This glossary defines key terms in Artificial Intelligence, Machine Learning, Data Science, Software Engineering, Finance and Science.

1

1q43-44 chromosome

biology science

The region 1q43-44 on chromosome 1 is associated with a syndrome characterized by intellectual disability, speech delays, and distinctive facial features. Deletions or duplications in this region can lead to a range of developmental issues, including microcephaly, corpus callosum abnormalities, and seizures.

1q43-q44 Deletion/Duplication Syndrome

biology science syndrome

  • Intellectual Disability: Moderate to severe intellectual disability is a hallmark of this syndrome.
  • Speech Delay: Limited or absent speech is common.
  • Facial Dysmorphism: Characteristic facial features include a round face, flat nasal bridge, prominent forehead, hypertelorism (widely spaced eyes), epicanthal folds, and low-set ears.
  • Corpus Callosum Abnormalities: Abnormalities of the corpus callosum (a part of the brain connecting the two hemispheres) are frequently observed.
  • Other Features: Hypotonia (low muscle tone), poor growth, seizures, and microcephaly (small head size) can also occur.
  • Variability: The severity of symptoms can vary greatly depending on the size and location of the deletion or duplication, and the specific genes involved.

A

Accuracy

machinelearning datascience DataAnalytics metrics % of observations that are correctly predicted.

Activation Functions

machinelearning artificialintelligence

Functions that enable neural networks to learn non-linear relationships between features and the label.

Popular activation functions include ReLU and Sigmoid.

For more details, refer to Google Developers ML Course and Keras Activations. Alternatively, refer to my notes on Activation Functions.

Adenocarcinoma

medical

Adenocarcinoma is a type of cancer that starts in glandular (secretory) cells—the cells that produce and release fluids like mucus. It can occur in organs such as the lungs, colon, breast, prostate, or pancreas.

Example: The biopsy revealed that the tumor was an adenocarcinoma of the colon.

American Standard Code for Information Interchange (ASCII)

Programming softwareengineering standards datascience machinelearning

ASCII (American Standard Code for Information Interchange) is a 7-bit character encoding standard developed in the 1960s to represent 128 English characters, including control characters, digits, uppercase/lowercase letters, and punctuation. It maps binary numbers (0–127) to specific symbols, with 95 printable characters, forming the basis for modern character sets like UTF-8.

Key Aspects of ASCII:

  • Characters Included: The 128 characters consist of 33 non-printing control characters (e.g., newline, tab) and 95 printable characters (digits, English letters, and symbols).
  • Common Values: ‘A’ is 65, ‘a’ is 97, ‘0’ is 48, and a space is 32.
  • Binary Representation: ASCII uses 7 bits per character, though often stored in 8-bit bytes (0–127).
  • Structure: It is organized into blocks: Control Characters (0–31), Special Characters and Numbers (32–64), Uppercase Letters (65–90), and Lowercase Letters (97–122).
  • Legacy: While heavily influential and still used, ASCII has largely been succeeded by UTF (specifically UTF-8), which includes the first 128 ASCII characters while supporting far more characters.

Common ASCII Examples:

  • 09: 48–57
  • AZ: 65–90
  • az: 97–122
  • Space: 32
  • Control Characters: 0–31 (e.g., NUL, BEL, LF, CR)

ASCII remains fundamental to computing,, especially for programming and communication protocols, even if its limitations necessitated the shift to more extensive encoding system

Ammortize

finance Gradually write off the initial cost of an asset over a period. In other words, spreading the cost of an (intangible) assert over its useful life.

Intangible assets could include Patents, Copyrights, & IPs.

Example: Imagine your business has purchased a patent for $10,000 / 5 = $2,0002,000.

Anagram

Programming Coding

An anagram is a word or phrase made by rearranging the letters of another word or phrase, using all the original letters exactly once to form a new, meaningful word or phrase, like “listen” becoming “silent,” or “a gentleman” becoming “elegant man”. It’s a popular form of wordplay, used in puzzles, literature, and humor to create hidden meanings or witty connections.

Key characteristics

  • Rearrangement: The core of an anagram is changing the order of letters.
  • All letters used: Every letter from the original must be used in the new word/phrase.
  • Meaningful result: The new arrangement must form a real word or phrase, not just gibberish.

Examples

  • Word to Word: “Triangle” and “Integral”.
  • Phrase to Phrase: “The eyes” and “They see”.
  • Famous Example: “William Shakespeare” becomes “I’ll make a wise phrase”.

Anecdote

linguistics

A short amusing or interesting story about a real incident or person.

Example: He told anecdotes about his job.

Antibodies

medical

Antibodies are proteins produced by the immune system that recognize and attach to foreign substances like bacteria, viruses, or toxins to help destroy them. They are a vital part of your immune defense system.

Example: After getting vaccinated, your body produces antibodies to fight off the virus in the future.

Synonyms / Related Terms: • Immunoglobulins (technical term) • Disease-fighting proteins • Immune proteins

Architecture

machinelearning artificialintelligence

The skeleton of the model — the definition of each layer and each operation that happens within the model.

Example: BERT is an architecture while bert-base-cased, a set of weights trained by the Google team for the first release of BERT, is a checkpoint. However, one can say “the BERT model” and “the bert-base-cased model.”

Alpha

finance

The extra return the (active) fund manager can generate over the (passive) index.

Apex

biology

(Biology) Tip or bottom point of the heart.

Arxiv

techology science ResearchPaper research openaccess

arXiv is a curated research-sharing platform open to anyone. It is s a pioneer in digital open access.

Extract from Wikipedia arXiv (pronounced as “archive”—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer reviewed. It consists of scientific papers in the fields of mathematics, physics, astronomy, electrical engineering, computer science, quantitative biology, statistics, mathematical finance, and economics, which can be accessed online.

Asset class

finance

3 common asset classes are:

  1. Debt: Umbrella term for all financial products that are based on borrowing.
  2. Equity: Ownership of a business and the risk it brings, either directly (through stocks) or indirectly (through mutual funds).
  3. Real assets: Can be physically seen. Gold and Real estates fall into this category.

Asset Management Company (AMC)

finance

Assets Under Management (AUC)

finance

Attention-Deficit/Hyperactivity Disorder

medical biology

ADHD (Attention-Deficit/Hyperactivity Disorder) is considered a form of neurodivergence, falling under the neurodiversity umbrella alongside conditions like autism, dyslexia, and dyspraxia, as it involves natural variations in brain function and information processing that differ from the “neurotypical” norm.

Augment

datascience machinelearning LLM


B

Bagging

machinelearning

Short form: Bagging Long form: Bootstrap Aggregation, where

  • Bootstrap = sampling with replacement to create diverse training sets
  • Aggregation = combining model predictions by averaging (regression) or majority voting (classification)

base64

Programming softwareengineering

Basic Healthcare Sum (BHS)

singapore retirement

"The Basic Healthcare Sum (BHS) is the estimated savings you need in your MediSave Account for your basic subsidised healthcare needs in old age. It is the maximum amount you can have in your MediSave Account (MA)"

For those who turn 65 in 2025, the BHS is fixed at S$75,500 for the rest of our lives – even though the actual BHS will continue to increase for subsequent cohorts.

The BHS of S$75,500 in 2025 is 5.6% higher than the BHS of S$71,500 set in 2024. In turn, the 2024 BHS of S$71,500 was 4.4% higher than the $68,500 BHS level set in 2023.

The Basic Healthcare Sum (BHS) has generally increased by around 5% each year – slightly higher than the 4% per annum interest we receive on our MediSave balances.

Assuming the BHS increases 5% yearly, in 35 years (at 2060), it would become .

Sources

Bewildered

linguistics

Perplexed, lost and confused; very puzzled

Example: Bilbo Baggins was bewildered and bewuthered

Bilateral ureteral obstruction

medical

In patients with prostate cancer:

Bilateral ureteral obstruction in prostate cancer

means that both kidneys are at risk because urine cannot drain properly. It’s a serious complication that needs urgent management to prevent kidney failure.

Bilateral ureteral obstruction in patients with prostate cancer means that both ureters—the tubes that carry urine from the kidneys to the bladder—are blocked, usually because the cancer is compressing or invading them.

What causes this in prostate cancer? The prostate is located just below the bladder, and when prostate cancer: • Grows large • Invades nearby tissues • Or spreads to lymph nodes

…it can press against or invade the ureters, especially where they pass close to the prostate.

Why is this dangerous?

  • Urine backs up into the kidneys, causing hydronephrosis (kidney swelling)
  • Leads to kidney damage or failure if untreated
  • Can cause fatigue, nausea, flank pain, and low urine output

Treatment options:

  • Ureteral stents (small tubes inserted to keep ureters open)
  • Nephrostomy tubes (drains urine directly from kidneys through the back)
  • Treating the underlying cancer (e.g. hormone therapy, radiation)

BLEU

machinelearning artificialintelligence LLM metrics

Measures the quality of text translated, similar to Precision.

Bombay Stock Exchange (BSE)

finance india

Breakthrough Device Designation (by FDA)

regulatory medical

This is not a full approval but a special status granted to devices that: • Provide more effective diagnosis or treatment for serious/life-threatening diseases • Have the potential to address unmet medical needs

It gives: • Faster review process • Closer communication with the FDA • Priority support during development and review

In March 2019, Paige.AI was granted the breakthrough device designation by the US Food and Drug Administration (FDA) for its AI system in cancer diagnosis.

Paige.AI received this status in 2019, meaning the FDA saw its AI cancer diagnostic system as high-potential and innovative.

Bootstrap

machinelearning

In ML context, it means creating multiple training sets by resampling with replacement from original dataset.


C

Canonical

softwareengineering Coding

Conforming to a rule, standard, or accepted principle, often referring to official, authoritative or standard form in literature, religion, science, or computing.

Capillary Refill Time (CRP)

biology science

A quick, physical test to objectively measure Perfusion (i.e., good blood circulation). It’s a brilliant and simple way to check how well the circulatory system is working.

  • How the Test is Done: The doctor will press firmly on a capillary-rich area, usually the child’s fingernail bed or the skin over their sternum (breastbone), for a few seconds. This pressure forces the blood out of the tiny blood vessels (capillaries) in that spot, causing it to turn white.
  • What is Measured: The doctor then releases the pressure and counts how long it takes for the normal pink colour to return to the area. This is the “refill time.”
  • What it Means: A time of less than 2 seconds (<2sec) is the goal. It shows that the circulatory system is responsive and that blood pressure is strong enough to quickly push blood back into those tiny vessels. It’s a sign of a healthy, well-hydrated, and robust cardiovascular system. If the time were longer (e.g., 3, 4, or 5 seconds), it could be a sign of issues like dehydration or more serious circulatory problems.

Capital appreciation

finance

Capstone

TBA

Cardiomyotpathy

science biology

Cardiomyopathy (kahr-dee-o-my-OP-uh-thee) is a disease of the heart muscle. It causes the heart to have a harder time pumping blood to the rest of the body, which can lead to symptoms of heart failure. Cardiomyopathy also can lead to some other serious heart conditions.

There are various types of cardiomyopathy. The main types include dilated, hypertrophic and restrictive cardiomyopathy. Treatment includes medicines and sometimes surgically implanted devices and heart surgery. Some people with severe cardiomyopathy need a heart transplant. Treatment depends on the type of cardiomyopathy and how serious it is.

Types

Source: https://www.mayoclinic.org/diseases-conditions/cardiomyopathy/symptoms-causes/syc-20370709

Causal

datascience

Relating to or acting as a cause.

Example: Causality, Causal Effect

Pronunciation: kaw·zl

Causality

datascience

The relationship between cause and effect.

Example: Causality, Causal Effect

Pronunciation: kaw·za·luh·tee

Central Provident Fund (CPF)

finance singapore

Civil Aviation Authority of Singapore (CAAS)

singapore aviation

Common Vulnerabilities and Exposures (CVEs)

security softwareengineering

A standardized, industry-recognized dictionary of publicly known cybersecurity flaws in software, hardware, and firmware.

Comprehensive

  • Including or dealing with all or nearly all elements or aspects of something.
  • of large scope; covering or involving much; inclusive

Concave

Concave function

  • A concave function is one where the line segment connecting any two points on the graph of the function lies below or on the graph.

Sources

Conformité Européenne (CE Mark)

regulatory

• The CE mark is required for medical devices (including software) sold in the European Economic Area (EEA). • It means the device meets EU safety, health, and environmental protection standards.

A CE mark allows you to legally market a device in Europe.

Conspiracy

linguistics

A secret plan by a group to do something unlawful or harmful.

Similar words: plot, scheme, plan

Container

techology DevOps

For more details, refer Docker.

Convex

Convex function

  • A convex function is one where the line segment connecting any two points on the graph of the function lies above or on the graph.

Sources

Coarse

linguistics

Rough or harsh in texture or structure, inferior quality. Can also refer to a person or their speech being rude or vulgar.

Example: The pointwise evaluation prompt approach might be coarse for our system.

Similar words: rough, bristly, scratchy, rude

Opposite words: fine

Pronunciation: kaws, kors

Cortex

medical

In anatomy, cortex refers to the outer layer of an organ or body part. It’s often used to describe the outer layer of the brain (cerebral cortex), but also applies to other organs like the kidney (renal cortex) and adrenal glands. The cortex plays a crucial role in the function and protection of the organ it surrounds.

Here’s a more detailed breakdown:

  • General Definition:
    • The cortex is essentially the “bark” or outer layer of an organ, providing a protective and functional role.
  • Cerebral Cortex:
    • This is the outer layer of the cerebrum, the largest part of the brain. It’s responsible for higher-level functions like thinking, memory, and language.
  • Other Cortices: The term “cortex” is also used for the outer layer of other organs like the kidney (renal cortex), adrenal glands (adrenal cortex), and even the outer layer of a hair.
  • Plant Cortex:
    • In botany, the cortex refers to the tissue between the epidermis (outer layer) and the vascular tissue in stems and roots.
  • In Technology
    • Often implies core processing unit or brain-like system.

Consumer Price Index (CPI)

finance

Designed to measure the average price changes of a fixed basket of consumption goods and services commonly purchased by resident households over time. It is widely used as a measure of consumer price inflation.

Singapore CPI increased 2.4% in 2024. For more details, refer SINGAPORE CONSUMER PRICE INDEX by Department of Statistics.

International CPI increased 5.7% in 2023. For more details, refer IMF report

Crony Capitalism

finance economics

Crony capitalism is a corrupt economic system where businesses gain success through close relationships with political leaders, not merit, securing unfair advantages like subsidies, tax breaks, and favorable contracts, distorting markets and increasing inequality. It’s characterized by collusion between the business and political classes, leading to bailouts, regulatory favoritism, and corruption, ultimately stifling innovation, harming public trust, and favoring special interests over fair competition.

curl

Programming softwareengineering

Command line tool and library to make HTTP requests and fetch content (i.e., transfer data) from URLs.

For more details, refer the official documentation.


D

Daemon (computing)

Programming softwareengineering

In computing, a daemon is a program that runs as a background computer process, rather than being under the direct control of an interactive user.

In the context of computing, the word is generally pronounced either as /ˈdiːmən/ DEE-mən or /ˈdeɪmən/ DAY-mən.

Docker

A system that packages applications and their dependencies into containers for consistent execution across various machines.

Docker benefits like in consistency, portability, and isolation.

For more details, refer Docker.

De Novo Clearance (by FDA)

regulatory medical

This is an FDA regulatory pathway used to approve low- to moderate-risk medical devices that: • Are novel (new to the market) • Have no predicate device (i.e., nothing similar already approved) This pathway allows companies to bring new and innovative medical devices to market.

Once granted: • The product is fully cleared for use • A new device type is created, which future similar products can reference

Paige Prostate Detect (PPD) has received de novo clearance (DEN200080) from the FDA in September 2021

It became the first FDA-cleared AI software for detecting prostate cancer in digital slides.

Domain Name System (DNS)

techology network DNS is the internet’s phonebook, translating human-friendly website names (like google.com) into numerical IP addresses (like 142.250.186.46) that computers use to find and connect to each other online. It allows users to type memorable domain names instead of long strings of numbers, making the internet navigable by mapping text to the correct server locations.

Key Functions

  • Translation (Resolution): When you type a URL, DNS servers look up the domain name and return its corresponding IP address, directing your browser to the right server.
  • Directory: It manages the vast database of domain names and their associated IP addresses, ensuring every site has a unique numerical identifier.
  • Hierarchical & Distributed: Instead of one giant phonebook, DNS uses a network of servers worldwide, making lookups fast and efficient.

Debt

finance

Debt financial products

finance

Products that usually give us an assured return. Examples include:

  • FD
  • Corporate deposits
  • Bond
  • Provident Fund
  • Public Provident Fund (PPF)
  • The core of the product is loan.
  • Higher the return it promises, the higher is the risk of non-payment of both of our investment and the interest.

Debt mutual funds

finance

Delineate

linguistics

Describe or portray (something) precisely. Indicate the exact position of (a border or boundary)

Example: He delineated the state of Texas on the map with a red pencil

Simiilar words: describe, depict

Direct Preference Optimization (DPO)

datascience artificialintelligence LLM AGI

Direct preference optimization (DPO) is a new method that helps large, unsupervised language models better match human preferences using a simple classification approach.1

Diastolic

Systolic and diastolic are two fundamental terms that describe the two main phases of a single heartbeat.

Think of the heart as a muscular pump. For every beat, it has to squeeze and then relax.

Diastolic (The Relaxing Phase)

  • What it is: Diastole is the part of the heartbeat when the heart muscle relaxes.
  • What it does: After squeezing, the heart chambers relax and expand to refill with blood, getting ready for the next contraction. This relaxation phase is just as important as the contraction because if the heart doesn’t fill properly, it can’t pump properly. It’s also during diastole that the coronary arteries deliver oxygen-rich blood to the heart muscle itself.
  • In a Blood Pressure Reading: It represents the pressure in your arteries when the heart is at rest between beats.

Distant metastasis

medical

In patients with prostate cancer

Distant metastasis in prostate cancer

means the disease has spread far from the prostate and is now considered advanced or metastatic prostate cancer. It requires systemic treatment and close monitoring.

In prostate cancer, distant metastasis means that the cancer has spread beyond the pelvis to organs or tissues that are far from the prostate, such as: • Bones (especially the spine, ribs, hips, pelvis, and long bones) • Lungs • Liver • Distant lymph nodes (outside the pelvic area)

How does it happen? Prostate cancer cells can: • Enter the bloodstream or lymphatic system • Travel to distant organs • Form new tumors at those sites

Staging context: • Distant metastasis corresponds to Stage IV (advanced stage) prostate cancer • In TNM staging, it’s often labeled as M1 (M = metastasis)

Distribution Cost

insurance

In the context of Insurance,

  • Distribution cost = basically sakes commissions + expenses the insurer pays to the financial adviser/distribution channel.
  • They are front-loaded in the first few years.
  • This is why if you surrender/cancel early, you usually get no refund — because most of what you paid went to distribution costs, not benefits.

For example

  • Year 1: 109% → The first year, distribution cost is more than your first year’s premium (insurer absorbs the excess, but it means almost all of what you pay goes to cover commissions/admin).
  • Year 2: 55% → Over half of your second year’s premium is still distribution cost.
  • Year 3: 30%,
  • Year 4: 25%,
  • Year 5: 15%,
  • Year 6: 15% → Gradually declines.
  • Year 7: 0% → No more commissions.

Dragon in a pinch

linguistics idiom

The phrase “as fierce as a dragon in a pinch” is an idiom, meaning someone can be surprisingly fierce or capable in a difficult situation, even if they don't usually appear that way


E

Ecological Fallacy

datascience

Occurs when one draws conclusions about individuals based solely on group-level data.

Example: Ecological Fallacy

Ellipsis

linguistics

A punctuation mark consisting of a series of three dots. An ellipsis can be used in many ways, such as for intentional omission of text or numbers.

Example: A set of dots (…) indicating an ellipsis.

Emanate

(of a feeling, quality, or sensation) issue or spread out from (a source).

Similar words: emergy, come out, originate, arise, start

Empirical

datascience

Based on, concerned with, or verifiable by observation or experience rather than theory or pure logic.

Example: They provided considerable empirical evidence to support their argument.

Opposite words: theoretical, non-empirical

Ephemeral

datascience linguistics

Lasting for a very short time.

Similar words: transitory, transiet, fleeting, passing, short-lived

Epigastric

biology science disease

Epigastric pain or burning: …

Equated monthly instalments (EMI)

finance

Equity

finance

Equity Linked Savings Scheme (ELSS)

finance india

An equity fund that gets the tax benefit.

Espionage

techology GAI LLM machinelearning datascience

The practice of spying or of using spies, typically by governments to obtain political and military information

Exchange-traded fund (ETF)

finance

Tracks an index like the Sensex, but also lists its units on a stock exchange, unlike a mutual fund.

Expense ratio

finance

The fees that a mutual fund charges investors for its costs and the profit it makes.

Exgratia

(of a payment) given as a favor or from a sense of moral obligation rather than because of any legal requirement.

Similar words: out of grace, by favor, out of good will, out of kindness.


F

Facebook AI Similarity Search (FAISS)

LLM Embeddings AI machinelearning datascience

To be added soon.

Fallacy

linguistics

A mistaken belief, especially one based on unsound arguments. An idea which many people believe to be true, but which is in fact false because it is based on incorrect information or reasoning.

Example: The notion that the camera never lies is a fallacy.

Similar words: misconception, misbelief, delusion

Pronunciation: fa·luh·see

Face value

finance

The value written on a financial instrument, like a bond or a stock.

Face value is the “official” value, not necessarily the “market” value.

👉 Example: A bond might have a face value of 1,000 at maturity.

Fast Moving Consumer Goods (FMCG)

finance

FDA (Food and Drug Administration, USA)

regulatory

The regulatory agency in the United States that approves drugs, medical devices, and diagnostics.

Fidelity

linguistics

The degree of exactness with which something is copied or reproduced.

Financial Assets

finance

Debt, and Equity

Fixed Deposit (FD)

finance

Fixed obligation-to-income ratio (FOIR)

finance

Floating Point

softwareengineering Coding datascience machinelearning LLM gpu

A floating-point number is a way to represent real numbers in computers, similar to scientific notation in base 10, but using base 2 (binary). It is composed of three main parts:

  1. Sign bit (1 bit): This single bit specifies the number’s sign — 0 means positive, and 1 means negative.
  2. Exponent bits (e.g., 8 bits in float32): These bits represent the exponent, which shifts the position of the binary point (like the decimal point) to scale the number. The exponent is stored with a “bias” (a fixed offset) to allow representation of both positive and negative exponents. This part determines the magnitude or scale of the number.
  3. Mantissa (or significand) bits (e.g., 23 bits in float32): These bits represent the significant digits (precision) of the number. The mantissa encodes the fractional part of the number after normalizing it to have a leading 1 (in normalized form, this leading 1 is implicit and not stored explicitly).

The value of a floating-point number is calculated as:

  • The sign bit controls the positive/negative.
  • The exponent bits, after subtracting the bias, give the power of 2.
  • The mantissa bits represent fractional precision after the binary point.

For example, in the IEEE 754 single precision (float32) format:

  • 1 bit for sign
  • 8 bits for exponent (with a bias of 127)
  • 23 bits for mantissa

This means the exponent value stored is an 8-bit unsigned integer, and the actual exponent is obtained by subtracting 127. The 23 mantissa bits represent the fractional part after the implicit leading 1.

This structure allows a wide dynamic range of numbers, from very tiny to very large, with a controlled precision depending on the mantissa length.

In summary:

  • The sign bit decides positive or negative.
  • The exponent bits shift the number’s scale by powers of two.
  • The mantissa bits encode the detailed digits or precision of the number after scaling.

This floating-point format makes it efficient to represent and compute on real numbers in computers, balancing between range and precision.

For more details, refer Data Types

Floating Point Operations Per Second (FLOPS)

softwareengineering Coding datascience LLM gpu

Unit of compute.

For more details, visit FLOPS

Flustered

linguistics

agitated or confused

Example: Bilbo Baggins had been too flustered to put down on his Engagement Tablet.

Fright

linguistics A sudden intense feeling of fear.

Functional Gastrointestinal Disorders (FGIDs)

science medical disease

F1 score

machinelearning datascience metrics

Score that is a function (harmonic mean) of precision and recall.

F1 score = 2 * TP / (2 * TP + FP + FN) = 2 * PPV * TPR / (PPV + TPR)


G

Genitourinary

medical

Genitourinary refers to the organs of the reproductive and urinary systems. It combines “genital” (related to reproduction) and “urinary” (related to urine and the urinary tract).

Example: The patient was referred to a genitourinary specialist for evaluation of kidney and bladder issues.

Similar words: Urogenital, Reproductive and urinary

Genitourinary Pathologists

medical

What do they do?

  • Diagnoses cancers (e.g., prostate cancer, kidney cancer, bladder cancer)
  • Identifies non-cancerous conditions (like infections, inflammation, or benign tumors)
  • Works closely with urologists and oncologists to guide treatment decisions

Government Technology Agency (GovTech)

singapore techology

GPT-Generated Unified Format (GGUF)

machinelearning deeplearning LLM

GGUF is a binary format designed for efficient loading and inference of large language models (LLMs) on various hardware, particularly CPUs and consumer GPUs, using tools like llama.cpp and Ollama. The “Unified” part reflects its goal to provide a standardized, extensible, and efficient format that unifies model weights, metadata, and other necessary components into one fast-loading binary format optimized for inference and deployment.

Key points about GGUF:

  • It succeeded earlier formats like GGML and was developed to overcome limitations around storage efficiency, loading speed, and cross-platform compatibility for LLMs.
  • GGUF is optimized for quick loading and inference, especially on consumer-grade hardware such as local PCs or servers.
  • It supports advanced compression and quantization techniques to reduce model size without sacrificing performance.
  • The format is extensible, allowing new features and metadata to be added without breaking compatibility.
  • GGUF is tightly integrated with projects like llama.cpp and supported by tools in the broader open-source LLM ecosystem, such as Hugging Face Transformers.

Group Relative Policy Optimization (GRPO)

LLM datascience AGI artificialintelligence

A reinforcement learning algorithm designed to train large language models (LLMs) for complex tasks like solving math problems or writing code. Unlike older methods, GRPO is memory-efficient because it doesn’t use a separate “value function” (a model that estimates future rewards). Instead, it generates multiple answers for each question, scores them with a reward model, and uses the average score as a reference to decide which answers are better. This makes it easier to train large models on limited hardware, which is surprising because it still performs well on tough tasks like reasoning.

Resources: https://aiengineering.academy/LLM/TheoryBehindFinetuning/GRPO/

Gut-brain axis dysregulation

biology science disease


H

Hearth

linguistics

Place in a home where a fire is or was traditionally kept for home heating and for cooking. Fireplace or floor of the fireplace.

High Bandwidth Memory (HBM)

gpu LLM machinelearning artificialintelligence

Hypertrophic cardiomyopathy

biology science syndrome disease

Hypertrophic cardiomyopathy (HCM) is a disease in which the heart muscle becomes thickened, also called hypertrophied. The thickened heart muscle can make it harder for the heart to pump blood.

Many people with hypertrophic cardiomyopathy don’t realize they have it. That’s because they have few, if any, symptoms. But in a small number of people with HCM, the thickened heart muscle can cause serious symptoms. These include shortness of breath and chest pain. Some people with HCM have changes in the heart’s electrical system. These changes can result in life-threatening irregular heartbeats or sudden death.

Sources: https://www.mayoclinic.org/diseases-conditions/hypertrophic-cardiomyopathy/symptoms-causes/syc-20350198

Hypotonia

biology science disease

Hypotonia means decreased muscle tone. It can be a condition on its own, called benign congenital hypotonia, or it can be indicative of another problem where there is progressive loss of muscle tone, such as muscular dystrophy or cerebral palsy. It is usually detected during infancy.


I

Immunohistochemistry (IHC)

medical

Immunohistochemistry (IHC) is a lab technique used to detect specific proteins in tissue samples using antibodies.

It’s a key method in diagnosing cancers, infections, and autoimmune diseases by revealing how cells behave under the microscope.

Example: The immunohistochemistry test confirmed the presence of cancer markers in the tissue sample.

Synonyms/Related terms

  • Immunohistology (synonym)
  • Antibody staining (informal synonym)
  • Biomarker detection
  • Tissue analysis

Break it down:

  • Immuno- = immune system (antibodies)
  • Histo- = tissue
  • Chemistry = reactions So: Immunohistochemistry = using antibodies in chemistry to study tissues!

For more details, refer https://www.biomol.com/resources/applications/immunohistochemistry/, https://oncodaily.com/oncolibrary/immunohistochemistry


Incidence

medical

The occurrence, rate, or frequency of a disease, crime, or other undesirable thing.

Example: An increased incidence of cancer.

Index

Index fund

Ingest

To absorb (information) or take (food, drink, or another substance) into the body by swallowing or absorbing it.

Example: He spent his days ingesting the contents of the library.

Similar words: absorb, consume, eat

Interquartile Range (IQR)

Math Statistics DataAnalytics datascience machinelearning

In statistics, the Interquartile Range (IQR) measures the spread of the middle 50% of your data, calculated as the difference between the third quartile (Q3, 75th percentile) and the first quartile (Q1, 25th percentile): IQR = Q3 - Q1. It indicates the variability of the central part of a dataset, ignoring extreme outliers, and is visualized by the box in a box plot.

Inter-reader vs. Intra-reader variability

TermDefinition
Inter-reader variabilityDifferences between different pathologists (pathologist A vs. B) interpreting the same slide.

Used to assess consistency across experts
Intra-reader variabilityDifferences within the same pathologist (pathologist A vs. A again), interpreting the same slide at different times.

Uses to test how reliable an individual expert is over time.
Why it matters:
• High inter-reader variability can point to ambiguity in diagnostic criteria or training gaps.
• High intra-reader variability may raise concerns about fatigue, bias, or complexity of the case.
• Both are critical when validating:
•	AI pathology models
•	Clinical guidelines
•	Consensus diagnoses

Intractable

Hard to control or deal with.

Example: The optimization problem might be intractable to solve.

Similar words: unmanageable, ungovernable, out of control

Investment horizon

Time for which we want to invest our money. Also known as “tenor


L

Large Language Models Meta AI (Llama)

artificialintelligence LLM OpenSource AGI As per wiki Llama (Large Language Model Meta AI) is a family of large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 4, released in April 2025.

For more details, refer https://github.com/meta-llama/llama-models?tab=readme-ov-file#llama-models-1

Left Ventricle (LV)

biology science

The main pumping chamber of the heart. It sends oxygen-rich blood to the rest of the body.

Left Ventricle Apex Trabeculation

biology science

LV apex trabeculation refers to the presence of prominent, finger-like projections (trabeculae) in the apex (tip) of the left ventricle of the heart. It can be a normal variant, especially in athletes, but excessive trabeculation, particularly when accompanied by a thin compacted myocardial layer, can indicate a condition called LVNC.

Left Ventricle Non-Compaction (LVNC)

LVNC is a type of cardiomyopathy associated with potential complications like heart failure, arrhythmias, and thromboembolism.

Lilliputian

A trivial or very small person or thing.

Example: Lilliput is the name of a fictional island whose people, the Lilliputians, stand only about six inches high.

Loofah or Luffa

A natural, fibrous sponge from gourd in the cucumber family, used for exfoliating and cleaning.

Low-Rank Adaptation (LoRA)

  • Instead of finetuning the weights of the actual model, we fine-tune the low rank matrices (a.k.a. adapters which consists of low-rank matrices)
  • During the inference time, these adapters are put on top of the actual weights of the base model, and they are summed together. This way, we don’t have to optimise the base model itself. For more details, refer LoRA vs. QLoRA

M

Marginal Utility

theory economics law The additional satisfaction or a benefit customer gets from consuming one more unit of a good or service.

Diminishing Marginal Utility The tendency for the satisfaction from each additional unit to decrease as consumption increases.

Example

  • Pizza: The first slice of pizza at a party may be very interesting and bring immense enjoyment. The second slide is still good, but less satisfying than the first, whereas the fifth or sixth slice is not nearly as enjoyable.
  • Money: The first 100 add much less utility to their overall happiness

Minimum Detectable Effect (MDE)

Statistics datascience DataAnalytics

In AB Testing & Experimentation and statistical analysis, the Minimum Detectable Effect (MDE) and the p-value are distinct but related concepts used to determine if a change is meaningful and statistically significant.

Key Differences and Relationships

  • Definition of MDE: The smallest change in a metric (e.g., conversion rate) that you want your test to reliably detect. It acts as a “sensitivity dial” for your experiment, set before the test runs.
  • Definition of P-Value: The probability of observing a difference as large as (or larger than) what you saw, assuming there is no actual difference (null hypothesis).
  • The Goal: You want to run a test where the p-value is below your significance threshold (usually 0.05) and the observed effect is at least as large as your MD

Median Absolute Deviation (MAD)

Statistics Math datascience machinelearning

The Median Absolute Deviation (MAD) is a robust measure of statistical dispersion, indicating how spread out a dataset is. It’s calculated by first finding the median of the dataset, then determining the absolute difference between each data point and the median, and finally finding the median of those absolute differences. MAD is particularly useful when dealing with datasets that may contain outliers or have non-normal distributions, as it is less sensitive to extreme values than measures like standard deviation.

Sources

Metastasis/ Metastasise

To metastasise means for cancer cells to spread from the original (primary) site to other parts of the body through the blood or lymphatic system. This process forms secondary tumors, often in vital organs like the liver, lungs, or bones.

Example:

  • The oncologist explained that the tumor had begun to metastasise to the lungs.
  • The tumour will metastasise if not treated early.

“meta” = beyond and “stasis” = standing/place 💡 So: Metastasise = cancer goes “beyond its place.”

Market Capitalization (Market Cap)

Market cap = No. of shares of company × Price per share.

SEBI defines:

  • Large-cap company as one that features within first 100 companies by market cap on the stock market.
  • Mid-cap is a company that ranks between 101 to 250 by market cap, and
  • Small-caps are 251 and below.

Microcephaly

biology science disease

Microcephaly (my-crow-sef-ah-lee) is a birth defect where a baby’s head is smaller than expected. Babies with microcephaly often have smaller brains that might that did not develop properly.

Source: https://www.cdc.gov/birth-defects/about/microcephaly.html

Mnemonic

A mnemonic is a memory aid — a tool, trick, rhyme, acronym, or phrase that helps you remember information more easily.

They are super useful for memorizing complex topics, vocabulary, medical terms, or lists.

Mnemonic Tip to Remember “Mnemonic”:

  • Even though it starts with an “m”, it’s silent—just like memory hides the m in “mnemonic!” Think: Mnemonic = Memory aid!

Money Laundering

finance banking

Money Mules

finance banking scam

A person who transfers illegally obtained money for criminals, often recruited through fake job offers or romance scams, by receiving illicit funds in their bank account and forwarding them elsewhere for a fee/commission, thus laundering the money and hiding the criminal’s identity, with severe legal consequences for the mule, including jail time and criminal records, even if they claim ignorance.

For more details, visit https://www.sc.com/sg/fraud-scam/money-mule/

Morbidity

The condition of suffering from a disease or medical condition.

Example: the therapy can substantially reduce respiratory morbidity in infants.

Morsel

linguistics

A small piece or amount of food; a mouthful.

Example: Bilbo Baggins baked two beautiful round seed-cakes for his after supper morsel.

Mutual Fund

finance

Mutually Exclusive Collectively Exhaustive (MECE)

Mutually exclusive means that each category is distinct and has no overlap, while collectively exhaustive means that all possible options or items are included in the categories. The MECE framework, which stands for Mutually Exclusive, Collectively Exhaustive, is a problem-solving and information-structuring tool, used especially in management consulting, to ensure that data is broken down logically and completely without gaps or overlaps.

MECE in Practice

  • Issue Trees: Consultants use MECE to break down complex problems into smaller, manageable parts, ensuring that every branch is distinct and that the entire tree covers all relevant aspects of the problem.
  • Case Interviews: Candidates for consulting roles are expected to use the MECE framework to structure their solutions logically and comprehensively.
  • Presentations and Reports: The MECE principle helps professionals organize their ideas in a way that is easy for clients to understand, guiding them toward informed decisions.

N

National Stock Exchange (NSE)

In context of India

Net Asset Value (NAV)

The price of one unit of a scheme (mutual fund).

Neurodivergent

medical differing in mental or neurological function from what is considered typical or normal (frequently used with reference to autistic spectrum disorders); not neurotypical.

Nexus

“Nexus” generally means a connection, link, or central point of connection between things. It can refer to a relationship, a link, or a core or center of something. It can also describe a connected group or series.

Here’s a more detailed breakdown:

  • Connection or Link: Nexus can describe a relationship or connection between two or more things, like the nexus between teachers and students. It can also refer to a causal link, such as the nexus between poverty and crime.
  • Central Point: It can also signify a central point, a hub, or a core, such as a bookstore being a nexus for a neighborhood.
  • Connected Group: Nexus can also refer to a connected group or series of things, like a nexus of theories or relationships.
  • Formal Usage:
    It’s often used in a formal context, particularly when discussing complex systems or relationships.
  • In Technology: Generally implies a hub, bridge, or central integration point.
  • Example: A sentence like “The bookstore has become something of a nexus for the downtown neighborhood” illustrates how nexus can describe a central place or focus for a group of people or activities.

Nifty50

Stock market index, that’s made up of 50 stocks.

Non-Player Character (NPC)

Non-player characters (NPCs) refers to any character that is not directly controlled by a player. Instead, NPCs are typically controlled by the game’s AI (in video games) or the game master (in tabletop RPGs). They often interact with players and can be anything from shopkeepers and quest-givers to enemies and background characters.

AI-powered NPCs (non-player characters) are characters in video games that are controlled by artificial intelligence rather than by human players. These AI-driven characters can interact with players in more dynamic and realistic ways than traditional NPCs, leading to more immersive and engaging gameplay experiences.

Normalized Discounted Cumulative Gain (NDCG or nDCG)

informationretrieval rag llm

Resources


O

Oncology

Oncology is the branch of medicine that deals with cancer—its diagnosis, treatment, and research.

Oncologists

Doctors who specialize in the field of oncology are called oncologists. A doctor who treats cancer patients.

Example: She decided to specialize in oncology to help patients fighting cancer.

One-Hot Encoding (OHE)

artificialintelligence machinelearning datascience LLM NLP

Out of Vocabulary (OOV)

artificialintelligence machinelearning datascience LLM NLP


P

Pangram

A sentence containing every letter of the alphabet.

Example: “The quick brown fox jumps over the lazy dog.” is a pangram.

Parameter-Efficient Fine-Tuning (PEFT)

Resources:

Pareto

Commonly refers to the Pareto principle, also known as 80/20 rule, which stats that roughly 80% of the effects come from 20% of causes.

Pareto principle states that small percentage of inputs often cause a disproportionately large percentage of outputs

Example

  • 80% of company’s sales come from 20% of its customers.
  • 80% of software crashes are caused by 20% of it bugs.
  • In personal productivity, 20% of your tasks may contribute to 80% of your results.

Pathology

Pathology is the study of diseases—their causes, effects, and development. It can also refer to the conditions caused by a disease.

Example:

  • She studied pathology to understand how cancer spreads in the body.
  • The report showed severe pathology in the liver tissue.

Synonyms: • Disease study • Medical science (when related to disease) • Morbidity (when referring to the condition itself)

Parity

the state or condition of being equal, especially as regards status or pay.

Pathologist

A doctor who studies disease by analyzing samples such as: • Biopsies • Surgical specimens • Cytology (cells from urine, for example)

Perfusion

science biology

This is a clinical term for good blood circulation. Perfusion is the process of the cardiovascular system (the heart and blood vessels) delivering oxygen-rich blood to all the tissues and organs of the body.

Perplexity

artificialintelligence machinelearning LLM metrics

Quantifies how ‘surprised’ the model is to see some words together.

PIP

Python softwareengineering OpenSource

In Python, PIP is the ==standard package manager used to install, manage, and uninstall third-party software packages and libraries that are not part of the Python standard library==. The name is a recursive acronym for “Pip Installs Packages”.

Role of PIP

PIP is an essential tool for any Python developer because it streamlines the process of adding external dependencies to a project. These packages are sourced primarily from the Python Package Index (PyPI), a vast online repository of community-contributed software.

Instead of manually downloading and managing source code files, PIP automates the process with simple command-line interface commands.

Populism

POPULISM IN POLITICS

While “every politician hopes to be popular”, Mr Ong said, populism is different. Populists often use an “us versus them” narrative, presenting themselves as champions of the people against elites, institutions, or outsiders, and offering simplistic solutions to complex problems. “When people are disillusioned and disgruntled, they hope for a silver bullet and may give these simplistic solutions a chance,” he added.

He cited Argentina’s history of left-wing populism – where businesses eventually left due to excessive taxation, leading to job losses and economic crises. “In the end, the workers and ordinary people are the ones who suffer,” he said.

Far-right populism, often linked to immigration concerns, is more widespread now, Mr Ong said, referencing countries like the US, UK, Australia and Japan.

“Populism takes societies on the road to ruin – creating irreconcilable rifts between communities, and fuelling xenophobia and racism,” he said. “Eventually, either their fiscal system goes broke or the society breaks apart.”

https://www.channelnewsasia.com/singapore/ong-ye-kung-workers-party-ge2025-racial-politics-5363956

Post-prandial

biology science disease

Post-prandial fullness:

Preamble

A preliminary or preparatory statement; an introduction.

Precision

machinelearning datascience metrics

Also known as Positive Predictive Value (PPV)

% of predicted positive that were correct

Precision = TP/(TP+FP) = 1 - FDR

Portmanteau

A word blending the sounds and combining the meanings of two others, for example motel (motor hotel) or brunch (breakfast + lunch).

Price-earnings ratio (P/E ratio or P/E)

finance Investment Stocks

Prompt Engineering

Engineering a prompt so that LLM does what we want.

By better crafting our prompts, we can improve the quality of results.

Prostate

The prostate is a small gland in males, located just below the bladder and in front of the rectum. It produces seminal fluid, which helps nourish and transport sperm during ejaculation.

It’s a part of the male reproductive system and is often discussed in relation to prostate cancer or benign prostatic hyperplasia (BPH).

Example: The doctor recommended a prostate exam because of the patient’s age and symptoms

Related Terms: • Prostate gland (full name) • Prostate cancer (a common male cancer) • Prostatitis (inflammation of the prostate)

Provident Fund (PF)

finance

Proximal Proxy Optimization (PPO)

artificialintelligence datascience LLM AGI

Proximal policy optimization (PPO) is a reinforcement learning (RL) algorithm for training an intelligent agent.2

Psuedo-Labels

Automatically generated labels from the data itself. These labels are not manually annotated but are inferred based on the inherent structure or attributes of the data.

In other words, the model automatically generates the labelled data.

Examples in practice:

  1. NLP: BERT uses MLM (Masked Language Modeling), GPT predicts the next word in a sequence (in CLM)
  2. Vision: SimCLR, MAE
  3. Speech: Wav2Vec

Example of a Masking/Prediction Task: In NLP, when training a model like BERT, random words in a sentence are masked. The model is trained to predict these masked words using the context (MLM).

  • Input: “The cat is ___ the table.”
  • Pseudo-label: “on.”

Public Provident Fund (PPF)

In context of India

Pydantic

Pydantic is the most widely used data validation library for Python.

For more details, refer to the official website.

Pylance

Superchares our Python Experience in Visual Studio Code/ Cursor.

Pylance is a powerful and popular extension for Visual Studio Code that provides enhanced language support for Python developers. Developed by Microsoft, it has become the default language server for the official Python extension, significantly boosting productivity with its rich set of features focused on speed, accuracy, and intelligent code assistance.

At its core, Pylance is powered by Microsoft’s open-source static type checking tool, Pyright. This foundation allows Pylance to deliver a superior IntelliSense experience, offering intelligent autocompletions, detailed function signature help, and rapid code navigation. By leveraging type information, Pylance can provide more accurate and context-aware suggestions, helping developers write cleaner and more error-free code

Python Package Index (pyPI)

Python Programming softwareengineering OpenSource

A repository of software for the Python programming language. PyPI help you find and install software developed and shared by the python community.


Q

Quantized Low-Rank Adaptation of Large Language Models (QLoRA)

QLoRA is where the base model is also quantised (e.g, weights reduced from 16-bit to 4-bit.

For more details, refer


R

Random Access Memory (RAM)

machinelearning datascience softwareengineering LLM gpu

General-purpose memory accessed by the CPU.

In LLM workflows, RAM is used primarily for loading the model from disk and managing tasks outside the GPU, such as operating system functions, data preprocessing, and orchestration by the CPU. While RAM is important for overall system operation and initial model loading, it is much slower than VRAM for neural network computations. Its size often needs to be at least equal to the uncompressed model size, but it doesn’t directly speed up the heavy computations.

Remote Code Execution (RCE)

security softwareengineering

A remote code execution (RCE) attack is one where an attacker can run malicious code on an organization’s computers or network. The ability to execute attacker-controlled code can be used for various purposes, including deploying additional malware or stealing sensitive data.

Rectified Linear Unit (ReLU)

A popular activation function that outputs the input directly if it’s positive, otherwise it outputs zero.

Real Assets

finance Investment Gold and Real estate

Recall

% of actually positive that were correct. 3 Recall = Sensitivity = True Positive Rate (TPR) = TP / P = TP / (TP + FN) = 1 - FNR

Recession

finance

Reconciliation

finance banking

An accounting process in which company’s records are reconciled with its bank statements to make sure that the balances match. It involves reviewing transactions, spotting mismatches, and adjusting balances until both figures align, typically done monthly.

Why It’s Important

  • Accuracy: Ensures your financial statements reflect your true cash position.
  • Error Detection: Catches human mistakes in recording transactions.
  • Fraud Prevention: Helps spot unauthorized withdrawals or fraudulent activity.
  • Cash Flow Management: Provides a clear picture of available funds.
  • Compliance: Essential for audits and tax filing

For more details, refer this blog post: https://www.highradius.com/resources/Blog/bank-reconciliation-definition/

Recurrent Neural Network (RNN)

A type of neural network designed for sequential data processing, where connections between nodes form a directed graph along a temporal sequence.

Recurring Deposit (RD)

finance

Reinforcement Learning from Human Feedback (RLHF)

artificialintelligence datascience LLM AGI A machine learning technique that aligns artificial intelligence (AI) models, especially Large Language Models (LLMs), with human preferences and values.4

Reinforcement Learning with Verifiable Rewards (RLVR)

artificialintelligence datascience LLM AGI

REINFORCE Leave One-Out (RLOO)

Resources: https://huggingface.co/blog/putting_rl_back_in_rlhf_with_rloo

Regularization and Regularization Rate (λ)

One approach to keeping a model simple is to penalize complex models; that is, to force the model to become simpler during training. Penalizing complex models is one form of regularization.

The training optimization algorithm works as:

A regularization rate (lambda) controls the strength of regularization, with higher values leading to simpler models and lower values increasing the risk of overfitting.

Formula:

Regularization types:

  • L1 regularization
  • L2 regularization

A high regularization rate:

  • Strengthens the influence of regularization, thereby reducing the chances of overfitting
  • Tends to produce a histogram of model weights with a normal distribution and a mean weight of 0

A low regularization rate:

  • Lowers the influence of regularization, thereby increasing the chances of overfitting
  • Tends to produce a histogram of model weights with a flat distribution

For more details, refer to Google Developers.

Repatriation

banking finance

Sending of money back to one’s own country.

Example: The repatriation of profits by foreign investors

Request for Comments (RFC)

RESTful API

A software architectural style that defines a set of constraints for creating web services.

RoBERTa

RoBERTa (Robustly Optimized BERT Pretraining Approach) is an optimized version of BERT that removes the Next Sentence Prediction task and uses different training configurations.

ROUGE

metrics machinelearning artificialintelligence LLM metrics

Measures the quality of text generated, similar to Recall.

Rule of 72

Versatile rule to know the rate of return of every year of a double-your-money proposition.

Over what time, my money doubles? Then, divide the 72 by the time window in years.

From the book Let’s Talk Money

Role Playing Games (RPG)

RPG involve players taking on the roles of fictional characters within a narrative, making choices and decisions that influence the story and character development. They can be played in various formats, including tabletop, video games, and live-action setting

Rome Process and Rome Criteria

science biology disease

The Rome criteria, established and periodically updated by the international Rome Foundation, provide the global standard for symptom-based diagnosis of FGIDs. This standardization is paramount for ensuring diagnostic consistency and upholding the scientific rigor of clinical research worldwide. The latest iteration, Rome IV, published in 2016, continues this tradition of refinement.

For more details, refer Rome Process and Rome Criteria.


S

Satiation

science biology disease

Early satiation: …

Self-attention

Self-attention is the mechanism that lets each word in a sentence focus on every other word, assessing their relationships to form a context-aware representation.

In other words, the self-attention mechanism enables the model to weight the importance of each word in a sequence relative to all other words, thus allowing the model to capture long-range contextual relationships and dependencies.

Both encoders and decoders consist of many layers connected by self-attention mechanisms.

Example: In the phrase “It was a bright sunny day,” the word “bright” can attend to “sunny” and “day” due to self-attention, achieving a bi-directional context that enriches its meaning. This bi-directional influence is essential for accurate tasks like named entity recognition and extractive question answering.

Self-supervised learning

A type of learning in which the objective is automatically computed from the inputs of the model. In other words, a type of machine learning where models learn to predict parts of data from other parts, without labels. That means humans are not needed to label the data.

Often used in natural language processing (NLP) and computer vision to pre-train models on large datasets.

Example: Transformer models like GPT, BERT, BART, T5, etc. have been trained as language models on large amounts of raw data in a self-supervised fashion. This type of model develops a statistical understanding of the language it has been trained on, but it’s not very useful for specific practical tasks. Because of this, the general pretrained model then goes through a process called transfer learning.

Secure Hash Algorithm (SHA)

algorithm security

It’s a cryptographic function that takes any input and produces a fixed-length string of characters (a “hash” or “digest”).

Key properties:

  • Deterministic: same input always produces the same hash
  • One-way: you can’t reverse the hash to get the original input
  • Unique: even a tiny change in input produces a completely different hash

SHA-256 is a specific variant of the SHA that produces a 256-bit (64 hex character) hash.

Input:  "hello"
Output: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

The “256” refers to the output size. Other variants exist:

VariantOutput sizeExample use
SHA-1160 bits (40 chars)Git commit hashes (legacy)
SHA-256256 bits (64 chars)Docker image digests, Bitcoin
SHA-512512 bits (128 chars)Password hashing, TLS

SHA-256 is widely used because it balances security (no known collisions) and performance (fast to compute).

Securities and Exchange Board (SEBI)

In context of India

Sets up the rules of the game around the equity market. Stock exchanges have to abide by them. The firms that publicly list must abide by SEBI and stock exchange rules.

Semantic Parsing

Converting language into structured data, often for databases or code.

Example: T5 converts language into structured data (e.g., SQL queries).

Sentiment Analysis

Detecting the emotional tone of text. Identifying and classifying entities like names, places, and dates.

Example: BERT-based Sentiment Classifier analyzes text for sentiment polarity.

Sensex

In context of India

A stock market index. It’s made up of 30 most representative companies that are listed in BSE. The index has an initial value of 100, as on 1 Apr 1979.

When we say Sensex went up, we mean that of the 30 companies in Sensex more prices rose than fell.

Sensex is a barometer of the activity in stock market during the day, and over a long period of time.

The Sensex and Nifty50 are broad market indices and are also called large-cap indices.

Sequence-to-sequence transformer models

Transformer models that can handle input sequences and generate output sequences, typically used for tasks like translation, summarization, and text generation.

Examples: BART, T5

Shuddered

linguistics (of a person) tremble convulsively, typically as a result of fear or revulsion

Similar words: shake, shiver, tremble, quiver, papitate

Example: He shuddered, and very quickly he was plain Mr. Baggins again.

Sigmoid

An activation function that maps any real number to a value between 0 and 1, commonly used in binary classification problems.

Smurf

finance banking

Softmax

machinelearning datascience Statistics

Normalises the quantities so that (each row in the matrix) sum up to 1.

Special Purpose LLMs

Large Language Models highly trained to focus on a single or small set of tasks. This is in contrast to General Purpose LLMs.

Example:

  • Raven-13B: Tuned to provide function calling services
  • Smaller & lower latency than general purpose LLMs

Spinal Metastasis

In patients with prostate cancer Spinal metastasis means that the cancer has spread from the prostate to the bones of the spine.

What does this mean for the patient? 1. Prostate cancer often spreads to bones, and the spine is one of the most common sites. 2. The spread happens through the bloodstream or lymphatic system. 3. It usually affects the vertebrae (bones of the spine), not the spinal cord itself (though the cord can be compressed).

Why is this serious? • Indicates advanced (stage 4) cancer • May require radiation therapy, hormone therapy, surgery, or pain management • Early detection is important to prevent spinal cord compression, which can cause permanent nerve damage


Summarization

Producing concise summaries of longer content.

Example: BART generates concise summaries for long texts.

Succinct

Briefly and clearly expressed.

Example: Use short, succinct sentence.

Similar words: concise, short, brief, compact

Pronunciation: suhk·singkt

Standard Deviation

Math DataAnalytics Statistics

Standard Deviation () is just the square root of Variance

For more details, refer

Stochastic

Statistics datascience machinelearning AI

Stock Exchange

Stock Market Index

Strode

linguistics

walk with long, decisive steps in one direction.

Example: Gandalf strode away.

Sundae

A popular American ice cream with scoops of ice cream topped with sweet sauce or syrup (like fudge or caramel), whipped cream, nuts, sprinkles, and a cherry

Swagger

Programming softwareengineering API documentation. Aids in testing and debugging.

Swagger is a popular set of open-source tools and a specification for designing, building, documenting, and consuming RESTful APIs, allowing developers to describe APIs in a machine-readable format (JSON/YAML) for interactive docs, code generation, and testing, making API management much smoother and collaborative. Think of it as a blueprint for your API, enabling tools to automatically create beautiful documentation (Swagger UI), client libraries, and tests.

Systematic Investment Plan (SIP)

Think of this as a recurring deposit, but instead of putting money in a fixed deposit, we are making periodic investments into a mutual fund.

Systematic Transfer Plan (STP)

A facility that allows us to space out a big investment over time.

Systematic Withdrawal Plan (SWP)

A facility to periodically redeem our units to generate an income. It works like a dividend plan, but in this case the control remains in our hand of how much money we want to take from our fund periodically.

Systolic

science biology

Systolic and diastolic are two fundamental terms that describe the two main phases of a single heartbeat.

Systolic (The Squeezing Phase)

  • What it is: Systole is the part of the heartbeat when the heart muscle contracts or squeezes
  • What it does: The powerful lower chambers of the heart (the ventricles) contract forcefully to pump blood out of the heart.
    • The right ventricle pumps blood to the lungs to pick up oxygen.
    • The left ventricle pumps oxygen-rich blood to the rest of the body.
  • In a Blood Pressure Reading: It represents the maximum pressure in your arteries as the heart contracts and pushes blood out.

T

Taxonomy

Taxonomy is a practice and science concerned with classification or categorization. Typically, there are two parts to it: the development of an underlying scheme of classes (a taxonomy) and the allocation of things to the classes (classification).

Originally, taxonomy referred only to the classification of organisms on the basis of shared characteristics. Today it also has a more general sense. It may refer to the classification of things or concepts, as well as to the principles underlying such work. Thus a taxonomy can be used to organize species, documents, videos or anything else

Terse

TBA

Text Classification

Categorizing text into predefined labels.

Text Completion

Generating the continuation of a given text.

Example: GPT-3 completes text based on context.

Theology

Related to the study of the nature of God and religious belief.

Throng

linguistics

A large densely packed crowd of people or animals.

Example: He pushed his way through the throng.

Trade Monitoring (TM)

finance banking

Transfer learning

A technique where a model developed for one task is reused as the starting point for a model on a second, related task. In other words, a process in which a (general) pretrained model is fine-tuned in a supervised way - that is, using human-annotated labels - on a given task.

The principle of transfer learning is to leverage knowledge from one domain (source task) to improve learning in another (target task).

Example: Often used in NLP and computer vision, where pre-trained models (e.g., BERT, ResNet) are fine-tuned for new tasks.

Transformer Model

A deep neural network (DNN) architecture primarily used for natural language processing (NLP) tasks. Transformer models rely on attention mechanisms to process data, enabling parallelization and improved performance on sequential data.

Example: Widely used in NLP tasks like translation, text generation, and question-answering; includes models like BERT and GPT.

Trivial

Of little value or importance.

Example: Huge fines were imposed for trivial offences.

Similar words: unimportant, insignificant, minor

Opposite words: non-trivial

Typology

banking finance

A typology is a system of classification used to organize things according to similar or dissimilar characteristics. Groups of things within a typology are known as “types”.

Typologies are distinct from taxonomies in that they primarily address things not categorizable based on empirical and objective characteristics, such as abstract and conceptual ideas or subjective criteria, though the two terms are sometimes used interchangeably.


U

Uni-directional attention

Decoder models operate in a uni-directional manner, meaning they only consider the context from the left (previous words) when making predictions. They do not access future words, in contrast to the bidirectional attention used in encoder models.

UK Conformity Assessed (UKCA Mark)

  • This is the UK’s version of the CE mark that’s required for medical devices (including software), created after Brexit.
  • It applies to products sold in England, Scotland, and Wales.

A UKCA mark ensures the product meets UK-specific safety and regulatory requirements.

Unicode

Programming softwareengineering machinelearning datascience

An international character encoding standard for use with different languages and scripts, by which each letter, digit, or symbol is assigned a unique numeric value (e.g., U+0041 for ‘A’) that applies across different platforms and programs. It enables consistent text representation, storage, and exchange across different platforms, operating systems, and applications. It serves as a superior alternative to older, limited standards like ASCII.

Unicode Tranformation Format (UTF)

Programming softwareengineering machinelearning datascience

UTF (Unicode Transformation Format) is a character encoding standard that allows computers to represent text in any language across platforms using Unicode code points. UTF-8 is the dominant standard for the internet (99.9% of web pages), using 1 to 4 bytes for encoding, and is fully backward compatible with ASCII.

Key Aspects of UTF-8:

  • Variable-Length Encoding: UTF-8 uses 1 byte for standard ASCII characters (0-127), and up to 4 bytes for other characters, emojis, and symbols.
  • Universal Compatibility: It represents all Unicode characters, making it ideal for internationalization.
  • **Web Standard:**It is the standard for HTML5, emails, JSON, and modern APIs
  • Efficiency: For English text, UTF-8 is very compact, while still supporting all characters.
  • Alternative Encodings: Besides UTF-8, other formats include UTF-16 (used in Windows/internal, usually 2-4 bytes) and UTF-32 (fixed-length 4 bytes).

Common UTF Variants:

  • UTF-8: Most common, 1-4 bytes, ASCII compatible.
  • UTF-16: 2 or 4 bytes, common in Windows/Java/JavaScripts internal storage.
  • UTF-32: 4 bytes, rarely used for storage due to size, but used in some scenarios for fixed-length needs.

UTF-8 ensures that text is encoded into bytes in a consistent way that can be reversed, ensuring “lossless” transport across different systems.

Unit linked insurance plans (ULIP)

Upending

linguistics set or turn (something) on its end or upside down.

Example: The security inspector upended my bag and dumped everything out.


V

Vanilla

Often means standard, plain, ordinary, straight forward, conventional or no-frills approach. Used in contrast to something exotic or complex

Vanishing gradient

A problem in training deep neural networks where gradients become exponentially small as they propagate back through the network layers, making it difficult for the network to learn long-range dependencies.

For more details, refer to Google Developers ML Course.

Variance

Math datascience DataAnalytics Statistics

Variance () is computed as the average of the squared differences from the Mean.

To calculate the variance follow these steps:

  • Calculate the Mean (the simple average of the numbers)
  • Then for each number: subtract the Mean and square the result (the squared difference).
  • Then calculate the average of those squared differences.

And the Standard Deviation () is just the square root of Variance

For more details, refer

Video RAM (VRAM)

artificialintelligence machinelearning LLM AI datascience gpu

Dedicated memory on GPUs, which are the main processors used to train and run LLMs efficiently. VRAM is much faster and has higher bandwidth than system RAM. It stores model weights, intermediate computations (like gradients during training), and key-value caches (context for inference) during operation. Having enough VRAM is critical because if the model size or context window exceeds VRAM capacity, performance drops drastically, or the process fails to run.

For example, refer to Which open-source models? Which variants? on various model parameters and their correponding VRAM requirements.

Virtual Large Language Model (vLLM)

machinelearning datascience LLM OpenSource

vLLM is a fast and easy-to-use library for open-source LLM inference and serving.

For more details, refer

Visceral Hypersensitivity

biology science disease


W

Willingness to Pay (WTP)

Startup

Wintering

(Especially of a bird) spend the winter in a particular place.

Example: birds wintering in the Channel Islands.

Whole Slide Images (WSI)


Appendix

Additional Resources

References

Footnotes

  1. https://www.superannotate.com/blog/direct-preference-optimization-dpo

  2. https://en.wikipedia.org/wiki/Proximal_policy_optimization

  3. https://en.wikipedia.org/wiki/Confusion_matrix

  4. https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback