TermDefinitionExample/Notes
Activation functionsFunctions that enable NN to learn non-linear relationships between features and the label.

For more details, refer google developers ML course, keras actviations
ArchitectureThis is the skeleton of the model — the definition of each layer and each operation that happens within the model.For example, BERT is an architecture while bert-base-cased, a set of weights trained by the Google team for the first release of BERT, is a checkpoint. However, one can say “the BERT model” and “the bert-base-cased model.”
Psuedo-LabelsAutomatically generated labels from the data itself. These labels are not manually annotated but are inferred based on the inherent structure or attributes of the data.

In other words, model automatically generates the labelled data.

Example in practice:
1. NLP: BERT: , GPT: Predicts the next word in a sequence (in CLM)
2. Vision: SimCLR, MAE
3. Speech: Wav2Vec
Example of a Masking/Prediction Tasks:
In NLP, when training a model like BERT, random words in a sentence are masked. The model is trained to predict these masked words using the context (MLM).
• Input: “The cat is ___ the table.”
• Pseudo-label: “on.”
PydanticPydantic is the most widely used data validation library for Python. For mode details, refer official website
PydanticPydantic is the most widely used data validation library for Python. For mode details, refer official website
Rectified Linear Unit (ReLU)
Recurrent Neural Network (RNN)
Recurrent Neural Network (RNN)
Regularization and Regularization Rate ()One approach to keeping a model simple is to penalize complex models; that is, to force the model to become simpler during training. Penalizing complex models is one form of regularization.

i.e., Training optimization algorithm

A regularization rate (lambda) controls the strength of regularization, with higher values leading to simpler models and lower values increasing the risk of overfitting.
i.e.,

For more details, refer google developers

Regularization types:
- L1 regularization
- L2 regularization

A regularization rate (lambda) controls the strength of regularization, with higher values leading to simpler models and lower values increasing the risk of overfitting.

A high regularization rate:
- Strengthens the influence of regularization, thereby reducing the chances of overfitting.
- Tends to produce a histogram of model weights having the following characteristics:
- a normal distribution
- a mean weight of 0.

A low regularization rate:
- Lowers the influence of regularization, thereby increasing the chances of overfitting.
- Tends to produce a histogram of model weights with a flat distribution.
RESTful API
RESTful API
RoBERTa
RoBERTa
Self-attentionSelf-attention is the mechanism that lets each word in a sentence focus on every other word, assessing their relationships to form a context-aware representation.

In other words, self-attention mechanism enables the model to weight the importance of each word in a sequence relative to all other other words, thus allowing the model to capture the long-range contextual relationships and dependencies.

Both encoders and decoders consists of many layers connected by self-attention mechanism.
In the phrase “It was a bright sunny day,” the word “bright” can attend to “sunny” and “day” due to self-attention, achieving a bi-directional context that enriches its meaning. This bi-directional influence is essential for accurate tasks like named entity recognition and extractive question answering.
Self-attentionSelf-attention is the mechanism that lets each word in a sentence focus on every other word, assessing their relationships to form a context-aware representation.

In other words, self-attention mechanism enables the model to weight the importance of each word in a sequence relative to all other other words, thus allowing the model to capture the long-range contextual relationships and dependencies.

Both encoders and decoders consists of many layers connected by self-attention mechanism.
In the phrase “It was a bright sunny day,” the word “bright” can attend to “sunny” and “day” due to self-attention, achieving a bi-directional context that enriches its meaning. This bi-directional influence is essential for accurate tasks like named entity recognition and extractive question answering.
Self-supervised learningA type of learning in which the objective is automatically computed from the inputs of the model. In other words, a type of machine learning where models learn to predict parts of data from other parts, without labels. That means humans are not needed to label the data.Often used in natural language processing (NLP) and computer vision to pre-train models on large datasets.

Transformer models like GPT, BERT, BART, T5, etc. have been trained as language models on large amount of raw data in a self-supervised fashion. This type of model develops a statistical understanding of the language it has been trained on, but it’s not very useful for specific practical tasks. Because of this, the general pretrained model then goes through a process called transfer learning.
Self-supervised learningA type of learning in which the objective is automatically computed from the inputs of the model. In other words, a type of machine learning where models learn to predict parts of data from other parts, without labels. That means humans are not needed to label the data.Often used in natural language processing (NLP) and computer vision to pre-train models on large datasets.

Transformer models like GPT, BERT, BART, T5, etc. have been trained as language models on large amount of raw data in a self-supervised fashion. This type of model develops a statistical understanding of the language it has been trained on, but it’s not very useful for specific practical tasks. Because of this, the general pretrained model then goes through a process called transfer learning.
Semantic ParsingConverting language into structured data, often for databases or code.T5 - Converts language into structured data (e.g., SQL queries)
Semantic ParsingConverting language into structured data, often for databases or code.T5 - Converts language into structured data (e.g., SQL queries)
Sentiment AnalysisDetecting the emotional tone of text.Identifying and classifying entities like names, places, and dates.BERT-based Sentiment Classifier - Analyzes text for sentiment polarity.
Sentiment AnalysisDetecting the emotional tone of text.Identifying and classifying entities like names, places, and dates.BERT-based Sentiment Classifier - Analyzes text for sentiment polarity.
Sequence-to-sequence transformer modelse.g., BART/T5 like
Sequence-to-sequence transformer modelse.g., BART/T5 like
Sigmoid
Special Purpose LLMsHighly trained to focus on a single or small set of tasks. This is in contrast to General Purpose LLMsRaven-13B
- Tuned to provide function calling services
- Smaller & lower latency than general purpose LLMs
Special Purpose LLMsHighly trained to focus on a single or small set of tasks. This is in contrast to General Purpose LLMsRaven-13B
- Tuned to provide function calling services
- Smaller & lower latency than general purpose LLMs
SummarizationProducing concise summaries of longer contentBART - Generates concise summaries for long texts.
SummarizationProducing concise summaries of longer contentBART - Generates concise summaries for long texts.
Text ClassificationCategorizing text into predefined labels.
Text ClassificationCategorizing text into predefined labels.
Text CompletionGenerating the continuation of a given text.GPT-3 - Completes text based on context.
Text CompletionGenerating the continuation of a given text.GPT-3 - Completes text based on context.
Transfer learningA technique where a model developed for one task is reused as the starting point for a model on a second, related task. In other words, a process in which a (general) pretrained model is fine-tuned in a supervised way - that is, using human-annotated labels - on a given task. The principle of transfer learning is to leverage knowledge from one domain (source task) to improve learning in another (target task).Often used in NLP and computer vision, where pre-trained models (e.g., BERT, ResNet) are fine-tuned for new tasks.
Transfer learningA technique where a model developed for one task is reused as the starting point for a model on a second, related task. In other words, a process in which a (general) pretrained model is fine-tuned in a supervised way - that is, using human-annotated labels - on a given task. The principle of transfer learning is to leverage knowledge from one domain (source task) to improve learning in another (target task).Often used in NLP and computer vision, where pre-trained models (e.g., BERT, ResNet) are fine-tuned for new tasks.
Transformer ModelA deep neural network (DNN) architecture primarily used for natural language processing (NLP) tasks. Transformer models rely on attention mechanisms to process data, enabling parallelization and improved performance on sequential data.Widely used in NLP tasks like translation, text generation, and question-answering; includes models like BERT and GPT.
Transformer ModelA deep neural network (DNN) architecture primarily used for natural language processing (NLP) tasks. Transformer models rely on attention mechanisms to process data, enabling parallelization and improved performance on sequential data.Widely used in NLP tasks like translation, text generation, and question-answering; includes models like BERT and GPT.
Uni-directional attentionDecoder models operate in a uni-directional manner, meaning they only consider the context from the left (previous words) when making predictions. They do not access future words, in contrast to the bidirectional attention used in encoder models.
Uni-directional attentionDecoder models operate in a uni-directional manner, meaning they only consider the context from the left (previous words) when making predictions. They do not access future words, in contrast to the bidirectional attention used in encoder models.
Vanishing gradientFor more details, refer google developers ML course
Prompt EngineeringEngineering a prompt so that LLM does what we want.

By better crafting our prompts, we can improve the quality of results.
IncidenceThe occurrence, rate, or frequency of a disease, crime, or other undesirable thing.

e.g., An increased incidence of cancer.

Additional Resources