In CLM, the model learns to predict the next word in a sequence based on the preceding context, training the model to generate coherent sequences. This method is auto-regressive, meaning the model generates each token based on all tokens to its left in the sequence.
Some variations and strategies within CLM for decoder-only models include:
- Uni-directional attention: Only looks at previous tokens, masking future tokens to prevent leakage.
- Prompt-Based Training: Using prompts or templates to guide the generation, especially for downstream tasks.
These pre-training approaches align with the objectives of models like GPT, designed for tasks like text completion, dialogue, and story generation.
While CLM and NLG they are related, they are not synonymous; CLM is a specific approach used for certain types of NLG applications. In other words, CLM is a method within NLG.