mDeBERTa V3 Base | 0.276 | Encoder-only | Text Encoder | 8192 | 768 | CC100 multilingual data | 2021-03 | Microsoft | Multilingual understanding | Text | Text | High multilingual accuracy on NER, POS, QA | |
XLM-RoBERTa Base | 0.125 | Encoder-only | Text Encoder | 512 | 768 | 2.5TB of common CommonCrawl data in 100 languages | 2019-11 | Facebook | Multilingual NLP tasks | Text | Text | High accuracy in XNLI, XQuAD, MLQA | |
mBERT | 0.110 | Encoder-only | Text Encoder | 512 | 768 | Wikipedia (Top 104 languages) | 2019 | Google | Multilingual understanding | Text | Text | Multilingual GLUE, XNLI | |
DistilXLM-R | 0.082 | Encoder-only | Text Encoder | 512 | 768 | Distilled from XLM-RoBERTa | 2020 | Hugging Face | Efficiency-focused model | Text | Text | Smaller, faster, with comparable accuracy | |
Llama 3.1 (8B) | 8 | Decoder-only | Text Decoder | 4096 | 4096 | Extensive multilingual and diverse text sources (15 trillion tokens) | 2024-07 | Meta | General-purpose text generation | Text | Text | High performance on benchmarks in 50+ languages | |
Llama 3.1 (70B) | 70 | Decoder-only | Text Decoder | 4096 | 4096 | Extensive multilingual and diverse text sources | 2024-07 | Meta | Advanced text generation | Text | Text | Outperforms previous Llama models in various tasks | |
Llama 3.1 (405B) | 405 | Decoder-only | Text Decoder | 4096 | 4096 | Extensive multilingual and diverse text sources | 2024-07 | Meta | Frontier-level AI capabilities | Text | Text | Competes with leading proprietary models | |
Llama 3.2 (1B) | 1 | Decoder-only | Text Decoder | 128000 | 4096 | Multilingual and domain-diverse corpus | 2024-09 | Meta | Lightweight, edge deployment | Text | Text | Competitive performance on summarization tasks | |
Llama 3.2 (3B) | 3 | Decoder-only | Text Decoder | 128000 | 4096 | Multilingual and domain-diverse corpus | 2024-09 | Meta | Efficient text generation | Text | Text | Matches or exceeds Llama 3.1 8B in certain tasks | |
Llama 3.2 (11B) | 11 | Decoder-only | Multimodal | 128000 | 4096 | Multilingual and domain-diverse corpus | 2024-09 | Meta | Text and image understanding | Text, Image | Text | Excels in visual reasoning and image captioning | |
Llama 3.2 (90B) | 90 | Decoder-only | Multimodal | 128000 | 4096 | Multilingual and domain-diverse corpus | 2024-09 | Meta | Advanced multimodal capabilities | Text, Image | Text | High performance in document-level understanding | |
XGLM-7.5B | 7.5 | Encoder-decoder | Seq2Seq Model | 2048 | 768 | 500B tokens across diverse languages | 2023* | Facebook | Multilingual text generation | Text | Text | High BLEU and ROUGE scores on translation tasks | |
mBART | 6 | Encoder-decoder | Seq2Seq Model | 1024 | 1024 | CC25 multilingual corpus | 2020-06 | Facebook | Translation and paraphrasing | Text | Text | High performance on translation benchmarks | |
T5 (11B) | 11 | Encoder-decoder | Seq2Seq Model | 512 | 1024 | C4 corpus (English-centric) | 2019-10 | Google | General NLP (text-gen, QA) | Text | Text | High scores in GLUE, SuperGLUE | |
GPT-3.5 | 175 | Decoder-only | Text Decoder | 4000 | - | Varied large-scale text data | 2022* | OpenAI | Text generation, QA | Text | Text | Competitive on common benchmarks | |
GPT-4 | 1800 | Decoder-only | Multimodal | 32000 | - | Diverse, curated large-scale text sources | 2023-03 | OpenAI | Text generation, reasoning | Text, Image | Text | Best-in-class on reasoning, code-gen, QA | Parameters are guestimated |
GPT-4 Turbo | 1800 | Decoder-only | Multimodal | 128000 | - | Diverse, curated large-scale text sources | 2023-11 | OpenAI | Faster, cost-effective GPT-4 | Text, Image | Text | Comparable to GPT-4 with improved efficiency | Parameters are guestimated |
GPT-4o | 1800 | Decoder-only | Multimodal | 128000 | - | Diverse, curated large-scale text sources | 2024-02 | OpenAI | Advanced reasoning capabilities | Text, Image, Audio | Text, Image, Audio | Enhanced reasoning and multimodal tasks | Parameters are guestimated |
GPT-4o mini | 8 | Decoder-only | Multimodal | 128000 | - | Diverse, curated large-scale text sources | 2024-07 | OpenAI | Cost-efficient text generation and reasoning | Text, Image, Audio, Video | Text, Image, Audio, Video | Scores 82% on MMLU; outperforms GPT-3.5 Turbo in chat preferences | Parameters are guestimated |
Claude 2 | 100 | Decoder-only | Text Decoder | 100000 | - | Curated datasets with extensive filtering | 2023-07 | Anthropic | Long-form text generation | Text | Text | Excels in safety, reasoning, and complex QA | |
Claude 3 | 100 | Decoder-only | Text Decoder | 200000 | - | Diverse datasets for high reasoning tasks | 2023-10 | Anthropic | Improved comprehension, safety | Text | Text | High scores in long-context benchmarks | |
Mistral (7B) | 7 | Decoder-only | Text Decoder | 4096 | 4096 | Varied large datasets for multilingual tasks | 2023-09 | Mistal AI | Open-source text generation | Text | Text | Outperforms Llama 2 13B on all benchmarks; approaches CodeLlama 7B performance on code | |
Ministral 8B 24.10 | 8 | Decoder-only | Text Decoder | 128000 | 4096 | Multilingual and code data | 2024-10 | Mistral AI | On-device and edge computing | Text | Text | Outperforms Mistral 7B; excels in edge applications | |
Ministral 3B 24.10 | 3 | Decoder-only | Text Decoder | 128000 | 4096 | Multilingual and code data | 2024-10 | Mistral AI | On-device and edge computing | Text | Text | Outperforms Mistral 7B; excels in edge applications | |
Gemini 1.5 Flash-8B | | Decoder-only | Multimodal | 1000000 | | Diverse, curated large-scale datasets | 2024-10 | Google | High-volume, high-frequency tasks | Text, Images, Audio | Text | Optimized for speed and efficiency; excels in summarization, chat applications, and data extraction | |
Gemini 1.5 Flash | | Decoder-only | Multimodal | 1000000 | | Diverse, curated large-scale datasets | 2024-05 | Google | Fast and versatile performance | Text, Images, Audio | Text | Balanced performance across diverse tasks; supports long context processing | |
Gemini 1.5 Flash | | Decoder-only | Multimodal | 1000000 | | Diverse, curated large-scale datasets | 2024-09 | Google | Enhanced speed and efficiency | Text, Images, Audio | Text | Improved latency and cost-efficiency; suitable for high-frequency tasks | |
Gemini 1.5 Pro | | Decoder-only | Multimodal | 200000 | | Diverse, curated large-scale datasets | 2024-05 | Google | Advanced reasoning tasks | Text, Images, Audio | Text | High performance in complex reasoning and multimodal tasks; supports extensive context | |
Gemini 1.5 Pro | | Decoder-only | Multimodal | 200000 | | Diverse, curated large-scale datasets | 2024-09 | Google | Enhanced reasoning capabilities | Text, Images, Audio | Text | Further improvements in reasoning and multimodal understanding; optimized for complex tasks | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |