ModelParameters (in billions)ArchitectureTypeMax Context Size (tokens)Feature Vector DimensionallyTraining dataReleased onOwnerFunctionalityInput ModalityOutput ModalityBenchmarks / Performance
mDeBERTa V3 Base0.276Encoder-onlyText Encoder8192768CC100 multilingual data2021-03MicrosoftMultilingual understandingTextTextHigh multilingual accuracy on NER, POS, QA
XLM-RoBERTa Base0.125Encoder-onlyText Encoder5127682.5TB of common CommonCrawl data in 100 languages2019-11FacebookMultilingual NLP tasksTextTextHigh accuracy in XNLI, XQuAD, MLQA
mBERT0.110Encoder-onlyText Encoder512768Wikipedia (Top 104 languages)2019GoogleMultilingual understandingTextTextMultilingual GLUE, XNLI
DistilXLM-R0.082Encoder-onlyText Encoder512768Distilled from XLM-RoBERTa2020Hugging FaceEfficiency-focused modelTextTextSmaller, faster, with comparable accuracy
Llama 3.1 (8B)8Decoder-onlyText Decoder40964096Extensive multilingual and diverse text sources (15 trillion tokens)2024-07MetaGeneral-purpose text generationTextTextHigh performance on benchmarks in 50+ languages
Llama 3.1 (70B)70Decoder-onlyText Decoder40964096Extensive multilingual and diverse text sources2024-07MetaAdvanced text generationTextTextOutperforms previous Llama models in various tasks
Llama 3.1 (405B)405Decoder-onlyText Decoder40964096Extensive multilingual and diverse text sources2024-07MetaFrontier-level AI capabilitiesTextTextCompetes with leading proprietary models
Llama 3.2 (1B)1Decoder-onlyText Decoder1280004096Multilingual and domain-diverse corpus2024-09MetaLightweight, edge deploymentTextTextCompetitive performance on summarization tasks
Llama 3.2 (3B)3Decoder-onlyText Decoder1280004096Multilingual and domain-diverse corpus2024-09MetaEfficient text generationTextTextMatches or exceeds Llama 3.1 8B in certain tasks
Llama 3.2 (11B)11Decoder-onlyMultimodal1280004096Multilingual and domain-diverse corpus2024-09MetaText and image understandingText, ImageTextExcels in visual reasoning and image captioning
Llama 3.2 (90B)90Decoder-onlyMultimodal1280004096Multilingual and domain-diverse corpus2024-09MetaAdvanced multimodal capabilitiesText, ImageTextHigh performance in document-level understanding
XGLM-7.5B7.5Encoder-decoderSeq2Seq Model2048768500B tokens across diverse languages2023*FacebookMultilingual text generationTextTextHigh BLEU and ROUGE scores on translation tasks
mBART6Encoder-decoderSeq2Seq Model10241024CC25 multilingual corpus2020-06FacebookTranslation and paraphrasingTextTextHigh performance on translation benchmarks
T5 (11B)11Encoder-decoderSeq2Seq Model5121024C4 corpus (English-centric)2019-10GoogleGeneral NLP (text-gen, QA)TextTextHigh scores in GLUE, SuperGLUE
GPT-3.5175Decoder-onlyText Decoder4000-Varied large-scale text data2022*OpenAIText generation, QATextTextCompetitive on common benchmarks
GPT-41800Decoder-onlyMultimodal32000-Diverse, curated large-scale text sources2023-03OpenAIText generation, reasoningText, ImageTextBest-in-class on reasoning, code-gen, QAParameters are guestimated
GPT-4 Turbo1800Decoder-onlyMultimodal128000-Diverse, curated large-scale text sources2023-11OpenAIFaster, cost-effective GPT-4Text, ImageTextComparable to GPT-4 with improved efficiencyParameters are guestimated
GPT-4o1800Decoder-onlyMultimodal128000-Diverse, curated large-scale text sources2024-02OpenAIAdvanced reasoning capabilitiesText, Image, AudioText, Image, AudioEnhanced reasoning and multimodal tasksParameters are guestimated
GPT-4o mini8Decoder-onlyMultimodal128000-Diverse, curated large-scale text sources2024-07OpenAICost-efficient text generation and reasoningText, Image, Audio, VideoText, Image, Audio, VideoScores 82% on MMLU; outperforms GPT-3.5 Turbo in chat preferencesParameters are guestimated
Claude 2100Decoder-onlyText Decoder100000-Curated datasets with extensive filtering2023-07AnthropicLong-form text generationTextTextExcels in safety, reasoning, and complex QA
Claude 3100Decoder-onlyText Decoder200000-Diverse datasets for high reasoning tasks2023-10AnthropicImproved comprehension, safetyTextTextHigh scores in long-context benchmarks
Mistral (7B)7Decoder-onlyText Decoder40964096Varied large datasets for multilingual tasks2023-09Mistal AIOpen-source text generationTextTextOutperforms Llama 2 13B on all benchmarks; approaches CodeLlama 7B performance on code
Ministral 8B 24.108Decoder-onlyText Decoder1280004096Multilingual and code data2024-10Mistral AIOn-device and edge computingTextTextOutperforms Mistral 7B; excels in edge applications
Ministral 3B 24.103Decoder-onlyText Decoder1280004096Multilingual and code data2024-10Mistral AIOn-device and edge computingTextTextOutperforms Mistral 7B; excels in edge applications
Gemini 1.5 Flash-8BDecoder-onlyMultimodal1000000Diverse, curated large-scale datasets2024-10GoogleHigh-volume, high-frequency tasksText, Images, AudioTextOptimized for speed and efficiency; excels in summarization, chat applications, and data extraction
Gemini 1.5 FlashDecoder-onlyMultimodal1000000Diverse, curated large-scale datasets2024-05GoogleFast and versatile performanceText, Images, AudioTextBalanced performance across diverse tasks; supports long context processing
Gemini 1.5 FlashDecoder-onlyMultimodal1000000Diverse, curated large-scale datasets2024-09GoogleEnhanced speed and efficiencyText, Images, AudioTextImproved latency and cost-efficiency; suitable for high-frequency tasks
Gemini 1.5 ProDecoder-onlyMultimodal200000Diverse, curated large-scale datasets2024-05GoogleAdvanced reasoning tasksText, Images, AudioTextHigh performance in complex reasoning and multimodal tasks; supports extensive context
Gemini 1.5 ProDecoder-onlyMultimodal200000Diverse, curated large-scale datasets2024-09GoogleEnhanced reasoning capabilitiesText, Images, AudioTextFurther improvements in reasoning and multimodal understanding; optimized for complex tasks