Core Concepts
Models
LLM Models Available for Fine-tuning
This table provides an overview of the Large Language Models (LLMs) available for fine-tuning, ordered approximately from most well-known to least familiar. It lists key details for each model, including its name, family, parameter count, context length, and additional features.
Model Name | Family | Parameters (B) | Context Length | vLLM Support | LoRA Support |
---|---|---|---|---|---|
Nvidia-Llama-3.1-Nemotron-70B-Instruct-HF | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.2-3B-Instruct | llama3.2 | 3 | 131,072 | Yes | Yes |
Meta-Llama-3.2-1B-Instruct | llama3.2 | 1 | 131,072 | Yes | Yes |
Mistral-Small-Instruct-2409 | mistral | 7.2 | 128,000 | Yes | Yes |
Ministral-8B-Instruct-2410 | mistral | 8 | 128,000 | Yes | Yes |
Mathstral-7B-v0.1 | mistral | 7 | 32,000 | Yes | Yes |
Qwen2.5-Coder-7B-Instruct | qwen2.5 | 7 | 32,768 | Yes | Yes |
Aya-Expanse-32b | aya | 32 | 128,000 | Yes | No |
Aya-Expanse-8b | aya | 8 | 8,000 | Yes | No |
Nemotron-Mini-4B-Instruct | nemotron | 4 | 4,096 | Yes | No |
Gemma-2-2b-it | gemma2 | 2 | 8,192 | Yes | Yes |
Meta-Llama-3.1-70B-Instruct | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-70B-Instruct | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-70B | llama3.1 | 70 | 131,072 | Yes | Yes |
Meta-Llama-3.1-8B-Instruct | llama3.1 | 8 | 131,072 | Yes | Yes |
Meta-Llama-3.1-8B | llama3.1 | 8 | 131,072 | Yes | Yes |
Meta-Llama-3-70B-Instruct | llama3 | 70 | 8,192 | Yes | Yes |
Meta-Llama-3-70B | llama3 | 70 | 8,192 | Yes | Yes |
Meta-Llama-3-8B-Instruct | llama3 | 8 | 8,192 | Yes | Yes |
Meta-Llama-3-8B | llama3 | 8 | 8,192 | Yes | Yes |
Mixtral-8x7B-Instruct-v0.1 | mixtral | 46.7 | 32,768 | Yes | Yes |
Mistral-7B-Instruct-v0.3 | mistral | 7.2 | 32,768 | Yes | Yes |
Mistral-Nemo-Instruct-2407 | mistral | 12.2 | 128,000 | No | No |
Mistral-Nemo-Base-2407 | mistral | 12.2 | 128,000 | No | No |
Gemma-2-27b-it | gemma2 | 27 | 8,192 | Yes | Yes |
Gemma-2-27b | gemma2 | 27 | 8,192 | Yes | Yes |
Gemma-2-9b-it | gemma2 | 9 | 8,192 | Yes | Yes |
Gemma-2-9b | gemma2 | 9 | 8,192 | Yes | Yes |
Phi-3-medium-128k-instruct | phi3 | 14 | 128,000 | Yes | No |
Phi-3-medium-4k-instruct | phi3 | 14 | 4,000 | Yes | No |
Phi-3-small-128k-instruct | phi3 | 7.4 | 128,000 | Yes | No |
Phi-3-small-8k-instruct | phi3 | 7.4 | 8,000 | Yes | No |
Phi-3-mini-128k-instruct | phi3 | 3.8 | 128,000 | Yes | No |
Phi-3-mini-4k-instruct | phi3 | 3.8 | 4,096 | Yes | No |
Qwen2-72B-Instruct | qwen2 | 72 | 32,768 | Yes | Yes |
Qwen2-72B | qwen2 | 72 | 32,768 | Yes | Yes |
Qwen2-57B-A14B-Instruct | qwen2 | 57 | 32,768 | Yes | Yes |
Qwen2-57B-A14B | qwen2 | 57 | 32,768 | Yes | Yes |
Qwen2-7B-Instruct | qwen2 | 7 | 32,768 | Yes | Yes |
Qwen2-7B | qwen2 | 7 | 32,768 | Yes | Yes |
Qwen2-1.5B-Instruct | qwen2 | 1.5 | 32,768 | Yes | Yes |
Qwen2-1.5B | qwen2 | 1.5 | 32,768 | Yes | Yes |
Qwen2-0.5B-Instruct | qwen2 | 0.5 | 32,768 | Yes | Yes |
Qwen2-0.5B | qwen2 | 0.5 | 32,768 | Yes | Yes |
TinyLlama_v1.1 | tinyllama | 1.1 | 2,048 | No | No |
DeepSeek-Coder-V2-Lite-Base | deepseek-coder-v2 | 16 | 163,840 | No | No |
InternLM2_5-7B-Chat | internlm2.5 | 7.74 | 1,000,000 | Yes | No |
InternLM2_5-7B | internlm2.5 | 7.74 | 1,000,000 | Yes | No |
Jamba-v0.1 | jamba | 51.6 | 256,000 | Yes | Yes |
Yi-1.5-34B-Chat | yi-1.5 | 34.4 | 4,000 | Yes | Yes |
Yi-1.5-34B | yi-1.5 | 34.4 | 4,000 | Yes | Yes |
Yi-1.5-34B-32K | yi-1.5 | 34.4 | 32,000 | Yes | Yes |
Yi-1.5-34B-Chat-16K | yi-1.5 | 34.4 | 16,000 | Yes | Yes |
Yi-1.5-9B-Chat | yi-1.5 | 8.83 | 4,000 | Yes | Yes |
Yi-1.5-9B | yi-1.5 | 8.83 | 4,000 | Yes | Yes |
Yi-1.5-9B-32K | yi-1.5 | 8.83 | 32,000 | Yes | Yes |
Yi-1.5-9B-Chat-16K | yi-1.5 | 8.83 | 16,000 | Yes | Yes |
Yi-1.5-6B-Chat | yi-1.5 | 6 | 4,000 | Yes | Yes |
Yi-1.5-6B | yi-1.5 | 6 | 4,000 | Yes | Yes |
c4ai-command-r-v01 | command-r | 35 | 131,072 | Yes | No |
Notes:
- “vLLM Support” indicates whether the model is compatible with the vLLM (very Large Language Model) inference framework.
- “LoRA Support” indicates if the vLLM support inference the model with multiple LorA Adapters. Read more
- Context length is measured in tokens. (The model context can change by the target inference library)
- Parameter count is shown in billions (B).
- Links lead to the model’s page on Hugging Face or the official website when available.
This table provides a comprehensive overview of the available models, their sizes, capabilities, and support for various fine-tuning techniques. When choosing a model for fine-tuning, consider factors such as the model size, context length, and support for specific optimization techniques like vLLM and LoRA.