Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Supported Models

Complete reference for model support in mistral.rs.

Model Categories

Text Models

  • Granite 4.0
  • SmolLM 3
  • DeepSeek V3
  • GPT-OSS
  • DeepSeek V2
  • Qwen 3 MoE
  • Phi 3.5 MoE
  • Qwen 3
  • GLM 4
  • GLM-4.7-Flash
  • GLM-4.7 (MoE)
  • Gemma 2
  • Qwen 2
  • Starcoder 2
  • Phi 3
  • Mixtral
  • Phi 2
  • Gemma
  • Llama
  • Mistral

Vision Models

  • Qwen 3-VL
  • Gemma 3n
  • Llama 4
  • Gemma 3
  • Mistral 3
  • Phi 4 multimodal
  • Qwen 2.5-VL
  • MiniCPM-O
  • Llama 3.2 Vision
  • Qwen 2-VL
  • Idefics 3
  • Idefics 2
  • LLaVA Next
  • LLaVA
  • Phi 3V

Speech Models

  • Dia

Image Generation Models

  • FLUX

Embedding Models

  • Embedding Gemma
  • Qwen 3 Embedding

Request a new model

Supported GGUF Architectures

Plain:

  • llama
  • phi2
  • phi3
  • starcoder2
  • qwen2
  • qwen3

With adapters:

  • llama
  • phi3

Quantization Support

ModelGGUFGGMLISQ
Mistral
Gemma
Llama
Mixtral
Phi 2
Phi 3
Phi 3.5 MoE
Qwen 2.5
Phi 3 Vision
Idefics 2
Gemma 2
GLM4
GLM-4.7-Flash (MoE)
GLM-4.7 (MoE)
Starcoder 2
LLaVa Next
LLaVa
Llama 3.2 Vision
Qwen2-VL
Idefics 3
Deepseek V2
Deepseek V3
MiniCPM-O 2.6
Qwen2.5-VL
Gemma 3
Mistral 3
Llama 4
Qwen 3
SmolLM3
Dia 1.6b
Gemma 3n
Qwen 3 VL
Granite 4.0
GPT-OSS

Device Mapping Support

Model categorySupported
Plain
GGUF
GGML
Vision Plain

X-LoRA and LoRA Support

ModelX-LoRAX-LoRA+GGUFX-LoRA+GGML
Mistral
Gemma
Llama
Mixtral
Phi 2
Phi 3
Phi 3.5 MoE
Qwen 2.5
Phi 3 Vision
Idefics 2
Gemma 2
GLM4
GLM-4.7-Flash (MoE)
GLM-4.7 (MoE)
Starcoder 2
LLaVa Next
LLaVa
Qwen2-VL
Idefics 3
Deepseek V2
Deepseek V3
MiniCPM-O 2.6
Qwen2.5-VL
Gemma 3
Mistral 3
Llama 4
Qwen 3
SmolLM3
Gemma 3n
Qwen 3 VL
Granite 4.0
GPT-OSS

AnyMoE Support

ModelAnyMoE
Mistral 7B
Gemma
Llama
Mixtral
Phi 2
Phi 3
Phi 3.5 MoE
Qwen 2.5
Phi 3 Vision
Idefics 2
Gemma 2
GLM-4.7-Flash (MoE)
GLM-4.7 (MoE)
Starcoder 2
LLaVa Next
LLaVa
Llama 3.2 Vision
Qwen2-VL
Idefics 3
Deepseek V2
Deepseek V3
MiniCPM-O 2.6
Qwen2.5-VL
Gemma 3
Mistral 3
Llama 4
Qwen 3
SmolLM3
Gemma 3n
Qwen 3 VL
Granite 4.0
GPT-OSS

Using Derivative Models

Model type is auto-detected. Use flags for quantized models and adapters:

Model TypeRequired Arguments
Plain-m <model-id>
GGUF Quantized-m <model-id> --format gguf -f <file>
ISQ Quantized-m <model-id> --isq <level>
UQFF Quantized-m <model-id> --from-uqff <file>
LoRA-m <model-id> --lora <adapter>
X-LoRA-m <model-id> --xlora <adapter> --xlora-order <file>

Example: Zephyr GGUF model

mistralrs serve -p 1234 --log output.txt --format gguf -t HuggingFaceH4/zephyr-7b-beta -m TheBloke/zephyr-7B-beta-GGUF -f zephyr-7b-beta.Q5_0.gguf

Chat Templates and Tokenizer

Mistral.rs will attempt to automatically load a chat template and tokenizer. This enables high flexibility across models and ensures accurate and flexible chat templating. However, this behavior can be customized.