Skip to content

Supported models

mistral.rs auto-detects the architecture from a repo’s config.json. To check yours:

  1. Open the model’s config.json on Hugging Face and read the architectures field (e.g. "Qwen3ForCausalLM", "Gemma4ForConditionalGeneration").
  2. Find the matching row below. Each architecture covers every checkpoint that reports that class, including future fine-tunes and sizes, so the families and examples here are a sample, not the full list.
  3. Not listed? You can still try it: force a known architecture with --arch, load a GGUF build, or request the model.
Terminal window
mistralrs run -m <model> # interactive
mistralrs serve -m <model> # OpenAI-compatible server

Expand the example in any row to copy a ready-to-run command. One loader often serves several brand names (Qwen 3.5 and 3.6 share Qwen3_5; LFM2 and LFM2.5 share Lfm2) - the Model families column lists them. Behavior that differs from the defaults is collected in model family notes.

The Architecture column is the config.json architectures value. Per-family quantization, thinking, gated-repo, and tool-calling details live in model family notes.

ArchitectureModel familiesExample
MistralForCausalLMMistral
mistralai/Mistral-7B-Instruct-v0.3mistralrs run -m mistralai/Mistral-7B-Instruct-v0.3
GemmaForCausalLMGemma
google/gemma-7b-itmistralrs run -m google/gemma-7b-it
MixtralForCausalLMMixtral
mistralai/Mixtral-8x7B-Instruct-v0.1mistralrs run -m mistralai/Mixtral-8x7B-Instruct-v0.1
LlamaForCausalLMLlama 2, Llama 3.x
meta-llama/Llama-3.1-8B-Instructmistralrs run -m meta-llama/Llama-3.1-8B-Instruct
PhiForCausalLMPhi-2
microsoft/phi-2mistralrs run -m microsoft/phi-2
Phi3ForCausalLMPhi-3, Phi-3.5
microsoft/Phi-3-medium-4k-instructmistralrs run -m microsoft/Phi-3-medium-4k-instruct
Qwen2ForCausalLMQwen2, Qwen2.5
Qwen/Qwen2.5-7B-Instruct (2.5), Qwen/Qwen2-7B-Instruct (2)mistralrs run -m Qwen/Qwen2.5-7B-Instruct
mistralrs run -m Qwen/Qwen2-7B-Instruct
Gemma2ForCausalLMGemma 2
google/gemma-2-9b-itmistralrs run -m google/gemma-2-9b-it
Starcoder2ForCausalLMStarcoder2
bigcode/starcoder2-7bmistralrs run -m bigcode/starcoder2-7b
PhiMoEForCausalLMPhi-3.5-MoE
microsoft/Phi-3.5-MoE-instructmistralrs run -m microsoft/Phi-3.5-MoE-instruct
DeepseekV2ForCausalLMDeepSeek-V2
deepseek-ai/DeepSeek-V2-Chatmistralrs run -m deepseek-ai/DeepSeek-V2-Chat
DeepseekV3ForCausalLMDeepSeek-V3, DeepSeek-R1
deepseek-ai/DeepSeek-V3 (V3), deepseek-ai/DeepSeek-R1 (R1)mistralrs run -m deepseek-ai/DeepSeek-V3
mistralrs run -m deepseek-ai/DeepSeek-R1
Qwen3ForCausalLMQwen3
Qwen/Qwen3-4Bmistralrs run -m Qwen/Qwen3-4B
Glm4ForCausalLMGLM-4
zai-org/GLM-4-32B-0414mistralrs run -m zai-org/GLM-4-32B-0414
Glm4MoeLiteForCausalLMGLM-4.7-Flash
zai-org/GLM-4.7-Flashmistralrs run -m zai-org/GLM-4.7-Flash
Glm4MoeForCausalLMGLM-4.7
zai-org/GLM-4.7mistralrs run -m zai-org/GLM-4.7
Qwen3MoeForCausalLMQwen3 MoE
Qwen/Qwen3-30B-A3Bmistralrs run -m Qwen/Qwen3-30B-A3B
SmolLM3ForCausalLMSmolLM3
HuggingFaceTB/SmolLM3-3Bmistralrs run -m HuggingFaceTB/SmolLM3-3B
GraniteMoeHybridForCausalLMGranite 4.0
ibm-granite/granite-4.0-micromistralrs run -m ibm-granite/granite-4.0-micro
GptOssForCausalLMGPT-OSS
openai/gpt-oss-20b (20b), openai/gpt-oss-120b (120b)mistralrs run -m openai/gpt-oss-20b
mistralrs run -m openai/gpt-oss-120b
HunYuanDenseV1ForCausalLMHunYuan
tencent/Hunyuan-7B-Instructmistralrs run -m tencent/Hunyuan-7B-Instruct
HunYuanMoEV1ForCausalLMHunYuan MoE
tencent/Hunyuan-A13B-Instructmistralrs run -m tencent/Hunyuan-A13B-Instruct
Qwen3NextForCausalLMQwen3-Next, Qwen3-Coder-Next
Qwen/Qwen3-Next-80B-A3B-Instructmistralrs run -m Qwen/Qwen3-Next-80B-A3B-Instruct
Lfm2ForCausalLMLFM2, LFM2.5
LiquidAI/LFM2.5-1.2B-Instruct (LFM2.5), LiquidAI/LFM2-1.2B (LFM2)mistralrs run -m LiquidAI/LFM2.5-1.2B-Instruct
mistralrs run -m LiquidAI/LFM2-1.2B
Lfm2MoeForCausalLMLFM2 MoE, LFM2.5 MoE
LiquidAI/LFM2.5-8B-A1B (LFM2.5), LiquidAI/LFM2-8B-A1B (LFM2)mistralrs run -m LiquidAI/LFM2.5-8B-A1B
mistralrs run -m LiquidAI/LFM2-8B-A1B
ArchitectureModel familiesExample
Phi3VForCausalLMPhi-3.5-Vision
microsoft/Phi-3.5-vision-instructmistralrs run -m microsoft/Phi-3.5-vision-instruct
Idefics2ForConditionalGenerationIdefics2
HuggingFaceM4/idefics2-8bmistralrs run -m HuggingFaceM4/idefics2-8b
LlavaNextForConditionalGenerationLLaVA-NeXT
llava-hf/llava-v1.6-mistral-7b-hfmistralrs run -m llava-hf/llava-v1.6-mistral-7b-hf
LlavaForConditionalGenerationLLaVA 1.5
llava-hf/llava-1.5-7b-hfmistralrs run -m llava-hf/llava-1.5-7b-hf
Lfm2VlForConditionalGenerationLFM2-VL, LFM2.5-VL
LiquidAI/LFM2.5-VL-1.6B (1.6B), LiquidAI/LFM2.5-VL-450M (450M)mistralrs run -m LiquidAI/LFM2.5-VL-1.6B
mistralrs run -m LiquidAI/LFM2.5-VL-450M
MllamaForConditionalGenerationLlama 3.2 Vision
meta-llama/Llama-3.2-11B-Vision-Instructmistralrs run -m meta-llama/Llama-3.2-11B-Vision-Instruct
Qwen2VLForConditionalGenerationQwen2-VL
Qwen/Qwen2-VL-7B-Instructmistralrs run -m Qwen/Qwen2-VL-7B-Instruct
Idefics3ForConditionalGenerationIdefics3, SmolVLM
HuggingFaceM4/Idefics3-8B-Llama3mistralrs run -m HuggingFaceM4/Idefics3-8B-Llama3
MiniCPMOMiniCPM-o
openbmb/MiniCPM-o-2_6mistralrs run -m openbmb/MiniCPM-o-2_6
Phi4MMForCausalLMPhi-4-multimodal
microsoft/Phi-4-multimodal-instructmistralrs run -m microsoft/Phi-4-multimodal-instruct
Qwen2_5_VLForConditionalGenerationQwen2.5-VL
Qwen/Qwen2.5-VL-7B-Instructmistralrs run -m Qwen/Qwen2.5-VL-7B-Instruct
Gemma3ForConditionalGenerationGemma 3
google/gemma-3-12b-itmistralrs run -m google/gemma-3-12b-it
Mistral3ForConditionalGenerationMistral Small 3
mistralai/Mistral-Small-3.2-24B-Instruct-2506mistralrs run -m mistralai/Mistral-Small-3.2-24B-Instruct-2506
Llama4ForConditionalGenerationLlama 4
meta-llama/Llama-4-Scout-17B-16E-Instructmistralrs run -m meta-llama/Llama-4-Scout-17B-16E-Instruct
Gemma3nForConditionalGenerationGemma 3n
google/gemma-3n-E4B-itmistralrs run -m google/gemma-3n-E4B-it
Qwen3VLForConditionalGenerationQwen3-VL
Qwen/Qwen3-VL-4B-Instructmistralrs run -m Qwen/Qwen3-VL-4B-Instruct
Qwen3VLMoeForConditionalGenerationQwen3-VL MoE
Qwen/Qwen3-VL-235B-A22B-Instructmistralrs run -m Qwen/Qwen3-VL-235B-A22B-Instruct
Qwen3_5ForConditionalGenerationQwen 3.5, Qwen 3.6
Qwen/Qwen3.5-27B (3.5), Qwen/Qwen3.6-27B (3.6)mistralrs run -m Qwen/Qwen3.5-27B
mistralrs run -m Qwen/Qwen3.6-27B
Qwen3_5MoeForConditionalGenerationQwen 3.5 MoE, Qwen 3.6 MoE
Qwen/Qwen3.5-35B-A3B (3.5), Qwen/Qwen3.6-35B-A3B (3.6)mistralrs run -m Qwen/Qwen3.5-35B-A3B
mistralrs run -m Qwen/Qwen3.6-35B-A3B
VoxtralForConditionalGenerationVoxtral
mistralai/Voxtral-Mini-3B-2507mistralrs run -m mistralai/Voxtral-Mini-3B-2507
Gemma4ForConditionalGenerationGemma 4
google/gemma-4-E4B-it (E4B), google/gemma-4-26B-A4B-it (26B-A4B MoE), google/gemma-4-31B-it (31B dense)mistralrs run -m google/gemma-4-E4B-it
mistralrs run -m google/gemma-4-26B-A4B-it
mistralrs run -m google/gemma-4-31B-it
DiffusionGemmaForBlockDiffusionDiffusionGemma
google/diffusiongemma-26B-A4B-itmistralrs run -m google/diffusiongemma-26B-A4B-it
ArchitectureModel familiesExample
FluxFLUX.1
black-forest-labs/FLUX.1-schnellmistralrs run -m black-forest-labs/FLUX.1-schnell
FluxOffloadedFLUX.1 (offloaded)
black-forest-labs/FLUX.1-schnellmistralrs run -m black-forest-labs/FLUX.1-schnell
ArchitectureModel familiesExample
DiaDia
nari-labs/Dia-1.6Bmistralrs run -m nari-labs/Dia-1.6B
ArchitectureModel familiesExample
Gemma3TextModelEmbeddingGemma
google/embeddinggemma-300mmistralrs run -m google/embeddinggemma-300m
Qwen3ForCausalLMQwen3 Embedding
Qwen/Qwen3-Embedding-0.6Bmistralrs run -m Qwen/Qwen3-Embedding-0.6B

Text, multimodal, speech, and embedding models support ISQ at load time. Diffusion models (FLUX) do not; they load at native precision. Pre-quantized format availability (GGUF, UQFF, GPTQ, AWQ) is per-model on Hugging Face.

ModeTarget architectureAssistant checkpoint familyGuide
MTPGemma4Gemma 4 assistant checkpoints, PagedAttention requiredSpeculative decoding (MTP)