Supported models

Is my model supported?

mistral.rs auto-detects the architecture from a repo’s config.json. To check yours:

Open the model’s config.json on Hugging Face and read the architectures field (e.g. "Qwen3ForCausalLM", "Gemma4ForConditionalGeneration").
Find the matching row below. Each architecture covers every checkpoint that reports that class, including future fine-tunes and sizes, so the families and examples here are a sample, not the full list.
Not listed? You can still try it: force a known architecture with --arch, load a GGUF build, or request the model.

mistralrs run -m <model>     # interactive
mistralrs serve -m <model>   # OpenAI-compatible server

Expand the example in any row to copy a ready-to-run command. One loader often serves several brand names (Qwen 3.5 and 3.6 share Qwen3_5; LFM2 and LFM2.5 share Lfm2) - the Model families column lists them. Behavior that differs from the defaults is collected in model family notes.

The Architecture column is the config.json architectures value. Per-family quantization, thinking, gated-repo, and tool-calling details live in model family notes.

Text models

Architecture	Model families	Example
`MistralForCausalLM`	Mistral	`mistralai/Mistral-7B-Instruct-v0.3` `mistralrs run -m mistralai/Mistral-7B-Instruct-v0.3`
`GemmaForCausalLM`	Gemma	`google/gemma-7b-it` `mistralrs run -m google/gemma-7b-it`
`MixtralForCausalLM`	Mixtral	`mistralai/Mixtral-8x7B-Instruct-v0.1` `mistralrs run -m mistralai/Mixtral-8x7B-Instruct-v0.1`
`LlamaForCausalLM`	Llama 2, Llama 3.x	`meta-llama/Llama-3.1-8B-Instruct` `mistralrs run -m meta-llama/Llama-3.1-8B-Instruct`
`PhiForCausalLM`	Phi-2	`microsoft/phi-2` `mistralrs run -m microsoft/phi-2`
`Phi3ForCausalLM`	Phi-3, Phi-3.5	`microsoft/Phi-3-medium-4k-instruct` `mistralrs run -m microsoft/Phi-3-medium-4k-instruct`
`Qwen2ForCausalLM`	Qwen2, Qwen2.5	`Qwen/Qwen2.5-7B-Instruct` (2.5), `Qwen/Qwen2-7B-Instruct` (2) `mistralrs run -m Qwen/Qwen2.5-7B-Instruct` `mistralrs run -m Qwen/Qwen2-7B-Instruct`
`Gemma2ForCausalLM`	Gemma 2	`google/gemma-2-9b-it` `mistralrs run -m google/gemma-2-9b-it`
`Starcoder2ForCausalLM`	Starcoder2	`bigcode/starcoder2-7b` `mistralrs run -m bigcode/starcoder2-7b`
`PhiMoEForCausalLM`	Phi-3.5-MoE	`microsoft/Phi-3.5-MoE-instruct` `mistralrs run -m microsoft/Phi-3.5-MoE-instruct`
`DeepseekV2ForCausalLM`	DeepSeek-V2	`deepseek-ai/DeepSeek-V2-Chat` `mistralrs run -m deepseek-ai/DeepSeek-V2-Chat`
`DeepseekV3ForCausalLM`	DeepSeek-V3, DeepSeek-R1	`deepseek-ai/DeepSeek-V3` (V3), `deepseek-ai/DeepSeek-R1` (R1) `mistralrs run -m deepseek-ai/DeepSeek-V3` `mistralrs run -m deepseek-ai/DeepSeek-R1`
`Qwen3ForCausalLM`	Qwen3	`Qwen/Qwen3-4B` `mistralrs run -m Qwen/Qwen3-4B`
`Glm4ForCausalLM`	GLM-4	`zai-org/GLM-4-32B-0414` `mistralrs run -m zai-org/GLM-4-32B-0414`
`Glm4MoeLiteForCausalLM`	GLM-4.7-Flash	`zai-org/GLM-4.7-Flash` `mistralrs run -m zai-org/GLM-4.7-Flash`
`Glm4MoeForCausalLM`	GLM-4.7	`zai-org/GLM-4.7` `mistralrs run -m zai-org/GLM-4.7`
`Qwen3MoeForCausalLM`	Qwen3 MoE	`Qwen/Qwen3-30B-A3B` `mistralrs run -m Qwen/Qwen3-30B-A3B`
`SmolLM3ForCausalLM`	SmolLM3	`HuggingFaceTB/SmolLM3-3B` `mistralrs run -m HuggingFaceTB/SmolLM3-3B`
`GraniteMoeHybridForCausalLM`	Granite 4.0	`ibm-granite/granite-4.0-micro` `mistralrs run -m ibm-granite/granite-4.0-micro`
`GptOssForCausalLM`	GPT-OSS	`openai/gpt-oss-20b` (20b), `openai/gpt-oss-120b` (120b) `mistralrs run -m openai/gpt-oss-20b` `mistralrs run -m openai/gpt-oss-120b`
`HunYuanDenseV1ForCausalLM`	HunYuan	`tencent/Hunyuan-7B-Instruct` `mistralrs run -m tencent/Hunyuan-7B-Instruct`
`HunYuanMoEV1ForCausalLM`	HunYuan MoE	`tencent/Hunyuan-A13B-Instruct` `mistralrs run -m tencent/Hunyuan-A13B-Instruct`
`Qwen3NextForCausalLM`	Qwen3-Next, Qwen3-Coder-Next	`Qwen/Qwen3-Next-80B-A3B-Instruct` `mistralrs run -m Qwen/Qwen3-Next-80B-A3B-Instruct`
`Lfm2ForCausalLM`	LFM2, LFM2.5	`LiquidAI/LFM2.5-1.2B-Instruct` (LFM2.5), `LiquidAI/LFM2-1.2B` (LFM2) `mistralrs run -m LiquidAI/LFM2.5-1.2B-Instruct` `mistralrs run -m LiquidAI/LFM2-1.2B`
`Lfm2MoeForCausalLM`	LFM2 MoE, LFM2.5 MoE	`LiquidAI/LFM2.5-8B-A1B` (LFM2.5), `LiquidAI/LFM2-8B-A1B` (LFM2) `mistralrs run -m LiquidAI/LFM2.5-8B-A1B` `mistralrs run -m LiquidAI/LFM2-8B-A1B`

Multimodal models

Architecture	Model families	Example
`Phi3VForCausalLM`	Phi-3.5-Vision	`microsoft/Phi-3.5-vision-instruct` `mistralrs run -m microsoft/Phi-3.5-vision-instruct`
`Idefics2ForConditionalGeneration`	Idefics2	`HuggingFaceM4/idefics2-8b` `mistralrs run -m HuggingFaceM4/idefics2-8b`
`LlavaNextForConditionalGeneration`	LLaVA-NeXT	`llava-hf/llava-v1.6-mistral-7b-hf` `mistralrs run -m llava-hf/llava-v1.6-mistral-7b-hf`
`LlavaForConditionalGeneration`	LLaVA 1.5	`llava-hf/llava-1.5-7b-hf` `mistralrs run -m llava-hf/llava-1.5-7b-hf`
`Lfm2VlForConditionalGeneration`	LFM2-VL, LFM2.5-VL	`LiquidAI/LFM2.5-VL-1.6B` (1.6B), `LiquidAI/LFM2.5-VL-450M` (450M) `mistralrs run -m LiquidAI/LFM2.5-VL-1.6B` `mistralrs run -m LiquidAI/LFM2.5-VL-450M`
`MllamaForConditionalGeneration`	Llama 3.2 Vision	`meta-llama/Llama-3.2-11B-Vision-Instruct` `mistralrs run -m meta-llama/Llama-3.2-11B-Vision-Instruct`
`Qwen2VLForConditionalGeneration`	Qwen2-VL	`Qwen/Qwen2-VL-7B-Instruct` `mistralrs run -m Qwen/Qwen2-VL-7B-Instruct`
`Idefics3ForConditionalGeneration`	Idefics3, SmolVLM	`HuggingFaceM4/Idefics3-8B-Llama3` `mistralrs run -m HuggingFaceM4/Idefics3-8B-Llama3`
`MiniCPMO`	MiniCPM-o	`openbmb/MiniCPM-o-2_6` `mistralrs run -m openbmb/MiniCPM-o-2_6`
`Phi4MMForCausalLM`	Phi-4-multimodal	`microsoft/Phi-4-multimodal-instruct` `mistralrs run -m microsoft/Phi-4-multimodal-instruct`
`Qwen2_5_VLForConditionalGeneration`	Qwen2.5-VL	`Qwen/Qwen2.5-VL-7B-Instruct` `mistralrs run -m Qwen/Qwen2.5-VL-7B-Instruct`
`Gemma3ForConditionalGeneration`	Gemma 3	`google/gemma-3-12b-it` `mistralrs run -m google/gemma-3-12b-it`
`Mistral3ForConditionalGeneration`	Mistral Small 3	`mistralai/Mistral-Small-3.2-24B-Instruct-2506` `mistralrs run -m mistralai/Mistral-Small-3.2-24B-Instruct-2506`
`Llama4ForConditionalGeneration`	Llama 4	`meta-llama/Llama-4-Scout-17B-16E-Instruct` `mistralrs run -m meta-llama/Llama-4-Scout-17B-16E-Instruct`
`Gemma3nForConditionalGeneration`	Gemma 3n	`google/gemma-3n-E4B-it` `mistralrs run -m google/gemma-3n-E4B-it`
`Qwen3VLForConditionalGeneration`	Qwen3-VL	`Qwen/Qwen3-VL-4B-Instruct` `mistralrs run -m Qwen/Qwen3-VL-4B-Instruct`
`Qwen3VLMoeForConditionalGeneration`	Qwen3-VL MoE	`Qwen/Qwen3-VL-235B-A22B-Instruct` `mistralrs run -m Qwen/Qwen3-VL-235B-A22B-Instruct`
`Qwen3_5ForConditionalGeneration`	Qwen 3.5, Qwen 3.6	`Qwen/Qwen3.5-27B` (3.5), `Qwen/Qwen3.6-27B` (3.6) `mistralrs run -m Qwen/Qwen3.5-27B` `mistralrs run -m Qwen/Qwen3.6-27B`
`Qwen3_5MoeForConditionalGeneration`	Qwen 3.5 MoE, Qwen 3.6 MoE	`Qwen/Qwen3.5-35B-A3B` (3.5), `Qwen/Qwen3.6-35B-A3B` (3.6) `mistralrs run -m Qwen/Qwen3.5-35B-A3B` `mistralrs run -m Qwen/Qwen3.6-35B-A3B`
`VoxtralForConditionalGeneration`	Voxtral	`mistralai/Voxtral-Mini-3B-2507` `mistralrs run -m mistralai/Voxtral-Mini-3B-2507`
`Gemma4ForConditionalGeneration`	Gemma 4	`google/gemma-4-E4B-it` (E4B), `google/gemma-4-26B-A4B-it` (26B-A4B MoE), `google/gemma-4-31B-it` (31B dense) `mistralrs run -m google/gemma-4-E4B-it` `mistralrs run -m google/gemma-4-26B-A4B-it` `mistralrs run -m google/gemma-4-31B-it`
`DiffusionGemmaForBlockDiffusion`	DiffusionGemma	`google/diffusiongemma-26B-A4B-it` `mistralrs run -m google/diffusiongemma-26B-A4B-it`

Image generation

Architecture	Model families	Example
`Flux`	FLUX.1	`black-forest-labs/FLUX.1-schnell` `mistralrs run -m black-forest-labs/FLUX.1-schnell`
`FluxOffloaded`	FLUX.1 (offloaded)	`black-forest-labs/FLUX.1-schnell` `mistralrs run -m black-forest-labs/FLUX.1-schnell`

Speech

Architecture	Model families	Example
`Dia`	Dia	`nari-labs/Dia-1.6B` `mistralrs run -m nari-labs/Dia-1.6B`

Embedding

Architecture	Model families	Example
`Gemma3TextModel`	EmbeddingGemma	`google/embeddinggemma-300m` `mistralrs run -m google/embeddinggemma-300m`
`Qwen3ForCausalLM`	Qwen3 Embedding	`Qwen/Qwen3-Embedding-0.6B` `mistralrs run -m Qwen/Qwen3-Embedding-0.6B`

Format and quantization notes

Text, multimodal, speech, and embedding models support ISQ at load time. Diffusion models (FLUX) do not; they load at native precision. Pre-quantized format availability (GGUF, UQFF, GPTQ, AWQ) is per-model on Hugging Face.

Speculative decoding

Mode	Target architecture	Assistant checkpoint family	Guide
MTP	`Gemma4`	Gemma 4 assistant checkpoints, PagedAttention required	Speculative decoding (MTP)