Skip to content

Supported models

Supported model architectures. Specific model sizes within each family are on Hugging Face. Architecture names below match the SDK enum variants (Python Architecture / MultimodalArchitecture / EmbeddingArchitecture / DiffusionArchitecture). The text-only --arch CLI flag accepts the lowercase form (mistral, gpt_oss, glm4moe, …); multimodal, speech, and diffusion architectures are auto-detected and not selectable via --arch.

To run:

Terminal window
mistralrs run -m <model>
mistralrs serve -m <model>

Passing --arch is only necessary in rare cases.

ArchitectureExample repo
Mistralmistralai/Mistral-7B-Instruct-v0.3
Gemmagoogle/gemma-7b-it
Mixtralmistralai/Mixtral-8x7B-Instruct-v0.1
Llamameta-llama/Llama-3.1-8B-Instruct
Phi2microsoft/phi-2
Phi3microsoft/Phi-3-medium-4k-instruct
Qwen2Qwen/Qwen2-7B-Instruct
Gemma2google/gemma-2-9b-it
Starcoder2bigcode/starcoder2-7b
Phi3_5MoEmicrosoft/Phi-3.5-MoE-instruct
DeepSeekV2deepseek-ai/DeepSeek-V2-Chat
DeepSeekV3deepseek-ai/DeepSeek-V3
Qwen3Qwen/Qwen3-4B
GLM4zai-org/GLM-4-32B-0414
GLM4Moezai-org/GLM-4.7
GLM4MoeLitezai-org/GLM-4.7-Flash
Qwen3MoeQwen/Qwen3-30B-A3B
SmolLm3HuggingFaceTB/SmolLM3-3B
GraniteMoeHybridibm-granite/granite-4.0-micro
GptOssopenai/gpt-oss-20b
Qwen3NextQwen/Qwen3-Next-80B-A3B-Instruct
ArchitectureExample repoModalities
Phi3Vmicrosoft/Phi-3.5-vision-instructText, image
Idefics2HuggingFaceM4/idefics2-8bText, image
LLaVANextllava-hf/llava-v1.6-mistral-7b-hfText, image
LLaVAllava-hf/llava-1.5-7b-hfText, image
VLlamameta-llama/Llama-3.2-11B-Vision-InstructText, image
Qwen2VLQwen/Qwen2-VL-7B-InstructText, image, video
Idefics3HuggingFaceM4/Idefics3-8B-Llama3Text, image
MiniCpmOopenbmb/MiniCPM-o-2_6Text, image, audio
Phi4MMmicrosoft/Phi-4-multimodal-instructText, image, audio
Qwen2_5VLQwen/Qwen2.5-VL-7B-InstructText, image, video
Gemma3google/gemma-3-12b-itText, image
Mistral3mistralai/Mistral-Small-3.2-24B-Instruct-2506Text, image
Llama4meta-llama/Llama-4-Scout-17B-16E-InstructText, image
Gemma3ngoogle/gemma-3n-E4B-itText, image, audio, video
Qwen3VLQwen/Qwen3-VL-4B-InstructText, image, video
Qwen3VLMoEQwen/Qwen3-VL-235B-A22B-InstructText, image, video
Qwen3_5Qwen/Qwen3.5-27BText, image
Qwen3_5MoeQwen/Qwen3.5-35B-A3BText, image
Voxtralmistralai/Voxtral-Mini-3B-2507Text, audio
Gemma4google/gemma-4-E4B-itText, image, audio, video
ArchitectureExample repo
Fluxblack-forest-labs/FLUX.1-schnell
FluxOffloadedblack-forest-labs/FLUX.1-schnell

FluxOffloaded loads the same FLUX checkpoints as Flux with CPU offload enabled for memory-constrained hosts.

ArchitectureExample repoDirection
Dianari-labs/Dia-1.6BText to speech
ArchitectureExample repo
EmbeddingGemmagoogle/embeddinggemma-300m
Qwen3EmbeddingQwen/Qwen3-Embedding-0.6B

Text, multimodal, speech, and embedding models support ISQ at load time. Diffusion models (FLUX) do not; they load at native precision. Pre-quantized format availability (GGUF, UQFF, GPTQ, AWQ) is per-model on Hugging Face.

For non-standard behavior, see model notes.