Work with specific model types
mistral.rs handles more than text chat. Each model category has its own request shape and conventions.
- Use vision and video input: image and video on Qwen3-VL and Gemma 4.
- Set up video input: FFmpeg, supported formats, CLI, and HTTP video examples.
- Use image generation models: diffusion models like FLUX.
- Use speech models: Voxtral for STT, Dia for TTS.
- Use embedding models: EmbeddingGemma, Qwen3-Embedding.
Model-family walkthroughs:
- Text model walkthroughs: Qwen3 thinking, SmolLM3, DeepSeek, GLM, GPT-OSS, and MoE notes.
- Vision model walkthroughs: Qwen-VL, Gemma, Llama, Mistral, Phi, Idefics, LLaVA, and MiniCPM-O.
Full model and modality list: supported models reference.