Start here

Use this page to pick the first document to read. Most workflows start with auto-detection and add flags only when the model, hardware, or deployment requires them.

Choose by task

If you need to…	Start here	Then read
Chat with a model on one machine	Your first model	Pick a quantization method
Verify install, GPU support, or Hugging Face access	Your first model	Troubleshooting
Expose an OpenAI-compatible endpoint	Serve a model as an API	Configure the HTTP server
Use the built-in browser UI	Serve a model as an API	Use the built-in web UI
Call mistral.rs from Python in-process	Call a model from Python	Python API reference
Embed mistral.rs in Rust	Call a model from Rust	Rust API on docs.rs
Build a local agent app with tools, code execution, web search, multimodal inputs, or session state	Build an agent	Agentic runtime for apps
Fit a larger model on the same hardware	Quantize a model	Auto-tune with mistralrs tune
Split a model across GPUs or machines	Performance	Split a model across multiple GPUs
Run a server for real traffic	Run mistralrs in Docker	Production checklist

Choose by runtime mode

Mode	Use when	Entry point
CLI	You want local interactive use, quick tests, or benchmarking.	`mistralrs run`, `mistralrs bench`, `mistralrs tune`
HTTP server	You want OpenAI-compatible clients, a web UI, or a process boundary around inference.	`mistralrs serve`
Config file	You need repeatable multi-model startup or a deployment config checked into source control.	`mistralrs from-config -f config.toml`
Diagnostics	You want to check hardware detection, build features, or Hugging Face connectivity.	`mistralrs doctor`
Python package	You want in-process access from Python without running a server.	`mistralrs.Runner`
Rust crate	You want inference embedded inside a Rust service.	`mistralrs` crate

If unsure

Start with Your first model, then Serve a model as an API. Those two pages exercise the default local and server paths and make later choices easier to evaluate.