Skip to content

Start here

Use this page to pick the first document to read. Most workflows start with auto-detection and add flags only when the model, hardware, or deployment requires them.

If you need to…Start hereThen read
Chat with a model on one machineYour first modelPick a quantization method
Verify install, GPU support, or Hugging Face accessYour first modelTroubleshooting
Expose an OpenAI-compatible endpointServe a model as an APIConfigure the HTTP server
Use the built-in browser UIServe a model as an APIUse the built-in web UI
Call mistral.rs from Python in-processCall a model from PythonPython API reference
Embed mistral.rs in RustCall a model from RustRust API on docs.rs
Build a local agent app with tools, code execution, web search, multimodal inputs, or session stateBuild an agentAgentic runtime for apps
Fit a larger model on the same hardwareQuantize a modelAuto-tune with mistralrs tune
Split a model across GPUs or machinesPerformanceSplit a model across multiple GPUs
Run a server for real trafficRun mistralrs in DockerProduction checklist
ModeUse whenEntry point
CLIYou want local interactive use, quick tests, or benchmarking.mistralrs run, mistralrs bench, mistralrs tune
HTTP serverYou want OpenAI-compatible clients, a web UI, or a process boundary around inference.mistralrs serve
Config fileYou need repeatable multi-model startup or a deployment config checked into source control.mistralrs from-config -f config.toml
DiagnosticsYou want to check hardware detection, build features, or Hugging Face connectivity.mistralrs doctor
Python packageYou want in-process access from Python without running a server.mistralrs.Runner
Rust crateYou want inference embedded inside a Rust service.mistralrs crate

Start with Your first model, then Serve a model as an API. Those two pages exercise the default local and server paths and make later choices easier to evaluate.