Installation Guide
Quick Install (Recommended)
The install script automatically detects your hardware (CUDA, Metal, MKL) and builds with optimal features.
Linux/macOS:
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | sh
Windows (PowerShell):
irm https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.ps1 | iex
Prerequisites
-
Install required packages:
- OpenSSL:
sudo apt install libssl-dev(Ubuntu) - pkg-config (Linux only):
sudo apt install pkg-config
- OpenSSL:
-
Install Rust from https://rustup.rs/
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env -
(Optional) Set up HuggingFace authentication:
mistralrs loginOr use
huggingface-cli loginas documented here.
Supported Accelerators
| Accelerator | Feature Flag | Additional Flags |
|---|---|---|
| NVIDIA GPUs (CUDA) | cuda | flash-attn, flash-attn-v3, cudnn |
| Apple Silicon GPU (Metal) | metal | |
| CPU (Intel) | mkl | |
| CPU (Apple Accelerate) | accelerate | |
| Generic CPU (ARM/AVX) | none | ARM NEON / AVX enabled by default |
Note for Linux users: The
metalfeature is macOS-only. Use--features "cuda flash-attn cudnn"for NVIDIA GPUs or--features mklfor Intel CPUs instead of--all-features.
Feature Detection
Determine which features to enable based on your hardware:
| Hardware | Features |
|---|---|
| NVIDIA GPU (Ampere+, compute >=80) | cuda cudnn flash-attn |
| NVIDIA GPU (Hopper, compute 90) | cuda cudnn flash-attn flash-attn-v3 |
| NVIDIA GPU (older) | cuda cudnn |
| Apple Silicon (macOS) | metal accelerate |
| Intel CPU with MKL | mkl |
| CPU only | (no features needed) |
Install from crates.io
cargo install mistralrs-cli --features "<your-features>"
Example:
cargo install mistralrs-cli --features "cuda flash-attn cudnn"
Build from Source
git clone https://github.com/EricLBuehler/mistral.rs.git
cd mistral.rs
cargo install --path mistralrs-cli --features "<your-features>"
Example:
cargo build --release --features "cuda flash-attn cudnn"
Docker
Docker images are available for quick deployment:
docker pull ghcr.io/ericlbuehler/mistral.rs:latest
docker run --gpus all -p 1234:1234 ghcr.io/ericlbuehler/mistral.rs:latest \
serve -m Qwen/Qwen3-4B
Docker images on GitHub Container Registry
Learn more about running Docker containers: https://docs.docker.com/engine/reference/run/
Python SDK
Install the Python package:
pip install mistralrs-cuda # For NVIDIA GPUs
pip install mistralrs-metal # For Apple Silicon
pip install mistralrs-mkl # For Intel CPUs
pip install mistralrs # CPU-only
Verify Installation
After installation, verify everything works:
# Check CLI is installed
mistralrs --help
# Run system diagnostics
mistralrs doctor
# Test with a small model
mistralrs run -m Qwen/Qwen3-0.6B
Getting Models
From Hugging Face Hub (Default)
Models download automatically from Hugging Face Hub:
mistralrs run -m meta-llama/Llama-3.2-3B-Instruct
For gated models, authenticate first:
mistralrs login
# Or: mistralrs run --token-source env:HF_TOKEN -m <model>
From Local Files
Pass a path to a downloaded model:
mistralrs run -m /path/to/model
Running GGUF Models
mistralrs run --format gguf -m author/model-repo -f model-quant.gguf
Specify tokenizer if needed:
mistralrs run --format gguf -m author/model-repo -f file.gguf -t author/official-tokenizer
Next Steps
- CLI Reference - All commands and options
- HTTP API - Run as an OpenAI-compatible server
- Python SDK - Python package documentation
- Troubleshooting - Common issues and solutions