Installation Guide

Quick Install (Recommended)

The install script automatically detects your hardware (CUDA, Metal, MKL) and builds with optimal features.

Linux/macOS:

curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | sh

Windows (PowerShell):

irm https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.ps1 | iex

Prerequisites

Install required packages:
- OpenSSL: sudo apt install libssl-dev (Ubuntu)
- pkg-config (Linux only): sudo apt install pkg-config

Install Rust from https://rustup.rs/

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

(Optional) Set up HuggingFace authentication:
```
mistralrs login
```
Or use huggingface-cli login as documented here.

Supported Accelerators

Accelerator	Feature Flag	Additional Flags
NVIDIA GPUs (CUDA)	`cuda`	`flash-attn`, `flash-attn-v3`, `cudnn`
Apple Silicon GPU (Metal)	`metal`
CPU (Intel)	`mkl`
CPU (Apple Accelerate)	`accelerate`
Generic CPU (ARM/AVX)	none	ARM NEON / AVX enabled by default

Note for Linux users: The metal feature is macOS-only. Use --features "cuda flash-attn cudnn" for NVIDIA GPUs or --features mkl for Intel CPUs instead of --all-features.

Feature Detection

Determine which features to enable based on your hardware:

Hardware	Features
NVIDIA GPU (Ampere+, compute >=80)	`cuda cudnn flash-attn`
NVIDIA GPU (Hopper, compute 90)	`cuda cudnn flash-attn flash-attn-v3`
NVIDIA GPU (older)	`cuda cudnn`
Apple Silicon (macOS)	`metal accelerate`
Intel CPU with MKL	`mkl`
CPU only	(no features needed)

Install from crates.io

cargo install mistralrs-cli --features "<your-features>"

Example:

cargo install mistralrs-cli --features "cuda flash-attn cudnn"

Build from Source

git clone https://github.com/EricLBuehler/mistral.rs.git
cd mistral.rs
cargo install --path mistralrs-cli --features "<your-features>"

Example:

cargo build --release --features "cuda flash-attn cudnn"

Docker

Docker images are available for quick deployment:

docker pull ghcr.io/ericlbuehler/mistral.rs:latest
docker run --gpus all -p 1234:1234 ghcr.io/ericlbuehler/mistral.rs:latest \
  serve -m Qwen/Qwen3-4B

Docker images on GitHub Container Registry

Learn more about running Docker containers: https://docs.docker.com/engine/reference/run/

Python SDK

Install the Python package:

pip install mistralrs-cuda    # For NVIDIA GPUs
pip install mistralrs-metal   # For Apple Silicon
pip install mistralrs-mkl     # For Intel CPUs
pip install mistralrs         # CPU-only

Verify Installation

After installation, verify everything works:

# Check CLI is installed
mistralrs --help

# Run system diagnostics
mistralrs doctor

# Test with a small model
mistralrs run -m Qwen/Qwen3-0.6B

Getting Models

From Hugging Face Hub (Default)

Models download automatically from Hugging Face Hub:

mistralrs run -m meta-llama/Llama-3.2-3B-Instruct

For gated models, authenticate first:

mistralrs login
# Or: mistralrs run --token-source env:HF_TOKEN -m <model>

From Local Files

Pass a path to a downloaded model:

mistralrs run -m /path/to/model

Running GGUF Models

mistralrs run --format gguf -m author/model-repo -f model-quant.gguf

Specify tokenizer if needed:

mistralrs run --format gguf -m author/model-repo -f file.gguf -t author/official-tokenizer

Next Steps

CLI Reference - All commands and options
HTTP API - Run as an OpenAI-compatible server
Python SDK - Python package documentation
Troubleshooting - Common issues and solutions

Keyboard shortcuts

mistral.rs Documentation