Python SDK Installation

Quick Install from PyPI (Recommended)

Pre-built wheels are available for common platforms. Choose the package that matches your hardware:

Hardware	Install Command
Recommended (auto-optimized)	`pip install mistralrs`
NVIDIA GPUs (CUDA)	`pip install mistralrs-cuda`
Apple Silicon (Metal)	`pip install mistralrs-metal`
Apple Accelerate	`pip install mistralrs-accelerate`
Intel CPUs (MKL)	`pip install mistralrs-mkl`

Platform-Specific Optimizations

The mistralrs base package includes platform-specific optimizations:

macOS Apple Silicon: Metal GPU support built-in
Linux/Windows x86_64: Intel MKL optimizations built-in
Linux aarch64: CPU-only (use mistralrs-cuda for GPU support)

All packages install the mistralrs Python module. The package suffix controls which accelerator features are enabled.

Supported Platforms

Package	Linux x86_64	Linux aarch64	Windows x86_64	macOS aarch64
mistralrs	MKL	CPU	MKL	Metal
mistralrs-cuda	CUDA	CUDA	CUDA	-
mistralrs-metal	-	-	-	Metal
mistralrs-accelerate	-	-	-	Accelerate
mistralrs-mkl	MKL	-	MKL	-

Python version: 3.10+ (wheels use abi3 for forward compatibility)

Windows Requirements

It is recommended to use WSL2 on Windows machines.

On Windows, additional runtime dependencies may be required:

CUDA packages: Install the NVIDIA CUDA Toolkit and ensure the bin directory is in your PATH
MKL packages: Install the Intel oneAPI Math Kernel Library runtime

# Example: Install with CUDA support
pip install mistralrs-cuda -v

Build from Source

Building from source gives you access to the latest features and allows customization of build options.

Prerequisites

Install system packages:

Ubuntu/Debian:

sudo apt install libssl-dev pkg-config

macOS:

brew install openssl pkg-config

Install Rust from https://rustup.rs/:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

(Optional) Set up HuggingFace authentication for gated models:
```
mkdir -p ~/.cache/huggingface
echo "YOUR_HF_TOKEN" > ~/.cache/huggingface/token
```
Or use huggingface-cli login.

Build Steps

Clone the repository:

git clone https://github.com/EricLBuehler/mistral.rs.git
cd mistral.rs/mistralrs-pyo3

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # Linux/macOS
# or: .venv\Scripts\activate  # Windows

Install maturin (Rust + Python build tool):
```
pip install maturin[patchelf]
```

Build and install:

maturin develop -r --features <your-features>

Feature Flags

Feature	Description
`cuda`	NVIDIA GPU support
`flash-attn`	Flash Attention (CUDA, Ampere+)
`flash-attn-v3`	Flash Attention v3 (CUDA, Hopper)
`cudnn`	cuDNN optimizations
`metal`	Apple Silicon GPU (macOS only)
`accelerate`	Apple Accelerate framework
`mkl`	Intel MKL

Example with CUDA and Flash Attention:

maturin develop -r --features "cuda flash-attn cudnn"

Verify Installation

import mistralrs
print(mistralrs.__version__)

Quick test:

from mistralrs import Runner, Which, ChatCompletionRequest

runner = Runner(
    which=Which.Plain(model_id="Qwen/Qwen3-0.6B"),
)

response = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="default",
        messages=[{"role": "user", "content": "Hello!"}],
        max_tokens=50,
    )
)
print(response.choices[0].message.content)

Next Steps

SDK Documentation - Full SDK reference
Examples - Python examples
Cookbook - Interactive tutorial

Keyboard shortcuts

mistral.rs Documentation