Python SDK Installation
Quick Install from PyPI (Recommended)
Pre-built wheels are available for common platforms. Choose the package that matches your hardware:
| Hardware | Install Command |
|---|---|
| Recommended (auto-optimized) | pip install mistralrs |
| NVIDIA GPUs (CUDA) | pip install mistralrs-cuda |
| Apple Silicon (Metal) | pip install mistralrs-metal |
| Apple Accelerate | pip install mistralrs-accelerate |
| Intel CPUs (MKL) | pip install mistralrs-mkl |
Platform-Specific Optimizations
The mistralrs base package includes platform-specific optimizations:
- macOS Apple Silicon: Metal GPU support built-in
- Linux/Windows x86_64: Intel MKL optimizations built-in
- Linux aarch64: CPU-only (use
mistralrs-cudafor GPU support)
All packages install the mistralrs Python module. The package suffix controls which accelerator features are enabled.
Supported Platforms
| Package | Linux x86_64 | Linux aarch64 | Windows x86_64 | macOS aarch64 |
|---|---|---|---|---|
| mistralrs | MKL | CPU | MKL | Metal |
| mistralrs-cuda | CUDA | CUDA | CUDA | - |
| mistralrs-metal | - | - | - | Metal |
| mistralrs-accelerate | - | - | - | Accelerate |
| mistralrs-mkl | MKL | - | MKL | - |
Python version: 3.10+ (wheels use abi3 for forward compatibility)
Windows Requirements
It is recommended to use WSL2 on Windows machines.
On Windows, additional runtime dependencies may be required:
- CUDA packages: Install the NVIDIA CUDA Toolkit and ensure the
bindirectory is in your PATH - MKL packages: Install the Intel oneAPI Math Kernel Library runtime
# Example: Install with CUDA support
pip install mistralrs-cuda -v
Build from Source
Building from source gives you access to the latest features and allows customization of build options.
Prerequisites
-
Install system packages:
Ubuntu/Debian:
sudo apt install libssl-dev pkg-configmacOS:
brew install openssl pkg-config -
Install Rust from https://rustup.rs/:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env -
(Optional) Set up HuggingFace authentication for gated models:
mkdir -p ~/.cache/huggingface echo "YOUR_HF_TOKEN" > ~/.cache/huggingface/tokenOr use
huggingface-cli login.
Build Steps
-
Clone the repository:
git clone https://github.com/EricLBuehler/mistral.rs.git cd mistral.rs/mistralrs-pyo3 -
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # Linux/macOS # or: .venv\Scripts\activate # Windows -
Install maturin (Rust + Python build tool):
pip install maturin[patchelf] -
Build and install:
maturin develop -r --features <your-features>
Feature Flags
| Feature | Description |
|---|---|
cuda | NVIDIA GPU support |
flash-attn | Flash Attention (CUDA, Ampere+) |
flash-attn-v3 | Flash Attention v3 (CUDA, Hopper) |
cudnn | cuDNN optimizations |
metal | Apple Silicon GPU (macOS only) |
accelerate | Apple Accelerate framework |
mkl | Intel MKL |
Example with CUDA and Flash Attention:
maturin develop -r --features "cuda flash-attn cudnn"
Verify Installation
import mistralrs
print(mistralrs.__version__)
Quick test:
from mistralrs import Runner, Which, ChatCompletionRequest
runner = Runner(
which=Which.Plain(model_id="Qwen/Qwen3-0.6B"),
)
response = runner.send_chat_completion_request(
ChatCompletionRequest(
model="default",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=50,
)
)
print(response.choices[0].message.content)
Next Steps
- SDK Documentation - Full SDK reference
- Examples - Python examples
- Cookbook - Interactive tutorial