Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Python SDK Installation

Pre-built wheels are available for common platforms. Choose the package that matches your hardware:

HardwareInstall Command
Recommended (auto-optimized)pip install mistralrs
NVIDIA GPUs (CUDA)pip install mistralrs-cuda
Apple Silicon (Metal)pip install mistralrs-metal
Apple Acceleratepip install mistralrs-accelerate
Intel CPUs (MKL)pip install mistralrs-mkl

Platform-Specific Optimizations

The mistralrs base package includes platform-specific optimizations:

  • macOS Apple Silicon: Metal GPU support built-in
  • Linux/Windows x86_64: Intel MKL optimizations built-in
  • Linux aarch64: CPU-only (use mistralrs-cuda for GPU support)

All packages install the mistralrs Python module. The package suffix controls which accelerator features are enabled.

Supported Platforms

PackageLinux x86_64Linux aarch64Windows x86_64macOS aarch64
mistralrsMKLCPUMKLMetal
mistralrs-cudaCUDACUDACUDA-
mistralrs-metal---Metal
mistralrs-accelerate---Accelerate
mistralrs-mklMKL-MKL-

Python version: 3.10+ (wheels use abi3 for forward compatibility)

Windows Requirements

It is recommended to use WSL2 on Windows machines.

On Windows, additional runtime dependencies may be required:

# Example: Install with CUDA support
pip install mistralrs-cuda -v

Build from Source

Building from source gives you access to the latest features and allows customization of build options.

Prerequisites

  1. Install system packages:

    Ubuntu/Debian:

    sudo apt install libssl-dev pkg-config
    

    macOS:

    brew install openssl pkg-config
    
  2. Install Rust from https://rustup.rs/:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  3. (Optional) Set up HuggingFace authentication for gated models:

    mkdir -p ~/.cache/huggingface
    echo "YOUR_HF_TOKEN" > ~/.cache/huggingface/token
    

    Or use huggingface-cli login.

Build Steps

  1. Clone the repository:

    git clone https://github.com/EricLBuehler/mistral.rs.git
    cd mistral.rs/mistralrs-pyo3
    
  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # Linux/macOS
    # or: .venv\Scripts\activate  # Windows
    
  3. Install maturin (Rust + Python build tool):

    pip install maturin[patchelf]
    
  4. Build and install:

    maturin develop -r --features <your-features>
    

Feature Flags

FeatureDescription
cudaNVIDIA GPU support
flash-attnFlash Attention (CUDA, Ampere+)
flash-attn-v3Flash Attention v3 (CUDA, Hopper)
cudnncuDNN optimizations
metalApple Silicon GPU (macOS only)
accelerateApple Accelerate framework
mklIntel MKL

Example with CUDA and Flash Attention:

maturin develop -r --features "cuda flash-attn cudnn"

Verify Installation

import mistralrs
print(mistralrs.__version__)

Quick test:

from mistralrs import Runner, Which, ChatCompletionRequest

runner = Runner(
    which=Which.Plain(model_id="Qwen/Qwen3-0.6B"),
)

response = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="default",
        messages=[{"role": "user", "content": "Hello!"}],
        max_tokens=50,
    )
)
print(response.choices[0].message.content)

Next Steps