Install on macOS with Metal
On Apple Silicon (M1, M2, M3, M4), the install script detects the chip and builds with Metal support:
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | shFor manual builds, follow the steps below.
Prerequisites
Section titled “Prerequisites”- macOS 13 (Ventura) or newer. Earlier Metal Performance Shaders versions lack required operations.
- Xcode Command Line Tools. Install with
xcode-select --install. - Rust 1.88 or newer via rustup.
- Homebrew with FFmpeg for video input:
brew install ffmpeg. See Set up video input for the full checklist.
A full Xcode install is not required. The command-line tools include the Metal Shading Language compiler and required headers.
Feature selection
Section titled “Feature selection”The standard combination on Apple Silicon is metal accelerate. Metal is the GPU backend; Accelerate provides BLAS/LAPACK on the CPU.
From crates.io:
cargo install mistralrs-cli --features "metal accelerate"From source:
git clone https://github.com/EricLBuehler/mistral.rs.gitcd mistral.rscargo install --path mistralrs-cli --features "metal accelerate"For Intel Macs, omit Metal and use accelerate, or use mkl if Intel’s MKL is installed.
Memory and unified memory notes
Section titled “Memory and unified memory notes”Apple Silicon uses unified memory; the GPU and CPU share physical RAM. Implications:
- No separate VRAM budget. A model that fits in RAM fits on the GPU.
mistralrs doctorreports total system memory rather than separate GPU memory.- Default paged attention block sizes are tuned for dedicated VRAM. See the paged attention guide for tuning on unified memory.
Total RAM caps model size. A 32 GB machine fits models up to ~20B parameters at 4-bit with moderate context. A 64 GB machine fits 70B-class models at 4-bit.
AFQ quantization
Section titled “AFQ quantization”On Metal, numeric --quant falls back to AFQ (Adaptive Float Quantization) when runtime ISQ is used. AFQ is tuned for Apple’s GPU. Pass --isq only when you want to force runtime ISQ and skip the UQFF lookup.
Existing GGUF files load and run directly. AFQ performance benefits require re-quantization with ISQ.
Verifying the install
Section titled “Verifying the install”mistralrs doctorThe output includes a [INFO] Build features: ... line listing metal (and accelerate if compiled in). If metal is missing, rebuild with --features "metal accelerate".