Skip to content

Cargo features

mistral.rs uses Cargo features to gate platform-specific and optional functionality.

FeatureCratesPurpose
cudamistralrs-cli, mistralrs, mistralrs-core, mistralrs-server-coreNVIDIA GPU support via CUDA, including CUDA paged attention and FlashInfer (paged-attention kernel library) paged kernels.
cudnnas abovecuDNN-accelerated kernels.
flash-attnas aboveFlash attention v2 (Ampere+, requires cuda).
flash-attn-v3mistralrs-cli, mistralrs-core, mistralrs-server-coreFlash attention v3 (Hopper, requires cuda). Not exposed by the top-level mistralrs crate.
cutilemistralrs-cli, mistralrs-corecuTile JIT MoE (Mixture of Experts) kernels. Requires CUDA >= 13.2 on Ampere/Ada, CUDA >= 13.3 on Hopper, CUDA >= 13.1 on Blackwell+, and the tileiras assembler at runtime. Without it, MoE models fall back to the built-in CUTLASS (NVIDIA GEMM template library) kernels. See MoE expert backends. Not exposed by the top-level mistralrs crate.
metalas aboveApple Silicon GPU support via Metal.
accelerateas aboveApple Accelerate framework for CPU math.
mklas aboveIntel MKL for CPU math.
ncclmistralrs-cli, mistralrs, mistralrs-core, mistralrs-server-coreNCCL single-machine CUDA multi-GPU support. Requires the NCCL runtime library at build and runtime.

Typical combinations:

  • NVIDIA Hopper: cuda flash-attn flash-attn-v3 cudnn (add cutile with CUDA >= 13.3)
  • NVIDIA Ampere or Ada: cuda flash-attn cudnn (add cutile with CUDA >= 13.2)
  • NVIDIA Blackwell with CUDA >= 13.1: cuda flash-attn cudnn cutile
  • NVIDIA older: cuda cudnn
  • Apple Silicon: metal
  • Intel CPU with MKL: mkl

For Linux CUDA multi-GPU, add nccl when NCCL is installed. The Linux installer and CUDA wheel builder add it automatically when they detect libnccl.

FeatureCratesPurpose
code-executionmistralrs-cli, mistralrs, mistralrs-core, mistralrs-server-corePython code execution tool. In mistralrs-cli defaults.
ringas aboveMulti-machine ring distributed inference.
swagger-uimistralrs-server-coreMounts Swagger UI on the HTTP server. On by default in mistralrs-server-core.

From cargo install:

Terminal window
cargo install mistralrs-cli --features "cuda nccl flash-attn cudnn"

From a source checkout:

Terminal window
cargo install --path mistralrs-cli --features "cuda nccl flash-attn cudnn"

In a consumer crate depending on mistralrs:

[dependencies]
mistralrs = { version = "0.8", features = ["cuda", "nccl", "flash-attn", "cudnn"] }

mistralrs-cli’s default feature is code-execution. mistralrs-server-core’s default feature is swagger-ui. To exclude defaults, use --no-default-features.

No crate enables an accelerator feature by default. Opt in to the accelerator matching your hardware.

mistralrs doctor prints a Build features: line listing the compiled-in accelerator features (cuda, metal, cudnn, flash-attn, flash-attn-v3, accelerate, mkl). Other features such as cutile, nccl, ring, code-execution, and swagger-ui are not shown on that line.