Skip to content

Cargo features

mistral.rs uses Cargo features to gate platform-specific and optional functionality.

FeatureCratesPurpose
cudamistralrs-cli, mistralrs, mistralrs-core, mistralrs-server-coreNVIDIA GPU support via CUDA.
cudnnas abovecuDNN-accelerated kernels.
flash-attnas aboveFlash attention v2 (Ampere+, requires cuda).
flash-attn-v3mistralrs-cli, mistralrs-core, mistralrs-server-coreFlash attention v3 (Hopper, requires cuda). Not exposed by the top-level mistralrs crate.
metalas aboveApple Silicon GPU support via Metal.
accelerateas aboveApple Accelerate framework for CPU math.
mklas aboveIntel MKL for CPU math.
ncclmistralrs-cliNCCL multi-GPU support.

Typical combinations:

  • NVIDIA Hopper: cuda flash-attn flash-attn-v3 cudnn
  • NVIDIA Ampere or Ada: cuda flash-attn cudnn
  • NVIDIA older: cuda cudnn
  • Apple Silicon: metal accelerate
  • Intel CPU with MKL: mkl
FeatureCratesPurpose
code-executionmistralrs-cli, mistralrs, mistralrs-core, mistralrs-server-corePython code execution tool. In mistralrs-cli defaults.
ringas aboveMulti-machine ring distributed inference.
swagger-uimistralrs-server-coreMounts Swagger UI on the HTTP server.

From cargo install:

Terminal window
cargo install mistralrs-cli --features "cuda flash-attn cudnn"

From a source checkout:

Terminal window
cargo install --path mistralrs-cli --features "cuda flash-attn cudnn"

In a consumer crate depending on mistralrs:

[dependencies]
mistralrs = { version = "0.8", features = ["cuda", "flash-attn", "cudnn"] }

mistralrs-cli’s default feature is code-execution. To exclude it, use --no-default-features.

Other crates enable no accelerator features by default. Opt in to the accelerator matching your hardware.

mistralrs doctor prints a Build features: line listing compiled-in features.