Skip to content

Quantization types

ISQ (in-situ quantization) types supported by mistral.rs. For format selection guidance and underlying tradeoffs, see the quantization guide.

Flag choice for normal CLI usage:

mistral.rs resolves N to a format based on the detected backend (see table). This happens when --quant falls back to runtime ISQ, or when you pass --isq N directly.

ShorthandMetal resolves toCUDA / CPU resolves to
2AFQ2Q2K
3AFQ3Q3K
4AFQ4Q4K
5Q5KQ5K
6AFQ6Q6K
8AFQ8Q8_0

Affine quantization, optimized for Apple Silicon. Runs on Metal (native kernels), CUDA (dedicated backend), and CPU (fallback).

TypeBits
afq22
afq33
afq44
afq66
afq88

GGML K-quant formats. Supported on all backends.

TypeBits
q2k2
q3k3
q4k4
q5k5
q6k6

Supported for GGUF compatibility:

TypeBits
q4_0, q4_14
q5_0, q5_15
q8_08

E4M3 FP8. Native acceleration on NVIDIA Ada/Hopper (compute 8.9+); runs emulated elsewhere.

TypeBitsLayout
fp88E4M3 (4-bit exponent, 3-bit mantissa)
f8q88FP8 weights, INT8 activations

4-bit microscaling format. Native on Blackwell; emulated elsewhere.

TypeBits
mxfp44

Half-quadratic quantization.

TypeBits
hqq44
hqq88

Not ISQ types, pre-quantized formats. Load directly when a Hugging Face model is available as GPTQ or AWQ:

Terminal window
mistralrs run --format plain -m <gptq-or-awq-repo>

mistral.rs detects the quantization from the model’s config. No --quant or --isq required.

See the quantization guide for format selection.