Skip to content

Quantization types

ISQ types supported by mistral.rs. For normal CLI usage, use --quant; use --isq only when you want to force runtime ISQ and skip the UQFF lookup. For format selection guidance, see the quantization decision guide. For underlying tradeoffs, see the explanation page.

Pass --quant N for the normal CLI path. If it falls back to runtime ISQ, or if you pass --isq N directly, mistral.rs resolves N to a format based on the detected backend.

ShorthandMetal resolves toCUDA / CPU resolves to
2AFQ2Q2K
3AFQ3Q3K
4AFQ4Q4K
5Q5KQ5K
6AFQ6Q6K
8AFQ8Q8_0

Adaptive float quantization, Metal backend only.

TypeBits
afq22
afq33
afq44
afq66
afq88

Loading AFQ on CUDA or CPU returns an error.

GGML K-quant formats. Supported on all backends.

TypeBits
q2k2
q3k3
q4k4
q5k5
q6k6

Supported for GGUF compatibility:

TypeBits
q4_0, q4_14
q5_0, q5_15
q8_08

GGUF files using these types load correctly.

Native FP8 on NVIDIA GPUs with compute capability 8.9+.

TypeBitsLayout
fp88E4M3 (4-bit exponent, 3-bit mantissa)
f8q88FP8 weights, INT8 activations

4-bit microscaling format. Native on Blackwell; emulated elsewhere.

TypeBits
mxfp44

Half-quadratic quantization.

TypeBits
hqq44
hqq88

Not ISQ types, pre-quantized formats. Load directly when a Hugging Face model is available as GPTQ or AWQ:

Terminal window
mistralrs run --format plain -m <gptq-or-awq-repo>

mistral.rs detects the quantization from the model’s config. No --quant or --isq required.

See the pick-a-quantization guide for format selection.