UQFF format
UQFF is the native mistral.rs quantized file format. To use UQFF models, see the UQFF guide; knowledge of the binary layout is not required.
File structure
Section titled “File structure”A UQFF export is a directory containing:
- One or more
<stem>-<shard>.uqffshards holding the quantized layer blobs. residual.safetensorsfor unquantized tensors (token embeddings, norms, lm_head, etc.).- Model assets copied from the source repo so the directory is self-contained:
config.json,tokenizer.json,tokenizer_config.json,generation_config.json, and (when present)modules.json,chat_template.jinja,processor_config.json,preprocessor_config.json.
A loader is pointed at one or more shard files (from_uqff); the residual safetensors and the JSON assets are picked up by sibling-path lookup.
Sharded files
Section titled “Sharded files”The output path passed to mistral.rs must end in .uqff. The writer emits <stem>-0.uqff, <stem>-1.uqff, … in the same directory, splitting by cumulative size with a soft cap of 10 GiB per shard.
Version compatibility
Section titled “Version compatibility”UQFF carries a packed major.minor.patch version. Readers reject a different major version and reject a minor newer than they support; older minor versions within the same major are accepted.
Reference implementation
Section titled “Reference implementation”Canonical implementations: mistralrs-quant (writer) and the model loader in mistralrs-core (reader).
Caveats
Section titled “Caveats”- UQFF is inference-only; no optimizer state or training metadata.
- The export directory is the unit of distribution. A shard alone is not loadable — the residual safetensors and
config.jsonare required.