Skip to content

UQFF format

UQFF is the native mistral.rs quantized file format. To use UQFF models, see the UQFF guide; knowledge of the binary layout is not required.

A UQFF export is a directory containing:

  • One or more <stem>-<shard>.uqff shards holding the quantized layer blobs.
  • residual.safetensors for unquantized tensors (token embeddings, norms, lm_head, etc.).
  • Model assets copied from the source repo so the directory is self-contained: config.json, tokenizer.json, tokenizer_config.json, generation_config.json, and (when present) modules.json, chat_template.jinja, processor_config.json, preprocessor_config.json.

A loader is pointed at one or more shard files (from_uqff); the residual safetensors and the JSON assets are picked up by sibling-path lookup.

The output path passed to mistral.rs must end in .uqff. The writer emits <stem>-0.uqff, <stem>-1.uqff, … in the same directory, splitting by cumulative size with a soft cap of 10 GiB per shard.

UQFF carries a packed major.minor.patch version. Readers reject a different major version and reject a minor newer than they support; older minor versions within the same major are accepted.

Canonical implementations: mistralrs-quant (writer) and the model loader in mistralrs-core (reader).

  • UQFF is inference-only; no optimizer state or training metadata.
  • The export directory is the unit of distribution. A shard alone is not loadable — the residual safetensors and config.json are required.