Skip to content

Use pre-quantized UQFF models

UQFF (Universal Quantized File Format) stores pre-quantized weights and loads directly without runtime conversion.

Terminal window
mistralrs run -m <repo> --from-uqff model.q4k-0.uqff

-m <repo> is required for tokenizer/base resolution. --from-uqff accepts a numeric shorthand (2, 3, 4, 5, 6, 8) or an ISQ type name (q4k, afq8, etc.).

For sharded UQFFs, pass the first shard’s filename. Subsequent shards are discovered by filename pattern (model.<isq-type>-0.uqff, model.<isq-type>-1.uqff, …).

For locally-stored UQFF files, -m can be the local directory and --from-uqff the filename.

The quantize subcommand converts an unquantized model to UQFF:

Terminal window
mistralrs quantize \
-m google/gemma-4-E4B-it \
--isq q4k \
-o gemma-q4k.uqff

A one-time operation. The result loads directly afterward.

--isq can be repeated or comma-separated to produce multiple variants in one run; pass a directory as -o in that case.

A README is generated alongside the output unless --no-readme is passed. --uqff-base-model and --uqff-repo-id set fields in the README.

Binary layout: UQFF format reference.