mistralrs quantize

Generate UQFF quantized model file

mistralrs quantize [OPTIONS] [COMMAND]

Option	Default	Description
`-m, --model-id <MODEL_ID>`		HuggingFace model ID or local path to model directory
`-t, --tokenizer <TOKENIZER>`		Path to local tokenizer.json file
`--dtype <DTYPE>`	`auto`	Model data type
`--isq <IN_SITU_QUANT>`		In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”)
`--isq-organization <ISQ_ORGANIZATION>`		ISQ organization strategy: default or moqe
`--imatrix <IMATRIX>`		imatrix file for enhanced quantization
`--calibration-file <CALIBRATION_FILE>`		Calibration file for imatrix generation
`--cpu`	`false`	Force CPU-only execution
`-n, --device-layers <DEVICE_LAYERS>`		Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”)
`--topology <TOPOLOGY>`		Topology YAML file for device mapping
`--hf-cache <HF_CACHE>`		Custom HuggingFace cache directory
`--max-seq-len <MAX_SEQ_LEN>`	`4096`	Max sequence length for automatic device mapping
`--max-batch-size <MAX_BATCH_SIZE>`	`1`	Max batch size for automatic device mapping
`-o, --output <OUTPUT_PATH>`		Output path: a `.uqff` file path (single ISQ) or a directory (auto-names files per ISQ type)
`--no-readme`	`false`	Skip README.md model card generation (generated by default in directory mode)
`--uqff-base-model <UQFF_BASE_MODEL>`		Base model ID for the generated README (skips interactive prompt)
`--uqff-repo-id <UQFF_REPO_ID>`		HF repo ID for the generated README and upload hint (skips interactive prompt)
`--max-edge <MAX_EDGE>`		Maximum edge length for image resizing (aspect ratio preserved)
`--max-num-images <MAX_NUM_IMAGES>`		Maximum number of images per request
`--max-image-length <MAX_IMAGE_LENGTH>`		Maximum image dimension for device mapping

mistralrs quantize auto

Auto-detect model type (recommended)

mistralrs quantize auto [OPTIONS] --model-id <MODEL_ID> --isq <IN_SITU_QUANT> --output <OUTPUT_PATH>

Option	Default	Description
`-m, --model-id <MODEL_ID>`	required	Model ID to load (HuggingFace repo or local path)
`-t, --tokenizer <TOKENIZER>`		Path to local tokenizer.json file
`--dtype <DTYPE>`	`auto`	Model data type
`--isq <IN_SITU_QUANT>`	required	In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”)
`--isq-organization <ISQ_ORGANIZATION>`		ISQ organization strategy: default or moqe
`--imatrix <IMATRIX>`		imatrix file for enhanced quantization
`--calibration-file <CALIBRATION_FILE>`		Calibration file for imatrix generation
`--cpu`	`false`	Force CPU-only execution
`-n, --device-layers <DEVICE_LAYERS>`		Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”)
`--topology <TOPOLOGY>`		Topology YAML file for device mapping
`--hf-cache <HF_CACHE>`		Custom HuggingFace cache directory
`--max-seq-len <MAX_SEQ_LEN>`	`4096`	Max sequence length for automatic device mapping
`--max-batch-size <MAX_BATCH_SIZE>`	`1`	Max batch size for automatic device mapping
`-o, --output <OUTPUT_PATH>`	required	Output path: a `.uqff` file path (single ISQ) or a directory (auto-names files per ISQ type). Examples: `-o model/model-q4k.uqff` or `-o output/`
`--no-readme`	`false`	Skip README.md model card generation (generated by default in directory mode)
`--uqff-base-model <UQFF_BASE_MODEL>`		Base model ID for the generated README (skips interactive prompt)
`--uqff-repo-id <UQFF_REPO_ID>`		HF repo ID for the generated README and upload hint (skips interactive prompt)
`--max-edge <MAX_EDGE>`		Maximum edge length for image resizing (aspect ratio preserved)
`--max-num-images <MAX_NUM_IMAGES>`		Maximum number of images per request
`--max-image-length <MAX_IMAGE_LENGTH>`		Maximum image dimension for device mapping

mistralrs quantize text

Text generation model with explicit architecture

mistralrs quantize text [OPTIONS] --model-id <MODEL_ID> --isq <IN_SITU_QUANT> --output <OUTPUT_PATH>

Option	Default	Description
`-m, --model-id <MODEL_ID>`	required	Model ID to load (HuggingFace repo or local path)
`-t, --tokenizer <TOKENIZER>`		Path to local tokenizer.json file
`--dtype <DTYPE>`	`auto`	Model data type
`-a, --arch <ARCH>`		Model architecture (required for text models)
`--isq <IN_SITU_QUANT>`	required	In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”)
`--isq-organization <ISQ_ORGANIZATION>`		ISQ organization strategy: default or moqe
`--imatrix <IMATRIX>`		imatrix file for enhanced quantization
`--calibration-file <CALIBRATION_FILE>`		Calibration file for imatrix generation
`--cpu`	`false`	Force CPU-only execution
`-n, --device-layers <DEVICE_LAYERS>`		Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”)
`--topology <TOPOLOGY>`		Topology YAML file for device mapping
`--hf-cache <HF_CACHE>`		Custom HuggingFace cache directory
`--max-seq-len <MAX_SEQ_LEN>`	`4096`	Max sequence length for automatic device mapping
`--max-batch-size <MAX_BATCH_SIZE>`	`1`	Max batch size for automatic device mapping
`-o, --output <OUTPUT_PATH>`	required	Output path: a `.uqff` file path (single ISQ) or a directory (auto-names files per ISQ type). Examples: `-o model/model-q4k.uqff` or `-o output/`
`--no-readme`	`false`	Skip README.md model card generation (generated by default in directory mode)
`--uqff-base-model <UQFF_BASE_MODEL>`		Base model ID for the generated README (skips interactive prompt)
`--uqff-repo-id <UQFF_REPO_ID>`		HF repo ID for the generated README and upload hint (skips interactive prompt)

mistralrs quantize multimodal

Multimodal model

mistralrs quantize multimodal [OPTIONS] --model-id <MODEL_ID> --isq <IN_SITU_QUANT> --output <OUTPUT_PATH>

Option	Default	Description
`-m, --model-id <MODEL_ID>`	required	Model ID to load (HuggingFace repo or local path)
`-t, --tokenizer <TOKENIZER>`		Path to local tokenizer.json file
`--dtype <DTYPE>`	`auto`	Model data type
`--isq <IN_SITU_QUANT>`	required	In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”)
`--isq-organization <ISQ_ORGANIZATION>`		ISQ organization strategy: default or moqe
`--imatrix <IMATRIX>`		imatrix file for enhanced quantization
`--calibration-file <CALIBRATION_FILE>`		Calibration file for imatrix generation
`--cpu`	`false`	Force CPU-only execution
`-n, --device-layers <DEVICE_LAYERS>`		Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”)
`--topology <TOPOLOGY>`		Topology YAML file for device mapping
`--hf-cache <HF_CACHE>`		Custom HuggingFace cache directory
`--max-seq-len <MAX_SEQ_LEN>`	`4096`	Max sequence length for automatic device mapping
`--max-batch-size <MAX_BATCH_SIZE>`	`1`	Max batch size for automatic device mapping
`-o, --output <OUTPUT_PATH>`	required	Output path: a `.uqff` file path (single ISQ) or a directory (auto-names files per ISQ type). Examples: `-o model/model-q4k.uqff` or `-o output/`
`--no-readme`	`false`	Skip README.md model card generation (generated by default in directory mode)
`--uqff-base-model <UQFF_BASE_MODEL>`		Base model ID for the generated README (skips interactive prompt)
`--uqff-repo-id <UQFF_REPO_ID>`		HF repo ID for the generated README and upload hint (skips interactive prompt)
`--max-edge <MAX_EDGE>`		Maximum edge length for image resizing (aspect ratio preserved)
`--max-num-images <MAX_NUM_IMAGES>`		Maximum number of images per request
`--max-image-length <MAX_IMAGE_LENGTH>`		Maximum image dimension for device mapping

mistralrs quantize embedding

Embedding model

mistralrs quantize embedding [OPTIONS] --model-id <MODEL_ID> --isq <IN_SITU_QUANT> --output <OUTPUT_PATH>

Option	Default	Description
`-m, --model-id <MODEL_ID>`	required	Model ID to load (HuggingFace repo or local path)
`-t, --tokenizer <TOKENIZER>`		Path to local tokenizer.json file
`--dtype <DTYPE>`	`auto`	Model data type
`--isq <IN_SITU_QUANT>`	required	In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”)
`--isq-organization <ISQ_ORGANIZATION>`		ISQ organization strategy: default or moqe
`--imatrix <IMATRIX>`		imatrix file for enhanced quantization
`--calibration-file <CALIBRATION_FILE>`		Calibration file for imatrix generation
`--cpu`	`false`	Force CPU-only execution
`-n, --device-layers <DEVICE_LAYERS>`		Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”)
`--topology <TOPOLOGY>`		Topology YAML file for device mapping
`--hf-cache <HF_CACHE>`		Custom HuggingFace cache directory
`--max-seq-len <MAX_SEQ_LEN>`	`4096`	Max sequence length for automatic device mapping
`--max-batch-size <MAX_BATCH_SIZE>`	`1`	Max batch size for automatic device mapping
`-o, --output <OUTPUT_PATH>`	required	Output path: a `.uqff` file path (single ISQ) or a directory (auto-names files per ISQ type). Examples: `-o model/model-q4k.uqff` or `-o output/`
`--no-readme`	`false`	Skip README.md model card generation (generated by default in directory mode)
`--uqff-base-model <UQFF_BASE_MODEL>`		Base model ID for the generated README (skips interactive prompt)
`--uqff-repo-id <UQFF_REPO_ID>`		HF repo ID for the generated README and upload hint (skips interactive prompt)