mistralrs quantize
Generate UQFF quantized model file
mistralrs quantize [OPTIONS] [COMMAND]| Option | Default | Description |
|---|---|---|
-m, --model-id <MODEL_ID> | HuggingFace model ID or local path to model directory | |
-t, --tokenizer <TOKENIZER> | Path to local tokenizer.json file | |
--dtype <DTYPE> | auto | Model data type |
--isq <IN_SITU_QUANT> | In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”) | |
--isq-organization <ISQ_ORGANIZATION> | ISQ organization strategy: default or moqe | |
--imatrix <IMATRIX> | imatrix file for enhanced quantization | |
--calibration-file <CALIBRATION_FILE> | Calibration file for imatrix generation | |
--cpu | false | Force CPU-only execution |
-n, --device-layers <DEVICE_LAYERS> | Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”) | |
--topology <TOPOLOGY> | Topology YAML file for device mapping | |
--hf-cache <HF_CACHE> | Custom HuggingFace cache directory | |
--max-seq-len <MAX_SEQ_LEN> | 4096 | Max sequence length for automatic device mapping |
--max-batch-size <MAX_BATCH_SIZE> | 1 | Max batch size for automatic device mapping |
-o, --output <OUTPUT_PATH> | Output path: a .uqff file path (single ISQ) or a directory (auto-names files per ISQ type) | |
--no-readme | false | Skip README.md model card generation (generated by default in directory mode) |
--uqff-base-model <UQFF_BASE_MODEL> | Base model ID for the generated README (skips interactive prompt) | |
--uqff-repo-id <UQFF_REPO_ID> | HF repo ID for the generated README and upload hint (skips interactive prompt) | |
--max-edge <MAX_EDGE> | Maximum edge length for image resizing (aspect ratio preserved) | |
--max-num-images <MAX_NUM_IMAGES> | Maximum number of images per request | |
--max-image-length <MAX_IMAGE_LENGTH> | Maximum image dimension for device mapping |
mistralrs quantize auto
Section titled “mistralrs quantize auto”Auto-detect model type (recommended)
mistralrs quantize auto [OPTIONS] --model-id <MODEL_ID> --isq <IN_SITU_QUANT> --output <OUTPUT_PATH>| Option | Default | Description |
|---|---|---|
-m, --model-id <MODEL_ID> | required | Model ID to load (HuggingFace repo or local path) |
-t, --tokenizer <TOKENIZER> | Path to local tokenizer.json file | |
--dtype <DTYPE> | auto | Model data type |
--isq <IN_SITU_QUANT> | required | In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”) |
--isq-organization <ISQ_ORGANIZATION> | ISQ organization strategy: default or moqe | |
--imatrix <IMATRIX> | imatrix file for enhanced quantization | |
--calibration-file <CALIBRATION_FILE> | Calibration file for imatrix generation | |
--cpu | false | Force CPU-only execution |
-n, --device-layers <DEVICE_LAYERS> | Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”) | |
--topology <TOPOLOGY> | Topology YAML file for device mapping | |
--hf-cache <HF_CACHE> | Custom HuggingFace cache directory | |
--max-seq-len <MAX_SEQ_LEN> | 4096 | Max sequence length for automatic device mapping |
--max-batch-size <MAX_BATCH_SIZE> | 1 | Max batch size for automatic device mapping |
-o, --output <OUTPUT_PATH> | required | Output path: a .uqff file path (single ISQ) or a directory (auto-names files per ISQ type). Examples: -o model/model-q4k.uqff or -o output/ |
--no-readme | false | Skip README.md model card generation (generated by default in directory mode) |
--uqff-base-model <UQFF_BASE_MODEL> | Base model ID for the generated README (skips interactive prompt) | |
--uqff-repo-id <UQFF_REPO_ID> | HF repo ID for the generated README and upload hint (skips interactive prompt) | |
--max-edge <MAX_EDGE> | Maximum edge length for image resizing (aspect ratio preserved) | |
--max-num-images <MAX_NUM_IMAGES> | Maximum number of images per request | |
--max-image-length <MAX_IMAGE_LENGTH> | Maximum image dimension for device mapping |
mistralrs quantize text
Section titled “mistralrs quantize text”Text generation model with explicit architecture
mistralrs quantize text [OPTIONS] --model-id <MODEL_ID> --isq <IN_SITU_QUANT> --output <OUTPUT_PATH>| Option | Default | Description |
|---|---|---|
-m, --model-id <MODEL_ID> | required | Model ID to load (HuggingFace repo or local path) |
-t, --tokenizer <TOKENIZER> | Path to local tokenizer.json file | |
--dtype <DTYPE> | auto | Model data type |
-a, --arch <ARCH> | Model architecture (required for text models) | |
--isq <IN_SITU_QUANT> | required | In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”) |
--isq-organization <ISQ_ORGANIZATION> | ISQ organization strategy: default or moqe | |
--imatrix <IMATRIX> | imatrix file for enhanced quantization | |
--calibration-file <CALIBRATION_FILE> | Calibration file for imatrix generation | |
--cpu | false | Force CPU-only execution |
-n, --device-layers <DEVICE_LAYERS> | Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”) | |
--topology <TOPOLOGY> | Topology YAML file for device mapping | |
--hf-cache <HF_CACHE> | Custom HuggingFace cache directory | |
--max-seq-len <MAX_SEQ_LEN> | 4096 | Max sequence length for automatic device mapping |
--max-batch-size <MAX_BATCH_SIZE> | 1 | Max batch size for automatic device mapping |
-o, --output <OUTPUT_PATH> | required | Output path: a .uqff file path (single ISQ) or a directory (auto-names files per ISQ type). Examples: -o model/model-q4k.uqff or -o output/ |
--no-readme | false | Skip README.md model card generation (generated by default in directory mode) |
--uqff-base-model <UQFF_BASE_MODEL> | Base model ID for the generated README (skips interactive prompt) | |
--uqff-repo-id <UQFF_REPO_ID> | HF repo ID for the generated README and upload hint (skips interactive prompt) |
mistralrs quantize multimodal
Section titled “mistralrs quantize multimodal”Multimodal model
mistralrs quantize multimodal [OPTIONS] --model-id <MODEL_ID> --isq <IN_SITU_QUANT> --output <OUTPUT_PATH>| Option | Default | Description |
|---|---|---|
-m, --model-id <MODEL_ID> | required | Model ID to load (HuggingFace repo or local path) |
-t, --tokenizer <TOKENIZER> | Path to local tokenizer.json file | |
--dtype <DTYPE> | auto | Model data type |
--isq <IN_SITU_QUANT> | required | In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”) |
--isq-organization <ISQ_ORGANIZATION> | ISQ organization strategy: default or moqe | |
--imatrix <IMATRIX> | imatrix file for enhanced quantization | |
--calibration-file <CALIBRATION_FILE> | Calibration file for imatrix generation | |
--cpu | false | Force CPU-only execution |
-n, --device-layers <DEVICE_LAYERS> | Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”) | |
--topology <TOPOLOGY> | Topology YAML file for device mapping | |
--hf-cache <HF_CACHE> | Custom HuggingFace cache directory | |
--max-seq-len <MAX_SEQ_LEN> | 4096 | Max sequence length for automatic device mapping |
--max-batch-size <MAX_BATCH_SIZE> | 1 | Max batch size for automatic device mapping |
-o, --output <OUTPUT_PATH> | required | Output path: a .uqff file path (single ISQ) or a directory (auto-names files per ISQ type). Examples: -o model/model-q4k.uqff or -o output/ |
--no-readme | false | Skip README.md model card generation (generated by default in directory mode) |
--uqff-base-model <UQFF_BASE_MODEL> | Base model ID for the generated README (skips interactive prompt) | |
--uqff-repo-id <UQFF_REPO_ID> | HF repo ID for the generated README and upload hint (skips interactive prompt) | |
--max-edge <MAX_EDGE> | Maximum edge length for image resizing (aspect ratio preserved) | |
--max-num-images <MAX_NUM_IMAGES> | Maximum number of images per request | |
--max-image-length <MAX_IMAGE_LENGTH> | Maximum image dimension for device mapping |
mistralrs quantize embedding
Section titled “mistralrs quantize embedding”Embedding model
mistralrs quantize embedding [OPTIONS] --model-id <MODEL_ID> --isq <IN_SITU_QUANT> --output <OUTPUT_PATH>| Option | Default | Description |
|---|---|---|
-m, --model-id <MODEL_ID> | required | Model ID to load (HuggingFace repo or local path) |
-t, --tokenizer <TOKENIZER> | Path to local tokenizer.json file | |
--dtype <DTYPE> | auto | Model data type |
--isq <IN_SITU_QUANT> | required | In-situ quantization level(s). Multiple values can be comma-separated or specified via repeated —isq flags (e.g., “—isq q4k,q8_0” or “—isq q4k —isq q8_0”) |
--isq-organization <ISQ_ORGANIZATION> | ISQ organization strategy: default or moqe | |
--imatrix <IMATRIX> | imatrix file for enhanced quantization | |
--calibration-file <CALIBRATION_FILE> | Calibration file for imatrix generation | |
--cpu | false | Force CPU-only execution |
-n, --device-layers <DEVICE_LAYERS> | Device layer mapping (format: ORD:NUM;… e.g., “0:10;1:20”) | |
--topology <TOPOLOGY> | Topology YAML file for device mapping | |
--hf-cache <HF_CACHE> | Custom HuggingFace cache directory | |
--max-seq-len <MAX_SEQ_LEN> | 4096 | Max sequence length for automatic device mapping |
--max-batch-size <MAX_BATCH_SIZE> | 1 | Max batch size for automatic device mapping |
-o, --output <OUTPUT_PATH> | required | Output path: a .uqff file path (single ISQ) or a directory (auto-names files per ISQ type). Examples: -o model/model-q4k.uqff or -o output/ |
--no-readme | false | Skip README.md model card generation (generated by default in directory mode) |
--uqff-base-model <UQFF_BASE_MODEL> | Base model ID for the generated README (skips interactive prompt) | |
--uqff-repo-id <UQFF_REPO_ID> | HF repo ID for the generated README and upload hint (skips interactive prompt) |