Crate mistralrs_quant

Re-exports§

pub use safetensors::Shard;
pub use safetensors::ShardedSafeTensors;
pub use safetensors::ShardedVarBuilder;
pub use distributed::layers::compute_kv_shard;
pub use distributed::layers::compute_n_kv_groups;
pub use distributed::layers::ColumnParallelLayer;
pub use distributed::layers::FusedExperts;
pub use distributed::layers::PackedExperts;
pub use distributed::layers::ReplicatedLayer;
pub use distributed::layers::RowParallelLayer;
pub use distributed::socket::Client;
pub use distributed::socket::Server;
pub use distributed::BarrierLike;
pub use distributed::Comm;
pub use distributed::Id;
pub use distributed::RingConfig;
pub use distributed::SumAllReduce;

Modules§

cublaslt
distributed
log
rotary
safetensors

Structs§

AfqLayer
BnbLinear
BnbQuantParmas
CollectedImatrixData
DummyLayer
FP8Linear
GgufMatMul
GptqLayer
HqqConfig
HqqLayer
ImatrixLayerStats
ImmediateIsqParams
LoraAdapter
LoraConfig
MatMul: Device/configurable intelligent matrix multiplication
QuantizeOntoGuard: Used to gate access to quantizing onto the host device
StaticLoraConfig
UnquantLinear

Enums§

AfqBits
AfqGroupSize
BnbQuantType
DistributedKind
HqqAxis
HqqBits
IsqType
QuantMethodConfig
QuantizeOntoDropGuard: Real (for Metal) and Fake (for CUDA)
QuantizedConfig
QuantizedSerdeType

Constants§

MULTI_LORA_DELIMITER
UQFF_QUANT_TYPE_OFFSET: Offset for the quant type. UQFF always serializes the version first.

Traits§

BitWiseOp
CumSumOp
LeftshiftOp
NonZeroOp
QuantMethod: Quantized method for a quantized matmul.
QuantizedSerde
SortOp: Extension trait adding argsort / sort convenience calls on Tensor.

Functions§

apply_immediate_isq
clear_applied_loras: Clear all LoRA adapters for the current engine thread
clear_immediate_isq
fp8_blockwise_dequantize: FP8 blockwise dequantize.
fp8_blockwise_quantize: FP8 blockwise quantize.
get_applied_loras: Get the LoRA adapters for the current engine thread
get_immediate_isq
linear
linear_b
linear_no_bias
linear_no_bias_static_lora: Static LoRA in the style of Phi-4 multimodal. Only when the layer regex for the specific LoRA matches.
push_applied_lora: Push a LoRA adapter for the current engine thread
set_immediate_isq
should_apply_immediate_isq