Crate mistralrs_quant

Source

Re-exports§

pub use safetensors::Shard;
pub use safetensors::ShardedSafeTensors;
pub use safetensors::ShardedVarBuilder;
pub use distributed::layers::compute_kv_shard;
pub use distributed::layers::compute_n_kv_groups;
pub use distributed::layers::ColumnParallelLayer;
pub use distributed::layers::FusedExperts;
pub use distributed::layers::PackedExperts;
pub use distributed::layers::ReplicatedLayer;
pub use distributed::layers::RowParallelLayer;
pub use distributed::socket::Client;
pub use distributed::socket::Server;
pub use distributed::BarrierLike;
pub use distributed::Comm;
pub use distributed::Id;
pub use distributed::RingConfig;
pub use distributed::SumAllReduce;

Modules§

cublaslt
distributed
log
rotary
safetensors

Structs§

AfqLayer
BnbLinear
BnbQuantParmas
CollectedImatrixData
DummyLayer
FP8Linear
GgufMatMul
GptqLayer
HqqConfig
HqqLayer
ImatrixLayerStats
ImmediateIsqParams
LoraAdapter
LoraConfig
MatMul
Device/configurable intelligent matrix multiplication
QuantizeOntoGuard
Used to gate access to quantizing onto the host device
StaticLoraConfig
UnquantLinear

Enums§

AfqBits
AfqGroupSize
BnbQuantType
DistributedKind
HqqAxis
HqqBits
IsqType
QuantMethodConfig
QuantizeOntoDropGuard
Real (for Metal) and Fake (for CUDA)
QuantizedConfig
QuantizedSerdeType

Constants§

MULTI_LORA_DELIMITER
UQFF_QUANT_TYPE_OFFSET
Offset for the quant type. UQFF always serializes the version first.

Statics§

APPLIED_LORAS

Traits§

BitWiseOp
CumSumOp
LeftshiftOp
NonZeroOp
QuantMethod
Quantized method for a quantized matmul.
QuantizedSerde
SortOp
Extension trait adding argsort / sort convenience calls on Tensor.

Functions§

apply_immediate_isq
get_immediate_isq
linear
linear_b
linear_no_bias
linear_no_bias_static_lora
Static LoRA in the style of Phi-4 multimodal. Only when the layer regex for the specific LoRA matches.
set_immediate_isq
should_apply_immediate_isq