Crate mistralrs_quant

Source

Re-exports§

pub use safetensors::Shard;
pub use safetensors::ShardedSafeTensors;
pub use safetensors::ShardedVarBuilder;
pub use distributed::layers::compute_kv_shard;
pub use distributed::layers::compute_n_kv_groups;
pub use distributed::layers::ColumnParallelLayer;
pub use distributed::layers::ReplicatedLayer;
pub use distributed::layers::RowParallelLayer;
pub use distributed::socket::Client;
pub use distributed::socket::Server;
pub use distributed::BarrierLike;
pub use distributed::Comm;
pub use distributed::Id;
pub use distributed::SumAllReduce;

Modules§

cublaslt
distributed
rotary
safetensors

Structs§

AfqLayer
BnbLinear
BnbQuantParmas
CollectedImatrixData
DummyLayer
FP8Linear
GgufMatMul
GptqLayer
HqqConfig
HqqLayer
ImatrixLayerStats
LoraAdapter
LoraConfig
MatMul
Device/configurable intelligent matrix multiplication
QuantizeOntoGuard
Used to gate access to quantizing onto the host device
StaticLoraConfig
UnquantLinear

Enums§

AfqBits
AfqGroupSize
BnbQuantType
DistributedKind
HqqAxis
HqqBits
IsqType
QuantMethodConfig
QuantizeOntoDropGuard
Real (for Metal) and Fake (for CUDA)
QuantizedConfig
QuantizedSerdeType

Constants§

MULTI_LORA_DELIMITER
UQFF_QUANT_TYPE_OFFSET
Offset for the quant type. UQFF always serializes the version first.

Statics§

APPLIED_LORAS

Traits§

QuantMethod
Quantized method for a quantized matmul.
QuantizedSerde

Functions§

linear
linear_b
linear_no_bias
linear_no_bias_static_lora
Static LoRA in the style of Phi-4 multimodal. Only when the layer regex for the specific LoRA matches.