Crate mistralrs_quant

Source

Re-exports§

pub use safetensors::Shard;
pub use safetensors::ShardedSafeTensors;
pub use safetensors::ShardedVarBuilder;
pub use distributed::layers::compute_kv_shard;
pub use distributed::layers::compute_n_kv_groups;
pub use distributed::layers::ColumnParallelLayer;
pub use distributed::layers::FusedExperts;
pub use distributed::layers::PackedExperts;
pub use distributed::layers::ReplicatedLayer;
pub use distributed::layers::RowParallelLayer;
pub use distributed::socket::Client;
pub use distributed::socket::Server;
pub use distributed::BarrierLike;
pub use distributed::Comm;
pub use distributed::Id;
pub use distributed::RingConfig;
pub use distributed::SumAllReduce;

Modules§

cublaslt
distributed
log
rotary
safetensors

Structs§

AfqLayer
BnbLinear
BnbQuantParmas
CollectedImatrixData
DummyLayer
FP8Linear
GgufMatMul
GptqLayer
HqqConfig
HqqLayer
ImatrixLayerStats
ImmediateIsqParams
LoraAdapter
LoraConfig
MatMul
Device/configurable intelligent matrix multiplication
QuantizeOntoGuard
Used to gate access to quantizing onto the host device
StaticLoraConfig
UnquantLinear

Enums§

AfqBits
AfqGroupSize
BnbQuantType
DistributedKind
HqqAxis
HqqBits
IsqType
QuantMethodConfig
QuantizeOntoDropGuard
Real (for Metal) and Fake (for CUDA)
QuantizedConfig
QuantizedSerdeType

Constants§

MULTI_LORA_DELIMITER
UQFF_QUANT_TYPE_OFFSET
Offset for the quant type. UQFF always serializes the version first.

Traits§

BitWiseOp
CumSumOp
LeftshiftOp
NonZeroOp
QuantMethod
Quantized method for a quantized matmul.
QuantizedSerde
SortOp
Extension trait adding argsort / sort convenience calls on Tensor.

Functions§

apply_immediate_isq
clear_applied_loras
Clear all LoRA adapters for the current engine thread
clear_immediate_isq
get_applied_loras
Get the LoRA adapters for the current engine thread
get_immediate_isq
linear
linear_b
linear_no_bias
linear_no_bias_static_lora
Static LoRA in the style of Phi-4 multimodal. Only when the layer regex for the specific LoRA matches.
push_applied_lora
Push a LoRA adapter for the current engine thread
set_immediate_isq
should_apply_immediate_isq