Re-exports§
pub use safetensors::Shard;
pub use safetensors::ShardedSafeTensors;
pub use safetensors::ShardedVarBuilder;
pub use distributed::layers::compute_kv_shard;
pub use distributed::layers::compute_n_kv_groups;
pub use distributed::layers::ColumnParallelLayer;
pub use distributed::layers::ReplicatedLayer;
pub use distributed::layers::RowParallelLayer;
pub use distributed::socket::Client;
pub use distributed::socket::Server;
pub use distributed::BarrierLike;
pub use distributed::Comm;
pub use distributed::Id;
pub use distributed::SumAllReduce;
Modules§
Structs§
- AfqLayer
- BnbLinear
- BnbQuant
Parmas - Collected
Imatrix Data - Dummy
Layer - FP8Linear
- Gguf
MatMul - Gptq
Layer - HqqConfig
- HqqLayer
- Imatrix
Layer Stats - Lora
Adapter - Lora
Config - MatMul
- Device/configurable intelligent matrix multiplication
- Quantize
Onto Guard - Used to gate access to quantizing onto the host device
- Static
Lora Config - Unquant
Linear
Enums§
- AfqBits
- AfqGroup
Size - BnbQuant
Type - Distributed
Kind - HqqAxis
- HqqBits
- IsqType
- Quant
Method Config - Quantize
Onto Drop Guard - Real (for Metal) and Fake (for CUDA)
- Quantized
Config - Quantized
Serde Type
Constants§
- MULTI_
LORA_ DELIMITER - UQFF_
QUANT_ TYPE_ OFFSET - Offset for the quant type. UQFF always serializes the version first.
Statics§
Traits§
- Quant
Method - Quantized method for a quantized matmul.
- Quantized
Serde
Functions§
- linear
- linear_
b - linear_
no_ bias - linear_
no_ bias_ static_ lora - Static LoRA in the style of Phi-4 multimodal. Only when the layer regex for the specific LoRA matches.