Re-exports§
pub use safetensors::Shard;
pub use safetensors::ShardedSafeTensors;
pub use safetensors::ShardedVarBuilder;
pub use distributed::layers::compute_kv_shard;
pub use distributed::layers::compute_n_kv_groups;
pub use distributed::layers::ColumnParallelLayer;
pub use distributed::layers::FusedExperts;
pub use distributed::layers::PackedExperts;
pub use distributed::layers::ReplicatedLayer;
pub use distributed::layers::RowParallelLayer;
pub use distributed::socket::Client;
pub use distributed::socket::Server;
pub use distributed::BarrierLike;
pub use distributed::Comm;
pub use distributed::Id;
pub use distributed::RingConfig;
pub use distributed::SumAllReduce;
Modules§
Structs§
- AfqLayer
- BnbLinear
- BnbQuant
Params - Collected
Imatrix Data - Convolution
- Device/configurable intelligent convolution
- Dummy
Layer - FP8Linear
- Gguf
MatMul - Gptq
Layer - HqqConfig
- HqqLayer
- Imatrix
Layer Stats - Immediate
IsqParams - Lora
Adapter - Lora
Config - MXFP4
Layer - MatMul
- Device/configurable intelligent matrix multiplication
- Quantize
Onto Guard - Used to gate access to quantizing onto the host device
- Static
Lora Config - Unquant
Linear
Enums§
- AfqBits
- AfqGroup
Size - BnbQuant
Type - Distributed
Kind - HqqAxis
- HqqBits
- IsqType
- Quant
Method Config - Quantize
Onto Drop Guard - Real (for Metal) and Fake (for CUDA)
- Quantized
Config - Quantized
Serde Type
Constants§
- MULTI_
LORA_ DELIMITER - UQFF_
QUANT_ TYPE_ OFFSET - Offset for the quant type. UQFF always serializes the version first.
Traits§
- BitWise
Op - CumSum
Op - Leftshift
Op - NonZero
Op - Quant
Method - Quantized method for a quantized matmul.
- Quantized
Serde - SortOp
- Extension trait adding
argsort
/sort
convenience calls onTensor
.
Functions§
- apply_
immediate_ isq - clear_
applied_ loras - Clear all LoRA adapters for the current engine thread
- clear_
immediate_ isq - fp8_
blockwise_ dequantize - FP8 blockwise dequantize.
- fp8_
blockwise_ quantize - FP8 blockwise quantize.
- fp8_
vector_ dequantize - FP8 vector dequantize.
- fp8_
vector_ quantize - FP8 vector quantize.
- get_
applied_ loras - Get the LoRA adapters for the current engine thread
- get_
immediate_ isq - linear
- linear_
b - linear_
no_ bias - linear_
no_ bias_ static_ lora - Static LoRA in the style of Phi-4 multimodal. Only when the layer regex for the specific LoRA matches.
- push_
applied_ lora - Push a LoRA adapter for the current engine thread
- set_
immediate_ isq - should_
apply_ immediate_ isq