Trait QuantMethod

Source
pub trait QuantMethod:
    Send
    + Sync
    + Debug
    + QuantizedSerde {
Show 14 methods // Required methods fn new(method: QuantMethodConfig) -> Result<Self> where Self: Sized; fn dequantize_w(&self) -> Result<Tensor>; fn forward(&self, a: &Tensor) -> Result<Tensor>; fn quantized_act_type(&self) -> Option<DType>; fn dtype_and_device(&self) -> (DType, Device); fn add_delta_w(&self, delta: &Tensor) -> Result<Arc<dyn QuantMethod>>; fn apply_isq( self: Arc<Self>, dtype: Option<IsqType>, device: Device, n_quantized: &AtomicUsize, imatrix_weight: Option<Vec<f32>>, guard: QuantizeOntoGuard, ) -> Result<Arc<dyn QuantMethod>>; // Provided methods fn forward_autocast(&self, a: &Tensor) -> Result<Tensor> { ... } fn gather_forward_autocast( &self, a: &Tensor, indices: &Tensor, ) -> Result<Tensor> { ... } fn gather_forward(&self, _a: &Tensor, _indices: &Tensor) -> Result<Tensor> { ... } fn unquant_weight_bias(&self) -> Option<(Tensor, Option<Tensor>)> { ... } fn begin_track_stats(&mut self) -> Result<()> { ... } fn end_track_stats(&self) -> Result<Tensor> { ... } fn is_distributed(&self) -> Option<DistributedKind> { ... }
}
Expand description

Quantized method for a quantized matmul.

Required Methods§

Source

fn new(method: QuantMethodConfig) -> Result<Self>
where Self: Sized,

Source

fn dequantize_w(&self) -> Result<Tensor>

Source

fn forward(&self, a: &Tensor) -> Result<Tensor>

Compute matmul of self and a. self should contain the weights.

Source

fn quantized_act_type(&self) -> Option<DType>

If a quantized method, return the activation dtype.

Source

fn dtype_and_device(&self) -> (DType, Device)

Weight dtype and device

Source

fn add_delta_w(&self, delta: &Tensor) -> Result<Arc<dyn QuantMethod>>

Add a delta weight from LoRA to the weights. This should be prescaled with alpha.

Source

fn apply_isq( self: Arc<Self>, dtype: Option<IsqType>, device: Device, n_quantized: &AtomicUsize, imatrix_weight: Option<Vec<f32>>, guard: QuantizeOntoGuard, ) -> Result<Arc<dyn QuantMethod>>

If the quant is backed by a qmatmul.

Provided Methods§

Source

fn forward_autocast(&self, a: &Tensor) -> Result<Tensor>

Compute matmul of self and a. self should contain the weights. Automatically cast to required quantization activation type and back

Source

fn gather_forward_autocast( &self, a: &Tensor, indices: &Tensor, ) -> Result<Tensor>

Compute matmul of self and a. self should contain the weights. Automatically cast to required quantization activation type and back.

If a is (n_tokens, n_experts, cols), self weights are (n_experts, rows, cols), then the indices are (n_tokens, n_experts).

Source

fn gather_forward(&self, _a: &Tensor, _indices: &Tensor) -> Result<Tensor>

Compute matmul of self and a. self should contain the weights.

If a is (n_tokens, n_experts, cols), self weights are (n_experts, rows, cols), then the indices are (n_tokens, n_experts).

Source

fn unquant_weight_bias(&self) -> Option<(Tensor, Option<Tensor>)>

Source

fn begin_track_stats(&mut self) -> Result<()>

Begin tracking stats into an ImatrixLayerStats

Source

fn end_track_stats(&self) -> Result<Tensor>

End tracking stats into an ImatrixLayerStats. Returns the computed imatrix.

Source

fn is_distributed(&self) -> Option<DistributedKind>

Trait Implementations§

Source§

impl Module for dyn QuantMethod

Source§

fn forward(&self, xs: &Tensor) -> Result<Tensor>

Implementors§