blockwise_fp8_moe

Function blockwise_fp8_moe 

Source
pub fn blockwise_fp8_moe(
    weight: Tensor,
    weight_scale_inv: Tensor,
    weight_block_size: Vec<usize>,
    dequant_dtype: DType,
) -> Result<Arc<dyn QuantMethod>>
Expand description

Create a BlockwiseFP8Linear for MoE with 3D weights [num_experts, N, K]. This is used by FusedExperts to enable gather_forward with native FP8 GEMM.