pub fn blockwise_fp8_moe(
weight: Tensor,
weight_scale_inv: Tensor,
weight_block_size: Vec<usize>,
dequant_dtype: DType,
) -> Result<Arc<dyn QuantMethod>>Expand description
Create a BlockwiseFP8Linear for MoE with 3D weights [num_experts, N, K]. This is used by FusedExperts to enable gather_forward with native FP8 GEMM.