pub fn fp8_blockwise_quantize(
input: &Tensor,
weight_block_size: Vec<usize>,
) -> Result<(Tensor, Tensor)>
Expand description
FP8 blockwise quantize.
- Expects input to be f32, f16, or bf16
- Returns a tuple of (quantized_weight, scales)
- quantized_weight is fp8
- scales is f32