pub fn fp8_vector_quantize(input: &Tensor) -> Result<(Tensor, Tensor)>
Expand description
FP8 vector quantize.
- Expects input to be f32, f16, or bf16
- Returns a tuple of (quantized_weight, scales)
- quantized_weight is fp8
- scales is f32
- Each scale corresponds to a vector of 128 elements