FlashAttention in mistral.rs
Mistral.rs supports FlashAttention V2 and V3 on CUDA devices (V3 is only supported when CC >= 9.0).
Note: If compiled with FlashAttention and PagedAttention is enabled, then FlashAttention will be used in tandem to accelerate the prefill phase.
GPU Architecture Compatibility
| Architecture | Compute Capability | Example GPUs | Feature Flag |
|---|---|---|---|
| Ampere | 8.0, 8.6 | RTX 30*, A100, A40 | --features flash-attn |
| Ada Lovelace | 8.9 | RTX 40*, L40S | --features flash-attn |
| Hopper | 9.0 | H100, H800 | --features flash-attn-v3 |
| Blackwell | 10.0, 12.0 | RTX 50* | --features flash-attn |
Note: FlashAttention V2 and V3 are mutually exclusive Note: To use FlashAttention in the Python SDK, compile from source.