diffusion_rs_common::nn

Module attention

source

Functionsยง

  • Computes (softmax(QK^T*sqrt(d_k)) + M)V. M is the attention mask, and is a bias (0 for unmasked, -inf for masked).