Module layers

Source

Structs§

ColumnParallelLayer
This layer has a weight that is parallelized along the output dimension, taking the “full” input dimension.
PackedExperts
ReplicatedLayer
This layer has no parallelization
RowParallelLayer
This layer has a weight that is parallelized along the input dimension, returning the “full” output dimension.

Functions§

compute_kv_shard
Compute the appropriate KV shard. This handles KV head replication. Be sure to use compute_n_kv_groups in tandem.
compute_n_kv_groups
Compute the number of KV groups, taking into account KV head replication.