Module layers
Source - ColumnParallelLayer
- This layer has a weight that is parallelized along the output dimension,
taking the “full” input dimension.
- PackedExperts
- ReplicatedLayer
- This layer has no parallelization
- RowParallelLayer
- This layer has a weight that is parallelized along the input dimension,
returning the “full” output dimension.
- compute_kv_shard
- Compute the appropriate KV shard. This handles KV head replication. Be sure to use
compute_n_kv_groups
in tandem. - compute_n_kv_groups
- Compute the number of KV groups, taking into account KV head replication.