mistralrs_quant::distributed

Module layers

Structs§

ColumnParallelLayer: This layer has a weight that is parallelized along the output dimension, taking the “full” input dimension.
FusedExperts
PackedExperts
ReplicatedLayer: This layer has no parallelization
RowParallelLayer: This layer has a weight that is parallelized along the input dimension, returning the “full” output dimension.

Functions§

compute_kv_shard: Compute the appropriate KV shard. This handles KV head replication. Be sure to use compute_n_kv_groups in tandem.
compute_n_kv_groups: Compute the number of KV groups, taking into account KV head replication.