Configure model topology
Topology is a per-layer placement and quantization mechanism. A YAML file specifies, per layer range, the device and quantization to use.
Most cases do not need topology. Defaults work for typical hardware; mistralrs tune covers common optimization.
Config
Section titled “Config”A YAML file keyed by start-end layer-range selectors:
0-16: device: cuda[0] isq: q4k16-32: device: cuda[1] isq: q4k32-40: device: cpu isq: q8_0Layers outside any range use defaults. device is a CUDA (cuda[N]), Metal (metal[N]), or CPU (cpu) specifier. isq accepts any ISQ type name recognized by --isq.
Pass with --topology:
mistralrs serve --topology topology.yaml -m <model>Embedding layers, LM head, and pre/post-norm are not individually addressable; they follow the first or last transformer layer’s placement.
For an introduction to per-layer quantization tradeoffs, see the explanation page on quantization tradeoffs.