Skip to content

Run across multiple machines

When a model exceeds one machine’s GPU memory, mistral.rs can split it across multiple hosts via a ring backend.

The ring feature must be compiled in:

Terminal window
cargo install --path mistralrs-cli --features "cuda flash-attn ring"

The ring backend reads its configuration from a JSON file pointed to by the RING_CONFIG environment variable. Each participant has its own RING_CONFIG with rank-specific values.

Config shape:

{
"master_ip": "10.0.0.1",
"master_port": 9000,
"port": 9001,
"right_port": 9002,
"right_ip": "10.0.0.2",
"rank": 0,
"world_size": 3
}

Non-master ranks (rank != 0) must specify master_ip. The master rank (rank = 0) is reachable via master_ip.

Multi-node coordination is controlled through environment variables, not CLI flags:

VariablePurpose
RING_CONFIGPath to the per-rank ring JSON config.
MISTRALRS_MN_GLOBAL_WORLD_SIZETotal world size across nodes.
MISTRALRS_MN_LOCAL_WORLD_SIZELocal TP size override on the node.
MISTRALRS_MN_HEAD_NUM_WORKERSNumber of worker nodes (set on head).
MISTRALRS_MN_HEAD_PORTHead node port.
MISTRALRS_MN_WORKER_SERVER_ADDRHead node address (set on workers).
MISTRALRS_MN_WORKER_IDWorker node id.
MISTRALRS_NO_NCCL=1Disable NCCL fallback.

Full env var reference: environment variables.

The ring backend is Linux-only. For single-machine multi-GPU, prefer NCCL-based tensor parallelism.