Skip to content

Environment variables

User-facing environment variables read by mistralrs or its build scripts. Standard Cargo build variables such as OUT_DIR and TARGET are omitted.

VariablePurpose
HF_HOMERoot of the Hugging Face cache. Default ~/.cache/huggingface.
HF_HUB_CACHEHugging Face hub cache location.
HF_TOKENAuth token. Overrides any token saved by mistralrs login at $HF_HOME/token.
HF_HUB_TOKENAuth token fallback when HF_TOKEN is not set.
HF_HUB_OFFLINEHF_HUB_OFFLINE=1 (or true/yes/on) disables all network calls to the Hugging Face Hub. Files and repo listings are served from $HF_HUB_CACHE/$HF_HOME/hub only; missing files fail fast with a clear error. The mistralrs doctor connectivity check is also skipped.

If --token-source env:NAME is used, mistral.rs reads the environment variable named by NAME as the token source.

Set HF_HUB_OFFLINE=1 to guarantee no network calls are made to the Hugging Face Hub. mistral.rs will only resolve files from the local cache ($HF_HUB_CACHE, falling back to $HF_HOME/hub, falling back to ~/.cache/huggingface/hub). Pre-download the model on a machine with network access (e.g. with huggingface-cli download <repo> or by running mistral.rs once online), then launch with HF_HUB_OFFLINE=1. A local model path (-m /path/to/dir) always reads from disk and never hits the network, so it works in offline mode without any cache lookup.

VariablePurpose
RUST_LOGOverride the tracing log filter. Examples: mistralrs_core=debug,tower_http=info, trace. CLI users can usually use -v or -vv instead.
MISTRALRS_DEBUGMISTRALRS_DEBUG=1 enables extra debug-level engine tracing.
VariablePurpose
MISTRALRS_NO_MMAPMISTRALRS_NO_MMAP=1 loads safetensors without mmap.
MISTRALRS_ISQ_SINGLETHREADIf set, runs ISQ quantization single-threaded.
VariablePurpose
MISTRALRS_SANDBOXauto, on, or off. Overrides the sandbox only when the resolved mode is auto; on and off in CLI/TOML win. See sandbox reference.
VariablePurpose
MCP_CONFIG_PATHMCP client configuration path used when --mcp-config is not passed.
KEEP_ALIVE_INTERVALSSE keep-alive interval in milliseconds. Falls back to the default if missing or invalid.
XDG_CACHE_HOMEBase cache directory for web UI state. The UI uses $XDG_CACHE_HOME/mistralrs.
HOMEFallback for web UI cache path when XDG_CACHE_HOME is not set.
VariablePurpose
MISTRALRS_NO_MLAMISTRALRS_NO_MLA=1 disables the MLA-specific attention path for DeepSeek V2/V3. Generic attention is used instead.
VariablePurpose
MISTRALRS_NO_NCCLMISTRALRS_NO_NCCL=1 disables NCCL. Falls back to the ring backend.
MISTRALRS_MN_GLOBAL_WORLD_SIZETotal world size across nodes. Presence of this variable enables multi-node mode.
MISTRALRS_MN_LOCAL_WORLD_SIZELocal TP size override on a single node.
MISTRALRS_MN_HEAD_NUM_WORKERSSet on the head node: number of worker nodes.
MISTRALRS_MN_HEAD_PORTSet on the head node: listening port for worker connections.
MISTRALRS_MN_WORKER_SERVER_ADDRSet on worker nodes: address of the head node.
MISTRALRS_MN_WORKER_IDSet on worker nodes: worker index (0-based).
RING_CONFIGPath to the ring backend JSON config. Presence of this variable enables the ring backend when built with the ring feature.

See the multi-machine ring guide for use.

VariablePurpose
MISTRALRS_IGPU_MEMORY_FRACTIONFraction of integrated GPU memory usable on CUDA systems with iGPUs. Default 0.75.

These are read by build scripts, not at runtime.

VariablePurpose
MISTRALRS_METAL_PRECOMPILEMISTRALRS_METAL_PRECOMPILE=0 skips Metal kernel precompilation at build time; kernels are compiled at runtime on first use.
CUDA_NVCC_FLAGSExtra compiler options passed to CUDA builds.
MISTRALRS_GIT_REVISIONGit revision embedded in the binary by the build script.

Not intended for direct use.

VariablePurpose
__MISTRALRS_DAEMON_INTERNALSet by the engine on spawned worker processes.