Skip to content

Environment variables

User-facing environment variables read by mistralrs or its build scripts. Standard Cargo build variables such as OUT_DIR and TARGET are omitted.

VariablePurpose
HF_HOMERoot of the Hugging Face cache. Default ~/.cache/huggingface.
HF_HUB_CACHEHugging Face hub cache location.
HF_TOKENAuth token. Overrides any token saved by mistralrs login at $HF_HOME/token.
HF_HUB_TOKENAuth token fallback when HF_TOKEN is not set.
HF_HUB_OFFLINESet to 1/true/yes/on to disable all Hugging Face Hub network calls. Files and listings are then served only from $HF_HUB_CACHE/$HF_HOME/hub, and a missing file errors out. Also skips the mistralrs doctor connectivity check.

If --token-source env:NAME is used, mistral.rs reads the environment variable named by NAME as the token source.

For the offline workflow (pre-downloading models, local paths), see run any model.

VariablePurpose
RUST_LOGOverride the tracing log filter. Examples: mistralrs_core=debug,tower_http=info, trace. CLI users can usually use -v or -vv instead.
MISTRALRS_DEBUGMISTRALRS_DEBUG=1 enables extra debug-level engine tracing.
VariablePurpose
MISTRALRS_NO_MMAPMISTRALRS_NO_MMAP=1 loads safetensors without mmap.
MISTRALRS_ISQ_SINGLETHREADIf set, runs ISQ (in-situ quantization) single-threaded.
VariablePurpose
MISTRALRS_SANDBOXauto, on, or off. Overrides the sandbox only when the resolved mode is auto; on and off in CLI/TOML win. See sandbox reference.
VariablePurpose
MCP_CONFIG_PATHMCP (Model Context Protocol) client configuration path used when --mcp-config is not passed.
KEEP_ALIVE_INTERVALSSE (Server-Sent Events) keep-alive interval in milliseconds. Falls back to the default if missing or invalid.
XDG_CACHE_HOMEBase cache directory for web UI state. The UI uses $XDG_CACHE_HOME/mistralrs.
HOMEFallback for web UI cache path when XDG_CACHE_HOME is not set.
VariablePurpose
MISTRALRS_CUDA_GRAPHSCUDA decode graph capture and replay is enabled by default for supported paged-attention decode steps. Set to 0, false, no, or off to disable. See CUDA graphs.
MISTRALRS_FLASHINFER_DECODESet to 0, false, no, or off to disable the FlashInfer (paged-attention kernel library) paged decode/cache layout and use the generic paged KV-cache layout instead. Defaults to enabled on CUDA when compatible.
MISTRALRS_NO_MLAMISTRALRS_NO_MLA=1 disables the MLA (Multi-head Latent Attention) path for DeepSeek V2/V3. Generic attention is used instead.
MISTRALRS_MOE_BACKENDForces the MoE (Mixture of Experts) expert backend: cutile, cutlass, fused (also wmma, native, legacy), or fast. Default is automatic selection. See MoE expert backends.
CUTILE_TILEIRAS_PATHPath to a specific tileiras binary for the cuTile JIT instead of resolving it from PATH.
VariablePurpose
MISTRALRS_NO_NCCLMISTRALRS_NO_NCCL=1 disables NCCL at runtime; single-machine CUDA multi-GPU then falls back to layer mapping. When using the ring backend on a binary also built with nccl, set this so the ring backend is selected.
MISTRALRS_MN_GLOBAL_WORLD_SIZETotal NCCL tensor-parallel world size across nodes. Presence of this variable enables multi-node NCCL mode.
MISTRALRS_MN_LOCAL_WORLD_SIZELocal NCCL tensor-parallel size contributed by each node.
MISTRALRS_MN_HEAD_NUM_WORKERSSet on the head node: number of worker nodes.
MISTRALRS_MN_HEAD_PORTSet on the head node: listening port for worker connections.
MISTRALRS_MN_WORKER_SERVER_ADDRSet on worker nodes: address of the head node.
MISTRALRS_MN_WORKER_IDSet on worker nodes: worker index (0-based).
RING_CONFIGPath to the ring backend JSON config. Setting it selects the ring backend when built with the ring feature. If the binary also has nccl, set MISTRALRS_NO_NCCL=1 as well.

See the distributed inference guide for use.

VariablePurpose
MISTRALRS_IGPU_MEMORY_FRACTIONFraction of integrated GPU memory usable on CUDA systems with iGPUs. Default 0.75.

These are read by build scripts, not at runtime.

VariablePurpose
MISTRALRS_METAL_PRECOMPILEMISTRALRS_METAL_PRECOMPILE=0 skips Metal kernel precompilation at build time; kernels are compiled at runtime on first use. Also accepts false, no, and off.
MISTRALRS_METAL_PLATFORMSLimits which Metal platform metallibs are precompiled. Accepts comma-separated macos, ios, tvos, or all; defaults to all platforms. For local macOS development, use MISTRALRS_METAL_PLATFORMS=macos.
CUDA_NVCC_FLAGSExtra compiler options passed to CUDA builds.
MISTRALRS_CUTLASS_COMMITOverrides the CUTLASS git commit used by CUDA build scripts for flash-attention and CUTLASS MoE kernels. Defaults to the project-pinned commit.
MISTRALRS_INSTALL_TAGPins the installers to a specific release tag (e.g. v0.8.23): the prebuilt is downloaded from that release, and a source build checks out that git tag. Default is the latest stable release (prebuilt) or latest master (source).
MISTRALRS_INSTALL_FROM_SOURCEMISTRALRS_INSTALL_FROM_SOURCE=1 makes the shell and PowerShell installers skip the prebuilt download and build from the latest master (bleeding edge) instead of the latest stable release.
MISTRALRS_INSTALL_NCCLMISTRALRS_INSTALL_NCCL=1 forces the shell and PowerShell installers to add the nccl feature for CUDA builds even if NCCL is not detected.
MISTRALRS_INSTALL_NO_NCCLMISTRALRS_INSTALL_NO_NCCL=1 makes the shell and PowerShell installers skip the nccl feature.
MISTRALRS_INSTALL_ALLOW_CUDA_MISMATCHMISTRALRS_INSTALL_ALLOW_CUDA_MISMATCH=1 lets a source build continue when local nvcc is newer than the CUDA version reported by the NVIDIA driver.
MISTRALRS_INSTALL_YESMISTRALRS_INSTALL_YES=1 auto-confirms every installer prompt (non-interactive installs for CI/containers; used by mistralrs update).
MISTRALRS_INSTALL_IGNORE_FFMPEGMISTRALRS_INSTALL_IGNORE_FFMPEG=1 skips the installer’s FFmpeg step, leaving any existing FFmpeg untouched.
MISTRALRS_GIT_REVISIONGit revision embedded in the binary by the build script.

Not intended for direct use.

VariablePurpose
__MISTRALRS_DAEMON_INTERNALSet by the engine on spawned worker processes.