Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration Reference

This document covers environment variables and server configuration for mistral.rs.

Runtime Environment Variables

VariableDescription
MISTRALRS_DEBUG=1Enable debug mode: outputs tensor info files for GGUF/GGML models, increases logging verbosity
MISTRALRS_NO_MMAP=1Disable memory-mapped file loading, forcing all tensor data into memory
MISTRALRS_NO_MLA=1Disable MLA (Multi-head Latent Attention) optimization for DeepSeek V2/V3 and GLM-4.7-Flash
MISTRALRS_ISQ_SINGLETHREAD=1Force ISQ (In-Situ Quantization) to run single-threaded
MCP_CONFIG_PATHFallback path for MCP client configuration (used if --mcp-config not provided)
KEEP_ALIVE_INTERVALSSE keep-alive interval in milliseconds (default: 10000)
HF_HUB_CACHEOverride Hugging Face Hub cache directory

Build-Time Environment Variables

VariableDescription
MISTRALRS_METAL_PRECOMPILE=0Skip Metal kernel precompilation (useful for CI)
NVCC_CCBINSet CUDA compiler path
CUDA_NVCC_FLAGS=-fPIERequired on some Linux distributions
CUDA_COMPUTE_CAPOverride CUDA compute capability (e.g., “80” for RTX 3090)

Server Defaults

When running the HTTP server with mistralrs serve, these defaults apply:

SettingDefault Value
Server IP0.0.0.0 (all interfaces)
Max request body50 MB
Max running sequences16
Prefix cache count16
SSE keep-alive10 seconds
PagedAttention (CUDA)Enabled
PagedAttention (Metal)Disabled
PA GPU memory usage90% of free memory
PA block size32 tokens

Multi-Node Distributed Configuration

For multi-node setups, configure the head node and workers using environment variables.

Head Node

VariableDescription
MISTRALRS_MN_GLOBAL_WORLD_SIZETotal number of devices across all nodes
MISTRALRS_MN_HEAD_NUM_WORKERSNumber of worker nodes
MISTRALRS_MN_HEAD_PORTPort for head node communication

Worker Nodes

VariableDescription
MISTRALRS_MN_WORKER_SERVER_ADDRAddress of head server to connect to
MISTRALRS_MN_WORKER_IDThis worker’s ID
MISTRALRS_MN_LOCAL_WORLD_SIZENumber of GPUs on this node
MISTRALRS_NO_NCCL=1Disable NCCL (use alternative backend)

See Also