Skip to content

Server configuration

For the full TOML schema, see the CLI TOML config reference. For prose, see the HTTP server guide.

CLI flagTOML keyDefaultMeaning
--hostserver.host0.0.0.0Bind interface.
-p, --portserver.port1234TCP port.
CLI flagTOML keyDefaultMeaning
--no-uiserver.no_uifalseDisable the built-in web UI (mounted at /ui by default).
CLI flagTOML keyDefaultMeaning
--mcp-portserver.mcp_portnot setEnable the MCP server on this port.
--mcp-configserver.mcp_confignot setPath to MCP client config (outbound servers).
CLI flagTOML keyDefaultMeaning
--agent (alias --agentic)runtime.agentfalseBuild a local agent: enables search and code execution with a per-session temp workdir.
--enable-searchruntime.enable_searchfalseEnable web search tool.
--enable-code-executionruntime.enable_code_executionfalseEnable Python code execution.
--max-tool-roundsserver.max_tool_rounds256Cap on agentic tool loop rounds.
--tool-dispatch-urlserver.tool_dispatch_urlnot setExternal URL for tool execution.
--search-embedding-modelruntime.search_embedding_modelnot setReranker for web search. Only embedding-gemma accepted.
--code-exec-pythonruntime.code_exec_pythonpython on Windows, python3 elsewherePython interpreter for code execution.
--code-exec-workdirruntime.code_exec_workdirper-session temp dirCode execution working directory.
--code-exec-timeoutruntime.code_exec_timeout30Code execution timeout (seconds).
--agent-permissionruntime.agent_permissionautoauto, ask, or deny. Controls whether agent actions run automatically, require approval, or are denied. See agent permissions. --code-exec-permission and runtime.code_exec_permission are accepted as aliases.
CLI flagTOML keyDefaultMeaning
--paged-attnpaged_attn.modeautoauto, on, or off.
--pa-context-lenpaged_attn.context_lennot setKV cache context length.
--pa-memory-mbpaged_attn.memory_mbnot setKV cache budget in MB.
--pa-memory-fractionpaged_attn.memory_fractionnot setKV cache budget as a fraction of VRAM.
--pa-block-sizepaged_attn.block_sizenot setTokens per block.
--pa-cache-typepaged_attn.cache_typeautoKV cache quantization type.

CORS allowed origins and the request body limit (default 50 MB) are configurable only programmatically through MistralRsServerRouterBuilder in mistralrs-server-core.

VariableMeaning
RUST_LOGOverride the tracing log filter. CLI users can usually use -v or -vv instead.
HF_HOMEHugging Face cache root.
HF_TOKENOverride cached auth token.
HF_HUB_OFFLINEHF_HUB_OFFLINE=1 runs fully offline; only the local Hugging Face cache is consulted and no network calls are made.
MCP_CONFIG_PATHAlternative to --mcp-config.

Full list: environment variables.