Skip to content

Connect to an MCP server

mistral.rs can act as a Model Context Protocol (MCP) client, connecting to one or more MCP servers at startup and merging their tools into the model's available set.

MCP tools automatically use strict tool calling when the MCP server provides an input schema.

The same connect task is available on the CLI/server (via a config file), the Python SDK, and the Rust SDK.

Terminal window
mistralrs serve --mcp-config mcp.json -m <model>

The MCP_CONFIG_PATH environment variable is an alternative to the flag.

Minimal mcp.json:

{
"servers": [
{
"name": "filesystem",
"source": {
"type": "Process",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
],
"auto_register_tools": true,
"tool_timeout_secs": 30,
"max_concurrent_calls": 4
}

Full schema: MCP config schema reference.

Three source.type values are supported:

  • Process: launch the server as a subprocess and communicate over stdio.
  • Http: connect over HTTP.
  • WebSocket: connect over WebSockets.

Remote servers can carry per-server auth and naming. A Hugging Face MCP entry, for example:

{
"servers": [
{
"name": "Hugging Face MCP",
"source": {"type": "Http", "url": "https://hf.co/mcp", "timeout_secs": 30},
"bearer_token": "hf_xxx",
"tool_prefix": "hf"
}
],
"auto_register_tools": true
}

bearer_token is sent as an Authorization header; enabled: false keeps an entry in the file without connecting it.

Each server's tools are exposed to the model as <prefix>_<tool> (separator is an underscore). The prefix is:

  • tool_prefix when set on the server entry.
  • otherwise an auto-generated mcp_<uuid>.

max_concurrent_calls caps in-flight MCP calls (default 10). tool_timeout_secs is the per-call timeout (default 30s). An Http source can set its own timeout_secs for that server's requests. (The WebSocket source carries a timeout_secs field but it is currently unused.)

GET /v1/models reports MCP status per model:

{
"object": "list",
"data": [{
"id": "Qwen/Qwen3-4B",
"object": "model",
"tools_available": true,
"mcp_tools_count": 3,
"mcp_servers_connected": 1
}]
}

Failure handling:

  • Invalid configurations are reported at startup.
  • MCP initialization is all-or-nothing: if any configured server fails to connect, initialization aborts, a warning is logged, and the HTTP server continues with MCP disabled (no MCP tools from any server). Surviving servers do not stay connected.
  • Once connected, individual tool-call failures do not crash the server.

MCP tool calls appear in agentic_tool_calls records and agentic_tool_call_progress streaming events with the prefixed name. They share the tool-round cap.

Full examples: Python MCP client, Rust MCP client, HTTP chat with MCP tools.