Skip to content

Tool calling

Tool calling lets the model request external work via a structured invocation. mistral.rs supports the standard OpenAI client-side flow plus a server-side loop that runs tools inside one request:

{
"model": "default",
"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given city",
"strict": true,
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
"additionalProperties": false
}
}
}
]
}

Client-side loop. Standard OpenAI flow: the model emits a tool_calls field, the caller runs the tool, sends the result back as a tool message, and the model produces another response.

Server-side loop. mistral.rs runs the entire tool loop inside one request: built-in tools (web search, code execution, shell, OpenAI-compatible Skills), MCP (Model Context Protocol) tools, SDK callbacks, or HTTP dispatch. The client sends one request and receives one final reply. See how the loop runs for the round structure.

Tool definitions follow the OpenAI schema (see the request above). When the model calls the tool, the response carries a tool_calls array:

{
"choices": [{
"message": {
"role": "assistant",
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {"name": "get_weather", "arguments": "{\"city\": \"Tokyo\"}"}
}]
},
"finish_reason": "tool_calls"
}]
}

The caller invokes the real API and sends the result back:

{
"messages": [
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "tool_calls": [...]},
{"role": "tool", "tool_call_id": "call_abc123", "content": "{\"temperature\": 18}"}
],
"tools": [...]
}

Full example

  • tool_choice: "none": disable tool calling for the request.
  • tool_choice: "auto" (default): model decides.
  • tool_choice: {"type": "function", "function": {"name": "..."}}: force a specific tool.
  • tool_choice: {"type": "function", "name": "..."}: force a specific function tool in the Responses API.
  • tool_choice: {"type": "allowed_tools", "mode": "auto"|"required", "tools": [{"type": "function", "name": "..."}]}: restrict the request to a subset of function tools, optionally requiring one of them.
  • tool_choice: "required": require at least one tool call. Requests with required and no available tools are rejected.

For required and named tool choices, mistral.rs treats the tool call as an engine obligation. The model may still emit normal text or reasoning first, but the sequence is not allowed to finish until a valid tool call is produced. If the model has not called a tool by the internal deadline, decoding switches to a constrained tool-call grammar so the remaining generation budget is reserved for the required call. When the chat template exposes a known tool-call format, the forced grammar uses that model-native wrapper, including Liquid/LFM2.5, Harmony/GPT-OSS, Gemma 4, Qwen-style formats, and other detected formats.

Hosted tools such as Responses web search, code interpreter, and shell execution are selected by adding them to tools; exact hosted-tool forcing/filtering through tool_choice is not currently supported.

allowed_tools restricts the grammar-visible function set without removing other function definitions from the request. With mode: "required", the model must call one of the allowed functions before the sequence can finish.

from openai import OpenAI
client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")
completion = client.chat.completions.create(
model="default",
messages=[{"role": "user", "content": "Use a tool to help me plan for Tokyo."}],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
"strict": True,
},
},
{
"type": "function",
"function": {
"name": "book_flight",
"description": "Book a flight to a city.",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
"strict": True,
},
},
],
tool_choice={
"type": "allowed_tools",
"mode": "required",
"tools": [{"type": "function", "name": "get_weather"}],
},
)
print(completion.choices[0].message.tool_calls[0].function.name)

Full example

With function.strict: true, mistral.rs constrains the generated arguments to the tool's parameters JSON Schema during decoding. Non-strict tools are only parsed as JSON after generation completes, so malformed output can still appear. Use strict mode when malformed or extra arguments would be expensive, unsafe, or annoying to handle in application code.

Built-in tools are strict by default, so the corresponding request fields get constrained arguments without adding strict yourself:

  • Web search and page extraction (web_search_options on Chat Completions, tools: [{"type":"web_search"}] on Responses).
  • Python code execution (tools: [{"type":"code_interpreter","container":{"type":"auto"}}]).
  • The read_file / list_files file helpers (declared files).
  • MCP tools, whenever the MCP server provides an input schema.

Notes:

  • Strict mode does not force the model to call a tool; use tool_choice for that. It only constrains the argument object if the model calls a strict tool.
  • For a tight schema, include required, enum, nested object schemas, array item schemas, and additionalProperties: false.
  • A tool marked strict without a parameters schema falls back to a generic object schema.
  • Strict tool calling is separate from response_format: {"type": "json_schema", ...}: tool strictness constrains a tool call's arguments, response-format schemas constrain the assistant's final text.
Terminal window
mistralrs serve --enable-search --enable-code-execution -m <model>

--agent is shorthand for both flags; see build an agent. The built-in tools are covered in web search and code execution.

For custom tools, the cleanest path is to run them as an MCP server and connect mistralrs as a client; see connect to an MCP server. The SDKs can also register custom callbacks directly: Python uses Runner(tool_callbacks=...) (full example); Rust builders use with_tool_callback(...) or with_tool_callback_and_tool(...) (full example).

Cap the rounds before the loop forces a final response. Unset by default; the loop's internal fallback cap is 256. Connected MCP tools share the same cap.

Terminal window
mistralrs serve --max-tool-rounds 10 -m <model>

--max-tool-rounds sets the server default. A request can override it with its own max_tool_rounds.

--tool-dispatch-url POSTs each tool call to an external URL instead of running it in-process:

Terminal window
mistralrs serve --tool-dispatch-url http://localhost:7070/tools -m <model>

Request body sent to the dispatch URL:

{"name": "search", "arguments": {"query": "mistralrs"}}

Expected response: a bare string (the tool result) or {"content": "..."}. The dispatch URL is server-level only; it cannot be set per-request over HTTP. Full example

Per-request fields:

  • tools, tool_choice: tools the model can see; force, disable, or model-decides.
  • web_search_options for Chat Completions search; Responses uses tools: [{"type":"web_search"}].
  • tools: [{"type":"code_interpreter","container":{"type":"auto"}}]: opt into the built-in Python tools when the server has code execution enabled.
  • max_tool_rounds: override the server default for one request.
  • session_id: reuse persistent agentic state.
  • agent_permission: tighten the permission mode.

Server-level flags:

  • --max-tool-rounds, --tool-dispatch-url
  • --enable-search, --enable-code-execution, --agent
  • --search-embedding-model
  • --code-exec-python, --code-exec-workdir, --code-exec-timeout
  • --agent-permission

See the serve CLI reference for the full flag list.

When the server-side loop runs, the response includes an agentic_tool_calls array (one entry per executed round) and, when streaming, agentic_tool_call_progress Server-Sent Events (SSE) around each tool execution. The wire format for both is in the HTTP API reference.