Tool calling
Tool calling lets the model request external work via a structured invocation. mistral.rs supports the standard OpenAI client-side flow plus a server-side loop that runs tools inside one request:
{ "model": "default", "messages": [{"role": "user", "content": "What's the weather in Tokyo?"}], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather in a given city", "strict": true, "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], "additionalProperties": false } } } ]}The two modes
Section titled “The two modes”Client-side loop. Standard OpenAI flow: the model emits a tool_calls field, the caller runs the tool, sends the result back as a tool message, and the model produces another response.
Server-side loop. mistral.rs runs the entire tool loop inside one request: built-in tools (web search, code execution, shell, OpenAI-compatible Skills), MCP (Model Context Protocol) tools, SDK callbacks, or HTTP dispatch. The client sends one request and receives one final reply. See how the loop runs for the round structure.
Client-side: defining tools
Section titled “Client-side: defining tools”Tool definitions follow the OpenAI schema (see the request above). When the model calls the tool, the response carries a tool_calls array:
{ "choices": [{ "message": { "role": "assistant", "tool_calls": [{ "id": "call_abc123", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\": \"Tokyo\"}"} }] }, "finish_reason": "tool_calls" }]}The caller invokes the real API and sends the result back:
{ "messages": [ {"role": "user", "content": "What's the weather in Tokyo?"}, {"role": "assistant", "tool_calls": [...]}, {"role": "tool", "tool_call_id": "call_abc123", "content": "{\"temperature\": 18}"} ], "tools": [...]}Pass OpenAI-compatible tool schemas as JSON strings on ChatCompletionRequest.tool_schemas:
import jsonfrom mistralrs import ChatCompletionRequest, Runner, ToolChoice, Which
tool_schema = json.dumps({ "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a city.", "strict": True, "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], }, },})
runner = Runner(which=Which.Plain(model_id="Qwen/Qwen3-4B"), in_situ_quant="4")response = runner.send_chat_completion_request( ChatCompletionRequest( model="default", messages=[{"role": "user", "content": "What's the weather in Tokyo?"}], tool_schemas=[tool_schema], tool_choice=ToolChoice.Auto, ))print(response.choices[0].message.tool_calls)Run the tool yourself and send the result back as a tool message, or register a matching tool_callbacks entry on Runner to have mistral.rs execute it server-side.
Set up a Tool with a Function definition and inspect tool_calls on the response:
use std::collections::HashMap;
use anyhow::Result;use mistralrs::{ Function, IsqBits, ModelBuilder, RequestBuilder, TextMessageRole, Tool, ToolChoice, ToolType,};use serde_json::{json, Value};
#[tokio::main]async fn main() -> Result<()> { let model = ModelBuilder::new("Qwen/Qwen3-4B") .with_auto_isq(IsqBits::Four) .build() .await?;
let parameters: HashMap<String, Value> = serde_json::from_value(json!({ "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], "additionalProperties": false }))?;
let tools = vec![Tool { tp: ToolType::Function, function: Function { name: "get_weather".to_string(), description: Some("Get the current weather in a city.".to_string()), parameters: Some(parameters), strict: Some(true), }, }];
let request = RequestBuilder::new() .add_message(TextMessageRole::User, "What's the weather in Tokyo?") .set_tools(tools) .set_tool_choice(ToolChoice::Auto);
let response = model.send_chat_request(request).await?; println!("{:?}", response.choices[0].message.tool_calls); Ok(())}For server-side execution, register the callback together with the Tool definition via with_tool_callback_and_tool.
Forcing or suppressing tool use
Section titled “Forcing or suppressing tool use”tool_choice: "none": disable tool calling for the request.tool_choice: "auto"(default): model decides.tool_choice: {"type": "function", "function": {"name": "..."}}: force a specific tool.tool_choice: {"type": "function", "name": "..."}: force a specific function tool in the Responses API.tool_choice: {"type": "allowed_tools", "mode": "auto"|"required", "tools": [{"type": "function", "name": "..."}]}: restrict the request to a subset of function tools, optionally requiring one of them.tool_choice: "required": require at least one tool call. Requests withrequiredand no available tools are rejected.
For required and named tool choices, mistral.rs treats the tool call as an engine obligation. The model may still emit normal text or reasoning first, but the sequence is not allowed to finish until a valid tool call is produced. If the model has not called a tool by the internal deadline, decoding switches to a constrained tool-call grammar so the remaining generation budget is reserved for the required call. When the chat template exposes a known tool-call format, the forced grammar uses that model-native wrapper, including Liquid/LFM2.5, Harmony/GPT-OSS, Gemma 4, Qwen-style formats, and other detected formats.
Hosted tools such as Responses web search, code interpreter, and shell execution are selected by adding them to tools; exact hosted-tool forcing/filtering through tool_choice is not currently supported.
Allowed Tools Examples
Section titled “Allowed Tools Examples”allowed_tools restricts the grammar-visible function set without removing other function definitions from the request. With mode: "required", the model must call one of the allowed functions before the sequence can finish.
from openai import OpenAI
client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")
completion = client.chat.completions.create( model="default", messages=[{"role": "user", "content": "Use a tool to help me plan for Tokyo."}], tools=[ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a city.", "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], }, "strict": True, }, }, { "type": "function", "function": { "name": "book_flight", "description": "Book a flight to a city.", "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], }, "strict": True, }, }, ], tool_choice={ "type": "allowed_tools", "mode": "required", "tools": [{"type": "function", "name": "get_weather"}], },)
print(completion.choices[0].message.tool_calls[0].function.name)import OpenAI from "openai";
const client = new OpenAI({ apiKey: "foobar", baseURL: "http://localhost:1234/v1/",});
const completion = await client.chat.completions.create({ model: "default", messages: [{ role: "user", content: "Use a tool to help me plan for Tokyo." }], tools: [ { type: "function", function: { name: "get_weather", description: "Get the current weather for a city.", parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"], }, strict: true, }, }, { type: "function", function: { name: "book_flight", description: "Book a flight to a city.", parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"], }, strict: true, }, }, ], tool_choice: { type: "allowed_tools", mode: "required", tools: [{ type: "function", name: "get_weather" }], },});
console.log(completion.choices[0].message.tool_calls[0].function.name);curl http://localhost:1234/v1/chat/completions \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer foobar' \ -d '{ "model": "default", "messages": [{"role": "user", "content": "Use a tool to help me plan for Tokyo."}], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a city.", "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"] }, "strict": true } }, { "type": "function", "function": { "name": "book_flight", "description": "Book a flight to a city.", "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"] }, "strict": true } } ], "tool_choice": { "type": "allowed_tools", "mode": "required", "tools": [{"type": "function", "name": "get_weather"}] } }'use mistralrs::{ AllowedToolChoice, AllowedToolsMode, AllowedToolsToolChoice, AllowedToolsToolChoiceType, ToolChoice,};
let tool_choice = ToolChoice::AllowedTools(AllowedToolsToolChoice { tp: AllowedToolsToolChoiceType::AllowedTools, mode: AllowedToolsMode::Required, tools: vec![AllowedToolChoice::Function { name: "get_weather".to_string(), }],});Strict tool calling
Section titled “Strict tool calling”With function.strict: true, mistral.rs constrains the generated arguments to the tool's parameters JSON Schema during decoding. Non-strict tools are only parsed as JSON after generation completes, so malformed output can still appear. Use strict mode when malformed or extra arguments would be expensive, unsafe, or annoying to handle in application code.
Built-in tools are strict by default, so the corresponding request fields get constrained arguments without adding strict yourself:
- Web search and page extraction (
web_search_optionson Chat Completions,tools: [{"type":"web_search"}]on Responses). - Python code execution (
tools: [{"type":"code_interpreter","container":{"type":"auto"}}]). - The
read_file/list_filesfile helpers (declaredfiles). - MCP tools, whenever the MCP server provides an input schema.
Notes:
- Strict mode does not force the model to call a tool; use
tool_choicefor that. It only constrains the argument object if the model calls a strict tool. - For a tight schema, include
required,enum, nested object schemas, array item schemas, andadditionalProperties: false. - A tool marked strict without a
parametersschema falls back to a generic object schema. - Strict tool calling is separate from
response_format: {"type": "json_schema", ...}: tool strictness constrains a tool call's arguments, response-format schemas constrain the assistant's final text.
Server-side: enabling built-in tools
Section titled “Server-side: enabling built-in tools”mistralrs serve --enable-search --enable-code-execution -m <model>--agent is shorthand for both flags; see build an agent. The built-in tools are covered in web search and code execution.
For custom tools, the cleanest path is to run them as an MCP server and connect mistralrs as a client; see connect to an MCP server. The SDKs can also register custom callbacks directly: Python uses Runner(tool_callbacks=...) (full example); Rust builders use with_tool_callback(...) or with_tool_callback_and_tool(...) (full example).
Configuring the tool loop
Section titled “Configuring the tool loop”Maximum tool rounds
Section titled “Maximum tool rounds”Cap the rounds before the loop forces a final response. Unset by default; the loop's internal fallback cap is 256. Connected MCP tools share the same cap.
mistralrs serve --max-tool-rounds 10 -m <model>--max-tool-rounds sets the server default. A request can override it with its own max_tool_rounds.
{ "model": "default", "messages": [{"role": "user", "content": "..."}], "max_tool_rounds": 10}ChatCompletionRequest( model="default", messages=[{"role": "user", "content": "..."}], max_tool_rounds=10,)let request = RequestBuilder::new() .add_message(TextMessageRole::User, "...") .with_max_tool_rounds(10);Dispatch URL
Section titled “Dispatch URL”--tool-dispatch-url POSTs each tool call to an external URL instead of running it in-process:
mistralrs serve --tool-dispatch-url http://localhost:7070/tools -m <model>Request body sent to the dispatch URL:
{"name": "search", "arguments": {"query": "mistralrs"}}Expected response: a bare string (the tool result) or {"content": "..."}. The dispatch URL is server-level only; it cannot be set per-request over HTTP. Full example
Per-request versus server-level
Section titled “Per-request versus server-level”Per-request fields:
tools,tool_choice: tools the model can see; force, disable, or model-decides.web_search_optionsfor Chat Completions search; Responses usestools: [{"type":"web_search"}].tools: [{"type":"code_interpreter","container":{"type":"auto"}}]: opt into the built-in Python tools when the server has code execution enabled.max_tool_rounds: override the server default for one request.session_id: reuse persistent agentic state.agent_permission: tighten the permission mode.
Server-level flags:
--max-tool-rounds,--tool-dispatch-url--enable-search,--enable-code-execution,--agent--search-embedding-model--code-exec-python,--code-exec-workdir,--code-exec-timeout--agent-permission
See the serve CLI reference for the full flag list.
Response extensions
Section titled “Response extensions”When the server-side loop runs, the response includes an agentic_tool_calls array (one entry per executed round) and, when streaming, agentic_tool_call_progress Server-Sent Events (SSE) around each tool execution. The wire format for both is in the HTTP API reference.