The agentic loop
With tools enabled, the engine runs the tool and continues the request without returning control to the client.
Entry conditions
Section titled “Entry conditions”The loop runs only when the model emits a tool call. The server advertises tools to the model in any of these cases:
--enable-searchis on (advertises the web search tool).--enable-code-executionis on and the request setsenable_code_execution: true(advertises themistralrs_execute_pythontool).- A registered tool callback exists (Rust/Python SDK
tool_callbacksor MCP client tools). --tool-dispatch-urlis set.
Otherwise the request is dispatched normally and this page does not apply.
Round structure
Section titled “Round structure”Each iteration:
- The engine runs inference. The result is a model response that either contains tool calls or does not.
- If the response contains no tool calls, the loop exits and the response is forwarded to the client. If more than one tool call is returned, only the first is executed and a warning is logged.
- The loop emits a progress event with phase
callingand the tool arguments. - The tool is executed through one of four paths: built-in web search, built-in code execution, a registered callback, or a POST to
--tool-dispatch-url. - The loop emits a progress event with phase
completeand the structured result. - The message history is extended with the assistant’s tool-call message and a
tool-role response, so the next inference pass sees the outcome. - If the round counter reaches the configured cap, the loop exits without another tool opportunity.
The cap is set by --max-tool-rounds. When unset, the loop uses an internal fallback of 256 rounds.
Progress events
Section titled “Progress events”Non-streaming responses include an agentic_tool_calls array with one entry per executed round. Streaming responses emit agentic_tool_call_progress Server-Sent Events around each tool execution.
Event shape:
- Phase
calling: before the tool runs. Includes the tool name and parsed arguments. - Phase
complete: after the tool runs. Data is tool-type-specific:- Code execution:
code,stdout,stderr,exception,images_base64,video_frames_base64,video_frame_count,working_directory,execution_time_ms. - Web search:
query,results_count. - Custom tools:
arguments,content.
- Code execution:
The loop also produces typed File outputs alongside the tool-call records. When the request declares files: [...] or a tool writes into the working directory and lists the file in its outputs parameter, the runtime captures it, attributes it to the producing round, and emits it as a file_produced SSE event during streaming or as a top-level files[] entry on the non-streaming response. Each agentic_tool_calls[*].file_ids lists the ids attributable to that round. See agentic runtime: files.
Session interaction
Section titled “Session interaction”At termination, the expanded message list (synthesized assistant tool-call messages and tool-role responses) is written back to the session. On the next request with the same session id, that history is spliced back in.
Client-side path
Section titled “Client-side path”If none of the entry conditions are met, the request is dispatched directly. The model’s tool_calls field is returned to the client and the client runs the next round. This is the standard OpenAI-compatible flow.
See also
Section titled “See also”- Guide: agentic runtime for apps, tool calling basics, configure the tool loop.
- Reference: HTTP API.