Agentic runtime for apps
mistral.rs can act as a local-first runtime for agent applications. A runtime request can include model generation, server-side tool execution, Python code execution that is sandboxed by default on Linux and macOS, web search, generated images or video frames, and persistent session state.
The most complete app-facing event stream today is /v1/chat/completions with stream: true. It emits normal OpenAI-compatible chunks plus mistral.rs agentic_tool_call_progress events.
Built-in runtime tools use strict tool calling by default. Web search, code execution, and file helper calls are constrained to their declared JSON Schemas before dispatch.
What the runtime does
Section titled “What the runtime does”| Runtime part | What mistral.rs provides |
|---|---|
| Model output | Chat-completion responses and streaming chunks. |
| Tool execution | Built-in search, code execution, MCP tools, callbacks, or HTTP tool dispatch. |
| Generated media | Captured images and video frames from tools as base64 fields. |
| Session state | Reusable session_id values for multi-turn tool and code state. |
This is the lane for applications that want local inference and local action in one process, without building a separate tool loop around a raw model server.
Agent permissions
Section titled “Agent permissions”agent_permission controls whether mistral.rs may run an agent action after the model asks for one. It applies to all server-executed actions, not just Python: code execution, web search, file tools, registered callbacks, and external tool dispatch.
| Mode | Behavior |
|---|---|
auto | Run the action as soon as the tool call is valid. |
ask | Pause before the action and ask the app, callback, or CLI user to approve it. |
deny | Keep the tool visible to the model, but return a denied tool result instead of running it. |
The server or runner policy is a floor. A request can tighten it, for example from auto to ask or deny, but cannot loosen a server started with --agent-permission ask or --agent-permission deny. code_execution_permission and --code-exec-permission are compatibility aliases for code-execution-focused apps; prefer agent_permission for new code.
Permissioning is separate from sandboxing. Permission mode decides whether an action may start. The sandbox controls what generated Python can access after it starts.
CLI, the built-in UI, HTTP, Rust, and Python expose the same approval semantics:
| Concept | Meaning |
|---|---|
| Approve or deny | Allow the action, or return a denied tool result to the model. |
message | Optional deny message returned to the model as the tool result. |
remember_for_session | On approve, skip later approval prompts for the same session_id. |
In interactive mode, ask prompts inline before each agent action. Choosing always approves later actions in the same CLI session.
mistralrs run --agent -m google/gemma-4-E4B-it --agent-permission askdeny is useful when you want to inspect proposed actions without letting them run:
mistralrs run --agent -m google/gemma-4-E4B-it --agent-permission denyBuilt-in UI
Section titled “Built-in UI”The built-in UI has a Tool approval control in the settings drawer. Set it to ask to show approval cards inline before agent actions run, or to deny to keep tool calls visible while denying execution.
Approval cards show the tool metadata and decision controls, with collapsible arguments when useful. Choose Approve, Always, or Deny; Always sets remember_for_session for the current chat session.
For HTTP, ask is only supported with stream: true. The stream emits agentic_tool_approval_required, then waits while the app resolves that approval.
{ "model": "default", "stream": true, "messages": [ {"role": "user", "content": "Use Python to inspect data.csv."} ], "enable_code_execution": true, "agent_permission": "ask", "session_id": "analysis-demo"}The approval event contains stable public metadata for app display and routing:
event: agentic_tool_approval_requireddata: {"approval_id":"appr_abc123","session_id":"analysis-demo","round":1,"tool":{"source":"built_in","kind":"code_execution","label":"Python code"},"arguments":{"code":"...","outputs":[]}}Resolve it with POST /v1/agent/approvals/{approval_id}:
{"decision":"approve","remember_for_session":true}decision is approve or deny. A deny response may include message, which is returned to the model as the tool result. remember_for_session: true on an approve response is the HTTP version of “always for this chat”: later actions in the same session_id do not ask again.
If an approval is not resolved, the action is denied after five minutes.
The approval endpoint returns {"status":"resolved"}, {"status":"queued"}, or {"status":"not_found"}. See the HTTP API reference for the exact wire schema and the HTTP approval example for a complete client.
Python SDK
Section titled “Python SDK”For Python, set agent_permission and pass an agent_approval_callback on the request. The callback receives an AgentToolApproval with approval_id, session_id, round, stable tool metadata, arguments_json, and a convenience code field when the action is Python code. Return True or False for simple callbacks, or return AgentToolApprovalDecision for deny messages and remember_for_session.
from mistralrs import ( AgentPermission, AgentToolApprovalDecision, AgentToolKind, ChatCompletionRequest,)
def approve(call): print(call.tool.label) if call.tool.kind == AgentToolKind.CodeExecution: print(call.code or "") else: print(call.arguments_json) answer = input("Approve? [y/N/a] ").strip().lower() if answer == "a": return AgentToolApprovalDecision.approve(remember_for_session=True) if answer in {"y", "yes"}: return AgentToolApprovalDecision.approve() return AgentToolApprovalDecision.deny("The user denied this action.")
request = ChatCompletionRequest( model="default", messages=[{"role": "user", "content": "Plot sin(x)."}], enable_code_execution=True, agent_permission=AgentPermission.Ask, agent_approval_callback=approve,)See the Python approval example and the Python agent approval reference.
Rust SDK
Section titled “Rust SDK”For Rust, set AgentPermission::Ask and pass an AgentToolApprovalCallback. The callback receives approval_id, session_id, round, stable tool metadata, and JSON arguments. Return AgentToolApprovalDecision::approve(), approve_for_session(), deny(None), or deny_with_message(...).
use std::sync::Arc;
use mistralrs::{ AgentPermission, AgentToolApprovalCallback, AgentToolApprovalDecision, RequestBuilder,};
let approval: AgentToolApprovalCallback = Arc::new(|approval| { println!("{}", approval.tool.label); println!("{}", approval.arguments); AgentToolApprovalDecision::approve_for_session()});
let request = RequestBuilder::from(messages) .with_code_execution() .with_agent_permission(AgentPermission::Ask) .with_agent_approval_callback(approval);Rust also supports with_agent_approval_async_callback for approval flows backed by async state, such as a UI event, database row, or message queue.
Both Rust callback forms use the same approval semantics. If an approval handler fails, the action is denied.
let request = RequestBuilder::from(messages) .with_code_execution() .with_agent_permission(AgentPermission::Ask) .with_agent_approval_async_callback(|approval| async move { println!("{}", approval.tool.label); AgentToolApprovalDecision::approve() });See the Rust approval example.
HTTP run stream
Section titled “HTTP run stream”Start a server with the tools your app is allowed to use:
mistralrs serve --agent -m google/gemma-4-E4B-it(--agent is a shorthand for --enable-search --enable-code-execution; the UI is on by default.)
Send a streaming chat-completions request:
{ "model": "default", "stream": true, "messages": [ { "role": "user", "content": "Use Python to plot sin(x), then explain the chart." } ], "enable_code_execution": true, "agent_permission": "auto", "web_search_options": {}, "max_tool_rounds": 4, "session_id": "analysis-demo"}Model output arrives as standard chat-completion chunks. Tool progress arrives as named SSE events:
event: agentic_tool_call_progressdata: {"type":"agentic_tool_call_progress","round":0,"tool_name":"mistralrs_execute_python","phase":"calling","data":{"tool_type":"code_execution","code":"print('hello')"}}A complete code-execution event can include captured output and media:
{ "type": "agentic_tool_call_progress", "round": 0, "tool_name": "mistralrs_execute_python", "phase": "complete", "data": { "tool_type": "code_execution", "stdout": "saved plot\n", "images_base64": ["..."], "video_frames_base64": ["..."], "video_frame_count": 12, "working_directory": "/tmp/mistralrs-code-demo", "execution_time_ms": 118 }}Event fields
Section titled “Event fields”| Field | Meaning |
|---|---|
type | agentic_tool_call_progress. |
round | Agentic loop round. |
tool_name | Tool that started or completed. |
phase | calling or complete. |
data.tool_type | code_execution, web_search, or custom. |
Tool-specific data:
| Tool type | Fields |
|---|---|
code_execution | code, stdout, stderr, exception, images_base64, video_frames_base64, video_frame_count, working_directory, execution_time_ms. |
web_search | query, results_count. |
custom | arguments, content. |
A File is a typed output produced by a tool (typically code execution). Each file has a stable id, a name, a format, a mime type, a size in bytes, and either an inline body or a reference for fetching it. Files are first-class on the wire: they ride alongside the model transcript, not buried inside tool output strings.
Declare required outputs on the request to give the model a contract:
{ "model": "default", "messages": [ {"role": "user", "content": "Generate a sin(x) plot and a CSV of the samples."} ], "enable_code_execution": true, "files": [ {"name": "plot.png", "format": "png"}, {"name": "samples.csv", "format": "csv", "description": "x, sin(x) columns"} ]}The non-streaming response carries the produced files in a top-level files array:
{ "files": [ { "id": "file_abc_r0_0", "name": "plot.png", "format": "png", "mime_type": "image/png", "bytes": 14823, "source": {"tool": "mistralrs_execute_python", "round": 0, "turn": 0}, "data_base64": "iVBORw0KGgo..." }, { "id": "file_abc_r0_1", "name": "samples.csv", "format": "csv", "mime_type": "text/csv", "bytes": 412, "source": {"tool": "mistralrs_execute_python", "round": 0, "turn": 0}, "text": "x,y\n0,0\n..." } ]}When streaming, each file is emitted as soon as it is produced via a named SSE event:
event: file_produceddata: {"id":"file_abc_r0_0","name":"plot.png","format":"png","mime_type":"image/png","bytes":14823,"source":{"tool":"mistralrs_execute_python","round":0,"turn":0},"data_base64":"iVBORw0KGgo..."}Each agentic_tool_calls[*] record gains a file_ids field that lists the ids of files attributable to that round, so apps can correlate files with the tool that wrote them.
Size policy
Section titled “Size policy”| Body | Inline | Above the cap |
|---|---|---|
| Up to 8 MB | Full body in text / data_base64. | The body field is omitted; fetch via GET /v1/files/{id}/content. |
Inside the model’s context, text files only ever expose the first 1024 bytes as a preview; the model uses read_file to inspect more. is_truncated() on the SDK File returns true when the wire body was elided.
Model tools
Section titled “Model tools”When the request declares files or enable_code_execution is on, the model gets two helper tools registered automatically:
read_file(file_id, start?, end?)— fetch part or all of a file the model has emitted.list_files()— enumerate files produced so far in this turn.
The Python executor tool also accepts an outputs: [string] parameter the model can use to declare what it wrote. Files declared via request.files are surfaced regardless.
Fetching by id
Section titled “Fetching by id”GET /v1/files/{id}/content returns the raw bytes with Content-Type, Content-Length, and Content-Disposition set. GET /v1/files/{id} returns the OpenAI-style metadata JSON. Status codes for the content endpoint:
| Code | Meaning |
|---|---|
| 200 | Body returned. |
| 404 | Unknown or expired file id. |
| 410 | File body was elided (request via the producing run instead). |
| 422 | The file is an error placeholder. |
Sessions
Section titled “Sessions”Use session_id when your app needs continuity across requests. Sessions can preserve tool history, generated media references, and code-execution state.
| Endpoint | Purpose |
|---|---|
GET /v1/sessions/{session_id} | Export a serialized session. |
PUT /v1/sessions/{session_id} | Import or replace a session. |
DELETE /v1/sessions/{session_id} | Delete a session. |
SDK boundaries
Section titled “SDK boundaries”| Surface | Current behavior |
|---|---|
| HTTP | Best surface for live model chunks, tool-progress timelines, files, and agent approval events. |
| Rust SDK | Model::stream_chat_request yields raw Response::AgenticToolCallProgress events. |
| Python SDK | Supports agentic requests, callbacks, code execution, and sessions. The streaming iterator currently yields model chunks; use HTTP SSE for the full timeline. |
| Web UI | Renders code execution, search, reasoning blocks, generated media, and approval cards inline. |
Security
Section titled “Security”Code execution runs with the permissions of the configured Python interpreter. Use agent_permission: "ask" or "deny" per request when an app needs tighter control over any server-executed agent action; a server-wide --agent-permission ask or deny cannot be loosened by the request. HTTP "ask" approval is app-driven over SSE, not a server terminal prompt. For untrusted users, run mistral.rs in a container or VM, use a low-privilege user, and constrain network access.