Skip to content

HTTP API reference

mistral.rs implements the OpenAI Chat Completions API, the Responses API, and a few mistral.rs-specific endpoints. This page lists every path with its request and response shape.

Fields not documented here are either standard OpenAI fields (pass through unchanged) or ignored. mistral.rs-specific extensions are called out explicitly.

Chat completion request.

{
"model": "default",
"messages": [ ... ],
"max_tokens": 256,
"temperature": 0.7,
"stream": false,
"tools": [ ... ],
"tool_choice": "auto",
"session_id": "optional-string",
"web_search_options": { ... },
"enable_code_execution": false,
"agent_permission": "auto",
"max_tool_rounds": 4
}

tools accepts OpenAI-compatible function tool definitions. mistral.rs also honors tools[*].function.strict: true, which constrains generated tool arguments to the tool’s parameters JSON Schema. See strict tool calling.

Response (non-streaming):

{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "default",
"choices": [ ... ],
"usage": { ... },
"session_id": "...",
"agentic_tool_calls": [ ... ]
}

mistral.rs-specific request fields include session_id, web_search_options, enable_code_execution, agent_permission, max_tool_rounds, and files. The server must be started with the corresponding capabilities, such as --enable-search or --enable-code-execution.

agent_permission accepts "auto", "ask", or "deny" and applies to server-executed agent actions: code execution, web search, file tools, registered callbacks, and external tool dispatch. code_execution_permission is accepted as a compatibility alias. See agent permissions for the shared behavior across CLI, HTTP, Python, and Rust.

Over HTTP, "ask" requires stream: true. The stream emits a named agentic_tool_approval_required event when an action needs approval, then waits for the app to approve or deny it with POST /v1/agent/approvals/{approval_id}. Non-streaming chat requests with "ask" return a validation error.

mistral.rs-specific response fields: session_id (string), agentic_tool_calls (array of tool-call records from the agentic loop, each with a file_ids array), files (array of File objects produced during the request).

When stream: true, the response is Server-Sent Events: unnamed data: lines carry chat completion chunks, named agentic_tool_call_progress events carry tool-loop milestones, named agentic_tool_approval_required events carry pending agent approvals, and named file_produced events carry each typed file emitted during the run. Stream terminates with data: [DONE].

Approval event:

event: agentic_tool_approval_required
data: {"approval_id":"appr_abc123","session_id":"...","round":1,"tool":{"source":"built_in","kind":"code_execution","label":"Python code"},"arguments":{"code":"...","outputs":[]}}

Resolve the approval:

POST /v1/agent/approvals/{approval_id}
Content-Type: application/json
{"decision":"deny","remember_for_session":false,"message":"Do not run code for this request."}

decision is "approve" or "deny". Set remember_for_session: true on an approve response to allow later agent actions in the same session_id without another approval event. A deny response may include message; that text is returned to the model as the tool result.

Unanswered approvals are denied after five minutes.

The endpoint returns {"status":"resolved"} when the waiting tool call was released, {"status":"queued"} if the app answered before the runtime started waiting, and {"status":"not_found"} for an unknown or expired approval ID.

For app-facing tool timelines, generated media fields, and sessions, see agentic runtime for apps.

Text completion (non-chat). Schema is OpenAI-compatible. Supported mistralrs extensions: top_k, min_p, repetition_penalty, dry_multiplier, dry_base, dry_allowed_length, dry_sequence_breakers, grammar, truncate_sequence. The chat-only fields (session_id, enable_code_execution, agent_permission, files, web_search_options, enable_thinking, reasoning_effort, max_tool_rounds) have no effect on this endpoint.

Embedding request. input, encoding_format ("float" or "base64") supported. dimensions returns an error. Extension: truncate_sequence.

Image generation. Uses height and width in place of OpenAI’s size. response_format defaults to "Url". See the image generation guide.

Text to speech. model and input supported. response_format accepts only wav and pcm; other OpenAI values return a validation error. voice, speed, instructions are ignored.

Lists loaded models.

Response:

{
"object": "list",
"data": [
{
"id": "default",
"object": "model",
"created": 1234567890,
"owned_by": "local",
"status": "loaded",
"tools_available": true,
"mcp_tools_count": 5,
"mcp_servers_connected": 1
}
]
}

Status values: loaded, unloaded, reloading.

OpenAI Responses API. Schema matches OpenAI’s spec. See the Responses guide for supported and unsupported fields.

Function tools in Responses requests also accept strict: true and use the same strict tool-calling path as Chat Completions.

Retrieve a response by id.

Delete a response.

Cancel a background response.

Unload a model, freeing its memory.

{ "model_id": "qwen" }

Response: { "model_id": "qwen", "status": "unloaded" }.

Reload a previously unloaded model.

{ "model_id": "qwen" }

Response: { "model_id": "qwen", "status": "loaded" }.

Query a model’s current status.

{ "model_id": "qwen" }

Launch a tune run.

Export an agentic session. Response is a SerializedSession object with messages, tool-call history, images, and videos. Returns 404 if the session does not exist.

Import a session. Body is a SerializedSession produced by a previous GET. Replaces any existing session with the same id.

Delete a session. Always returns 200 whether the session existed or not.

Returns 200 when the server is up. Does not verify model load status.

Returns system information (OS, memory, GPUs, mistralrs version).

Returns a diagnostic report equivalent to mistralrs doctor output.

Re-apply ISQ to the loaded model.

Streaming responses are Server-Sent Events. Default (unnamed) data: lines carry chat completion chunks in OpenAI format; the stream ends with data: [DONE]. Named events are used for the agentic timeline:

EventBody
(default data:)Chat completion chunk in OpenAI format. Stream terminator is data: [DONE].
agentic_tool_call_progressTool-loop progress. Includes round, tool_name, phase (calling or complete), and structured data.
file_producedA File object emitted during the run. Each file is sent once.

Tool-progress data.tool_type is code_execution, web_search, or custom. Code execution events can include images_base64 and video_frames_base64.

mistral.rs returns typed file outputs from agentic runs as first-class objects, separate from the model transcript.

FieldTypeNotes
files[].namestringFilename. Required.
files[].formatstringFormat hint (png, csv, json, …). Inferred from the extension if omitted.
files[].descriptionstringOptional hint surfaced to the model.

Example:

{
"model": "default",
"messages": [
{"role": "user", "content": "Plot sin(x) and save as plot.png."}
],
"enable_code_execution": true,
"files": [{"name": "plot.png", "format": "png"}]
}

The non-streaming response gains a top-level files array of File objects:

FieldTypeNotes
idstringStable id, format file_<run>_r<round>_<idx>.
namestringFilename as written.
formatstringOpen-ended format string.
mime_typestringContent-Type.
bytesintegerBody size.
created_atintegerUnix epoch seconds.
sourceobject{"tool", "round", "turn"} attribution.
textstringFull text body for text files. Absent if elided.
previewstringShort UTF-8 preview for text files.
data_base64stringBase64 body for binary files. Absent if elided.
code, messagestringsPresent if the file failed to materialize.

Each entry in agentic_tool_calls carries a file_ids array listing the files attributable to that round.

Example response:

{
"files": [
{
"id": "file_abc_r0_0",
"name": "plot.png",
"format": "png",
"mime_type": "image/png",
"bytes": 14823,
"source": {"tool": "mistralrs_execute_python", "round": 0, "turn": 0},
"data_base64": "iVBORw0KGgo..."
}
],
"agentic_tool_calls": [
{
"round": 0,
"name": "mistralrs_execute_python",
"file_ids": ["file_abc_r0_0"]
}
]
}

Streaming requests emit each file as soon as it is produced. The body is the same File JSON as the non-streaming files[] entry.

event: file_produced
data: {"id":"file_abc_r0_0","name":"plot.png","format":"png","mime_type":"image/png","bytes":14823,"source":{"tool":"mistralrs_execute_python","round":0,"turn":0},"data_base64":"iVBORw0KGgo..."}

Bodies up to 8 MB ship inline (text or data_base64). Above the cap, the body field is omitted and the client fetches the raw bytes via GET /v1/files/{id}/content. Inside the model’s context, text files only ever see the first 1024 bytes as a preview; the model uses read_file to inspect more.

OpenAI-compatible Files endpoints. Upload (POST /v1/files) is not implemented; files arrive via agentic tool calls.

MethodPathReturns
GET/v1/files{object: "list", data: [<File metadata>]}
GET/v1/files/{id}File metadata JSON
GET/v1/files/{id}/contentRaw bytes (Content-Type, Content-Length, Content-Disposition)
DELETE/v1/files/{id}{id, object: "file", deleted: bool}

File metadata shape:

{
"id": "file_abc_r0_0",
"object": "file",
"bytes": 14823,
"created_at": 1735632000,
"filename": "plot.png",
"purpose": "agent_output",
"format": "png",
"mime_type": "image/png",
"source": {"tool": "mistralrs_execute_python", "round": 0, "turn": 0}
}

/v1/files/{id}/content response codes:

CodeMeaning
200Body returned.
404Unknown or expired file id.
410File body was elided.
422The file is an error placeholder.

See the OpenAI compatibility reference for the supported and unsupported fields.