HTTP API reference
mistral.rs implements the OpenAI Chat Completions API, the Responses API, and a few mistral.rs-specific endpoints. This page lists every path with its request and response shape.
Fields not documented here are either standard OpenAI fields (pass through unchanged) or ignored. mistral.rs-specific extensions are called out explicitly.
Core endpoints
Section titled “Core endpoints”POST /v1/chat/completions
Section titled “POST /v1/chat/completions”Chat completion request.
{ "model": "default", "messages": [ ... ], "max_tokens": 256, "temperature": 0.7, "stream": false, "tools": [ ... ], "tool_choice": "auto", "session_id": "optional-string", "web_search_options": { ... }, "enable_code_execution": false, "agent_permission": "auto", "max_tool_rounds": 4}tools accepts OpenAI-compatible function tool definitions. mistral.rs also honors tools[*].function.strict: true, which constrains generated tool arguments to the tool’s parameters JSON Schema. See strict tool calling.
Response (non-streaming):
{ "id": "chatcmpl-...", "object": "chat.completion", "created": 1234567890, "model": "default", "choices": [ ... ], "usage": { ... }, "session_id": "...", "agentic_tool_calls": [ ... ]}mistral.rs-specific request fields include session_id, web_search_options, enable_code_execution, agent_permission, max_tool_rounds, and files. The server must be started with the corresponding capabilities, such as --enable-search or --enable-code-execution.
agent_permission accepts "auto", "ask", or "deny" and applies to server-executed agent actions: code execution, web search, file tools, registered callbacks, and external tool dispatch. code_execution_permission is accepted as a compatibility alias. See agent permissions for the shared behavior across CLI, HTTP, Python, and Rust.
Over HTTP, "ask" requires stream: true. The stream emits a named agentic_tool_approval_required event when an action needs approval, then waits for the app to approve or deny it with POST /v1/agent/approvals/{approval_id}. Non-streaming chat requests with "ask" return a validation error.
mistral.rs-specific response fields: session_id (string), agentic_tool_calls (array of tool-call records from the agentic loop, each with a file_ids array), files (array of File objects produced during the request).
When stream: true, the response is Server-Sent Events: unnamed data: lines carry chat completion chunks, named agentic_tool_call_progress events carry tool-loop milestones, named agentic_tool_approval_required events carry pending agent approvals, and named file_produced events carry each typed file emitted during the run. Stream terminates with data: [DONE].
Approval event:
event: agentic_tool_approval_requireddata: {"approval_id":"appr_abc123","session_id":"...","round":1,"tool":{"source":"built_in","kind":"code_execution","label":"Python code"},"arguments":{"code":"...","outputs":[]}}Resolve the approval:
POST /v1/agent/approvals/{approval_id}Content-Type: application/json
{"decision":"deny","remember_for_session":false,"message":"Do not run code for this request."}decision is "approve" or "deny". Set remember_for_session: true on an approve response to allow later agent actions in the same session_id without another approval event. A deny response may include message; that text is returned to the model as the tool result.
Unanswered approvals are denied after five minutes.
The endpoint returns {"status":"resolved"} when the waiting tool call was released, {"status":"queued"} if the app answered before the runtime started waiting, and {"status":"not_found"} for an unknown or expired approval ID.
For app-facing tool timelines, generated media fields, and sessions, see agentic runtime for apps.
POST /v1/completions
Section titled “POST /v1/completions”Text completion (non-chat). Schema is OpenAI-compatible. Supported mistralrs extensions: top_k, min_p, repetition_penalty, dry_multiplier, dry_base, dry_allowed_length, dry_sequence_breakers, grammar, truncate_sequence. The chat-only fields (session_id, enable_code_execution, agent_permission, files, web_search_options, enable_thinking, reasoning_effort, max_tool_rounds) have no effect on this endpoint.
POST /v1/embeddings
Section titled “POST /v1/embeddings”Embedding request. input, encoding_format ("float" or "base64") supported. dimensions returns an error. Extension: truncate_sequence.
POST /v1/images/generations
Section titled “POST /v1/images/generations”Image generation. Uses height and width in place of OpenAI’s size. response_format defaults to "Url". See the image generation guide.
POST /v1/audio/speech
Section titled “POST /v1/audio/speech”Text to speech. model and input supported. response_format accepts only wav and pcm; other OpenAI values return a validation error. voice, speed, instructions are ignored.
GET /v1/models
Section titled “GET /v1/models”Lists loaded models.
Response:
{ "object": "list", "data": [ { "id": "default", "object": "model", "created": 1234567890, "owned_by": "local", "status": "loaded", "tools_available": true, "mcp_tools_count": 5, "mcp_servers_connected": 1 } ]}Status values: loaded, unloaded, reloading.
Responses API
Section titled “Responses API”POST /v1/responses
Section titled “POST /v1/responses”OpenAI Responses API. Schema matches OpenAI’s spec. See the Responses guide for supported and unsupported fields.
Function tools in Responses requests also accept strict: true and use the same strict tool-calling path as Chat Completions.
GET /v1/responses/{id}
Section titled “GET /v1/responses/{id}”Retrieve a response by id.
DELETE /v1/responses/{id}
Section titled “DELETE /v1/responses/{id}”Delete a response.
POST /v1/responses/{id}/cancel
Section titled “POST /v1/responses/{id}/cancel”Cancel a background response.
Model management
Section titled “Model management”POST /v1/models/unload
Section titled “POST /v1/models/unload”Unload a model, freeing its memory.
{ "model_id": "qwen" }Response: { "model_id": "qwen", "status": "unloaded" }.
POST /v1/models/reload
Section titled “POST /v1/models/reload”Reload a previously unloaded model.
{ "model_id": "qwen" }Response: { "model_id": "qwen", "status": "loaded" }.
POST /v1/models/status
Section titled “POST /v1/models/status”Query a model’s current status.
{ "model_id": "qwen" }POST /v1/models/tune
Section titled “POST /v1/models/tune”Launch a tune run.
Session management
Section titled “Session management”GET /v1/sessions/{session_id}
Section titled “GET /v1/sessions/{session_id}”Export an agentic session. Response is a SerializedSession object with messages, tool-call history, images, and videos. Returns 404 if the session does not exist.
PUT /v1/sessions/{session_id}
Section titled “PUT /v1/sessions/{session_id}”Import a session. Body is a SerializedSession produced by a previous GET. Replaces any existing session with the same id.
DELETE /v1/sessions/{session_id}
Section titled “DELETE /v1/sessions/{session_id}”Delete a session. Always returns 200 whether the session existed or not.
System
Section titled “System”GET /health
Section titled “GET /health”Returns 200 when the server is up. Does not verify model load status.
GET /v1/system/info
Section titled “GET /v1/system/info”Returns system information (OS, memory, GPUs, mistralrs version).
POST /v1/system/doctor
Section titled “POST /v1/system/doctor”Returns a diagnostic report equivalent to mistralrs doctor output.
POST /re_isq
Section titled “POST /re_isq”Re-apply ISQ to the loaded model.
Streaming event types
Section titled “Streaming event types”Streaming responses are Server-Sent Events. Default (unnamed) data: lines carry chat completion chunks in OpenAI format; the stream ends with data: [DONE]. Named events are used for the agentic timeline:
| Event | Body |
|---|---|
(default data:) | Chat completion chunk in OpenAI format. Stream terminator is data: [DONE]. |
agentic_tool_call_progress | Tool-loop progress. Includes round, tool_name, phase (calling or complete), and structured data. |
file_produced | A File object emitted during the run. Each file is sent once. |
Tool-progress data.tool_type is code_execution, web_search, or custom. Code execution events can include images_base64 and video_frames_base64.
mistral.rs returns typed file outputs from agentic runs as first-class objects, separate from the model transcript.
Request schema
Section titled “Request schema”| Field | Type | Notes |
|---|---|---|
files[].name | string | Filename. Required. |
files[].format | string | Format hint (png, csv, json, …). Inferred from the extension if omitted. |
files[].description | string | Optional hint surfaced to the model. |
Example:
{ "model": "default", "messages": [ {"role": "user", "content": "Plot sin(x) and save as plot.png."} ], "enable_code_execution": true, "files": [{"name": "plot.png", "format": "png"}]}Response schema
Section titled “Response schema”The non-streaming response gains a top-level files array of File objects:
| Field | Type | Notes |
|---|---|---|
id | string | Stable id, format file_<run>_r<round>_<idx>. |
name | string | Filename as written. |
format | string | Open-ended format string. |
mime_type | string | Content-Type. |
bytes | integer | Body size. |
created_at | integer | Unix epoch seconds. |
source | object | {"tool", "round", "turn"} attribution. |
text | string | Full text body for text files. Absent if elided. |
preview | string | Short UTF-8 preview for text files. |
data_base64 | string | Base64 body for binary files. Absent if elided. |
code, message | strings | Present if the file failed to materialize. |
Each entry in agentic_tool_calls carries a file_ids array listing the files attributable to that round.
Example response:
{ "files": [ { "id": "file_abc_r0_0", "name": "plot.png", "format": "png", "mime_type": "image/png", "bytes": 14823, "source": {"tool": "mistralrs_execute_python", "round": 0, "turn": 0}, "data_base64": "iVBORw0KGgo..." } ], "agentic_tool_calls": [ { "round": 0, "name": "mistralrs_execute_python", "file_ids": ["file_abc_r0_0"] } ]}file_produced event
Section titled “file_produced event”Streaming requests emit each file as soon as it is produced. The body is the same File JSON as the non-streaming files[] entry.
event: file_produceddata: {"id":"file_abc_r0_0","name":"plot.png","format":"png","mime_type":"image/png","bytes":14823,"source":{"tool":"mistralrs_execute_python","round":0,"turn":0},"data_base64":"iVBORw0KGgo..."}Size policy
Section titled “Size policy”Bodies up to 8 MB ship inline (text or data_base64). Above the cap, the body field is omitted and the client fetches the raw bytes via GET /v1/files/{id}/content. Inside the model’s context, text files only ever see the first 1024 bytes as a preview; the model uses read_file to inspect more.
Files API
Section titled “Files API”OpenAI-compatible Files endpoints. Upload (POST /v1/files) is not implemented; files arrive via agentic tool calls.
| Method | Path | Returns |
|---|---|---|
GET | /v1/files | {object: "list", data: [<File metadata>]} |
GET | /v1/files/{id} | File metadata JSON |
GET | /v1/files/{id}/content | Raw bytes (Content-Type, Content-Length, Content-Disposition) |
DELETE | /v1/files/{id} | {id, object: "file", deleted: bool} |
File metadata shape:
{ "id": "file_abc_r0_0", "object": "file", "bytes": 14823, "created_at": 1735632000, "filename": "plot.png", "purpose": "agent_output", "format": "png", "mime_type": "image/png", "source": {"tool": "mistralrs_execute_python", "round": 0, "turn": 0}}/v1/files/{id}/content response codes:
| Code | Meaning |
|---|---|
| 200 | Body returned. |
| 404 | Unknown or expired file id. |
| 410 | File body was elided. |
| 422 | The file is an error placeholder. |
OpenAI compatibility
Section titled “OpenAI compatibility”See the OpenAI compatibility reference for the supported and unsupported fields.