Anthropic Messages API
mistral.rs exposes Anthropic-compatible Messages endpoints at POST /v1/messages
and POST /v1/messages/count_tokens. They run through the same local model,
scheduler, chat templates, multimodal handling, tool calling, and agentic runtime
as /v1/chat/completions. Anthropic clients use http://localhost:1234 as the
base URL (no /v1 suffix; the client appends /v1/messages itself).
Start the server as for the OpenAI-compatible API; both APIs are always served. For Claude Code configuration, see Use Codex and Claude Code.
Basic request
Section titled “Basic request”curl http://localhost:1234/v1/messages \ -H 'content-type: application/json' \ -H 'x-api-key: not-used' \ -H 'anthropic-version: 2023-06-01' \ -d '{ "model": "default", "max_tokens": 128, "system": "You are concise.", "messages": [ {"role": "user", "content": "Write a haiku about local inference."} ] }'Response shape:
{ "id": "0", "type": "message", "role": "assistant", "content": [ {"type": "text", "text": "..."} ], "model": "Qwen/Qwen3-4B", "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 18, "output_tokens": 31 }}id is the server’s per-request sequence number and model echoes the loaded model’s name, not the "default" alias sent in the request.
The server accepts Anthropic headers for client compatibility, but does not validate
x-api-key. Put authentication in a reverse proxy when exposing the server to users.
Streaming
Section titled “Streaming”Set stream: true to receive Anthropic-style Server-Sent Events:
curl http://localhost:1234/v1/messages \ -H 'content-type: application/json' \ -H 'x-api-key: not-used' \ -H 'anthropic-version: 2023-06-01' \ -d '{ "model": "default", "max_tokens": 128, "stream": true, "messages": [{"role": "user", "content": "Count to three."}] }'The stream uses message_start, content_block_start, content_block_delta,
content_block_stop, message_delta, and message_stop events. It also emits
Anthropic ping events while idle. Text deltas use
{"type":"text_delta","text":"..."}. Thinking deltas use
{"type":"thinking_delta","thinking":"..."} when the model produces separate
reasoning content. Tool-call argument deltas use
{"type":"input_json_delta","partial_json":"..."}.
Count tokens
Section titled “Count tokens”Use POST /v1/messages/count_tokens with the same request shape to count the
input tokens after chat-template formatting:
curl http://localhost:1234/v1/messages/count_tokens \ -H 'content-type: application/json' \ -H 'x-api-key: not-used' \ -H 'anthropic-version: 2023-06-01' \ -d '{ "model": "default", "messages": [{"role": "user", "content": "Count these tokens."}] }'Response:
{"input_tokens": 14}Supported fields
Section titled “Supported fields”Request fields:
| Anthropic field | Support |
|---|---|
model | Supported. Use default or a loaded model id. |
messages | Supports user and assistant messages with string content or content blocks. |
system | Supported as a top-level string or text-block array. |
max_tokens | Supported. |
temperature, top_p, top_k, min_p | Supported. |
stop_sequences | Supported. |
stream | Supported. |
tools | Client tools are converted to OpenAI-compatible function tools. Anthropic server tools for web_search_* and code_execution_* map to mistral.rs agentic features. |
tool_choice | auto, none, and specific client tool choices are supported. any is accepted as auto. Anthropic server-tool choices are accepted as auto. |
container.skills | Supports uploaded custom Skills with {"type":"custom","skill_id":"...","version":"latest"}. Anthropic-managed built-in Skills such as pptx, xlsx, docx, and pdf are not bundled. |
thinking | {"type":"enabled"} maps to mistral.rs thinking mode (reasoning content emitted by the model) when the loaded chat template supports it. |
enable_thinking, reasoning_effort | Supported as mistral.rs extensions. |
logit_bias, logprobs, top_logprobs | Supported as mistral.rs extensions. |
presence_penalty, frequency_penalty, repetition_penalty | Supported as mistral.rs extensions. |
response_format, grammar | Supported as mistral.rs extensions. Do not set both in one request. |
dry_multiplier, dry_base, dry_allowed_length, dry_sequence_breakers | Supported as mistral.rs extensions. |
metadata | Accepted for client compatibility. |
Content blocks:
| Block | Support |
|---|---|
text | Supported. |
image | Supports base64 and URL sources. Requires a multimodal model. |
tool_use | Supported on assistant messages. |
tool_result | Supported on user messages. Text results are forwarded as tool messages. |
thinking, redacted_thinking | Accepted in request history. Returned when the model exposes separate reasoning content. |
mistral.rs agentic extensions accepted on this endpoint: session_id,
web_search_options, enable_code_execution, agent_permission,
code_execution_permission, files, max_tool_rounds, and truncate_sequence.
Tool use
Section titled “Tool use”Anthropic tool definitions:
{ "tools": [ { "name": "get_weather", "description": "Get weather for a city.", "input_schema": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"] } } ]}When the model asks for a tool, the response contains a tool_use content block:
{ "type": "tool_use", "id": "call-...", "name": "get_weather", "input": {"city": "Paris"}}Return the result in a later user message with a tool_result block:
{ "role": "user", "content": [ { "type": "tool_result", "tool_use_id": "call-...", "content": "Light rain, 12 C." } ]}For server-executed tools, use the same mistral.rs agent fields as Chat Completions.
Streaming may include mistral.rs named events such as agentic_tool_call_progress,
agentic_tool_approval_required, and file_produced.
Agentic server tools
Section titled “Agentic server tools”Anthropic web search server-tool declarations enable mistral.rs web search for
the request. The server must be started with search enabled, for example with
mistralrs serve --agent ... or mistralrs serve --enable-search .... Two
declaration types are accepted:
web_search_20250305enables web search only.web_search_20260209(Anthropic’s dynamic web-search variant) also enables code execution for the request, because it uses code-backed filtering. The server must additionally be started with code execution enabled.
{ "tools": [ { "type": "web_search_20250305", "name": "web_search", "user_location": { "type": "approximate", "city": "New York", "country": "US", "region": "NY", "timezone": "America/New_York" } } ]}Anthropic code-execution server-tool declarations enable the built-in Python
tool for the request. The server must be started with code execution enabled,
for example with mistralrs serve --agent ... or
mistralrs serve --enable-code-execution ....
{ "tools": [ {"type": "code_execution_20250825", "name": "code_execution"} ], "agent_permission": "auto"}You can combine both server tools. The Anthropic request field
tool_choice: {"type":"none"} disables both client and server tools for that
request.
Skills
Section titled “Skills”Anthropic-compatible Skills use the same uploaded skill store as
OpenAI-compatible Skills. Upload a zip or
multipart skill directory to POST /v1/skills, then reference the returned
custom skill id from container.skills on /v1/messages.
Start the server with shell execution enabled. --agent is recommended because
it also enables the rest of the local agent runtime:
mistralrs serve --agent -p 1234 -m Qwen/Qwen3-4BList and upload endpoints accept Anthropic headers. When an Anthropic header is
present, GET /v1/skills returns Anthropic-style fields such as
display_title, latest_version, source, has_more, and next_page.
source=custom returns uploaded local Skills; source=anthropic returns an
empty list because mistral.rs does not bundle Anthropic-managed built-in Skills.
Use the uploaded Skill in a Messages request:
curl http://localhost:1234/v1/messages \ -H 'content-type: application/json' \ -H 'x-api-key: not-used' \ -H 'anthropic-version: 2023-06-01' \ -H 'anthropic-beta: code-execution-2025-08-25,skills-2025-10-02' \ -d '{ "model": "default", "max_tokens": 1024, "agent_permission": "auto", "max_tool_rounds": 6, "container": { "skills": [ {"type": "custom", "skill_id": "skill_...", "version": "latest"} ] }, "tools": [ {"type": "code_execution_20250825", "name": "code_execution"} ], "messages": [ {"role": "user", "content": "Use the uploaded skill to complete the task."} ] }'Skills are mounted into the shell runtime. The Anthropic code execution tool
declaration is accepted for client compatibility and enables Python code
execution too, but the skill bundle itself is read from the shell workdir under
skills/<skill-name>/.
Generated artifacts are available the same way as other mistral.rs agentic
files. Non-streaming responses include a top-level files array, and streaming
responses emit file_produced events. Download bytes from
GET /v1/files/{file_id}/content.
Examples
Section titled “Examples”| Example | What it shows |
|---|---|
| anthropic_chat | Plain non-streaming Messages request. |
| anthropic_streaming | Anthropic SSE parsing. |
| anthropic_tool_calling | Client-side tool use with tool_use and tool_result. |
| anthropic_agentic | Anthropic server-tool declarations mapped to mistral.rs web search and code execution. |
| anthropic_skills | Upload a custom Skill and use it from container.skills. |