Skip to content

Anthropic Messages API

mistral.rs exposes Anthropic-compatible Messages endpoints at POST /v1/messages and POST /v1/messages/count_tokens. They run through the same local model, scheduler, chat templates, multimodal handling, tool calling, and agentic runtime as /v1/chat/completions. Anthropic clients use http://localhost:1234 as the base URL (no /v1 suffix; the client appends /v1/messages itself).

Start the server as for the OpenAI-compatible API; both APIs are always served. For Claude Code configuration, see Use Codex and Claude Code.

Terminal window
curl http://localhost:1234/v1/messages \
-H 'content-type: application/json' \
-H 'x-api-key: not-used' \
-H 'anthropic-version: 2023-06-01' \
-d '{
"model": "default",
"max_tokens": 128,
"system": "You are concise.",
"messages": [
{"role": "user", "content": "Write a haiku about local inference."}
]
}'

Response shape:

{
"id": "0",
"type": "message",
"role": "assistant",
"content": [
{"type": "text", "text": "..."}
],
"model": "Qwen/Qwen3-4B",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 18,
"output_tokens": 31
}
}

id is the server’s per-request sequence number and model echoes the loaded model’s name, not the "default" alias sent in the request.

The server accepts Anthropic headers for client compatibility, but does not validate x-api-key. Put authentication in a reverse proxy when exposing the server to users.

Set stream: true to receive Anthropic-style Server-Sent Events:

Terminal window
curl http://localhost:1234/v1/messages \
-H 'content-type: application/json' \
-H 'x-api-key: not-used' \
-H 'anthropic-version: 2023-06-01' \
-d '{
"model": "default",
"max_tokens": 128,
"stream": true,
"messages": [{"role": "user", "content": "Count to three."}]
}'

The stream uses message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop events. It also emits Anthropic ping events while idle. Text deltas use {"type":"text_delta","text":"..."}. Thinking deltas use {"type":"thinking_delta","thinking":"..."} when the model produces separate reasoning content. Tool-call argument deltas use {"type":"input_json_delta","partial_json":"..."}.

Use POST /v1/messages/count_tokens with the same request shape to count the input tokens after chat-template formatting:

Terminal window
curl http://localhost:1234/v1/messages/count_tokens \
-H 'content-type: application/json' \
-H 'x-api-key: not-used' \
-H 'anthropic-version: 2023-06-01' \
-d '{
"model": "default",
"messages": [{"role": "user", "content": "Count these tokens."}]
}'

Response:

{"input_tokens": 14}

Request fields:

Anthropic fieldSupport
modelSupported. Use default or a loaded model id.
messagesSupports user and assistant messages with string content or content blocks.
systemSupported as a top-level string or text-block array.
max_tokensSupported.
temperature, top_p, top_k, min_pSupported.
stop_sequencesSupported.
streamSupported.
toolsClient tools are converted to OpenAI-compatible function tools. Anthropic server tools for web_search_* and code_execution_* map to mistral.rs agentic features.
tool_choiceauto, none, and specific client tool choices are supported. any is accepted as auto. Anthropic server-tool choices are accepted as auto.
container.skillsSupports uploaded custom Skills with {"type":"custom","skill_id":"...","version":"latest"}. Anthropic-managed built-in Skills such as pptx, xlsx, docx, and pdf are not bundled.
thinking{"type":"enabled"} maps to mistral.rs thinking mode (reasoning content emitted by the model) when the loaded chat template supports it.
enable_thinking, reasoning_effortSupported as mistral.rs extensions.
logit_bias, logprobs, top_logprobsSupported as mistral.rs extensions.
presence_penalty, frequency_penalty, repetition_penaltySupported as mistral.rs extensions.
response_format, grammarSupported as mistral.rs extensions. Do not set both in one request.
dry_multiplier, dry_base, dry_allowed_length, dry_sequence_breakersSupported as mistral.rs extensions.
metadataAccepted for client compatibility.

Content blocks:

BlockSupport
textSupported.
imageSupports base64 and URL sources. Requires a multimodal model.
tool_useSupported on assistant messages.
tool_resultSupported on user messages. Text results are forwarded as tool messages.
thinking, redacted_thinkingAccepted in request history. Returned when the model exposes separate reasoning content.

mistral.rs agentic extensions accepted on this endpoint: session_id, web_search_options, enable_code_execution, agent_permission, code_execution_permission, files, max_tool_rounds, and truncate_sequence.

Anthropic tool definitions:

{
"tools": [
{
"name": "get_weather",
"description": "Get weather for a city.",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
]
}

When the model asks for a tool, the response contains a tool_use content block:

{
"type": "tool_use",
"id": "call-...",
"name": "get_weather",
"input": {"city": "Paris"}
}

Return the result in a later user message with a tool_result block:

{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "call-...",
"content": "Light rain, 12 C."
}
]
}

For server-executed tools, use the same mistral.rs agent fields as Chat Completions. Streaming may include mistral.rs named events such as agentic_tool_call_progress, agentic_tool_approval_required, and file_produced.

Anthropic web search server-tool declarations enable mistral.rs web search for the request. The server must be started with search enabled, for example with mistralrs serve --agent ... or mistralrs serve --enable-search .... Two declaration types are accepted:

  • web_search_20250305 enables web search only.
  • web_search_20260209 (Anthropic’s dynamic web-search variant) also enables code execution for the request, because it uses code-backed filtering. The server must additionally be started with code execution enabled.
{
"tools": [
{
"type": "web_search_20250305",
"name": "web_search",
"user_location": {
"type": "approximate",
"city": "New York",
"country": "US",
"region": "NY",
"timezone": "America/New_York"
}
}
]
}

Anthropic code-execution server-tool declarations enable the built-in Python tool for the request. The server must be started with code execution enabled, for example with mistralrs serve --agent ... or mistralrs serve --enable-code-execution ....

{
"tools": [
{"type": "code_execution_20250825", "name": "code_execution"}
],
"agent_permission": "auto"
}

You can combine both server tools. The Anthropic request field tool_choice: {"type":"none"} disables both client and server tools for that request.

Anthropic-compatible Skills use the same uploaded skill store as OpenAI-compatible Skills. Upload a zip or multipart skill directory to POST /v1/skills, then reference the returned custom skill id from container.skills on /v1/messages.

Start the server with shell execution enabled. --agent is recommended because it also enables the rest of the local agent runtime:

Terminal window
mistralrs serve --agent -p 1234 -m Qwen/Qwen3-4B

List and upload endpoints accept Anthropic headers. When an Anthropic header is present, GET /v1/skills returns Anthropic-style fields such as display_title, latest_version, source, has_more, and next_page. source=custom returns uploaded local Skills; source=anthropic returns an empty list because mistral.rs does not bundle Anthropic-managed built-in Skills.

Use the uploaded Skill in a Messages request:

Terminal window
curl http://localhost:1234/v1/messages \
-H 'content-type: application/json' \
-H 'x-api-key: not-used' \
-H 'anthropic-version: 2023-06-01' \
-H 'anthropic-beta: code-execution-2025-08-25,skills-2025-10-02' \
-d '{
"model": "default",
"max_tokens": 1024,
"agent_permission": "auto",
"max_tool_rounds": 6,
"container": {
"skills": [
{"type": "custom", "skill_id": "skill_...", "version": "latest"}
]
},
"tools": [
{"type": "code_execution_20250825", "name": "code_execution"}
],
"messages": [
{"role": "user", "content": "Use the uploaded skill to complete the task."}
]
}'

Skills are mounted into the shell runtime. The Anthropic code execution tool declaration is accepted for client compatibility and enables Python code execution too, but the skill bundle itself is read from the shell workdir under skills/<skill-name>/.

Generated artifacts are available the same way as other mistral.rs agentic files. Non-streaming responses include a top-level files array, and streaming responses emit file_produced events. Download bytes from GET /v1/files/{file_id}/content.

ExampleWhat it shows
anthropic_chatPlain non-streaming Messages request.
anthropic_streamingAnthropic SSE parsing.
anthropic_tool_callingClient-side tool use with tool_use and tool_result.
anthropic_agenticAnthropic server-tool declarations mapped to mistral.rs web search and code execution.
anthropic_skillsUpload a custom Skill and use it from container.skills.