Skip to content

OpenAI-compatible APIs

mistral.rs exposes OpenAI-compatible endpoints under /v1. Use http://localhost:1234/v1 as the base URL for OpenAI SDKs and compatible clients.

The same server also exposes the Anthropic Messages API at http://localhost:1234.

Terminal window
mistralrs serve -m Qwen/Qwen3-4B

Use model: "default" for a single-model server. In multi-model serving, use the configured model id exactly as it appears in GET /v1/models.

EndpointPurpose
GET /v1/modelsList loaded models.
POST /v1/chat/completionsOpenAI-compatible chat, streaming, tool calling, multimodal inputs, and mistral.rs agentic extensions.
POST /v1/responsesOpenAI Responses API for response objects, polling, background runs, and cancellation.
POST /v1/completionsLegacy text completions.
POST /v1/embeddingsEmbedding generation.
POST /v1/images/generationsImage generation.
POST /v1/audio/speechText to speech.
GET /v1/filesList files produced by agentic runs.

For every path, request schema, and response schema, see the HTTP API reference. For field-level compatibility notes, see OpenAI compatibility.

Terminal window
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer not-used" \
-d '{
"model": "default",
"messages": [
{"role": "user", "content": "Write a haiku about local inference."}
],
"max_tokens": 128
}'

The Authorization header is accepted for client compatibility but is not validated. Put authentication in a reverse proxy when exposing the server to users.

from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-used")
response = client.chat.completions.create(
model="default",
messages=[{"role": "user", "content": "Say hello from mistral.rs."}],
)
print(response.choices[0].message.content)

Use /v1/responses when the client expects OpenAI’s Responses shape or needs response ids, polling, background processing, or cancellation.

Terminal window
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer not-used" \
-d '{
"model": "default",
"input": "Summarize the benefits of local inference.",
"max_output_tokens": 128
}'

See OpenAI Responses API for supported fields and endpoint-specific behavior.

OpenAI-compatible function tools are supported on Chat Completions and Responses. mistral.rs also supports strict: true inside function definitions for JSON-Schema-constrained tool arguments.

When the server is started with agentic capabilities, OpenAI-compatible requests can also use mistral.rs extensions such as session_id, web_search_options, enable_code_execution, agent_permission, files, and max_tool_rounds.

Terminal window
mistralrs serve --agent -m Qwen/Qwen3-4B

For app-facing tool timelines, generated files, search, code execution, and session state, see agentic runtime for apps.

Server examples live in examples/server/:

FileWhat it shows
chat.pyBasic Chat Completions request.
streaming.pyChat Completions streaming.
tool_calling.pyOpenAI-compatible function tools.
responses.pyResponses API request.
responses_vision.pyResponses API with image input.
web_search.pySearch through OpenAI-compatible request fields.
codex_config.tomlCodex provider config for /v1/responses.

For Codex setup, see Use Codex and Claude Code.