OpenAI compatibility
mistral.rs targets field-level OpenAI API compatibility. Most OpenAI client libraries work against mistral.rs unchanged. This page lists the exceptions. For setup and examples, see OpenAI-compatible APIs.
Chat Completions fields
Section titled “Chat Completions fields”Implemented
Section titled “Implemented”modelmessages(including multimodal content parts)max_tokensmax_completion_tokens(OpenAI’s newer alias formax_tokens)temperaturetop_pstreamstoptools,tool_choiceresponse_format(text,json_schema)logit_biaslogprobs,top_logprobspresence_penalty,frequency_penaltyn(multiple completions)
Implemented with deviation
Section titled “Implemented with deviation”tool_choice:"auto","none", and specific function objects work."required"is unsupported; use a specific function object to force tool use.tools[*].function.strict: accepted on function tools. Whentrue, mistral.rs constrains generated tool arguments to the tool’sparametersJSON Schema. See strict tool calling.response_formatwithjson_schema: uses llguidance for constrained decoding. Output shape may differ from OpenAI’s on ambiguous schemas.json_objectis not accepted.
Silently ignored
Section titled “Silently ignored”seed, user, stream_options, metadata, service_tier, parallel_tool_calls, store. The request body accepts these fields (unknown fields are not rejected) but no behavior is wired to them. Use mistral.rs session_id for persistence.
mistralrs extensions
Section titled “mistralrs extensions”Accepted alongside OpenAI fields. OpenAI ignores them:
top_k: hard candidate cap.min_p: min-p sampling threshold.repetition_penalty: simpler alternative to frequency/presence.dry_multiplier,dry_base,dry_allowed_length,dry_sequence_breakers: DRY sampling parameters.grammar: llguidance constraints beyond JSON schemas.enable_thinking: tri-state for supporting models.trueforces thinking on,falseforces it off, omitting the field (or sendingnull) uses the chat template’s default (currently thinking on). Note that the Python SDK’sChatCompletionRequestconstructor defaults this toFalserather thanNone.web_search_options: search tool configuration (de facto OpenAI field, not yet universal).session_id: multi-turn session persistence.truncate_sequence: truncate long prompts at the model’s context limit instead of erroring.
Responses API fields
Section titled “Responses API fields”See the Responses guide. Notable exceptions:
parallel_tool_callsmust betrueor omitted.falsereturns an error.max_tool_callsreturns an error for any value.- Function tools support
strict: truewith the same JSON-Schema-constrained argument generation as Chat Completions.
Completions (legacy)
Section titled “Completions (legacy)”/v1/completions (non-chat) is supported with a subset of Chat Completions extensions: top_k, min_p, repetition_penalty, dry_multiplier, dry_base, dry_allowed_length, dry_sequence_breakers, grammar, truncate_sequence. The agentic, session, file, web-search, thinking, and reasoning-effort fields are not part of this endpoint’s schema and have no effect.
Embeddings
Section titled “Embeddings”inputaccepts a string or a list of strings.encoding_format:"float"(default) or"base64".dimensions: passing any value returns an error. Custom dimensions are not supported.user: accepted but not used.
Extensions:
truncate_sequence: truncate long prompts at the model’s context limit instead of erroring.
Image Generation
Section titled “Image Generation”promptnresponse_format:"Url"(default; response carries a server-side filename inurl) or"B64Json"(response carries adata:image/png;base64,...string inb64_json).
OpenAI’s size string (e.g. "1024x1024") is not supported. Use the height and width fields instead:
height(default 720)width(default 1280)
quality, style, steps, guidance_scale are ignored.
/v1/audio/speech (TTS)
Section titled “/v1/audio/speech (TTS)”model,input: supported.response_format: onlywavandpcmare accepted;mp3,opus,aac,flacreturn a validation error.voice,instructions,speed: ignored.
/v1/audio/transcriptions and /v1/audio/translations
Section titled “/v1/audio/transcriptions and /v1/audio/translations”Not exposed as dedicated endpoints. Voxtral and similar STT models go through /v1/chat/completions with audio content parts. See speech models guide.
Moderation
Section titled “Moderation”Not supported. mistral.rs has no built-in moderation model; run one as a separate service if needed.
Files and Assistants APIs
Section titled “Files and Assistants APIs”File uploads (OpenAI’s POST /v1/files) are not supported. mistral.rs exposes GET /v1/files, GET /v1/files/{id}, GET /v1/files/{id}/content, and DELETE /v1/files/{id} for files produced by the agentic loop. The Assistants API is not supported; the mistral.rs equivalent is the session-based agentic loop on the chat completions endpoint.
Fine-tuning and Batch
Section titled “Fine-tuning and Batch”Not supported. mistral.rs is an inference engine, not a training platform.
Tokenization
Section titled “Tokenization”mistral.rs does not expose /v1/tokenize or /v1/detokenize HTTP endpoints. Tokenizer access is available through the SDKs (tokenize_text / detokenize_text in Python; tokenize_with_model / detokenize_with_model in Rust).
Authentication
Section titled “Authentication”OpenAI requires an Authorization: Bearer ... header. mistral.rs does not validate it. Clients that require an API key for initialization can send any non-empty string. For real authentication, place an authenticating reverse proxy in front.
Response headers
Section titled “Response headers”Content-Type: application/json for non-streaming responses; text/event-stream for streaming. The session id (when assigned or matched) is in the response body’s session_id field.