OpenAI compatibility
mistral.rs targets field-level OpenAI API compatibility. Most OpenAI client libraries work against mistral.rs unchanged. This page lists the exceptions.
Chat Completions fields
Section titled “Chat Completions fields”Implemented
Section titled “Implemented”modelmessages(including multimodal content parts)max_tokensmax_completion_tokens(OpenAI’s newer alias formax_tokens)temperaturetop_pstreamstoptools,tool_choiceresponse_format(text,json_schema)logit_biaslogprobs,top_logprobspresence_penalty,frequency_penaltyn(multiple completions)
Implemented with deviation
Section titled “Implemented with deviation”tool_choice:"auto","none", and specific function objects work."required"is unsupported; use a specific function object to force tool use.tools[*].function.strict: accepted on function tools. Whentrue, mistral.rs constrains generated tool arguments to the tool’sparametersJSON Schema. See strict tool calling.response_formatwithjson_schema: uses llguidance for constrained decoding. Output shape may differ from OpenAI’s on ambiguous schemas.json_objectis not accepted.
Silently ignored
Section titled “Silently ignored”seed, user, stream_options, metadata, service_tier, parallel_tool_calls, store. The request body accepts these fields (unknown fields are not rejected) but no behavior is wired to them. Use mistral.rs session_id for persistence.
mistralrs extensions
Section titled “mistralrs extensions”Accepted alongside OpenAI fields. OpenAI ignores them:
top_k: hard candidate cap.min_p: min-p sampling threshold.repetition_penalty: simpler alternative to frequency/presence.dry_multiplier,dry_base,dry_allowed_length,dry_sequence_breakers: DRY sampling parameters.grammar: llguidance constraints beyond JSON schemas.enable_thinking: tri-state for supporting models.trueforces thinking on,falseforces it off, omitting the field (or sendingnull) uses the chat template’s default (currently thinking on). Note that the Python SDK’sChatCompletionRequestconstructor defaults this toFalserather thanNone.web_search_options: search tool configuration (de facto OpenAI field, not yet universal).session_id: multi-turn session persistence.truncate_sequence: truncate long prompts at the model’s context limit instead of erroring.
Responses API fields
Section titled “Responses API fields”See the Responses guide. Notable exceptions:
parallel_tool_callsmust betrueor omitted.falsereturns an error.max_tool_callsreturns an error for any value.- Function tools support
strict: truewith the same JSON-Schema-constrained argument generation as Chat Completions.
Completions (legacy)
Section titled “Completions (legacy)”/v1/completions (non-chat) is supported with a subset of Chat Completions extensions: top_k, min_p, repetition_penalty, dry_multiplier, dry_base, dry_allowed_length, dry_sequence_breakers, grammar, truncate_sequence. The agentic, session, file, web-search, thinking, and reasoning-effort fields are not part of this endpoint’s schema and have no effect.
Embeddings
Section titled “Embeddings”inputaccepts a string or a list of strings.encoding_format:"float"(default) or"base64".dimensions: passing any value returns an error. Custom dimensions are not supported.user: accepted but not used.
Extensions:
truncate_sequence: truncate long prompts at the model’s context limit instead of erroring.
Image Generation
Section titled “Image Generation”promptnresponse_format:"Url"(default; response carries a server-side filename inurl) or"B64Json"(response carries adata:image/png;base64,...string inb64_json).
OpenAI’s size string (e.g. "1024x1024") is not supported. Use the height and width fields instead:
height(default 720)width(default 1280)
quality, style, steps, guidance_scale are ignored.
/v1/audio/speech (TTS)
Section titled “/v1/audio/speech (TTS)”model,input: supported.response_format: onlywavandpcmare accepted;mp3,opus,aac,flacreturn a validation error.voice,instructions,speed: ignored.
/v1/audio/transcriptions and /v1/audio/translations
Section titled “/v1/audio/transcriptions and /v1/audio/translations”Not exposed as dedicated endpoints. Voxtral and similar STT models go through /v1/chat/completions with audio content parts. See speech models guide.
Moderation
Section titled “Moderation”Not supported. mistral.rs has no built-in moderation model; run one as a separate service if needed.
Files and Assistants APIs
Section titled “Files and Assistants APIs”File uploads (OpenAI’s POST /v1/files) are not supported. mistral.rs exposes GET /v1/files, GET /v1/files/{id}, GET /v1/files/{id}/content, and DELETE /v1/files/{id} for files produced by the agentic loop. The Assistants API is not supported; the mistral.rs equivalent is the session-based agentic loop on the chat completions endpoint.
Fine-tuning and Batch
Section titled “Fine-tuning and Batch”Not supported. mistral.rs is an inference engine, not a training platform.
Tokenization
Section titled “Tokenization”mistral.rs does not expose /v1/tokenize or /v1/detokenize HTTP endpoints. Tokenizer access is available through the SDKs (tokenize_text / detokenize_text in Python; tokenize_with_model / detokenize_with_model in Rust).
Authentication
Section titled “Authentication”OpenAI requires an Authorization: Bearer ... header. mistral.rs does not validate it. Clients that require an API key for initialization can send any non-empty string. For real authentication, place an authenticating reverse proxy in front.
Response headers
Section titled “Response headers”Content-Type: application/json for non-streaming responses; text/event-stream for streaming. The session id (when assigned or matched) is in the response body’s session_id field.