OpenAI-compatible chat completions endpoint handler.
const url = 'https://example.com/v1/chat/completions';const options = { method: 'POST', headers: {'Content-Type': 'application/json'}, body: '{"agent_permission":null,"code_execution_permission":null,"dry_allowed_length":null,"dry_base":null,"dry_multiplier":null,"dry_sequence_breakers":null,"enable_code_execution":true,"enable_thinking":null,"files":["example"],"frequency_penalty":null,"grammar":{"type":"regex","value":"example"},"logit_bias":null,"logprobs":false,"max_tokens":256,"max_tool_rounds":null,"messages":[{"content":"example","name":"example","role":"example","tool_call_id":"example","tool_calls":[{"function":{"arguments":"example","name":"example"},"id":"example","type":"function"}]}],"min_p":null,"model":"mistral","n":1,"presence_penalty":null,"reasoning_effort":null,"repetition_penalty":null,"response_format":{"type":"text"},"session_id":"example","stop":"example","stream":true,"temperature":0.7,"tool_choice":"none","tools":null,"top_k":null,"top_logprobs":null,"top_p":null,"truncate_sequence":null,"web_search_options":{"extract_description":"example","search_context_size":"low","search_description":"example","user_location":{"approximate":{"city":"example","country":"example","region":"example","timezone":"example"},"type":"approximate"}}}'};
try { const response = await fetch(url, options); const data = await response.json(); console.log(data);} catch (error) { console.error(error);}curl --request POST \ --url https://example.com/v1/chat/completions \ --header 'Content-Type: application/json' \ --data '{ "agent_permission": null, "code_execution_permission": null, "dry_allowed_length": null, "dry_base": null, "dry_multiplier": null, "dry_sequence_breakers": null, "enable_code_execution": true, "enable_thinking": null, "files": [ "example" ], "frequency_penalty": null, "grammar": { "type": "regex", "value": "example" }, "logit_bias": null, "logprobs": false, "max_tokens": 256, "max_tool_rounds": null, "messages": [ { "content": "example", "name": "example", "role": "example", "tool_call_id": "example", "tool_calls": [ { "function": { "arguments": "example", "name": "example" }, "id": "example", "type": "function" } ] } ], "min_p": null, "model": "mistral", "n": 1, "presence_penalty": null, "reasoning_effort": null, "repetition_penalty": null, "response_format": { "type": "text" }, "session_id": "example", "stop": "example", "stream": true, "temperature": 0.7, "tool_choice": "none", "tools": null, "top_k": null, "top_logprobs": null, "top_p": null, "truncate_sequence": null, "web_search_options": { "extract_description": "example", "search_context_size": "low", "search_description": "example", "user_location": { "approximate": { "city": "example", "country": "example", "region": "example", "timezone": "example" }, "type": "approximate" } } }'Request Body required
Section titled “Request Body required ”Chat completion request following OpenAI’s specification
object
Permission policy for agentic tools.
Example
nullPermission policy for code execution.
Example
nullLongest repeated sequence DRY leaves unpenalized.
Example
nullBase for DRY’s exponential penalty growth.
Example
nullDRY repetition penalty multiplier; 0 disables DRY.
Example
nullSequences that reset DRY repetition matching.
Example
nullEnable Python code execution tools for this request.
Toggle thinking output for models that support it.
Example
nullRequired output files. The runtime asks the model to produce them and surfaces a File (or error placeholder) for each.
Penalize tokens by how often they have appeared so far; positive values reduce repetition.
Example
nullobject
object
object
object
Top-level grammar configuration for LLGuidance
object
List of grammar configurations
Grammar configuration with lexer settings
object
The JSON schema that the grammar should generate
The Lark grammar that the grammar should generate
The name of this grammar, can be used in GenGrammar nodes
Maximum number of tokens to generate
object
Bias added to the logits of these token IDs before sampling.
Example
nullReturn log probabilities of the output tokens.
Example
falseMaximum number of tokens to generate.
Example
256Maximum number of tool-call rounds the server will auto-execute.
Example
nullRepresents a single message in a conversation
Examples
use either::Either;
use mistralrs_server_core::openai::{Message, MessageContent};
// User message
let user_msg = Message {
content: Some(MessageContent(Either::Left("What's 2+2?".to_string()))),
role: "user".to_string(),
name: None,
tool_calls: None,
};
// System message
let system_msg = Message {
content: Some(MessageContent(Either::Left("You are a helpful assistant.".to_string()))),
role: "system".to_string(),
name: None,
tool_calls: None,
};object
Optional participant name for this message
The role of the message sender (“user”, “assistant”, “system”, “tool”, etc.)
Tool call ID this message is responding to (for tool messages)
Optional list of tool calls (for assistant messages)
Represents a tool call made by the assistant
This structure wraps a function call with its type information.
object
The function call details
object
The function arguments (JSON string)
The name of the function to call
Unique identifier for this tool call
The type of tool being called
Drop tokens below this fraction of the top token’s probability.
Example
nullModel ID; “default” targets the only loaded model.
Example
mistralHow many choices to generate.
Example
1Penalize tokens that have already appeared; positive values push toward new topics.
Example
nullReasoning effort level for Harmony-format models (GPT-OSS). Controls the depth of reasoning/analysis: “low”, “medium”, or “high”.
Example
nullMultiplicative repetition penalty; 1.0 disables it.
Example
nullPersistent agentic state. If None, a new session is created and the ID is returned in the response.
Multiple possible stop sequences
Single stop sequence
Stream the response as server-sent events.
Example
trueSampling temperature; higher values increase randomness.
Example
0.7Disallow selection of tools.
Allow automatic selection of any given tool, or none.
Force selection of a given tool.
object
Force selection of a given tool.
object
Function definition for a tool
object
When true, the tool’s parameters JSON schema is enforced on the
generated arguments via constrained decoding (llguidance).
Type of tool
Tools the model may call.
Tool definition
object
Function definition for a tool
object
When true, the tool’s parameters JSON schema is enforced on the
generated arguments via constrained decoding (llguidance).
Type of tool
Example
nullSample only from the k most likely tokens.
Example
nullNumber of most likely tokens to return per position; requires logprobs.
Example
nullNucleus sampling: only tokens within the top cumulative probability mass are considered.
Example
nullTruncate inputs that exceed the model’s context length instead of erroring.
Example
nullEnable the built-in web search tool.
object
Override the description for the extraction tool.
Override the description for the search tool.
Responses
Section titled “ Responses ”Chat completions