Skip to content

Get structured output

mistral.rs constrains generation server-side with llguidance, so the output is guaranteed to match your schema — no retry loops or post-hoc validation. The common case is a JSON schema; regex, Lark, and raw llguidance grammars cover everything else.

OpenAI’s response_format with type: "json_schema":

Terminal window
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [{"role": "user", "content": "Gimme a sample address."}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "Address",
"schema": {
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"state": {"type": "string", "pattern": "^[A-Z]{2}$"},
"zip": {"type": "integer", "minimum": 10000, "maximum": 99999}
},
"required": ["street", "city", "state", "zip"],
"additionalProperties": false
}
}
}
}'

With the OpenAI Python client, client.beta.chat.completions.parse accepts a pydantic model directly and returns parsed objects:

from openai import OpenAI
from pydantic import BaseModel
client = OpenAI(api_key="not-used", base_url="http://localhost:1234/v1/")
class Address(BaseModel):
street: str
city: str
state: str
zip: int
completion = client.beta.chat.completions.parse(
model="default",
messages=[{"role": "user", "content": "Gimme a sample address."}],
response_format=Address,
)
print(completion.choices[0].message.parsed)

Full examples: openai_response_format, json_schema.

For JSON-Schema-constrained tool arguments (rather than the whole response), set strict: true on the function tool; see tool calling.

Beyond JSON schemas, the grammar request field accepts regex, Lark, and raw llguidance constraints. grammar and response_format are mutually exclusive in one request.

Over HTTP, grammar is a tagged object (on Chat Completions, Responses, legacy Completions, and the Anthropic Messages endpoint):

{"grammar": {"type": "regex", "value": "(- [^\\n]*\\n)+(- [^\\n]*)"}}
typevalueUse for
regexregex stringFixed-shape text: lists, IDs, dates.
json_schemaJSON schema objectSame constraint as response_format, without the wrapper.
larkLark grammar stringContext-free syntax: expressions, DSLs.
llguidancellguidance grammar objectComposed grammars; full control.

In the Python SDK the same four constraints are selected by grammar_type ("regex", "json_schema", "lark", "llguidance") with grammar as a string; JSON-encode the json_schema and llguidance object forms. In Rust, pass a Constraint to RequestBuilder::set_constraint:

let request = RequestBuilder::new()
.set_constraint(mistralrs::Constraint::Regex(
"(- [^\n]*\n)+(- [^\n]*)(\n\n)?".to_string(),
))
.add_message(TextMessageRole::User, "Please write a few jokes.");

Constraint::Regex, Constraint::Lark, Constraint::JsonSchema, and Constraint::Llguidance mirror the HTTP variants.

Full examples: regex, lark, llguidance (HTTP); regex, lark_llg (Python); grammar, llguidance (Rust).

  • response_format accepts {"type": "text"} (no constraint) and {"type": "json_schema", ...}. OpenAI’s json_object mode is not accepted; supply a schema instead. See OpenAI compatibility for field-level deviations.
  • Constrained decoding restricts which tokens can be sampled; it does not change what the model wants to say. Prompting for the data you expect still matters, and a low temperature helps on extraction tasks.
  • Schema output shape may differ from OpenAI’s behavior on ambiguous schemas (llguidance enforces the schema literally).