Get structured output
mistral.rs constrains generation server-side with llguidance, so the output is guaranteed to match your schema — no retry loops or post-hoc validation. The common case is a JSON schema; regex, Lark, and raw llguidance grammars cover everything else.
OpenAI’s response_format with type: "json_schema":
curl http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "default", "messages": [{"role": "user", "content": "Gimme a sample address."}], "response_format": { "type": "json_schema", "json_schema": { "name": "Address", "schema": { "type": "object", "properties": { "street": {"type": "string"}, "city": {"type": "string"}, "state": {"type": "string", "pattern": "^[A-Z]{2}$"}, "zip": {"type": "integer", "minimum": 10000, "maximum": 99999} }, "required": ["street", "city", "state", "zip"], "additionalProperties": false } } } }'With the OpenAI Python client, client.beta.chat.completions.parse accepts a pydantic model directly and returns parsed objects:
from openai import OpenAIfrom pydantic import BaseModel
client = OpenAI(api_key="not-used", base_url="http://localhost:1234/v1/")
class Address(BaseModel): street: str city: str state: str zip: int
completion = client.beta.chat.completions.parse( model="default", messages=[{"role": "user", "content": "Gimme a sample address."}], response_format=Address,)print(completion.choices[0].message.parsed)Full examples: openai_response_format, json_schema.
The Python SDK takes the schema as a JSON string via grammar / grammar_type. Derive it from a pydantic model with model_json_schema():
import jsonfrom pydantic import BaseModelfrom mistralrs import Runner, Which, ChatCompletionRequest
class Address(BaseModel): street: str city: str state: str zip: int
runner = Runner(which=Which.Plain(model_id="Qwen/Qwen3-4B"))
res = runner.send_chat_completion_request( ChatCompletionRequest( model="default", messages=[{"role": "user", "content": "Gimme a sample address."}], grammar_type="json_schema", grammar=json.dumps(Address.model_json_schema()), ))print(res.choices[0].message.content)Full examples: pydantic_schema, json_schema.
Model::generate_structured::<T>() derives the schema from a Rust type with schemars::JsonSchema, constrains generation to it, and deserializes the result:
use mistralrs::{IsqBits, ModelBuilder, TextMessageRole, TextMessages};use schemars::JsonSchema;use serde::Deserialize;
#[derive(Debug, Deserialize, JsonSchema)]struct Address { street: String, city: String, state: String, zip: u32,}
#[tokio::main]async fn main() -> anyhow::Result<()> { let model = ModelBuilder::new("Qwen/Qwen3-4B") .with_auto_isq(IsqBits::Four) // optional: 4-bit ISQ, see /reference/quantization-types/ .build() .await?;
let messages = TextMessages::new() .add_message(TextMessageRole::User, "Give me a sample US address.");
let address: Address = model.generate_structured::<Address>(messages).await?; println!("{address:?}"); Ok(())}Full examples: cookbook/structured, advanced/json_schema.
For JSON-Schema-constrained tool arguments (rather than the whole response), set strict: true on the function tool; see tool calling.
Grammar constraints
Section titled “Grammar constraints”Beyond JSON schemas, the grammar request field accepts regex, Lark, and raw llguidance constraints. grammar and response_format are mutually exclusive in one request.
Over HTTP, grammar is a tagged object (on Chat Completions, Responses, legacy Completions, and the Anthropic Messages endpoint):
{"grammar": {"type": "regex", "value": "(- [^\\n]*\\n)+(- [^\\n]*)"}}type | value | Use for |
|---|---|---|
regex | regex string | Fixed-shape text: lists, IDs, dates. |
json_schema | JSON schema object | Same constraint as response_format, without the wrapper. |
lark | Lark grammar string | Context-free syntax: expressions, DSLs. |
llguidance | llguidance grammar object | Composed grammars; full control. |
In the Python SDK the same four constraints are selected by grammar_type ("regex", "json_schema", "lark", "llguidance") with grammar as a string; JSON-encode the json_schema and llguidance object forms. In Rust, pass a Constraint to RequestBuilder::set_constraint:
let request = RequestBuilder::new() .set_constraint(mistralrs::Constraint::Regex( "(- [^\n]*\n)+(- [^\n]*)(\n\n)?".to_string(), )) .add_message(TextMessageRole::User, "Please write a few jokes.");Constraint::Regex, Constraint::Lark, Constraint::JsonSchema, and Constraint::Llguidance mirror the HTTP variants.
Full examples: regex, lark, llguidance (HTTP); regex, lark_llg (Python); grammar, llguidance (Rust).
response_formataccepts{"type": "text"}(no constraint) and{"type": "json_schema", ...}. OpenAI’sjson_objectmode is not accepted; supply a schema instead. See OpenAI compatibility for field-level deviations.- Constrained decoding restricts which tokens can be sampled; it does not change what the model wants to say. Prompting for the data you expect still matters, and a low temperature helps on extraction tasks.
- Schema output shape may differ from OpenAI’s behavior on ambiguous schemas (llguidance enforces the schema literally).