Chat templates
A chat template formats messages into the string the model receives. Different models use different formats, and the wrong format produces output that is coherent but degraded. mistral.rs auto-detects the template for almost every supported model. This guide covers manual override.
Auto-detection
Section titled “Auto-detection”mistral.rs checks, in order:
- The
chat_templatefield in the model’stokenizer_config.jsonon Hugging Face. Most modern models include this. - A bundled template in
chat_templates/keyed by architecture. - A generic fallback for some older models.
If none match, auto-detection has failed and an override is required.
Symptoms of a wrong template:
- Output quality below expectations.
- Special tokens (
<|im_start|>,<bos>, etc.) leaking into output. - Multi-turn degrading faster than single-turn.
- System prompts ignored or treated as user input.
Overriding the template
Section titled “Overriding the template”A file
Section titled “A file”Pass a Jinja template file with --chat-template:
mistralrs run -m <model> --chat-template my-template.jinjaThe template uses standard Jinja2 with HuggingFace conventions (variables for messages, bos_token, eos_token). Copy the original from tokenizer_config.json and modify as a starting point.
Inline
Section titled “Inline”--jinja-explicit accepts an inline template string for one-off tests:
mistralrs run -m <model> --jinja-explicit "{% for msg in messages %}..."--jinja-explicit overrides --chat-template when both are set.
Picking a bundled template
Section titled “Picking a bundled template”mistral.rs ships templates for common architectures in chat_templates/. For new models of a known architecture not auto-detected, point at the bundled template:
mistralrs run -m <new-model> --chat-template chat_templates/llama3.jinjaBundled templates are looked up by name relative to the binary install location when not given as a full path.
Writing a template from scratch
Section titled “Writing a template from scratch”For models without an existing template, write one. General pattern:
{% if messages[0]['role'] == 'system' %}{{ bos_token }}<|system|>{{ messages[0]['content'] }}<|end|>{% endif %}{% for msg in messages[(1 if messages[0]['role'] == 'system' else 0):] %}{% if msg['role'] == 'user' %}<|user|>{{ msg['content'] }}<|end|>{% elif msg['role'] == 'assistant' %}<|assistant|>{{ msg['content'] }}<|end|>{% endif %}{% endfor %}{% if add_generation_prompt %}<|assistant|>{% endif %}Available variables:
messages: the chat message list.bos_token,eos_token: model special tokens.add_generation_prompt: true when building a prompt for generation.
For model-specific tokens and role markers, the model’s Hugging Face page is authoritative.
Multimodal templates
Section titled “Multimodal templates”Multimodal models need templates that handle non-text content parts. Most models use placeholder tokens like <|image|> or <|audio|>. Bundled templates handle this for supported architectures; custom multimodal templates must do so too.
The chat_templates/ directory contains templates for Gemma 4 and Qwen3-VL.