Chat templates and tokenizer customization
JINJA chat templates (recommended method)
Some models do not come with support for tool calling or other features, and as such it might be necessary to specify your own chat template.
We provide some chat templates here, and it is easy to modify or create others to customize chat template behavior.
To use this, add the jinja-explicit parameter to the various APIs
mistralrs serve -p 1234 --isq 4 --jinja-explicit chat_templates/mistral_small_tool_call.jinja -m mistralai/Mistral-Small-3.1-24B-Instruct-2503
Chat template overrides
Mistral.rs attempts to automatically load a chat template from the tokenizer_config.json file. This enables high flexibility across instruction-tuned models and ensures accurate chat templating. However, if the chat_template field is missing, then a JINJA chat template should be provided. The JINJA chat template may use messages, add_generation_prompt, bos_token, eos_token, and unk_token as inputs.
We provide some chat templates here, and it is easy to modify or create others to customize chat template behavior.
For example, to use the chatml template, --chat-template is specified before the model architecture. For example:
mistralrs serve -p 1234 --log output.log --chat-template ./chat_templates/chatml.json -m meta-llama/Llama-3.2-3B-Instruct
Note: For GGUF models, the chat template may be loaded directly from the GGUF file by omitting any other chat template sources.
Tokenizer
Some models do not provide a tokenizer.json file although mistral.rs expects one. To solve this, please run this script. It will output the tokenizer.json file for your specific model. This may be used by passing the --tokenizer-json flag after the model architecture. For example:
$ python3 scripts/get_tokenizers_json.py
Enter model ID: microsoft/Orca-2-13b
$ mistralrs serve -p 1234 --log output.log -m microsoft/Orca-2-13b --tokenizer-json tokenizer.json
Putting it all together, to run, for example, an Orca model (which does not come with a tokenizer.json or chat template):
- Generate the
tokenizer.jsonby running the script atscripts/get_tokenizers_json.py. This will output some files includingtokenizer.jsonin the working directory. - Find and copy the correct chat template from
chat-templatesto the working directory (eg.,cp chat_templates/chatml.json .) - Run
mistralrs serve, specifying the tokenizer and chat template:mistralrs serve -p 1234 --log output.txt --chat-template chatml.json -m microsoft/Orca-2-13b -t tokenizer.json
Note: For GGUF models, the tokenizer may be loaded directly from the GGUF file by omitting the tokenizer model ID.