Web search
--enable-search exposes a web_search tool to the model.
The built-in search and extraction tools use strict tool calling by default, so generated queries and URLs are constrained to the declared JSON Schema.
Turning it on
Section titled “Turning it on”mistralrs serve --enable-search -m <model>The built-in backend uses DuckDuckGo (https://html.duckduckgo.com/html/?q=...). Up to 10 results are returned per query. Results pass through a readability-style extractor.
Reranking
Section titled “Reranking”Retrieved results pass through an embedding-based reranker before reaching the model. To enable a reranker:
mistralrs serve --enable-search \ --search-embedding-model embedding-gemma \ -m <model>--search-embedding-model accepts embedding-gemma. It requires --enable-search (or --agent/--agentic, which turns search on as part of the one-flag agent preset).
Per-request options
Section titled “Per-request options”The OpenAI web_search_options field controls per-request behavior:
{ "model": "default", "messages": [{"role": "user", "content": "What happened at CES this year?"}], "web_search_options": { "search_context_size": "medium" }}Fields on WebSearchOptions:
search_context_size:low,medium(default),high.user_location: optional location hint.search_description: optional description shown to the model.extract_description: optional description for content extraction.
Custom search backends
Section titled “Custom search backends”The Python and Rust SDKs accept a search_callback. The callback receives a query string and returns a list of result dicts. Used for searching internal corpora.
Python:
def my_search(query: str) -> list[dict]: return [ {"title": "...", "description": "...", "url": "internal://...", "content": "..."}, ... ]
runner = Runner( which=Which.Plain(model_id="Qwen/Qwen3-4B"), enable_search=True, search_callback=my_search,)