Web search
--enable-search exposes a web_search tool to the model.
Turning it on
Section titled “Turning it on”mistralrs serve --enable-search -m <model>Once the server is started with --enable-search, Chat Completions requests control per-request behavior with web_search_options:
{ "model": "default", "messages": [{"role": "user", "content": "What happened at CES this year?"}], "web_search_options": { "search_context_size": "medium" }}Responses requests use OpenAI hosted-tool syntax instead:
{ "model": "default", "input": "What happened at CES this year?", "tools": [ { "type": "web_search", "search_context_size": "medium", "return_token_budget": "default" } ], "tool_choice": "required"}from mistralrs import ChatCompletionRequest, Runner, Which
runner = Runner( which=Which.Plain(model_id="Qwen/Qwen3-4B"), enable_search=True,)use mistralrs::{IsqBits, ModelBuilder, SearchEmbeddingModel};
let model = ModelBuilder::new("Qwen/Qwen3-4B") .with_auto_isq(IsqBits::Four) .with_search(SearchEmbeddingModel::default()) .build() .await?;The built-in backend uses DuckDuckGo (https://html.duckduckgo.com/html/?q=...). Up to 10 results are returned per query. Results pass through a readability-style extractor.
The built-in search and extraction tools use strict tool calling by default, so generated queries and URLs are constrained to the declared JSON Schema.
Reranking
Section titled “Reranking”The neural reranker is opt-in. When a search embedding model is configured, retrieved results pass through a BM25 keyword pre-filter and then an embedding-based reranker before reaching the model. With no embedding model, results go straight to the model unranked.
In the SDKs, enabling search already loads the reranker (EmbeddingGemma by default). On the CLI it is off until you pass --search-embedding-model:
mistralrs serve --enable-search \ --search-embedding-model embedding-gemma \ -m <model>--search-embedding-model accepts embedding-gemma. It requires --enable-search (or --agent/--agentic, the one-flag agent preset that turns search on).
Per-request options
Section titled “Per-request options”Fields on WebSearchOptions:
search_context_size:low,medium(default),high.user_location: optional location hint.filters:allowed_domainsandblocked_domains, each up to 100 domains. A domain matches its subdomains.return_token_budget:defaultorunlimitedfor Responsesweb_search.search_content_types:["text"]is supported. Image search is not supported.external_web_access:trueor omitted.falseis not supported forweb_search; it is ignored forweb_search_preview.search_description: optional description shown to the model.extract_description: optional description for content extraction.
web_search_preview is accepted for Responses, but it does not support filters or return_token_budget.
Custom search backends
Section titled “Custom search backends”The Python and Rust SDKs accept a search_callback that replaces the default web search. The callback receives a query string and returns search results. Used for searching internal corpora.
The callback returns a list of dicts with keys title, description, url, and content:
def my_search(query: str) -> list[dict]: return [ {"title": "...", "description": "...", "url": "internal://...", "content": "..."}, ]
runner = Runner( which=Which.Plain(model_id="Qwen/Qwen3-4B"), enable_search=True, search_callback=my_search,)The callback receives SearchFunctionParameters and returns Vec<SearchResult>:
use std::sync::Arc;use mistralrs::{ModelBuilder, SearchFunctionParameters, SearchResult};
let model = ModelBuilder::new("Qwen/Qwen3-4B") .with_search_callback(Arc::new(|params: &SearchFunctionParameters| { Ok(vec![SearchResult { title: "...".to_string(), description: "...".to_string(), url: "internal://...".to_string(), content: "...".to_string(), }]) })) .build() .await?;