Skip to content

Web search

--enable-search exposes a web_search tool to the model.

Terminal window
mistralrs serve --enable-search -m <model>

The built-in backend uses DuckDuckGo (https://html.duckduckgo.com/html/?q=...). Up to 10 results are returned per query. Results pass through a readability-style extractor.

The built-in search and extraction tools use strict tool calling by default, so generated queries and URLs are constrained to the declared JSON Schema.

The neural reranker is opt-in. When a search embedding model is configured, retrieved results pass through a BM25 keyword pre-filter and then an embedding-based reranker before reaching the model. With no embedding model, results go straight to the model unranked.

In the SDKs, enabling search already loads the reranker (EmbeddingGemma by default). On the CLI it is off until you pass --search-embedding-model:

Terminal window
mistralrs serve --enable-search \
--search-embedding-model embedding-gemma \
-m <model>

--search-embedding-model accepts embedding-gemma. It requires --enable-search (or --agent/--agentic, the one-flag agent preset that turns search on).

Fields on WebSearchOptions:

  • search_context_size: low, medium (default), high.
  • user_location: optional location hint.
  • filters: allowed_domains and blocked_domains, each up to 100 domains. A domain matches its subdomains.
  • return_token_budget: default or unlimited for Responses web_search.
  • search_content_types: ["text"] is supported. Image search is not supported.
  • external_web_access: true or omitted. false is not supported for web_search; it is ignored for web_search_preview.
  • search_description: optional description shown to the model.
  • extract_description: optional description for content extraction.

web_search_preview is accepted for Responses, but it does not support filters or return_token_budget.

The Python and Rust SDKs accept a search_callback that replaces the default web search. The callback receives a query string and returns search results. Used for searching internal corpora.

The callback returns a list of dicts with keys title, description, url, and content:

def my_search(query: str) -> list[dict]:
return [
{"title": "...", "description": "...", "url": "internal://...", "content": "..."},
]
runner = Runner(
which=Which.Plain(model_id="Qwen/Qwen3-4B"),
enable_search=True,
search_callback=my_search,
)