Web search tool in mistral.rs
mistral.rs is compatible with OpenAI’s web_search_options parameter! Once enabled, this allows web searching for models.
This works with all models that support tool calling. However, your mileage may vary depending on the specific model. The following models work during testing and are recommended for usage:
- Hermes 3 3b/8b
- Mistral 3 24b
- Llama 4 Scout/Maverick
- Qwen 3 (⭐ Recommended!)
Web search is supported both in streaming and completion responses! This makes it easy to integrate and test out in interactive mode!
Besides tool calling and parsing of web content, we also use an embedding model to select the most relevant search results.
You can use the web search tool in all the APIs: Python, Rust, and server.
Selecting a search embedding model
Internally, we now use google/embeddinggemma-300m to embed documents for ranking. You can pick from the built-in reranker variants (currently just embedding_gemma) in every API:
- Rust:
with_search(SearchEmbeddingModel::EmbeddingGemma300M)in the builder - Python:
search_embedding_model="embedding_gemma"in the Runner - Server:
--search-embedding-model embedding_gemmaflag
Specifying a custom search callback
By default, mistral.rs uses a DuckDuckGo-based search callback. To override this, you can provide your own search function:
- Rust: use
.with_search_callback(...)on the model builder with anArc<dyn Fn(&SearchFunctionParameters) -> anyhow::Result<Vec<SearchResult>> + Send + Sync>. - Python: pass the
search_callbackkeyword argument toRunner, which should be a functiondef search_callback(query: str) -> List[Dict[str, str]]returning a list of results with keys"title","description","url", and"content".
Example in Python:
def search_callback(query: str) -> list[dict[str, str]]:
# Implement your custom search logic here, returning a list of result dicts
return [
{
"title": "Example Result",
"description": "An example description",
"url": "https://example.com",
"content": "Full text content of the page",
},
# more results...
]
from mistralrs import Runner, Which, Architecture
runner = Runner(
which=Which.Plain(model_id="YourModel/ID", arch=Architecture.Mistral),
enable_search=True,
search_callback=search_callback,
)
HTTP server
Be sure to add --enable-search!
Here are some examples using various models. Note that this works for both streaming and completion requests, so interactive mode is featured here!
mistralrs run --enable-search --isq 4 -m Qwen/Qwen3-4B
mistralrs serve --enable-search -p 1234 --isq 4 --jinja-explicit chat_templates/mistral_small_tool_call.jinja -m mistralai/Mistral-Small-3.1-24B-Instruct-2503
mistralrs run --enable-search --isq 4 -m NousResearch/Hermes-3-Llama-3.1-8B
from openai import OpenAI
client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")
messages = [
{
"role": "user",
"content": "Can you show me some code using mistral.rs for running Llama 3.2 Vision?",
}
]
completion = client.chat.completions.create(
model="default",
messages=messages,
tool_choice="auto",
max_tokens=1024,
web_search_options={},
)
# print(completion.usage)
print(completion.choices[0].message.content)
if completion.choices[0].message.tool_calls is not None:
# Should never happen.
tool_called = completion.choices[0].message.tool_calls[0].function
print(tool_called)
Python SDK
from mistralrs import (
Runner,
Which,
ChatCompletionRequest,
Architecture,
WebSearchOptions,
)
# Define a custom search callback if desired
def my_search_callback(query: str) -> list[dict[str, str]]:
# Fetch or compute search results here
return [
{
"title": "Mistral.rs GitHub",
"description": "Official mistral.rs repository",
"url": "https://github.com/EricLBuehler/mistral.rs",
"content": "mistral.rs is a Rust binding for Mistral models...",
},
]
runner = Runner(
which=Which.Plain(
model_id="NousResearch/Hermes-3-Llama-3.1-8B",
arch=Architecture.Llama,
),
enable_search=True,
search_callback=my_search_callback,
)
res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="default",
messages=[
{
"role": "user",
"content": "Can you show me some code using mistral.rs for running Llama 3.2 Vision?",
}
],
max_tokens=256,
presence_penalty=1.0,
top_p=0.1,
temperature=0.1,
web_search_options=WebSearchOptions(
search_context_size=None, user_location=None
),
)
)
print(res.choices[0].message.content)
print(res.usage)
Rust SDK
use anyhow::Result;
use mistralrs::{
SearchEmbeddingModel, IsqType, RequestBuilder, TextMessageRole, TextMessages, TextModelBuilder,
WebSearchOptions,
};
#[tokio::main]
async fn main() -> Result<()> {
let model = TextModelBuilder::new("NousResearch/Hermes-3-Llama-3.1-8B")
.with_isq(IsqType::Q4K)
.with_logging()
.with_search(SearchEmbeddingModel::default())
.build()
.await?;
let messages = TextMessages::new().add_message(
TextMessageRole::User,
"What is the weather forecast for Boston?",
);
let messages =
RequestBuilder::from(messages).with_web_search_options(WebSearchOptions::default());
let response = model.send_chat_request(messages).await?;
println!("What is the weather forecast for Boston?\n\n");
println!("{}", response.choices[0].message.content.as_ref().unwrap());
dbg!(
response.usage.avg_prompt_tok_per_sec,
response.usage.avg_compl_tok_per_sec
);
Ok(())
}