Rust SDK getting started
The Rust SDK embeds the engine directly into a Rust program. A Rust toolchain is required; see rustup.rs.
use anyhow::Result;use mistralrs::{IsqBits, ModelBuilder, TextMessageRole, TextMessages};
#[tokio::main]async fn main() -> Result<()> { let model = ModelBuilder::new("Qwen/Qwen3-4B") .with_auto_isq(IsqBits::Four) .with_logging() .build() .await?;
let messages = TextMessages::new().add_message( TextMessageRole::User, "In one sentence, what is Rust known for?", );
let response = model.send_chat_request(messages).await?; println!("{}", response.choices[0].message.content.as_ref().unwrap());
Ok(())}Run with cargo run --release. The first run downloads the weights into the Hugging Face cache.
with_auto_isq(IsqBits::Four) enables ISQ (in-situ quantization) to 4 bits; omit it to run the model unquantized.
Full example: text-generation.
Project setup
Section titled “Project setup”cargo new --bin hello-mistralrscd hello-mistralrsAdd the dependencies to Cargo.toml:
[dependencies]anyhow = "1"mistralrs = "0.8"tokio = { version = "1", features = ["full"] }The default features build for CPU. For GPU acceleration, enable the matching feature:
# NVIDIA GPU (CUDA)mistralrs = { version = "0.8", features = ["cuda", "flash-attn", "cudnn"] }
# Apple Silicon (Metal)mistralrs = { version = "0.8", features = ["metal"] }
# Intel CPU with MKLmistralrs = { version = "0.8", features = ["mkl"] }Feature names match the CLI build features. The cargo features reference lists every option.
The pieces
Section titled “The pieces”ModelBuilder is a fluent configuration object; each method returns self. The only required input is the Hugging Face repository id (or local path) passed to ModelBuilder::new. Everything else has a default. build() loads the weights and returns a Model.
with_auto_isq(IsqBits::Four) matches --isq 4 on the CLI: the engine selects an optimal 4-bit format per platform (AFQ4 on Metal, Q4K on CUDA or CPU). To pin a specific format, use with_isq(IsqType::Q4K); see the quantization reference.
TextMessages assembles a basic chat conversation. For per-message sampling, tool schemas, or logprobs, use RequestBuilder.
The Rust SDK reference lists the full Model surface; specialized builders (GgufModelBuilder, EmbeddingModelBuilder, LoraModelBuilder, …) are on docs.rs.
Streaming
Section titled “Streaming”stream_chat_request returns a futures Stream of response chunks:
use anyhow::Result;use futures::StreamExt;use mistralrs::{ ChatCompletionChunkResponse, ChunkChoice, Delta, IsqBits, ModelBuilder, Response, TextMessageRole, TextMessages,};use std::io::Write;
#[tokio::main]async fn main() -> Result<()> { let model = ModelBuilder::new("Qwen/Qwen3-4B") .with_auto_isq(IsqBits::Four) .build() .await?;
let messages = TextMessages::new().add_message( TextMessageRole::User, "Write me a haiku about ownership.", );
let mut stream = model.stream_chat_request(messages).await?; let stdout = std::io::stdout(); let mut out = std::io::BufWriter::new(stdout.lock());
while let Some(item) = stream.next().await { if let Response::Chunk(ChatCompletionChunkResponse { choices, .. }) = item { if let Some(ChunkChoice { delta: Delta { content: Some(text), .. }, .. }) = choices.first() { out.write_all(text.as_bytes())?; out.flush()?; } } }
Ok(())}The stream yields Response values. Most are Response::Chunk carrying assistant output in choices[0].delta.content; other variants cover errors and the final completion event, and production code should match them explicitly. The streaming guide covers the variant taxonomy, error handling, and tool-call progress events.
ModelBuilder::build() performs all model loading and is expensive. Call it once at startup. Model methods take &self and are safe to call concurrently, so you can share one instance by reference. To move it across spawned tasks, wrap it in Arc<Model> (Model does not implement Clone).
Requests through the Rust SDK bypass the HTTP layer; there is no /v1/chat/completions endpoint and no OpenAI compatibility shim. To expose a model over HTTP alongside direct in-process access, see the embed-in-axum guide.