Embed mistralrs inside an Axum application
To add mistral.rs to an existing Axum app, mount the mistralrs router under a sub-path. The pattern uses two builders from mistralrs-server-core:
MistralRsForServerBuilderconstructs the engine state (SharedMistralRsState = Arc<MistralRs>, used later for custom handlers).MistralRsServerRouterBuilderproduces an AxumRouterfrom that state.
Dependencies
Section titled “Dependencies”[dependencies]anyhow = "1"mistralrs-core = "0.8"mistralrs-server-core = "0.8"axum = "0.8"tokio = { version = "1", features = ["full"] }The high-level mistralrs crate is not needed here; the server builders take a ModelSelected from mistralrs-core directly.
Mount under a sub-path
Section titled “Mount under a sub-path”use axum::{Router, routing::get};use mistralrs_core::{AutoDeviceMapParams, ModelDType, ModelSelected};use mistralrs_server_core::{ mistralrs_for_server_builder::MistralRsForServerBuilder, mistralrs_server_router_builder::MistralRsServerRouterBuilder,};
#[tokio::main]async fn main() -> anyhow::Result<()> { let model = ModelSelected::Plain { model_id: "Qwen/Qwen3-4B".into(), tokenizer_json: None, arch: None, dtype: ModelDType::Auto, topology: None, organization: None, write_uqff: None, from_uqff: None, imatrix: None, calibration_file: None, max_seq_len: AutoDeviceMapParams::DEFAULT_MAX_SEQ_LEN, max_batch_size: AutoDeviceMapParams::DEFAULT_MAX_BATCH_SIZE, hf_cache_path: None, matformer_config_path: None, matformer_slice_name: None, };
let shared_mistralrs = MistralRsForServerBuilder::new() .with_model(model) .with_in_situ_quant("4".to_string()) .build() .await?;
let mistralrs_router = MistralRsServerRouterBuilder::new() .with_mistralrs(shared_mistralrs) .build() .await?;
let app = Router::new() .route("/", get(|| async { "My app" })) .nest("/ai", mistralrs_router);
let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?; axum::serve(listener, app).await?; Ok(())}POST /ai/v1/chat/completions then behaves identically to the standalone server, as do the other routes.
with_in_situ_quant("4") applies ISQ (in-situ quantization) to 4-bit; omit it to run the model unquantized.
ModelSelected names every field, so this literal will not compile when new fields are added. For the current field list, see the docs.rs ModelSelected entry or the mistralrs-server-core crate-level docs.
Builder options
Section titled “Builder options”MistralRsServerRouterBuilder exposes:
with_include_swagger_routes(bool)with_base_path(&str)with_allowed_origins(Vec<String>)with_max_body_limit(usize)with_max_tool_rounds(usize)with_tool_dispatch_url(String)with_agent_permission(AgentPermission)andwith_code_execution_permission(CodeExecutionPermission)
MistralRsForServerBuilder exposes engine-level options (with_model, with_in_situ_quant, set_paged_attn, with_seed, multi-model via add_model, etc.).
Calling the model directly from a handler
Section titled “Calling the model directly from a handler”For custom request shapes, share the SharedMistralRsState directly with Axum handlers and use the lower-level helpers exposed by mistralrs-server-core (chat_completion::parse_request, handler_core::send_request, …).
A complete example with custom OpenAPI integration is in the mistralrs-server-core crate-level documentation.