LoRA and X-LoRA adapters
LoRA (Low-Rank Adaptation) adapters add task-specific fine-tuning on top of a base model without modifying the base weights. X-LoRA loads several adapters at once and lets the model select among them per request.
Loading a LoRA
Section titled “Loading a LoRA”mistral.rs reads adapter_config.json from the LoRA repo for the targeted modules and rank. Pass multiple adapters as a semicolon-separated list.
mistralrs run -m <base-model> --lora <lora-repo>Multiple adapters:
mistralrs run -m <base-model> --lora "<lora-repo-1>;<lora-repo-2>"from mistralrs import Runner, Which
runner = Runner( which=Which.Lora( model_id="<base-model>", adapter_model_ids=["<lora-repo-1>", "<lora-repo-2>"], ))use mistralrs::{LoraModelBuilder, TextModelBuilder};
let model = LoraModelBuilder::from_text_model_builder( TextModelBuilder::new("<base-model>"), vec!["<lora-repo-1>", "<lora-repo-2>"],).build().await?;Full examples: lora-zephyr (Python), lora (Rust).
X-LoRA
Section titled “X-LoRA”X-LoRA loads multiple adapters with a learned scaling head that selects per-token weighting. The ordering file maps adapters to the scaler's output positions.
mistralrs run \ -m <base-model> \ --xlora <xlora-repo> \ --xlora-order <ordering-file.json>Flag rules:
--xloraconflicts with--lora.--xlora-orderand--tgt-non-granular-indexare only valid alongside--xlora.--tgt-non-granular-index <n>controls how often the X-LoRA scaler recomputes. Without it, the scaler recomputes every token.
from mistralrs import Runner, Which
runner = Runner( which=Which.XLora( model_id="<base-model>", xlora_model_id="<xlora-repo>", order="<ordering-file.json>", # tgt_non_granular_index=..., ))use std::fs::File;use mistralrs::{XLoraModelBuilder, TextModelBuilder};
let model = XLoraModelBuilder::from_text_model_builder( TextModelBuilder::new("<base-model>"), "<xlora-repo>", serde_json::from_reader(File::open("<ordering-file.json>")?)?,).build().await?;Full examples: xlora-zephyr (Python), xlora (Rust).
AnyMoE
Section titled “AnyMoE”AnyMoE goes a step further than adapters: it composes several fine-tunes of the same base model into a MoE (Mixture of Experts) configuration at inference time, training only a small per-layer router.
- It is exposed through the Rust SDK (
AnyMoeModelBuilder) and the Python SDK (AnyMoeConfig,AnyMoeExpertType); it is not configurable via the CLI. - Expert checkpoints must share the base model architecture, and a small JSON calibration dataset is required to train the router.
- The
AnyMoeConfigdocstrings in the AnyMoE Python reference cover finding theprefix/mlpvalues from a model'smodel.safetensors.index.json.
Full examples: anymoe (Python), anymoe and anymoe-lora (Rust).