Skip to content

LoRA and X-LoRA adapters

LoRA (Low-Rank Adaptation) adapters add task-specific fine-tuning on top of a base model without modifying the base weights. X-LoRA loads several adapters at once and lets the model select among them per request.

mistral.rs reads adapter_config.json from the LoRA repo for the targeted modules and rank. Pass multiple adapters as a semicolon-separated list.

Terminal window
mistralrs run -m <base-model> --lora <lora-repo>

Multiple adapters:

Terminal window
mistralrs run -m <base-model> --lora "<lora-repo-1>;<lora-repo-2>"

Full examples: lora-zephyr (Python), lora (Rust).

X-LoRA loads multiple adapters with a learned scaling head that selects per-token weighting. The ordering file maps adapters to the scaler's output positions.

Terminal window
mistralrs run \
-m <base-model> \
--xlora <xlora-repo> \
--xlora-order <ordering-file.json>

Flag rules:

  • --xlora conflicts with --lora.
  • --xlora-order and --tgt-non-granular-index are only valid alongside --xlora.
  • --tgt-non-granular-index <n> controls how often the X-LoRA scaler recomputes. Without it, the scaler recomputes every token.

Full examples: xlora-zephyr (Python), xlora (Rust).

AnyMoE goes a step further than adapters: it composes several fine-tunes of the same base model into a MoE (Mixture of Experts) configuration at inference time, training only a small per-layer router.

  • It is exposed through the Rust SDK (AnyMoeModelBuilder) and the Python SDK (AnyMoeConfig, AnyMoeExpertType); it is not configurable via the CLI.
  • Expert checkpoints must share the base model architecture, and a small JSON calibration dataset is required to train the router.
  • The AnyMoeConfig docstrings in the AnyMoE Python reference cover finding the prefix/mlp values from a model's model.safetensors.index.json.

Full examples: anymoe (Python), anymoe and anymoe-lora (Rust).