Skip to content

AnyMoE

Expert type for an AnyMoE model. May be:

  • AnyMoeExpertType.FineTuned()
  • AnyMoeExpertType.LoraAdapter(rank: int, alpha: float, target_modules: list[str])
FieldType
rankint
alphafloat
target_moduleslist[str]
__init__(
hidden_size: int,
dataset_json: str,
prefix: str,
mlp: str,
model_ids: list[str],
expert_type: AnyMoeExpertType,
layers: list[int] = [],
lr: float = 0.001,
epochs: int = 100,
batch_size: int = 4,
gate_model_id: str | None = None,
training: bool = True,
loss_csv_path: str | None = None,
) -> None

Create an AnyMoE config from the hidden size, dataset, and other metadata. The model IDs may be local paths.

To find the prefix/mlp values:

  • Go to https://huggingface.co/<MODEL ID>/tree/main?show_file_info=model.safetensors.index.json
  • Look for the mlp layers: for example model.layers.27.mlp.down_proj.weight means the prefix is model.layers and the mlp is mlp.

To find the hidden size:

  • Look it up in https://huggingface.co/<BASE MODEL ID>/blob/main/config.json.

Note: gate_model_id specifies the gating model ID. If training == True, safetensors are written here; otherwise the pretrained safetensors are loaded and no training occurs.

Note: if training == True, loss_csv_path has no effect. Otherwise, a CSV loss file is saved at that path.


Generated from mistralrs-pyo3/mistralrs.pyi.