AnyMoE
AnyMoeExpertType
Section titled “AnyMoeExpertType”Expert type for an AnyMoE model. May be:
AnyMoeExpertType.FineTuned()AnyMoeExpertType.LoraAdapter(rank: int, alpha: float, target_modules: list[str])
AnyMoeExpertType.FineTuned
Section titled “AnyMoeExpertType.FineTuned”AnyMoeExpertType.LoraAdapter
Section titled “AnyMoeExpertType.LoraAdapter”| Field | Type |
|---|---|
rank | int |
alpha | float |
target_modules | list[str] |
AnyMoeConfig
Section titled “AnyMoeConfig”AnyMoeConfig.__init__
Section titled “AnyMoeConfig.__init__”__init__( hidden_size: int, dataset_json: str, prefix: str, mlp: str, model_ids: list[str], expert_type: AnyMoeExpertType, layers: list[int] = [], lr: float = 0.001, epochs: int = 100, batch_size: int = 4, gate_model_id: str | None = None, training: bool = True, loss_csv_path: str | None = None,) -> NoneCreate an AnyMoE config from the hidden size, dataset, and other metadata. The model IDs may be local paths.
To find the prefix/mlp values:
- Go to
https://huggingface.co/<MODEL ID>/tree/main?show_file_info=model.safetensors.index.json - Look for the mlp layers: for example
model.layers.27.mlp.down_proj.weightmeans the prefix ismodel.layersand the mlp ismlp.
To find the hidden size:
- Look it up in
https://huggingface.co/<BASE MODEL ID>/blob/main/config.json.
Note: gate_model_id specifies the gating model ID. If training == True, safetensors are written here; otherwise the pretrained safetensors are loaded and no training occurs.
Note: if training == True, loss_csv_path has no effect. Otherwise, a CSV loss file is saved at that path.
Generated from mistralrs-pyo3/mistralrs.pyi.