pub struct TextModelBuilder { /* private fields */ }
Expand description
Configure a text model with the various parameters for loading, running, and other inference behaviors.
Implementations§
source§impl TextModelBuilder
impl TextModelBuilder
sourcepub fn new(model_id: impl ToString) -> Self
pub fn new(model_id: impl ToString) -> Self
A few defaults are applied here:
- MoQE ISQ organization
- Token source is from the cache (.cache/huggingface/token)
- Maximum number of sequences running is 32
- Number of sequences to hold in prefix cache is 16.
sourcepub fn with_prompt_batchsize(self, prompt_batchsize: NonZeroUsize) -> Self
pub fn with_prompt_batchsize(self, prompt_batchsize: NonZeroUsize) -> Self
Set the prompt batchsize to use for inference.
sourcepub fn with_topology(self, topology: Topology) -> Self
pub fn with_topology(self, topology: Topology) -> Self
Set the model topology for use during loading. If there is an overlap, the topology type is used over the ISQ type.
sourcepub fn with_mixture_qexperts_isq(self) -> Self
pub fn with_mixture_qexperts_isq(self) -> Self
Organize ISQ to enable MoQE (Mixture of Quantized Experts, https://arxiv.org/abs/2310.02410)
sourcepub fn with_chat_template(self, chat_template: impl ToString) -> Self
pub fn with_chat_template(self, chat_template: impl ToString) -> Self
Literal Jinja chat template OR Path (ending in .json
) to one.
sourcepub fn with_tokenizer_json(self, tokenizer_json: impl ToString) -> Self
pub fn with_tokenizer_json(self, tokenizer_json: impl ToString) -> Self
Path to a discrete tokenizer.json
file.
sourcepub fn with_loader_type(self, loader_type: NormalLoaderType) -> Self
pub fn with_loader_type(self, loader_type: NormalLoaderType) -> Self
Manually set the model loader type. Otherwise, it will attempt to automatically determine the loader type.
sourcepub fn with_dtype(self, dtype: ModelDType) -> Self
pub fn with_dtype(self, dtype: ModelDType) -> Self
Load the model in a certain dtype.
sourcepub fn with_force_cpu(self) -> Self
pub fn with_force_cpu(self) -> Self
Force usage of the CPU device. Do not use PagedAttention with this.
sourcepub fn with_token_source(self, token_source: TokenSource) -> Self
pub fn with_token_source(self, token_source: TokenSource) -> Self
Source of the Hugging Face token.
sourcepub fn with_hf_revision(self, revision: impl ToString) -> Self
pub fn with_hf_revision(self, revision: impl ToString) -> Self
Set the revision to use for a Hugging Face remote model.
sourcepub fn with_isq(self, isq: IsqType) -> Self
pub fn with_isq(self, isq: IsqType) -> Self
Use ISQ of a certain type. If there is an overlap, the topology type is used over the ISQ type.
sourcepub fn with_imatrix(self, path: PathBuf) -> Self
pub fn with_imatrix(self, path: PathBuf) -> Self
Utilise this imatrix file during ISQ. Incompatible with specifying a calibration file.
sourcepub fn with_calibration_file(self, path: PathBuf) -> Self
pub fn with_calibration_file(self, path: PathBuf) -> Self
Utilise this calibration file to collcet an imatrix. Incompatible with specifying a calibration file.
sourcepub fn with_paged_attn(
self,
paged_attn_cfg: impl FnOnce() -> Result<PagedAttentionConfig>,
) -> Result<Self>
pub fn with_paged_attn( self, paged_attn_cfg: impl FnOnce() -> Result<PagedAttentionConfig>, ) -> Result<Self>
Enable PagedAttention. Configure PagedAttention with a PagedAttentionConfig
object, which
can be created with sensible values with a PagedAttentionMetaBuilder
.
If PagedAttention is not supported (query with paged_attn_supported
), this will do nothing.
sourcepub fn with_max_num_seqs(self, max_num_seqs: usize) -> Self
pub fn with_max_num_seqs(self, max_num_seqs: usize) -> Self
Set the maximum number of sequences which can be run at once.
sourcepub fn with_no_kv_cache(self) -> Self
pub fn with_no_kv_cache(self) -> Self
Disable KV cache. Trade performance for memory usage.
sourcepub fn with_prefix_cache_n(self, n_seqs: Option<usize>) -> Self
pub fn with_prefix_cache_n(self, n_seqs: Option<usize>) -> Self
Set the number of sequences to hold in the prefix cache. Set to None
to disable the prefix cacher.
sourcepub fn with_logging(self) -> Self
pub fn with_logging(self) -> Self
Enable logging.
sourcepub fn with_device_mapping(self, device_mapping: DeviceMapMetadata) -> Self
pub fn with_device_mapping(self, device_mapping: DeviceMapMetadata) -> Self
Provide metadata to initialize the device mapper. Generally, it is more programmatic and easier to use
the Topology
, see Self::with_topology
.
sourcepub fn write_uqff(self, path: PathBuf) -> Self
pub fn write_uqff(self, path: PathBuf) -> Self
Path to write a UQFF file to.
The parent (part of the path excluding the filename) will determine where any other files generated are written to. These can be used to load UQFF models standalone, and may include:
residual.safetensors
tokenizer.json
config.json
- And others
pub async fn build(self) -> Result<Model>
Auto Trait Implementations§
impl Freeze for TextModelBuilder
impl RefUnwindSafe for TextModelBuilder
impl Send for TextModelBuilder
impl Sync for TextModelBuilder
impl Unpin for TextModelBuilder
impl UnwindSafe for TextModelBuilder
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Downcast for Twhere
T: AsAny + ?Sized,
impl<T> Downcast for Twhere
T: AsAny + ?Sized,
§fn downcast_ref<T>(&self) -> Option<&T>where
T: AsAny,
fn downcast_ref<T>(&self) -> Option<&T>where
T: AsAny,
Any
.§fn downcast_mut<T>(&mut self) -> Option<&mut T>where
T: AsAny,
fn downcast_mut<T>(&mut self) -> Option<&mut T>where
T: AsAny,
Any
.§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more