Struct TextModelBuilder

Source

pub struct TextModelBuilder { /* private fields */ }

Expand description

Configure a text model with the various parameters for loading, running, and other inference behaviors.

Implementations§

Source §

impl TextModelBuilder

Source

pub fn new(model_id: impl ToString) -> Self

A few defaults are applied here:

MoQE ISQ organization
Token source is from the cache (.cache/huggingface/token)
Maximum number of sequences running is 32
Number of sequences to hold in prefix cache is 16.
Automatic device mapping with model defaults according to AutoDeviceMapParams
By default, web searching compatible with the OpenAI web_search_options setting is disabled.

Source

pub fn with_search(self, search_bert_model: BertEmbeddingModel) -> Self

Enable searching compatible with the OpenAI web_search_options setting. This uses the BERT model specified or the default.

Source

pub fn with_throughput_logging(self) -> Self

Enable runner throughput logging.

Source

pub fn with_jinja_explicit(self, jinja_explicit: String) -> Self

Explicit JINJA chat template file (.jinja) to be used. If specified, this overrides all other chat templates.

Source

pub fn with_prompt_chunksize(self, prompt_chunksize: NonZeroUsize) -> Self

Set the prompt batchsize to use for inference.

Source

pub fn with_topology(self, topology: Topology) -> Self

Set the model topology for use during loading. If there is an overlap, the topology type is used over the ISQ type.

Source

pub fn with_mixture_qexperts_isq(self) -> Self

Organize ISQ to enable MoQE (Mixture of Quantized Experts, https://arxiv.org/abs/2310.02410)

Source

pub fn with_chat_template(self, chat_template: impl ToString) -> Self

Literal Jinja chat template OR Path (ending in .json) to one.

Source

pub fn with_tokenizer_json(self, tokenizer_json: impl ToString) -> Self

Path to a discrete tokenizer.json file.

Source

pub fn with_loader_type(self, loader_type: NormalLoaderType) -> Self

Manually set the model loader type. Otherwise, it will attempt to automatically determine the loader type.

Source

pub fn with_dtype(self, dtype: ModelDType) -> Self

Load the model in a certain dtype.

Source

pub fn with_force_cpu(self) -> Self

Force usage of the CPU device. Do not use PagedAttention with this.

Source

pub fn with_token_source(self, token_source: TokenSource) -> Self

Source of the Hugging Face token.

Source

pub fn with_hf_revision(self, revision: impl ToString) -> Self

Set the revision to use for a Hugging Face remote model.

Source

pub fn with_isq(self, isq: IsqType) -> Self

Use ISQ of a certain type. If there is an overlap, the topology type is used over the ISQ type.

Source

pub fn with_imatrix(self, path: PathBuf) -> Self

Utilise this imatrix file during ISQ. Incompatible with specifying a calibration file.

Source

pub fn with_calibration_file(self, path: PathBuf) -> Self

Utilise this calibration file to collcet an imatrix. Incompatible with specifying a calibration file.

Source

pub fn with_paged_attn( self, paged_attn_cfg: impl FnOnce() -> Result<PagedAttentionConfig>, ) -> Result<Self>

Enable PagedAttention. Configure PagedAttention with a PagedAttentionConfig object, which can be created with sensible values with a PagedAttentionMetaBuilder.

If PagedAttention is not supported (query with paged_attn_supported), this will do nothing.

Source

pub fn with_max_num_seqs(self, max_num_seqs: usize) -> Self

Set the maximum number of sequences which can be run at once.

Source

pub fn with_no_kv_cache(self) -> Self

Disable KV cache. Trade performance for memory usage.

Source

pub fn with_prefix_cache_n(self, n_seqs: Option<usize>) -> Self

Set the number of sequences to hold in the prefix cache. Set to None to disable the prefix cacher.

Source

pub fn with_logging(self) -> Self

Enable logging.

Source

pub fn with_device_mapping(self, device_mapping: DeviceMapSetting) -> Self

Provide metadata to initialize the device mapper.

Source

pub fn from_uqff(self, path: Vec<PathBuf>) -> Self

Path to read a UQFF file from.

Source

pub fn write_uqff(self, path: PathBuf) -> Self

Path to write a UQFF file to.

The parent (part of the path excluding the filename) will determine where any other files generated are written to. These can be used to load UQFF models standalone, and may include:

residual.safetensors
tokenizer.json
config.json
And others

Source

pub fn from_hf_cache_pathf(self, hf_cache_path: PathBuf) -> Self

Cache path for Hugging Face models downloaded locally

Source

pub fn with_device(self, device: Device) -> Self

Set the main device to load this model onto. Automatic device mapping will be performed starting with this device.

Source

pub async fn build(self) -> Result<Model>

Trait Implementations§

Source §

impl Clone for TextModelBuilder

Source §

fn clone(&self) -> TextModelBuilder

Returns a copy of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl From<UqffTextModelBuilder> for TextModelBuilder

Source §

fn from(value: UqffTextModelBuilder) -> Self

Converts to this type from the input type.

Auto Trait Implementations§

§

impl UnwindSafe for TextModelBuilder

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

§

impl<T> AsAny for T
where T: Any,

§

fn as_any(&self) -> &(dyn Any + 'static)

§

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

§

fn type_name(&self) -> &'static str

Gets the type name of self

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

§

impl<T> Downcast for T
where T: AsAny + ?Sized,

§

fn is<T>(&self) -> bool
where T: AsAny,

Returns true if the boxed type is the same as T. Read more

§

fn downcast_ref<T>(&self) -> Option<&T>
where T: AsAny,

Forward to the method defined on the type Any.

§

fn downcast_mut<T>(&mut self) -> Option<&mut T>
where T: AsAny,

Forward to the method defined on the type Any.

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more

§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

§