pub struct GgufModelBuilder { /* private fields */ }
Expand description
Configure a text GGUF model with the various parameters for loading, running, and other inference behaviors.
Implementations§
source§impl GgufModelBuilder
impl GgufModelBuilder
sourcepub fn new(model_id: impl ToString, files: Vec<impl ToString>) -> Self
pub fn new(model_id: impl ToString, files: Vec<impl ToString>) -> Self
A few defaults are applied here:
- Token source is from the cache (.cache/huggingface/token)
- Maximum number of sequences running is 32
- Number of sequences to hold in prefix cache is 16.
sourcepub fn with_tok_model_id(self, tok_model_id: impl ToString) -> Self
pub fn with_tok_model_id(self, tok_model_id: impl ToString) -> Self
Source the tokenizer and chat template from this model ID (must contain tokenizer.json
and tokenizer_config.json
).
sourcepub fn with_prompt_batchsize(self, prompt_batchsize: NonZeroUsize) -> Self
pub fn with_prompt_batchsize(self, prompt_batchsize: NonZeroUsize) -> Self
Set the prompt batchsize to use for inference.
sourcepub fn with_topology(self, topology: Topology) -> Self
pub fn with_topology(self, topology: Topology) -> Self
Set the model topology for use during loading. If there is an overlap, the topology type is used over the ISQ type.
sourcepub fn with_chat_template(self, chat_template: impl ToString) -> Self
pub fn with_chat_template(self, chat_template: impl ToString) -> Self
Literal Jinja chat template OR Path (ending in .json
) to one.
sourcepub fn with_tokenizer_json(self, tokenizer_json: impl ToString) -> Self
pub fn with_tokenizer_json(self, tokenizer_json: impl ToString) -> Self
Path to a discrete tokenizer.json
file.
sourcepub fn with_force_cpu(self) -> Self
pub fn with_force_cpu(self) -> Self
Force usage of the CPU device. Do not use PagedAttention with this.
sourcepub fn with_token_source(self, token_source: TokenSource) -> Self
pub fn with_token_source(self, token_source: TokenSource) -> Self
Source of the Hugging Face token.
sourcepub fn with_hf_revision(self, revision: impl ToString) -> Self
pub fn with_hf_revision(self, revision: impl ToString) -> Self
Set the revision to use for a Hugging Face remote model.
sourcepub fn with_paged_attn(
self,
paged_attn_cfg: impl FnOnce() -> Result<PagedAttentionConfig>,
) -> Result<Self>
pub fn with_paged_attn( self, paged_attn_cfg: impl FnOnce() -> Result<PagedAttentionConfig>, ) -> Result<Self>
Enable PagedAttention. Configure PagedAttention with a PagedAttentionConfig
object, which
can be created with sensible values with a PagedAttentionMetaBuilder
.
If PagedAttention is not supported (query with paged_attn_supported
), this will do nothing.
sourcepub fn with_max_num_seqs(self, max_num_seqs: usize) -> Self
pub fn with_max_num_seqs(self, max_num_seqs: usize) -> Self
Set the maximum number of sequences which can be run at once.
sourcepub fn with_no_kv_cache(self) -> Self
pub fn with_no_kv_cache(self) -> Self
Disable KV cache. Trade performance for memory usage.
sourcepub fn with_prefix_cache_n(self, n_seqs: Option<usize>) -> Self
pub fn with_prefix_cache_n(self, n_seqs: Option<usize>) -> Self
Set the number of sequences to hold in the prefix cache. Set to None
to disable the prefix cacher.
sourcepub fn with_logging(self) -> Self
pub fn with_logging(self) -> Self
Enable logging.
sourcepub fn with_device_mapping(self, device_mapping: DeviceMapMetadata) -> Self
pub fn with_device_mapping(self, device_mapping: DeviceMapMetadata) -> Self
Provide metadata to initialize the device mapper. Generally, it is more programmatic and easier to use
the Topology
, see Self::with_topology
.
pub async fn build(self) -> Result<Model>
Auto Trait Implementations§
impl Freeze for GgufModelBuilder
impl RefUnwindSafe for GgufModelBuilder
impl Send for GgufModelBuilder
impl Sync for GgufModelBuilder
impl Unpin for GgufModelBuilder
impl UnwindSafe for GgufModelBuilder
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Downcast for Twhere
T: AsAny + ?Sized,
impl<T> Downcast for Twhere
T: AsAny + ?Sized,
§fn downcast_ref<T>(&self) -> Option<&T>where
T: AsAny,
fn downcast_ref<T>(&self) -> Option<&T>where
T: AsAny,
Any
.§fn downcast_mut<T>(&mut self) -> Option<&mut T>where
T: AsAny,
fn downcast_mut<T>(&mut self) -> Option<&mut T>where
T: AsAny,
Any
.§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more