Crate mistralrs_core

Source

Re-exports§

pub use llguidance;

Modules§

distributed
layers

Structs§

AnyMoeConfig
AnyMoeLoader
AnyMoePipeline
ApproximateUserLocation
CalledFunction
ChatCompletionChunkResponse
Chat completion streaming request chunk.
ChatCompletionResponse
An OpenAI compatible chat completion response.
ChatTemplate
Template for chat models including bos/eos/unk as well as the chat template.
Choice
Chat completion choice.
ChunkChoice
Chat completion streaming chunk choice.
CompletionChoice
Completion request choice.
CompletionChunkChoice
Chat completion streaming chunk choice.
CompletionChunkResponse
Completion request choice.
CompletionResponse
An OpenAI compatible completion response.
Delta
Delta in content for streaming response.
DetokenizationRequest
Request to detokenize some text.
DeviceLayerMapMetadata
DeviceMapMetadata
Metadata to initialize the device mapper.
DiffusionGenerationParams
DiffusionLoader
A loader for a vision (non-quantized) model.
DiffusionLoaderBuilder
A builder for a loader for a vision (non-quantized) model.
DiffusionSpecificConfig
Config specific to loading a vision model.
DrySamplingParams
Function
GGMLLoader
A loader for a GGML model.
GGMLLoaderBuilder
A builder for a GGML loader.
GGMLSpecificConfig
Config for a GGML loader.
GGUFLoader
Loader for a GGUF model.
GGUFLoaderBuilder
A builder for a GGUF loader.
GGUFSpecificConfig
Config for a GGUF loader.
GemmaLoader
NormalLoader for a Gemma model.
Idefics2Loader
VisionLoader for an Idefics 2 Vision model.
ImageChoice
ImageGenerationResponse
LLaVALoader
VisionLoader for an LLaVA Vision model.
LLaVANextLoader
VisionLoader for an LLaVANext Vision model.
LayerDeviceMapper
A device mapper which does device mapping per hidden layer.
LayerTopology
LlamaLoader
NormalLoader for a Llama model.
LoaderBuilder
A builder for a loader using the selected model.
LocalModelPaths
All local paths and metadata necessary to load a model.
Logprobs
Logprobs per token.
MemoryUsage
MistralLoader
MistralRs
The MistralRs struct handles sending requests to the engine. It is the core multi-threaded component of mistral.rs, and uses mpsc Sender and Receiver primitives to send and receive requests to the engine.
MistralRsBuilder
The MistralRsBuilder takes the pipeline and a scheduler method and constructs an Engine and a MistralRs instance. The Engine runs on a separate thread, and the MistralRs instance stays on the calling thread.
MistralRsConfig
MixtralLoader
NormalLoader
A loader for a “normal” (non-quantized) model.
NormalLoaderBuilder
A builder for a loader for a “normal” (non-quantized) model.
NormalRequest
A normal request request to the MistralRs.
NormalSpecificConfig
Config specific to loading a normal model.
Ordering
Adapter model ordering information.
PagedAttentionConfig
All memory counts in MB. Default for block size is 32.
Phi2Loader
NormalLoader for a Phi 2 model.
Phi3Loader
NormalLoader for a Phi 3 model.
Phi3VLoader
VisionLoader for a Phi 3 Vision model.
Qwen2Loader
NormalLoader for a Qwen 2 model.
ResponseLogprob
A logprob with the top logprobs for this token.
ResponseMessage
Chat completion response message.
SamplingParams
Sampling params are used to control sampling.
SpeculativeConfig
Metadata for a speculative pipeline
SpeculativeLoader
A loader for a speculative pipeline using 2 Loaders.
SpeculativePipeline
Speculative decoding pipeline: https://arxiv.org/pdf/2211.17192
Starcoder2Loader
NormalLoader for a Starcoder2 model.
TokenizationRequest
Request to tokenize some messages or some text.
Tool
ToolCallResponse
TopLogprob
Top-n logprobs element
Topology
Usage
OpenAI compatible (superset) usage during a request.
VisionLoader
A loader for a vision (non-quantized) model.
VisionLoaderBuilder
A builder for a loader for a vision (non-quantized) model.
VisionSpecificConfig
Config specific to loading a vision model.
WebSearchOptions

Enums§

AnyMoeExpertType
AutoDeviceMapParams
BertEmbeddingModel
Embedding model used for ranking web search results internally.
Constraint
Control the constraint with llguidance.
DefaultSchedulerMethod
The scheduler method controld how sequences are scheduled during each step of the engine. For each scheduling step, the scheduler method is used if there are not only running, only waiting sequences, or none. If is it used, then it is used to allow waiting sequences to run.
DeviceMapSetting
DiffusionLoaderType
The architecture to load the vision model as.
EngineInstruction
GGUFArchitecture
ImageGenerationResponseFormat
Image generation response format
IsqOrganization
IsqType
MemoryGpuConfig
MistralRsError
ModelCategory
Category of the model. This can also be used to extract model-category specific tools, such as the vision model prompt prefixer.
ModelDType
DType for the model.
ModelKind
The kind of model to build.
ModelSelected
NormalLoaderType
The architecture to load the normal model as.
Request
A request to the Engine, encapsulating the various parameters as well as the mpsc response Sender used to return the Response.
RequestMessage
Message or messages for a Request.
Response
The response enum contains 3 types of variants:
ResponseErr
ResponseOk
SchedulerConfig
StopTokens
Stop sequences or ids.
TokenSource
The source of the HF token.
ToolCallType
ToolChoice
ToolType
VisionLoaderType
The architecture to load the vision model as.
WebSearchUserLocation

Constants§

GGUF_MULTI_FILE_DELIMITER
MULTI_LORA_DELIMITER
SYSTEM_FINGERPRINT

Statics§

ENGINE_INSTRUCTIONS
Engine instructions, per Engine (MistralRs) ID.
GLOBAL_HF_CACHE
TERMINATE_ALL_NEXT_STEP
Terminate all sequences on the next scheduling step. Be sure to reset this.

Traits§

CustomLogitsProcessor
Customizable logits processor.
Loader
The Loader trait abstracts the loading process. The primary entrypoint is the load_model method.
ModelPaths
ModelPaths abstracts the mechanism to get all necessary files for running a model. For example LocalModelPaths implements ModelPaths when all files are in the local file system.
Pipeline
TryIntoDType
Type which can be converted to a DType
VisionPromptPrefixer
Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at

Functions§

get_auto_device_map_params
get_model_dtype
get_tgt_non_granular_index
get_toml_selected_model_device_map_params
get_toml_selected_model_dtype
initialize_logging
This should be called to initialize the debug flag and logging. This should not be called in mistralrs-core code due to Rust usage.
paged_attn_supported
true if built with CUDA (requires Unix) /Metal
parse_isq_value
Parse ISQ value.
using_flash_attn
true if built with the flash-attn or flash-attn-v3 features, false otherwise.

Type Aliases§

LlguidanceGrammar
MessageContent