Crate mistralrs_core

Source

Re-exports§

pub use llguidance;

Modules§

distributed
layers
matformer
speech_utils

Structs§

AddModelConfig: Configuration for adding a model to MistralRs
AnyMoeConfig
AnyMoeLoader
AnyMoePipeline
ApproximateUserLocation
AudioInput: Raw audio input consisting of PCM samples and a sample rate.
AutoLoader: Automatically selects between a normal or vision loader based on the architectures field.
AutoLoaderBuilder
CalledFunction: Called function with name and arguments
ChatCompletionChunkResponse: Chat completion streaming request chunk.
ChatCompletionResponse: An OpenAI compatible chat completion response.
ChatTemplate: Template for chat models including bos/eos/unk as well as the chat template.
Choice: Chat completion choice.
ChunkChoice: Chat completion streaming chunk choice.
CompletionChoice: Completion request choice.
CompletionChunkChoice: Chat completion streaming chunk choice.
CompletionChunkResponse: Completion request choice.
CompletionResponse: An OpenAI compatible completion response.
Delta: Delta in content for streaming response.
DetokenizationRequest: Request to detokenize some text.
DeviceLayerMapMetadata
DeviceMapMetadata: Metadata to initialize the device mapper.
DiffusionGenerationParams
DiffusionLoader: A loader for a vision (non-quantized) model.
DiffusionLoaderBuilder: A builder for a loader for a vision (non-quantized) model.
DrySamplingParams
EngineConfig: Configuration for creating an engine instance
Function: Function definition for a tool
GGMLLoader: A loader for a GGML model.
GGMLLoaderBuilder: A builder for a GGML loader.
GGMLSpecificConfig: Config for a GGML loader.
GGUFLoader: Loader for a GGUF model.
GGUFLoaderBuilder: A builder for a GGUF loader.
GGUFSpecificConfig: Config for a GGUF loader.
GemmaLoader: NormalLoader for a Gemma model.
Idefics2Loader: VisionLoader for an Idefics 2 Vision model.
ImageChoice
ImageGenerationResponse
LLaVALoader: VisionLoader for an LLaVA Vision model.
LLaVANextLoader: VisionLoader for an LLaVANext Vision model.
LayerDeviceMapper: A device mapper which does device mapping per hidden layer.
LayerTopology
LlamaLoader: NormalLoader for a Llama model.
LoaderBuilder: A builder for a loader using the selected model.
LocalModelPaths: All local paths and metadata necessary to load a model.
Logprobs: Logprobs per token.
LoraAdapterPaths
McpClient: MCP client that manages connections to multiple MCP servers
McpClientConfig: Configuration for MCP client integration
McpServerConfig: Configuration for an individual MCP server
McpToolInfo: Information about a tool discovered from an MCP server
MemoryUsage
MistralLoader
MistralRs: The MistralRs struct handles sending requests to multiple engines. It is the core multi-threaded component of mistral.rs, and uses mpsc Sender and Receiver primitives to send and receive requests to the appropriate engine based on model ID.
MistralRsBuilder: The MistralRsBuilder takes the pipeline and a scheduler method and constructs an Engine and a MistralRs instance. The Engine runs on a separate thread, and the MistralRs instance stays on the calling thread.
MistralRsConfig
MixtralLoader
Modalities
NormalLoader: A loader for a “normal” (non-quantized) model.
NormalLoaderBuilder: A builder for a loader for a “normal” (non-quantized) model.
NormalRequest: A normal request request to the MistralRs.
NormalSpecificConfig: Config specific to loading a normal model.
Ordering: Adapter model ordering information.
PagedAttentionConfig: All memory counts in MB. Default for block size is 32.
Phi2Loader: NormalLoader for a Phi 2 model.
Phi3Loader: NormalLoader for a Phi 3 model.
Phi3VLoader: VisionLoader for a Phi 3 Vision model.
Qwen2Loader: NormalLoader for a Qwen 2 model.
ResponseLogprob: A logprob with the top logprobs for this token.
ResponseMessage: Chat completion response message.
SamplingParams: Sampling params are used to control sampling.
SearchFunctionParameters
SearchResult
SpeculativeConfig: Metadata for a speculative pipeline
SpeculativeLoader: A loader for a speculative pipeline using 2 Loaders.
SpeculativePipeline: Speculative decoding pipeline: https://arxiv.org/pdf/2211.17192
SpeechLoader
SpeechPipeline
Starcoder2Loader: NormalLoader for a Starcoder2 model.
TokenizationRequest: Request to tokenize some messages or some text.
Tool: Tool definition
ToolCallResponse
ToolCallbackWithTool: A tool callback with its associated Tool definition.
TopLogprob: Top-n logprobs element
Topology
Usage: OpenAI compatible (superset) usage during a request.
VisionLoader: A loader for a vision (non-quantized) model.
VisionLoaderBuilder: A builder for a loader for a vision (non-quantized) model.
VisionSpecificConfig: Config specific to loading a vision model.
WebSearchOptions

Enums§

AdapterPaths
AnyMoeExpertType
AutoDeviceMapParams
BertEmbeddingModel: Embedding model used for ranking web search results internally.
Constraint: Control the constraint with llguidance.
DefaultSchedulerMethod: The scheduler method controld how sequences are scheduled during each step of the engine. For each scheduling step, the scheduler method is used if there are not only running, only waiting sequences, or none. If is it used, then it is used to allow waiting sequences to run.
DeviceMapSetting
DiffusionLoaderType: The architecture to load the vision model as.
EngineInstruction
GGUFArchitecture
ImageGenerationResponseFormat: Image generation response format
IsqOrganization
IsqType
McpServerSource: Supported MCP server transport sources
MemoryGpuConfig
MistralRsError
ModelCategory: Category of the model. This can also be used to extract model-category specific tools, such as the vision model prompt prefixer.
ModelDType: DType for the model.
ModelKind: The kind of model to build.
ModelSelected
NormalLoaderType: The architecture to load the normal model as.
PagedCacheType
Request: A request to the Engine, encapsulating the various parameters as well as the mpsc response Sender used to return the Response.
RequestMessage: Message or messages for a Request.
Response: The response enum contains 3 types of variants:
ResponseErr
ResponseOk
SchedulerConfig
SearchContextSize
SpeechGenerationConfig
SpeechLoaderType
StopTokens: Stop sequences or ids.
SupportedModality
TokenSource: The source of the HF token.
ToolCallType
ToolChoice
ToolType: Type of tool
VisionLoaderType: The architecture to load the vision model as.
WebSearchUserLocation

Constants§

GGUF_MULTI_FILE_DELIMITER
MULTI_LORA_DELIMITER
SYSTEM_FINGERPRINT
UQFF_MULTI_FILE_DELIMITER

Statics§

ENGINE_INSTRUCTIONS: Engine instructions, per Engine (MistralRs) ID.
GLOBAL_HF_CACHE
TERMINATE_ALL_NEXT_STEP: Terminate all sequences on the next scheduling step. Be sure to reset this. This is a global flag for terminating all engines at once (e.g., Ctrl+C).

Traits§

CustomLogitsProcessor: Customizable logits processor.
Loader: The Loader trait abstracts the loading process. The primary entrypoint is the load_model method.
ModelPaths: ModelPaths abstracts the mechanism to get all necessary files for running a model. For example LocalModelPaths implements ModelPaths when all files are in the local file system.
MultimodalPromptPrefixer: Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at 0.
Pipeline
TryIntoDType: Type which can be converted to a DType

Functions§

get_auto_device_map_params
get_engine_terminate_flag: Get or create a termination flag for the current engine thread.
get_model_dtype
get_tgt_non_granular_index
get_toml_selected_model_device_map_params
get_toml_selected_model_dtype
initialize_logging: This should be called to initialize the debug flag and logging. This should not be called in mistralrs-core code due to Rust usage.
paged_attn_supported: true if built with CUDA (requires Unix) /Metal
parse_isq_value: Parse ISQ value.
reset_engine_terminate_flag: Reset termination flags for the current engine.
should_terminate_engine_sequences: Check if the current engine should terminate sequences.
using_flash_attn: true if built with the flash-attn or flash-attn-v3 features, false otherwise.

Type Aliases§

LlguidanceGrammar
MessageContent
SearchCallback: Callback used to override how search results are gathered. The returned vector must be sorted in decreasing order of relevance.
ToolCallback: Callback used for custom tool functions. Receives the called function (name and JSON arguments) and returns the tool output as a string.
ToolCallbacks: Collection of callbacks keyed by tool name.

Crate mistralrs_coreCopy item path

Re-exports§

Modules§

Structs§

Enums§

Constants§

Statics§

Traits§

Functions§

Type Aliases§

Crate mistralrs_core