Crate mistralrs_core Copy item path Source pub use llguidance;
distributed layers speech_utils AddModelConfig Configuration for adding a model to MistralRs AnyMoeConfig AnyMoeLoader AnyMoePipeline ApproximateUserLocation AudioInput Raw audio input consisting of PCM samples and a sample rate. AutoLoader Automatically selects between a normal or vision loader based on the architectures
field. AutoLoaderBuilder CalledFunction Called function with name and arguments ChatCompletionChunkResponse Chat completion streaming request chunk. ChatCompletionResponse An OpenAI compatible chat completion response. ChatTemplate Template for chat models including bos/eos/unk as well as the chat template. Choice Chat completion choice. ChunkChoice Chat completion streaming chunk choice. CompletionChoice Completion request choice. CompletionChunkChoice Chat completion streaming chunk choice. CompletionChunkResponse Completion request choice. CompletionResponse An OpenAI compatible completion response. Delta Delta in content for streaming response. DetokenizationRequest Request to detokenize some text. DeviceLayerMapMetadata DeviceMapMetadata Metadata to initialize the device mapper. DiffusionGenerationParams DiffusionLoader A loader for a vision (non-quantized) model. DiffusionLoaderBuilder A builder for a loader for a vision (non-quantized) model. DrySamplingParams EngineConfig Configuration for creating an engine instance Function Function definition for a tool GGMLLoader A loader for a GGML model. GGMLLoaderBuilder A builder for a GGML loader. GGMLSpecificConfig Config for a GGML loader. GGUFLoader Loader for a GGUF model. GGUFLoaderBuilder A builder for a GGUF loader. GGUFSpecificConfig Config for a GGUF loader. GemmaLoader NormalLoader
for a Gemma model.Idefics2Loader VisionLoader
for an Idefics 2 Vision model.ImageChoice ImageGenerationResponse LLaVALoader VisionLoader
for an LLaVA Vision model.LLaVANextLoader VisionLoader
for an LLaVANext Vision model.LayerDeviceMapper A device mapper which does device mapping per hidden layer. LayerTopology LlamaLoader NormalLoader
for a Llama model.LoaderBuilder A builder for a loader using the selected model. LocalModelPaths All local paths and metadata necessary to load a model. Logprobs Logprobs per token. LoraAdapterPaths McpClient MCP client that manages connections to multiple MCP servers McpClientConfig Configuration for MCP client integration McpServerConfig Configuration for an individual MCP server McpToolInfo Information about a tool discovered from an MCP server MemoryUsage MistralLoader MistralRs The MistralRs struct handles sending requests to multiple engines.
It is the core multi-threaded component of mistral.rs, and uses mpsc
Sender
and Receiver
primitives to send and receive requests to the
appropriate engine based on model ID. MistralRsBuilder The MistralRsBuilder takes the pipeline and a scheduler method and constructs
an Engine and a MistralRs instance. The Engine runs on a separate thread, and the MistralRs
instance stays on the calling thread. MistralRsConfig MixtralLoader Modalities NormalLoader A loader for a “normal” (non-quantized) model. NormalLoaderBuilder A builder for a loader for a “normal” (non-quantized) model. NormalRequest A normal request request to the MistralRs
. NormalSpecificConfig Config specific to loading a normal model. Ordering Adapter model ordering information. PagedAttentionConfig All memory counts in MB. Default for block size is 32. Phi2Loader NormalLoader
for a Phi 2 model.Phi3Loader NormalLoader
for a Phi 3 model.Phi3VLoader VisionLoader
for a Phi 3 Vision model.Qwen2Loader NormalLoader
for a Qwen 2 model.ResponseLogprob A logprob with the top logprobs for this token. ResponseMessage Chat completion response message. SamplingParams Sampling params are used to control sampling. SearchFunctionParameters SearchResult SpeculativeConfig Metadata for a speculative pipeline SpeculativeLoader A loader for a speculative pipeline using 2 Loader
s. SpeculativePipeline Speculative decoding pipeline: https://arxiv.org/pdf/2211.17192 SpeechLoader SpeechPipeline Starcoder2Loader NormalLoader
for a Starcoder2 model.TokenizationRequest Request to tokenize some messages or some text. Tool Tool definition ToolCallResponse ToolCallbackWithTool A tool callback with its associated Tool definition. TopLogprob Top-n logprobs element Topology Usage OpenAI compatible (superset) usage during a request. VisionLoader A loader for a vision (non-quantized) model. VisionLoaderBuilder A builder for a loader for a vision (non-quantized) model. VisionSpecificConfig Config specific to loading a vision model. WebSearchOptions AdapterPaths AnyMoeExpertType AutoDeviceMapParams BertEmbeddingModel Embedding model used for ranking web search results internally. Constraint Control the constraint with llguidance. DefaultSchedulerMethod The scheduler method controld how sequences are scheduled during each
step of the engine. For each scheduling step, the scheduler method is used if there
are not only running, only waiting sequences, or none. If is it used, then it
is used to allow waiting sequences to run. DeviceMapSetting DiffusionLoaderType The architecture to load the vision model as. EngineInstruction GGUFArchitecture ImageGenerationResponseFormat Image generation response format IsqOrganization IsqType McpServerSource Supported MCP server transport sources MemoryGpuConfig MistralRsError ModelCategory Category of the model. This can also be used to extract model-category specific tools,
such as the vision model prompt prefixer. ModelDType DType for the model. ModelKind The kind of model to build. ModelSelected NormalLoaderType The architecture to load the normal model as. PagedCacheType Request A request to the Engine, encapsulating the various parameters as well as
the mpsc
response Sender
used to return the Response
. RequestMessage Message or messages for a Request
. Response The response enum contains 3 types of variants: ResponseErr ResponseOk SchedulerConfig SearchContextSize SpeechGenerationConfig SpeechLoaderType StopTokens Stop sequences or ids. SupportedModality TokenSource The source of the HF token. ToolCallType ToolChoice ToolType Type of tool VisionLoaderType The architecture to load the vision model as. WebSearchUserLocation GGUF_MULTI_FILE_DELIMITER MULTI_LORA_DELIMITER SYSTEM_FINGERPRINT UQFF_MULTI_FILE_DELIMITER ENGINE_INSTRUCTIONS Engine instructions, per Engine (MistralRs) ID. GLOBAL_HF_CACHE TERMINATE_ALL_NEXT_STEP Terminate all sequences on the next scheduling step. Be sure to reset this.
This is a global flag for terminating all engines at once (e.g., Ctrl+C). CustomLogitsProcessor Customizable logits processor. Loader The Loader
trait abstracts the loading process. The primary entrypoint is the
load_model
method. ModelPaths ModelPaths
abstracts the mechanism to get all necessary files for running a model. For
example LocalModelPaths
implements ModelPaths
when all files are in the local file system.MultimodalPromptPrefixer Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at 0. Pipeline TryIntoDType Type which can be converted to a DType get_auto_device_map_params get_engine_terminate_flag Get or create a termination flag for the current engine thread. get_model_dtype get_tgt_non_granular_index get_toml_selected_model_device_map_params get_toml_selected_model_dtype initialize_logging This should be called to initialize the debug flag and logging.
This should not be called in mistralrs-core code due to Rust usage. paged_attn_supported true
if built with CUDA (requires Unix) /Metalparse_isq_value Parse ISQ value. reset_engine_terminate_flag Reset termination flags for the current engine. should_terminate_engine_sequences Check if the current engine should terminate sequences. using_flash_attn true
if built with the flash-attn
or flash-attn-v3
features, false otherwise.LlguidanceGrammar MessageContent SearchCallback Callback used to override how search results are gathered. The returned
vector must be sorted in decreasing order of relevance. ToolCallback Callback used for custom tool functions. Receives the called function
(name and JSON arguments) and returns the tool output as a string. ToolCallbacks Collection of callbacks keyed by tool name.