Crate mistralrs_core   Copy item path  Source  pub use llguidance;distributed layers matformer speech_utils  AddModelConfig  Configuration for adding a model to MistralRs AnyMoeConfig  AnyMoeLoader  AnyMoePipeline  ApproximateUserLocation   AudioInput  Raw audio input consisting of PCM samples and a sample rate. AutoLoader  Automatically selects between a normal or vision loader based on the architectures field. AutoLoaderBuilder   CalledFunction  Called function with name and arguments ChatCompletionChunkResponse    Chat completion streaming request chunk. ChatCompletionResponse   An OpenAI compatible chat completion response. ChatTemplate  Template for chat models including bos/eos/unk as well as the chat template. Choice Chat completion choice. ChunkChoice  Chat completion streaming chunk choice. CompletionChoice  Completion request choice. CompletionChunkChoice   Chat completion streaming chunk choice. CompletionChunkResponse   Completion request choice. CompletionResponse  An OpenAI compatible completion response. Delta Delta in content for streaming response. DetokenizationRequest  Request to detokenize some text. DeviceLayerMapMetadata   DeviceMapMetadata  Metadata to initialize the device mapper. DiffusionGenerationParams   DiffusionLoader  A loader for a vision (non-quantized) model. DiffusionLoaderBuilder   A builder for a loader for a vision (non-quantized) model. DrySamplingParams  EmbeddingLoader  A loader for a vision (non-quantized) model. EmbeddingLoaderBuilder   A builder for a loader for a vision (non-quantized) model. EmbeddingModelPaths   All local paths and metadata necessary to load an embedding model. EmbeddingSpecificConfig   Config specific to loading a vision model. EngineConfig  Configuration for creating an engine instance Function Function definition for a tool GGMLLoader  A loader for a GGML model. GGMLLoaderBuilder   A builder for a GGML loader. GGMLSpecificConfig   Config for a GGML loader. GGUFLoader  Loader for a GGUF model. GGUFLoaderBuilder   A builder for a GGUF loader. GGUFSpecificConfig   Config for a GGUF loader. GemmaLoader  NormalLoader  for a Gemma model.Idefics2Loader  VisionLoader  for an Idefics 2 Vision model.ImageChoice  ImageGenerationResponse   LLaVALoader  VisionLoader  for an LLaVA Vision model.LLaVANextLoader   VisionLoader  for an LLaVANext Vision model.LayerDeviceMapper   A device mapper which does device mapping per hidden layer. LayerTopology  LlamaLoader  NormalLoader  for a Llama model.LoaderBuilder  A builder for a loader using the selected model. LocalModelPaths   All local paths and metadata necessary to load a model. Logprobs Logprobs per token. LoraAdapterPaths   McpClient MCP client that manages connections to multiple MCP servers McpClientConfig  Configuration for MCP client integration McpServerConfig  Configuration for an individual MCP server McpToolInfo  Information about a tool discovered from an MCP server MemoryUsage  MistralLoader  MistralRs  The MistralRs struct handles sending requests to multiple engines.
It is the core multi-threaded component of mistral.rs, and uses mpsc
Sender and Receiver primitives to send and receive requests to the
appropriate engine based on model ID. MistralRsBuilder  The MistralRsBuilder takes the pipeline and a scheduler method and constructs
an Engine and a MistralRs instance. The Engine runs on a separate thread, and the MistralRs
instance stays on the calling thread. MistralRsConfig  MixtralLoader  Modalities NormalLoader  A loader for a “normal” (non-quantized) model. NormalLoaderBuilder   A builder for a loader for a “normal” (non-quantized) model. NormalRequest  A normal request request to the MistralRs. NormalSpecificConfig   Config specific to loading a normal model. Ordering Adapter model ordering information. PagedAttentionConfig   All memory counts in MB. Default for block size is 32. Phi2Loader  NormalLoader  for a Phi 2 model.Phi3Loader  NormalLoader  for a Phi 3 model.Phi3VLoader  VisionLoader  for a Phi 3 Vision model.Qwen2Loader  NormalLoader  for a Qwen 2 model.ResponseLogprob  A logprob with the top logprobs for this token. ResponseMessage  Chat completion response message. SamplingParams  Sampling params are used to control sampling. SearchFunctionParameters   SearchResult  SpeculativeConfig  Metadata for a speculative pipeline SpeculativeLoader  A loader for a speculative pipeline using 2 Loader s. SpeculativePipeline  Speculative decoding pipeline: https://arxiv.org/pdf/2211.17192  SpeechLoader  SpeechPipeline  Starcoder2Loader  NormalLoader  for a Starcoder2 model.TokenizationRequest  Request to tokenize some messages or some text. Tool Tool definition ToolCallResponse   ToolCallbackWithTool    A tool callback with its associated Tool definition. TopLogprob Top-n logprobs element Topology Usage OpenAI compatible (superset) usage during a request. VisionLoader  A loader for a vision (non-quantized) model. VisionLoaderBuilder   A builder for a loader for a vision (non-quantized) model. VisionSpecificConfig   Config specific to loading a vision model. WebSearchOptions  AdapterPaths  AnyMoeExpertType   AutoDeviceMapParams   BertEmbeddingModel   Embedding model used for ranking web search results internally. Constraint Control the constraint with llguidance. DefaultSchedulerMethod   The scheduler method controld how sequences are scheduled during each
step of the engine. For each scheduling step, the scheduler method is used if there
are not only running, only waiting sequences, or none. If is it used, then it
is used to allow waiting sequences to run. DeviceMapSetting  DiffusionLoaderType   The architecture to load the vision model as. EmbeddingLoaderType   The architecture to load the embedding model as. EngineInstruction  GGUFArchitecture  ImageGenerationResponseFormat    Image generation response format IsqOrganization IsqType McpServerSource  Supported MCP server transport sources MemoryGpuConfig  MistralRsError  ModelCategory  Category of the model. This can also be used to extract model-category specific tools,
such as the vision model prompt prefixer. ModelDType  DType for the model. ModelKind  The kind of model to build. ModelSelected  NormalLoaderType   The architecture to load the normal model as. PagedCacheType   Request A request to the Engine, encapsulating the various parameters as well as
the mpsc response Sender used to return the Response . RequestMessage  Message or messages for a Request . Response The response enum contains 3 types of variants: ResponseErr  ResponseOk  SchedulerConfig  SearchContextSize   SpeechGenerationConfig   SpeechLoaderType   StopTokens  Stop sequences or ids. SupportedModality  TokenSource  The source of the HF token. ToolCallType   ToolChoice  ToolType  Type of tool VisionLoaderType   The architecture to load the vision model as. WebSearchUserLocation   GGUF_MULTI_FILE_DELIMITER    MULTI_LORA_DELIMITER   SYSTEM_FINGERPRINT  UQFF_MULTI_FILE_DELIMITER    ENGINE_INSTRUCTIONS  Engine instructions, per Engine (MistralRs) ID. GLOBAL_HF_CACHE   TERMINATE_ALL_NEXT_STEP    Terminate all sequences on the next scheduling step. Be sure to reset this.
This is a global flag for terminating all engines at once (e.g., Ctrl+C). CustomLogitsProcessor   Customizable logits processor. Loader The Loader trait abstracts the loading process. The primary entrypoint is the
load_model method. ModelPaths  ModelPaths abstracts the mechanism to get all necessary files for running a model. For
example LocalModelPaths implements ModelPaths when all files are in the local file system.MultimodalPromptPrefixer   Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at 0. Pipeline TryIntoDType  Type which can be converted to a DType get_auto_device_map_params     get_engine_terminate_flag    Get or create a termination flag for the current engine thread. get_model_dtype   get_tgt_non_granular_index     get_toml_selected_model_device_map_params       get_toml_selected_model_dtype     initialize_logging  This should be called to initialize the debug flag and logging.
This should not be called in mistralrs-core code due to Rust usage. paged_attn_supported   true if built with CUDA (requires Unix) /Metalparse_isq_value   Parse ISQ value. reset_engine_terminate_flag    Reset termination flags for the current engine. should_terminate_engine_sequences    Check if the current engine should terminate sequences. using_flash_attn   true if built with the flash-attn or flash-attn-v3 features, false otherwise.LlguidanceGrammar  MessageContent  SearchCallback  Callback used to override how search results are gathered. The returned
vector must be sorted in decreasing order of relevance. ToolCallback  Callback used for custom tool functions. Receives the called function
(name and JSON arguments) and returns the tool output as a string. ToolCallbacks  Collection of callbacks keyed by tool name.