Crate mistralrs_core Copy item path Source pub use llguidance;
distributed layers AnyMoeConfig AnyMoeLoader AnyMoePipeline ApproximateUserLocation CalledFunction ChatCompletionChunkResponse Chat completion streaming request chunk. ChatCompletionResponse An OpenAI compatible chat completion response. ChatTemplate Template for chat models including bos/eos/unk as well as the chat template. Choice Chat completion choice. ChunkChoice Chat completion streaming chunk choice. CompletionChoice Completion request choice. CompletionChunkChoice Chat completion streaming chunk choice. CompletionChunkResponse Completion request choice. CompletionResponse An OpenAI compatible completion response. Delta Delta in content for streaming response. DetokenizationRequest Request to detokenize some text. DeviceLayerMapMetadata DeviceMapMetadata Metadata to initialize the device mapper. DiffusionGenerationParams DiffusionLoader A loader for a vision (non-quantized) model. DiffusionLoaderBuilder A builder for a loader for a vision (non-quantized) model. DiffusionSpecificConfig Config specific to loading a vision model. DrySamplingParams Function GGMLLoader A loader for a GGML model. GGMLLoaderBuilder A builder for a GGML loader. GGMLSpecificConfig Config for a GGML loader. GGUFLoader Loader for a GGUF model. GGUFLoaderBuilder A builder for a GGUF loader. GGUFSpecificConfig Config for a GGUF loader. GemmaLoader NormalLoader
for a Gemma model.Idefics2Loader VisionLoader
for an Idefics 2 Vision model.ImageChoice ImageGenerationResponse LLaVALoader VisionLoader
for an LLaVA Vision model.LLaVANextLoader VisionLoader
for an LLaVANext Vision model.LayerDeviceMapper A device mapper which does device mapping per hidden layer. LayerTopology LlamaLoader NormalLoader
for a Llama model.LoaderBuilder A builder for a loader using the selected model. LocalModelPaths All local paths and metadata necessary to load a model. Logprobs Logprobs per token. MemoryUsage MistralLoader MistralRs The MistralRs struct handles sending requests to the engine.
It is the core multi-threaded component of mistral.rs, and uses mpsc
Sender
and Receiver
primitives to send and receive requests to the
engine. MistralRsBuilder The MistralRsBuilder takes the pipeline and a scheduler method and constructs
an Engine and a MistralRs instance. The Engine runs on a separate thread, and the MistralRs
instance stays on the calling thread. MistralRsConfig MixtralLoader NormalLoader A loader for a “normal” (non-quantized) model. NormalLoaderBuilder A builder for a loader for a “normal” (non-quantized) model. NormalRequest A normal request request to the MistralRs
. NormalSpecificConfig Config specific to loading a normal model. Ordering Adapter model ordering information. PagedAttentionConfig All memory counts in MB. Default for block size is 32. Phi2Loader NormalLoader
for a Phi 2 model.Phi3Loader NormalLoader
for a Phi 3 model.Phi3VLoader VisionLoader
for a Phi 3 Vision model.Qwen2Loader NormalLoader
for a Qwen 2 model.ResponseLogprob A logprob with the top logprobs for this token. ResponseMessage Chat completion response message. SamplingParams Sampling params are used to control sampling. SpeculativeConfig Metadata for a speculative pipeline SpeculativeLoader A loader for a speculative pipeline using 2 Loader
s. SpeculativePipeline Speculative decoding pipeline: https://arxiv.org/pdf/2211.17192 Starcoder2Loader NormalLoader
for a Starcoder2 model.TokenizationRequest Request to tokenize some messages or some text. Tool ToolCallResponse TopLogprob Top-n logprobs element Topology Usage OpenAI compatible (superset) usage during a request. VisionLoader A loader for a vision (non-quantized) model. VisionLoaderBuilder A builder for a loader for a vision (non-quantized) model. VisionSpecificConfig Config specific to loading a vision model. WebSearchOptions AnyMoeExpertType AutoDeviceMapParams BertEmbeddingModel Embedding model used for ranking web search results internally. Constraint Control the constraint with llguidance. DefaultSchedulerMethod The scheduler method controld how sequences are scheduled during each
step of the engine. For each scheduling step, the scheduler method is used if there
are not only running, only waiting sequences, or none. If is it used, then it
is used to allow waiting sequences to run. DeviceMapSetting DiffusionLoaderType The architecture to load the vision model as. EngineInstruction GGUFArchitecture ImageGenerationResponseFormat Image generation response format IsqOrganization IsqType MemoryGpuConfig MistralRsError ModelCategory Category of the model. This can also be used to extract model-category specific tools,
such as the vision model prompt prefixer. ModelDType DType for the model. ModelKind The kind of model to build. ModelSelected NormalLoaderType The architecture to load the normal model as. Request A request to the Engine, encapsulating the various parameters as well as
the mpsc
response Sender
used to return the Response
. RequestMessage Message or messages for a Request
. Response The response enum contains 3 types of variants: ResponseErr ResponseOk SchedulerConfig StopTokens Stop sequences or ids. TokenSource The source of the HF token. ToolCallType ToolChoice ToolType VisionLoaderType The architecture to load the vision model as. WebSearchUserLocation GGUF_MULTI_FILE_DELIMITER MULTI_LORA_DELIMITER SYSTEM_FINGERPRINT ENGINE_INSTRUCTIONS Engine instructions, per Engine (MistralRs) ID. GLOBAL_HF_CACHE TERMINATE_ALL_NEXT_STEP Terminate all sequences on the next scheduling step. Be sure to reset this. CustomLogitsProcessor Customizable logits processor. Loader The Loader
trait abstracts the loading process. The primary entrypoint is the
load_model
method. ModelPaths ModelPaths
abstracts the mechanism to get all necessary files for running a model. For
example LocalModelPaths
implements ModelPaths
when all files are in the local file system.Pipeline TryIntoDType Type which can be converted to a DType VisionPromptPrefixer Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at get_auto_device_map_params get_model_dtype get_tgt_non_granular_index get_toml_selected_model_device_map_params get_toml_selected_model_dtype initialize_logging This should be called to initialize the debug flag and logging.
This should not be called in mistralrs-core code due to Rust usage. paged_attn_supported true
if built with CUDA (requires Unix) /Metalparse_isq_value Parse ISQ value. using_flash_attn true
if built with the flash-attn
or flash-attn-v3
features, false otherwise.LlguidanceGrammar MessageContent