Crate mistralrs

Expand description

This crate provides an asynchronous API to mistral.rs.

To get started loading a model, check out the following builders:

Check out the v0_4_api module for concise documentation of this, newer API.

§Example

use anyhow::Result;
use mistralrs::{
    IsqType, PagedAttentionMetaBuilder, TextMessageRole, TextMessages, TextModelBuilder,
};

#[tokio::main]
async fn main() -> Result<()> {
    let model = TextModelBuilder::new("microsoft/Phi-3.5-mini-instruct".to_string())
        .with_isq(IsqType::Q8_0)
        .with_logging()
        .with_paged_attn(|| PagedAttentionMetaBuilder::default().build())?
        .build()
        .await?;

    let messages = TextMessages::new()
        .add_message(
            TextMessageRole::System,
            "You are an AI agent with a specialty in programming.",
        )
        .add_message(
            TextMessageRole::User,
            "Hello! How are you? Please write generic binary search function in Rust.",
        );

    let response = model.send_chat_request(messages).await?;

    println!("{}", response.choices[0].message.content.as_ref().unwrap());
    dbg!(
        response.usage.avg_prompt_tok_per_sec,
        response.usage.avg_compl_tok_per_sec
    );

    Ok(())
}

§Streaming example

use anyhow::Result;
use mistralrs::{
    IsqType, PagedAttentionMetaBuilder, TextMessageRole, TextMessages, TextModelBuilder, Response
};

#[tokio::main]
async fn main() -> Result<()> {
    let model = TextModelBuilder::new("microsoft/Phi-3.5-mini-instruct".to_string())
        .with_isq(IsqType::Q8_0)
        .with_logging()
        .with_paged_attn(|| PagedAttentionMetaBuilder::default().build())?
        .build()
        .await?;

    let messages = TextMessages::new()
        .add_message(
            TextMessageRole::System,
            "You are an AI agent with a specialty in programming.",
        )
        .add_message(
            TextMessageRole::User,
            "Hello! How are you? Please write generic binary search function in Rust.",
        );

    let mut stream = model.stream_chat_request(messages).await?;

    while let Some(chunk) = stream.next().await {
        if let Response::Chunk(chunk) = chunk{
            print!("{}", chunk.choices[0].delta.content);
        }
        // Handle the error cases.

    }
    Ok(())
}

Re-exports§

pub use v0_4_api::*;

Modules§

layers
llguidance
v0_4_api
This will be the API as of v0.4.0. Other APIs will not be deprecated, but moved into a module such as this one.

Structs§

AnyMoeConfig
AnyMoeLoader
AnyMoePipeline
CalledFunction
ChatCompletionChunkResponse
Chat completion streaming request chunk.
ChatCompletionResponse
An OpenAI compatible chat completion response.
ChatTemplate
Template for chat models including bos/eos/unk as well as the chat template.
Choice
Chat completion choice.
ChunkChoice
Chat completion streaming chunk choice.
CompletionChoice
Completion request choice.
CompletionChunkChoice
Chat completion streaming chunk choice.
CompletionChunkResponse
Completion request choice.
CompletionResponse
An OpenAI compatible completion response.
Delta
Delta in content for streaming response.
DetokenizationRequest
Request to detokenize some text.
DeviceLayerMapMetadata
DeviceMapMetadata
Metadata to initialize the device mapper.
DiffusionGenerationParams
DiffusionLoader
A loader for a vision (non-quantized) model.
DiffusionLoaderBuilder
A builder for a loader for a vision (non-quantized) model.
DiffusionSpecificConfig
Config specific to loading a vision model.
DrySamplingParams
Function
GGMLLoader
A loader for a GGML model.
GGMLLoaderBuilder
A builder for a GGML loader.
GGMLSpecificConfig
Config for a GGML loader.
GGUFLoader
Loader for a GGUF model.
GGUFLoaderBuilder
A builder for a GGUF loader.
GGUFSpecificConfig
Config for a GGUF loader.
GemmaLoader
NormalLoader for a Gemma model.
Idefics2Loader
VisionLoader for an Idefics 2 Vision model.
ImageChoice
ImageGenerationResponse
LLaVALoader
VisionLoader for an LLaVA Vision model.
LLaVANextLoader
VisionLoader for an LLaVANext Vision model.
LayerDeviceMapper
A device mapper which does device mapping per hidden layer.
LayerTopology
LlamaLoader
NormalLoader for a Llama model.
LoaderBuilder
A builder for a loader using the selected model.
LocalModelPaths
All local paths and metadata necessary to load a model.
Logprobs
Logprobs per token.
MemoryUsage
MistralLoader
MistralRs
The MistralRs struct handles sending requests to the engine. It is the core multi-threaded component of mistral.rs, and uses mpsc Sender and Receiver primitives to send and receive requests to the engine.
MistralRsBuilder
The MistralRsBuilder takes the pipeline and a scheduler method and constructs an Engine and a MistralRs instance. The Engine runs on a separate thread, and the MistralRs instance stays on the calling thread.
MistralRsConfig
MixtralLoader
NormalLoader
A loader for a “normal” (non-quantized) model.
NormalLoaderBuilder
A builder for a loader for a “normal” (non-quantized) model.
NormalRequest
A normal request request to the MistralRs.
NormalSpecificConfig
Config specific to loading a normal model.
Ordering
Adapter model ordering information.
PagedAttentionConfig
All memory counts in MB. Default for block size is 32.
Phi2Loader
NormalLoader for a Phi 2 model.
Phi3Loader
NormalLoader for a Phi 3 model.
Phi3VLoader
VisionLoader for a Phi 3 Vision model.
Qwen2Loader
NormalLoader for a Qwen 2 model.
ResponseLogprob
A logprob with the top logprobs for this token.
ResponseMessage
Chat completion response message.
SamplingParams
Sampling params are used to control sampling.
SpeculativeConfig
Metadata for a speculative pipeline
SpeculativeLoader
A loader for a speculative pipeline using 2 Loaders.
SpeculativePipeline
Speculative decoding pipeline: https://arxiv.org/pdf/2211.17192
Starcoder2Loader
NormalLoader for a Starcoder2 model.
Tensor
The core struct for manipulating tensors.
TokenizationRequest
Request to tokenize some messages or some text.
Tool
ToolCallResponse
TopLogprob
Top-n logprobs element
Topology
Usage
OpenAI compatible (superset) usage during a request.
VisionLoader
A loader for a vision (non-quantized) model.
VisionLoaderBuilder
A builder for a loader for a vision (non-quantized) model.
VisionSpecificConfig
Config specific to loading a vision model.

Enums§

AnyMoeExpertType
Constraint
Control the constraint with llguidance.
DType
The different types of elements allowed in tensors.
DefaultSchedulerMethod
The scheduler method controld how sequences are scheduled during each step of the engine. For each scheduling step, the scheduler method is used if there are not only running, only waiting sequences, or none. If is it used, then it is used to allow waiting sequences to run.
Device
DiffusionLoaderType
The architecture to load the vision model as.
EngineInstruction
GGUFArchitecture
ImageGenerationResponseFormat
Image generation response format
IsqOrganization
IsqType
MemoryGpuConfig
MistralRsError
ModelCategory
Category of the model. This can also be used to extract model-category specific tools, such as the vision model prompt prefixer.
ModelDType
DType for the model.
ModelKind
The kind of model to build.
ModelSelected
NormalLoaderType
The architecture to load the normal model as.
Request
A request to the Engine, encapsulating the various parameters as well as the mpsc response Sender used to return the Response.
RequestMessage
Message or messages for a Request.
Response
The response enum contains 3 types of variants:
ResponseErr
ResponseOk
SchedulerConfig
StopTokens
Stop sequences or ids.
TokenSource
The source of the HF token.
ToolCallType
ToolChoice
ToolType
VisionLoaderType
The architecture to load the vision model as.

Constants§

Statics§

ENGINE_INSTRUCTIONS
Engine instructions, per Engine (MistralRs) ID.
TERMINATE_ALL_NEXT_STEP
Terminate all sequences on the next scheduling step. Be sure to reset this.

Traits§

CustomLogitsProcessor
Customizable logits processor.
Loader
The Loader trait abstracts the loading process. The primary entrypoint is the load_model method.
ModelPaths
ModelPaths abstracts the mechanism to get all necessary files for running a model. For example LocalModelPaths implements ModelPaths when all files are in the local file system.
Pipeline
TryIntoDType
Type which can be converted to a DType
VisionPromptPrefixer
Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at

Functions§

cross_entropy_loss
The cross-entropy loss.
get_model_dtype
get_tgt_non_granular_index
get_toml_selected_model_dtype
initialize_logging
This should be called to initialize the debug flag and logging. This should not be called in mistralrs-core code due to Rust usage.
paged_attn_supported
parse_isq_value
Parse ISQ value: one of

Crate mistralrsCopy item path