mistralrs
An object wrapping the underlying Rust system to handle requests and process conversations.
Send an OpenAI API compatible request, returning the result.
Send an embeddings request, returning embedding vectors in the same order they were provided. This returns the embeddings as [batch size, embedding dim]
Send an OpenAI API compatible request, returning the result.
Generate an image.
Send a request to re-ISQ the model. If the model was loaded as GGUF or GGML then nothing will happen.
Tokenize some text, returning raw tokens.
Detokenize some tokens, returning text.
Return the maximum supported sequence length for the requested model, if available.
Send an OpenAI API compatible request to a specific model, returning the result.
Send an OpenAI API compatible completion request to a specific model, returning the result.
Unload a model from memory while preserving its configuration for later reload.
The model can be reloaded automatically when a request is sent to it, or manually
using reload_model().
Check if a model is currently loaded (as opposed to unloaded).
An OpenAI API compatible chat completion request.
An OpenAI API compatible completion request.
Chat completion response message.
Delta in content for streaming response.
A logprob with the top logprobs for this token.
Logprobs per token.
Chat completion choice.
Chat completion streaming chunk choice.
OpenAI compatible (superset) usage during a request.
An OpenAI compatible chat completion response.
Chat completion streaming request chunk.
Completion request choice.
An OpenAI compatible completion response.
Top-n logprobs element
DType for the model.
If the model is quantized, this is ignored so it is reasonable to use the [Default] impl.
Note: When using Auto, fallback pattern is: BF16 -> F16 -> 32
Image generation response format
MCP server source configuration for different transport types
Configuration for an individual MCP server
Configuration for MCP client integration