Crate mistralrs

Structs§

ChatCompletionResponse
An OpenAI compatible chat completion response.
CompletionResponse
An OpenAI compatible completion response.
DeviceMapMetadata
GGMLLoaderBuilder
A builder for a GGML loader.
GGMLSpecificConfig
Config for a GGML loader.
GGUFLoaderBuilder
A builder for a GGUF loader.
GGUFSpecificConfig
A config for a GGUF loader.
MistralRs
The MistralRs struct handles sending requests to the engine. It is the core multi-threaded component of mistral.rs, and uses mspc Sender and Receiver primitives to send and receive requests to the engine.
MistralRsBuilder
The MistralRsBuilder takes the pipeline and a scheduler method and constructs an Engine and a MistralRs instance. The Engine runs on a separate thread, and the MistralRs instance stays on the calling thread.
NormalLoader
A loader for a “normal” (non-quantized) model.
NormalLoaderBuilder
A builder for a loader for a “normal” (non-quantized) model.
NormalRequest
NormalSpecificConfig
Config specific to loading a normal model.
SamplingParams
Sampling params are used to control sampling.
Usage
OpenAI compatible (superset) usage during a request.

Constraint
Control the constraint with Regex or Yacc.
NormalLoaderType
The architecture to load the normal model as.
Request
A request to the Engine, encapsulating the various parameters as well as the mspc response Sender used to return the Response.
RequestMessage
Message or messages for a Request.
Response
The response enum contains 3 types of variants:
SchedulerMethod
The scheduler method controld how sequences are scheduled during each step of the engine. For each scheduling step, the scheduler method is used if there are not only running, only waiting sequences, or none. If is it used, then it is used to allow waiting sequences to run.
StopTokens
Stop sequences or ids.
TokenSource
The source of the HF token.

Loader
The Loader trait abstracts the loading process. The primary entrypoint is the load_model method.