OpenAI-compatible completions endpoint handler.

POST

/v1/completions

const url = 'https://example.com/v1/completions';
const options = {
  method: 'POST',
  headers: {'Content-Type': 'application/json'},
  body: '{"adapter":{"generation":"5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a"},"best_of":1,"dry_allowed_length":null,"dry_base":null,"dry_multiplier":null,"dry_sequence_breakers":null,"echo":false,"frequency_penalty":null,"grammar":{"type":"regex","value":"example"},"logit_bias":null,"logprobs":null,"max_tokens":16,"min_p":null,"model":"mistral","n":1,"presence_penalty":null,"prompt":"Say this is a test.","repetition_penalty":null,"stop":"example","stream":true,"suffix":null,"temperature":0.7,"tool_choice":"none","tools":null,"top_k":null,"top_p":null,"truncate_sequence":null,"user":"example"}'
};

try {
  const response = await fetch(url, options);
  const data = await response.json();
  console.log(data);
} catch (error) {
  console.error(error);
}

curl --request POST \
  --url https://example.com/v1/completions \
  --header 'Content-Type: application/json' \
  --data '{ "adapter": { "generation": "5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a" }, "best_of": 1, "dry_allowed_length": null, "dry_base": null, "dry_multiplier": null, "dry_sequence_breakers": null, "echo": false, "frequency_penalty": null, "grammar": { "type": "regex", "value": "example" }, "logit_bias": null, "logprobs": null, "max_tokens": 16, "min_p": null, "model": "mistral", "n": 1, "presence_penalty": null, "prompt": "Say this is a test.", "repetition_penalty": null, "stop": "example", "stream": true, "suffix": null, "temperature": 0.7, "tool_choice": "none", "tools": null, "top_k": null, "top_p": null, "truncate_sequence": null, "user": "example" }'

Request Body^required

application/json

Legacy OpenAI compatible text completion request

object

adapter

One of:

null
object

null

One of:

string
object

Resolve this alias when the request is admitted.

string

Pin this exact resident generation.

object

generation

required

The 64-character hexadecimal generation ID returned by the adapter management API.

string

Example

5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a

best_of

integer | null

Example

dry_allowed_length

Longest repeated sequence DRY leaves unpenalized.

integer | null

Example

null

dry_base

Base for DRY’s exponential penalty growth.

number | null format: float

Example

null

dry_multiplier

DRY repetition penalty multiplier; 0 disables DRY.

number | null format: float

Example

null

dry_sequence_breakers

Sequences that reset DRY repetition matching.

Array<string> | null

Example

null

echo

Echo the prompt back alongside the completion.

boolean

Example

false

frequency_penalty

Penalize tokens by how often they have appeared so far; positive values reduce repetition.

number | null format: float

Example

null

grammar

One of:

null
object

null

logit_bias

Bias added to the logits of these token IDs before sampling.

object | null

Example

null

logprobs

Include log probabilities of this many most likely tokens.

integer | null

Example

null

max_tokens

Maximum number of tokens to generate.

integer | null

Example

min_p

Drop tokens below this fraction of the top token’s probability.

number | null format: double

Example

null

model

Model ID; “default” targets the only loaded model.

string

Example

mistral

How many choices to generate.

integer

Example

presence_penalty

Penalize tokens that have already appeared; positive values push toward new topics.

number | null format: float

Example

null

prompt

required

string

Example

Say this is a test.

repetition_penalty

Multiplicative repetition penalty; 1.0 disables it.

number | null format: float

Example

null

stop

One of:

null
unknown

null

stream

Stream the response as server-sent events.

boolean | null

suffix

Text appended after the completion.

string | null

Example

null

temperature

Sampling temperature; higher values increase randomness.

number | null format: double

Example

0.7

tool_choice

One of:

null
object

null

One of:

Disallow selection of tools.

string

Allowed values: none

Tool definition

object

function

required

Function definition for a tool

object

description

string | null

name

required

string

parameters

object | null

strict

When true, the tool’s parameters JSON schema is enforced on the generated arguments via constrained decoding (llguidance).

boolean | null

type

required

Type of tool

string

Allowed values: function

tools

Tools the model may call.

Array<object> | null

Tool definition

object

function

required

Function definition for a tool

object

description

string | null

name

required

string

parameters

object | null

strict

When true, the tool’s parameters JSON schema is enforced on the generated arguments via constrained decoding (llguidance).

boolean | null

type

required

Type of tool

string

Allowed values: function

Example

null

top_k

Sample only from the k most likely tokens.

integer | null

Example

null

top_p

Nucleus sampling: only tokens within the top cumulative probability mass are considered.

number | null format: double

Example

null

truncate_sequence

Truncate inputs that exceed the model’s context length instead of erroring.

boolean | null

Example

null

user

string | null

Responses

200

Completion JSON or server-sent event chunks

object

adapter_generation

string | null

choices

required

Array<object>

object

finish_reason

required

string

index

required

integer

logprobs

text

required

string

created

required

integer format: int64

required

string

model

required

string

object

required

string

system_fingerprint

required

string

usage

required

object

avg_compl_tok_per_sec

required

number format: float

avg_prompt_tok_per_sec

required

number format: float

avg_tok_per_sec

required

number format: float

completion_tokens

required

integer

prompt_tokens

required

integer

total_completion_time_sec

required

number format: float

total_prompt_time_sec

required

number format: float

total_time_sec

required

number format: float

total_tokens

required

integer

Example ^generated

{
  "adapter_generation": "example",
  "choices": [
    {
      "finish_reason": "example",
      "index": 1,
      "logprobs": "example",
      "text": "example"
    }
  ],
  "created": 1,
  "id": "example",
  "model": "example",
  "object": "example",
  "system_fingerprint": "example",
  "usage": {
    "avg_compl_tok_per_sec": 1,
    "avg_prompt_tok_per_sec": 1,
    "avg_tok_per_sec": 1,
    "completion_tokens": 1,
    "prompt_tokens": 1,
    "total_completion_time_sec": 1,
    "total_prompt_time_sec": 1,
    "total_time_sec": 1,
    "total_tokens": 1
  }
}

OpenAI-compatible completions endpoint handler.

Request Body required

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Responses

200

Example generated

Request Body^required

Example ^generated