OpenAI-compatible completions endpoint handler.
const url = 'https://example.com/v1/completions';const options = { method: 'POST', headers: {'Content-Type': 'application/json'}, body: '{"best_of":1,"dry_allowed_length":null,"dry_base":null,"dry_multiplier":null,"dry_sequence_breakers":null,"echo":false,"frequency_penalty":null,"grammar":{"type":"regex","value":"example"},"logit_bias":null,"logprobs":null,"max_tokens":16,"min_p":null,"model":"mistral","n":1,"presence_penalty":null,"prompt":"Say this is a test.","repetition_penalty":null,"stop":"example","stream":true,"suffix":null,"temperature":0.7,"tool_choice":"none","tools":null,"top_k":null,"top_p":null,"truncate_sequence":null,"user":"example"}'};
try { const response = await fetch(url, options); const data = await response.json(); console.log(data);} catch (error) { console.error(error);}curl --request POST \ --url https://example.com/v1/completions \ --header 'Content-Type: application/json' \ --data '{ "best_of": 1, "dry_allowed_length": null, "dry_base": null, "dry_multiplier": null, "dry_sequence_breakers": null, "echo": false, "frequency_penalty": null, "grammar": { "type": "regex", "value": "example" }, "logit_bias": null, "logprobs": null, "max_tokens": 16, "min_p": null, "model": "mistral", "n": 1, "presence_penalty": null, "prompt": "Say this is a test.", "repetition_penalty": null, "stop": "example", "stream": true, "suffix": null, "temperature": 0.7, "tool_choice": "none", "tools": null, "top_k": null, "top_p": null, "truncate_sequence": null, "user": "example" }'Request Body required
Section titled “Request Body required ”Legacy OpenAI compatible text completion request
object
Example
1Longest repeated sequence DRY leaves unpenalized.
Example
nullBase for DRY’s exponential penalty growth.
Example
nullDRY repetition penalty multiplier; 0 disables DRY.
Example
nullSequences that reset DRY repetition matching.
Example
nullEcho the prompt back alongside the completion.
Example
falsePenalize tokens by how often they have appeared so far; positive values reduce repetition.
Example
nullobject
object
object
object
Top-level grammar configuration for LLGuidance
object
List of grammar configurations
Grammar configuration with lexer settings
object
The JSON schema that the grammar should generate
The Lark grammar that the grammar should generate
The name of this grammar, can be used in GenGrammar nodes
Maximum number of tokens to generate
object
Bias added to the logits of these token IDs before sampling.
Example
nullInclude log probabilities of this many most likely tokens.
Example
nullMaximum number of tokens to generate.
Example
16Drop tokens below this fraction of the top token’s probability.
Example
nullModel ID; “default” targets the only loaded model.
Example
mistralHow many choices to generate.
Example
1Penalize tokens that have already appeared; positive values push toward new topics.
Example
nullExample
Say this is a test.Multiplicative repetition penalty; 1.0 disables it.
Example
nullMultiple possible stop sequences
Single stop sequence
Stream the response as server-sent events.
Text appended after the completion.
Example
nullSampling temperature; higher values increase randomness.
Example
0.7Disallow selection of tools.
Allow automatic selection of any given tool, or none.
Force selection of a given tool.
object
Force selection of a given tool.
object
Function definition for a tool
object
When true, the tool’s parameters JSON schema is enforced on the
generated arguments via constrained decoding (llguidance).
Type of tool
Tools the model may call.
Tool definition
object
Function definition for a tool
object
When true, the tool’s parameters JSON schema is enforced on the
generated arguments via constrained decoding (llguidance).
Type of tool
Example
nullSample only from the k most likely tokens.
Example
nullNucleus sampling: only tokens within the top cumulative probability mass are considered.
Example
nullTruncate inputs that exceed the model’s context length instead of erroring.
Example
nullResponses
Section titled “ Responses ”Completions