Sampling parameters
Sampling parameters control how the engine selects the next token from the model’s probability distribution.
Temperature, top-p, top-k
Section titled “Temperature, top-p, top-k”Temperature scales the logit distribution before sampling. Higher temperature flattens it; lower temperature sharpens it.
temperature: 0.0: greedy. Always picks the most likely token.temperature: 1.0: matches the model’s training distribution.
Top-p (nucleus sampling) keeps the smallest set of tokens whose cumulative probability exceeds p, then renormalizes.
top_p: 1.0: disables nucleus sampling.
Top-k caps the candidate count at k.
top_k: 1: equivalent to greedy.top_k: 0: disabled (default when unset).
Min-p scales with the most likely token’s probability. The threshold is min_p times the top-token probability; everything below is dropped.
When the model is confident, min-p filters more tokens. When uncertain, it filters fewer.
DRY (Don’t Repeat Yourself)
Section titled “DRY (Don’t Repeat Yourself)”DRY penalizes sequences reproducing spans from preceding text.
Parameters:
dry_multiplier: penalty strength.dry_base: exponent base for penalty scaling.dry_allowed_length: match length before the penalty applies.dry_sequence_breakers: tokens that reset matching.
Off by default.
Repetition penalty and frequency penalty
Section titled “Repetition penalty and frequency penalty”presence_penalty: flat penalty on tokens that appeared at all.frequency_penalty: penalty proportional to occurrence count.
Both are OpenAI-compatible.
Interaction order
Section titled “Interaction order”When multiple filters are active, application order is:
- Temperature (logit scaling)
- Top-k (hard candidate cap)
- Top-p or min-p
- Repetition/frequency/presence/DRY penalties
Setting parameters
Section titled “Setting parameters”All parameters work on the HTTP API, in SDK request types, and in interactive-mode slash commands (/temperature, /topk, /topp). Slash-command values persist between requests; per-request API values override them.
For deployment-wide defaults, the CLI TOML config has a [sampling] section applied to requests not specifying a parameter.
seed in the request controls randomness. Identical seeds with identical prompts produce identical output.