Generation parameters for a single inference call.
More...
#include <entropic/types/config.h>
|
| float | temperature = 0.7f |
| | Sampling temperature.
|
| |
| float | top_p = 0.9f |
| | Nucleus sampling threshold.
|
| |
| int | top_k = 40 |
| | Top-K sampling.
|
| |
| float | repeat_penalty = 1.1f |
| | Repetition penalty.
|
| |
| int | max_tokens = 4096 |
| | Maximum tokens to generate.
|
| |
| int | seed = -1 |
| | RNG seed for reproducible sampling.
|
| |
| int | reasoning_budget = -1 |
| | Per-call think budget override (-1 = unlimited)
|
| |
| bool | enable_thinking = true |
| | Enable <think> blocks (false if reasoning_budget == 0)
|
| |
| std::string | grammar |
| | GBNF grammar string (empty = unconstrained)
|
| |
| std::string | grammar_key |
| | Grammar registry key.
|
| |
| std::vector< std::string > | stop |
| | Stop sequences.
|
| |
| int | logprobs = 0 |
| | Top log-probs per token (0 = disabled)
|
| |
| int | time_limit_ms = 0 |
| | Wall-clock time cap in milliseconds.
|
| |
| std::string | profile |
| | GPU resource profile name.
|
| |
| bool | auto_adapt = true |
| | Enable throughput-based max_tokens auto-adaptation.
|
| |
| float | adapt_headroom = 0.9f |
| | Target time usage fraction for auto-adaptation.
|
| |
Generation parameters for a single inference call.
- Version
- 2.0.6-rc16 — added seed
Definition at line 227 of file config.h.
◆ adapt_headroom
| float entropic::GenerationParams::adapt_headroom = 0.9f |
Target time usage fraction for auto-adaptation.
0.9 means "use at most 90% of time_limit_ms for generation".
- Version
- 1.9.7
Definition at line 272 of file config.h.
◆ auto_adapt
| bool entropic::GenerationParams::auto_adapt = true |
Enable throughput-based max_tokens auto-adaptation.
When true, the orchestrator may reduce max_tokens to fit within time_limit_ms based on recent throughput measurements. Ignored if time_limit_ms == 0.
- Version
- 1.9.7
Definition at line 267 of file config.h.
◆ enable_thinking
| bool entropic::GenerationParams::enable_thinking = true |
Enable <think> blocks (false if reasoning_budget == 0)
Definition at line 239 of file config.h.
◆ grammar
| std::string entropic::GenerationParams::grammar |
GBNF grammar string (empty = unconstrained)
Definition at line 240 of file config.h.
◆ grammar_key
| std::string entropic::GenerationParams::grammar_key |
Grammar registry key.
Resolved to GBNF content by orchestrator before passing to the backend. If both grammar and grammar_key are set, grammar (raw string) takes precedence.
- Version
- 1.9.3
Definition at line 245 of file config.h.
◆ logprobs
| int entropic::GenerationParams::logprobs = 0 |
Top log-probs per token (0 = disabled)
Definition at line 247 of file config.h.
◆ max_tokens
| int entropic::GenerationParams::max_tokens = 4096 |
Maximum tokens to generate.
Definition at line 232 of file config.h.
◆ profile
| std::string entropic::GenerationParams::profile |
GPU resource profile name.
Resolved to GPUResourceProfile by the orchestrator before passing to the backend. Empty string means use the "balanced" profile.
- Version
- 1.9.7
Definition at line 260 of file config.h.
◆ reasoning_budget
| int entropic::GenerationParams::reasoning_budget = -1 |
Per-call think budget override (-1 = unlimited)
Definition at line 238 of file config.h.
◆ repeat_penalty
| float entropic::GenerationParams::repeat_penalty = 1.1f |
Repetition penalty.
Definition at line 231 of file config.h.
◆ seed
| int entropic::GenerationParams::seed = -1 |
RNG seed for reproducible sampling.
-1 = random (default). Maps to LLAMA_DEFAULT_SEED when negative. (P2-14)
- Version
- 2.0.6-rc16
Definition at line 237 of file config.h.
◆ stop
| std::vector<std::string> entropic::GenerationParams::stop |
Stop sequences.
Definition at line 246 of file config.h.
◆ temperature
| float entropic::GenerationParams::temperature = 0.7f |
Sampling temperature.
Definition at line 228 of file config.h.
◆ time_limit_ms
| int entropic::GenerationParams::time_limit_ms = 0 |
Wall-clock time cap in milliseconds.
Generation is cancelled if this limit is reached. 0 = no time limit (default).
- Version
- 1.9.7
Definition at line 254 of file config.h.
◆ top_k
| int entropic::GenerationParams::top_k = 40 |
Top-K sampling.
Definition at line 230 of file config.h.
◆ top_p
| float entropic::GenerationParams::top_p = 0.9f |
Nucleus sampling threshold.
Definition at line 229 of file config.h.
The documentation for this struct was generated from the following file: