Model configuration for a single tier. More...

#include <entropic/types/config.h>

Inheritance diagram for entropic::ModelConfig:

Public Attributes
std::filesystem::path	path
	Resolved model file path.

std::string	adapter = "qwen35"
	Chat adapter name.

int	context_length = 16384
	Context window size (512–131072)

int	gpu_layers = -1
	GPU offload layers (-1 = all)

bool	keep_warm = false
	Pre-warm model at startup.

bool	use_mlock = true
	Lock model in system RAM.

int	reasoning_budget = -1
	Think token budget (-1 = unlimited)

std::string	cache_type_k = "f16"
	KV cache key quantization type.

std::string	cache_type_v = "f16"
	KV cache value quantization type.

int	n_batch = 512
	Batch size for prompt processing.

int	n_ubatch = 0
	Physical micro-batch size for prompt processing (gh#23 MVP item 5).

int	n_threads = 0
	CPU threads (0 = auto-detect)

std::string	tensor_split
	Multi-GPU tensor split ratios (empty = single GPU)

std::string	split_mode
	Multi-GPU split mode for model load (gh#23 MVP item 6).

int	main_gpu = 0
	Primary GPU index for model load (gh#23 MVP item 7).

bool	offload_kqv = true
	Offload KQV ops (incl.

float	rope_freq_base = 0.0f
	RoPE base frequency override (gh#23 MVP item 9).

float	rope_freq_scale = 0.0f
	RoPE frequency scaling factor (gh#23 MVP item 10).

int	n_parallel = 1
	Max parallel sequences per context (gh#23 MVP item 11).

bool	flash_attn = true
	Enable flash attention.

std::optional< std::vector< std::string > >	allowed_tools
	Tool whitelist (nullopt = all)

std::filesystem::path	mmproj_path
	Vision projector GGUF path.

std::string	model_format = "gguf"
	Expected model format.

Detailed Description

Model configuration for a single tier.

Contains all parameters needed to load and configure a model, including llama.cpp pass-through fields for KV cache, batching, threading, and attention.

Version: 1.8.0

Definition at line 148 of file config.h.

Member Data Documentation

◆ adapter

std::string entropic::ModelConfig::adapter = "qwen35"

Chat adapter name.

Definition at line 150 of file config.h.

◆ allowed_tools

std::optional<std::vector<std::string> > entropic::ModelConfig::allowed_tools

Tool whitelist (nullopt = all)

Definition at line 236 of file config.h.

◆ cache_type_k

std::string entropic::ModelConfig::cache_type_k = "f16"

KV cache key quantization type.

Definition at line 158 of file config.h.

◆ cache_type_v

std::string entropic::ModelConfig::cache_type_v = "f16"

KV cache value quantization type.

Definition at line 159 of file config.h.

◆ context_length

int entropic::ModelConfig::context_length = 16384

Context window size (512–131072)

Definition at line 151 of file config.h.

◆ flash_attn

bool entropic::ModelConfig::flash_attn = true

Enable flash attention.

Definition at line 233 of file config.h.

◆ gpu_layers

int entropic::ModelConfig::gpu_layers = -1

GPU offload layers (-1 = all)

Definition at line 152 of file config.h.

◆ keep_warm

bool entropic::ModelConfig::keep_warm = false

Pre-warm model at startup.

Definition at line 153 of file config.h.

◆ main_gpu

int entropic::ModelConfig::main_gpu = 0

Primary GPU index for model load (gh#23 MVP item 7).

llama.cpp's mparams.main_gpu. Effective when split_mode == "none" (single-GPU pinning) or "row" (small tensors go to this GPU). Ignored when split_mode == "layer". 0 (default) preserves pre-v2.3.19 behavior bit-for-bit.

Version: 2.3.19

Definition at line 194 of file config.h.

◆ mmproj_path

std::filesystem::path entropic::ModelConfig::mmproj_path

Vision projector GGUF path.

When non-empty, the backend loads an mtmd_context alongside the base model for multimodal inference. Empty (default) = text-only model.

Version: 1.9.11

Definition at line 244 of file config.h.

◆ model_format

std::string entropic::ModelConfig::model_format = "gguf"

Expected model format.

"gguf" (default), "axmodel", "onnx", or empty (auto-detect). The backend validates that the file matches the expected format during load(). Mismatch returns ENTROPIC_ERROR_LOAD_FAILED with a diagnostic message identifying the actual format.

Version: 1.9.13

Definition at line 254 of file config.h.

◆ n_batch

int entropic::ModelConfig::n_batch = 512

Batch size for prompt processing.

Definition at line 160 of file config.h.

◆ n_parallel

int entropic::ModelConfig::n_parallel = 1

Max parallel sequences per context (gh#23 MVP item 11).

llama.cpp's cparams.n_seq_max. 1 (default) matches llama.cpp's default — single-sequence context, bit-identical pre-v2.3.23 behavior. Raising this enables KV-cache slot reuse across multiple concurrent generations (e.g. speculative rejection batches, batched-server scenarios). Effective max is LLAMA_MAX_SEQ; consult llama.cpp for the current ceiling.

Version: 2.3.23

Definition at line 232 of file config.h.

◆ n_threads

int entropic::ModelConfig::n_threads = 0

CPU threads (0 = auto-detect)

Definition at line 174 of file config.h.

◆ n_ubatch

int entropic::ModelConfig::n_ubatch = 0

Physical micro-batch size for prompt processing (gh#23 MVP item 5).

llama.cpp's cparams.n_ubatch. Decoupled from n_batch since llama.cpp v0.4 — n_batch is the LOGICAL batch (max tokens queued per llama_decode call) and n_ubatch is the PHYSICAL chunk the kernels actually process. Smaller n_ubatch reduces peak GPU memory for the same n_batch. 0 (default) means "match `n_batch`" — preserves pre-v2.3.17 behavior bit-for-bit since llama.cpp's default in that case is min(n_batch, default). Typical productive values: 128, 256, 512 (== n_batch).

Version: 2.3.17

Definition at line 172 of file config.h.

◆ offload_kqv

bool entropic::ModelConfig::offload_kqv = true

Offload KQV ops (incl.

KV cache) to the GPU (gh#23 MVP item 8). llama.cpp's cparams.offload_kqv. true (default) matches llama.cpp's default — KQV runs on GPU for max throughput. Set false to keep KQV on the CPU side; saves VRAM at a throughput cost. Useful for tight-VRAM single-GPU setups.

Version: 2.3.20

Definition at line 202 of file config.h.

◆ path

std::filesystem::path entropic::ModelConfig::path

Resolved model file path.

Definition at line 149 of file config.h.

◆ reasoning_budget

int entropic::ModelConfig::reasoning_budget = -1

Think token budget (-1 = unlimited)

Definition at line 157 of file config.h.

◆ rope_freq_base

float entropic::ModelConfig::rope_freq_base = 0.0f

RoPE base frequency override (gh#23 MVP item 9).

llama.cpp's cparams.rope_freq_base. 0.0 (default) takes the model's trained value — preserves pre-v2.3.21 behavior bit-for-bit. Positive overrides typically range 10000–10000000; raising it stretches the RoPE period (extends effective context at a quality cost). Pair with rope_freq_scale for YaRN-style context-extension setups.

Version: 2.3.21

Definition at line 212 of file config.h.

◆ rope_freq_scale

float entropic::ModelConfig::rope_freq_scale = 0.0f

RoPE frequency scaling factor (gh#23 MVP item 10).

llama.cpp's cparams.rope_freq_scale. 0.0 (default) takes the model's trained value — preserves pre-v2.3.22 behavior bit-for-bit. Values in (0, 1) shrink the effective context (denser RoPE positions); values > 1 stretch it. Typical YaRN-style extension uses values like 0.5 (2× context). Pairs with rope_freq_base.

Version: 2.3.22

Definition at line 222 of file config.h.

◆ split_mode

std::string entropic::ModelConfig::split_mode

Multi-GPU split mode for model load (gh#23 MVP item 6).

Maps to llama.cpp's enum llama_split_mode. Accepted values:

"" (default) — keep llama.cpp's default (LAYER).
"none" — single GPU, no split. Use with main_gpu.
"layer" — split layers + KV across GPUs (llama.cpp default).
"row" — split layers + KV with tensor parallelism. Unrecognized values fall back to the default with a logged warning. Empty (default) preserves pre-v2.3.18 model load bit-for-bit.
Version
2.3.18

Definition at line 186 of file config.h.

◆ tensor_split

std::string entropic::ModelConfig::tensor_split

Multi-GPU tensor split ratios (empty = single GPU)

Definition at line 175 of file config.h.

◆ use_mlock

bool entropic::ModelConfig::use_mlock = true

Lock model in system RAM.

Definition at line 154 of file config.h.

The documentation for this struct was generated from the following file:

entropic/types/config.h

Public Attributes