Implementation of the target/draft compatibility check. More...

#include <entropic/inference/speculative_compat.h>
#include <llama.h>
#include <algorithm>
#include <cstring>
#include <optional>
#include <string>

Include dependency graph for speculative_compat.cpp:

Go to the source code of this file.

Namespaces
namespace	entropic
	Activate model on GPU (WARM → ACTIVE).

Functions
CompatResult	entropic::speculative::check_compat (const llama_model target, const llama_model draft)
	Check whether a draft model can pair with a target for sequential speculative decoding.

Detailed Description

Implementation of the target/draft compatibility check.

Mirrors the file-private common_speculative_are_compatible function from extern/llama.cpp/common/speculative.cpp plus an entropic-side recurrent-architecture gate. See the header for the full rationale.

Version: 2.1.11

Definition in file speculative_compat.cpp.

Function Documentation

◆ check_compat()

CompatResult entropic::speculative::check_compat	(	const llama_model *	target,
		const llama_model *	draft
	)

Check whether a draft model can pair with a target for sequential speculative decoding.

Compatibility orchestrator.

Mirrors the logic of llama.cpp's file-private common_speculative_are_compatible (in common/speculative.cpp) and additionally enforces entropic's architecture gate:

Target model must NOT be recurrent (Mamba/RWKV) AND must NOT be hybrid (Jamba, Granite-Hybrid, Nemotron-H, QWEN35/QWEN35MOE, QWEN3NEXT, KIMI_LINEAR, FALCON_H1, PLAMO2, LFM2/LFM2MOE). Speculative-decoding math assumes a pure-transformer verifier; upstream does not self-disable for these architectures at the v2.1.11 pin, and hybrid SSM targets produce divergent logits across split-prefill boundaries (Gate A, Session 5).
Vocab type must match between target and draft.
BOS-add behavior and BOS token id must match.
EOS-add behavior and EOS token id must match.
Vocab size difference must be ≤ 128 tokens (SPEC_VOCAB_MAX_SIZE_DIFFERENCE in llama.cpp).
Token text must match for tokens [SPEC_VOCAB_CHECK_START_TOKEN_ID=5, min(n_vocab_tgt, n_vocab_dft)).

Parameters

target	Target (verifier) llama_model handle. Must be non-null.
draft	Draft (proposer) llama_model handle. Must be non-null.

Returns: CompatResult — compatible=true and empty diagnostic on success; compatible=false with a specific diagnostic string identifying the first failed rule on failure. @utility

Version: 2.1.11

Parameters

target	Target (verifier) model.
draft	Draft (proposer) model.

Returns: CompatResult. @utility

Version: 2.1.11

Definition at line 271 of file speculative_compat.cpp.

Namespaces

Functions

Detailed Description

Function Documentation

◆ check_compat()