Entropic 2.3.8
Local-first agentic inference engine
Loading...
Searching...
No Matches
speculative_compat.cpp File Reference

Implementation of the target/draft compatibility check. More...

#include <entropic/inference/speculative_compat.h>
#include <llama.h>
#include <algorithm>
#include <cstring>
#include <optional>
#include <string>
Include dependency graph for speculative_compat.cpp:

Go to the source code of this file.

Namespaces

namespace  entropic
 Activate model on GPU (WARM → ACTIVE).
 

Functions

CompatResult entropic::speculative::check_compat (const llama_model *target, const llama_model *draft)
 Check whether a draft model can pair with a target for sequential speculative decoding.
 

Detailed Description

Implementation of the target/draft compatibility check.

Mirrors the file-private common_speculative_are_compatible function from extern/llama.cpp/common/speculative.cpp plus an entropic-side recurrent-architecture gate. See the header for the full rationale.

Version
2.1.11

Definition in file speculative_compat.cpp.

Function Documentation

◆ check_compat()

CompatResult entropic::speculative::check_compat ( const llama_model *  target,
const llama_model *  draft 
)

Check whether a draft model can pair with a target for sequential speculative decoding.

Compatibility orchestrator.

Mirrors the logic of llama.cpp's file-private common_speculative_are_compatible (in common/speculative.cpp) and additionally enforces entropic's architecture gate:

  1. Target model must NOT be recurrent (Mamba/RWKV) AND must NOT be hybrid (Jamba, Granite-Hybrid, Nemotron-H, QWEN35/QWEN35MOE, QWEN3NEXT, KIMI_LINEAR, FALCON_H1, PLAMO2, LFM2/LFM2MOE). Speculative-decoding math assumes a pure-transformer verifier; upstream does not self-disable for these architectures at the v2.1.11 pin, and hybrid SSM targets produce divergent logits across split-prefill boundaries (Gate A, Session 5).
  2. Vocab type must match between target and draft.
  3. BOS-add behavior and BOS token id must match.
  4. EOS-add behavior and EOS token id must match.
  5. Vocab size difference must be ≤ 128 tokens (SPEC_VOCAB_MAX_SIZE_DIFFERENCE in llama.cpp).
  6. Token text must match for tokens [SPEC_VOCAB_CHECK_START_TOKEN_ID=5, min(n_vocab_tgt, n_vocab_dft)).
Parameters
targetTarget (verifier) llama_model handle. Must be non-null.
draftDraft (proposer) llama_model handle. Must be non-null.
Returns
CompatResult — compatible=true and empty diagnostic on success; compatible=false with a specific diagnostic string identifying the first failed rule on failure. @utility
Version
2.1.11
Parameters
targetTarget (verifier) model.
draftDraft (proposer) model.
Returns
CompatResult. @utility
Version
2.1.11

Definition at line 271 of file speculative_compat.cpp.