|
Entropic 2.3.8
Local-first agentic inference engine
|
Tokenizer/architecture compatibility check for speculative decoding draft pairing. More...
#include <string>

Go to the source code of this file.
Classes | |
| struct | entropic::speculative::CompatResult |
| Result of a draft/target compatibility check. More... | |
Namespaces | |
| namespace | entropic |
| Activate model on GPU (WARM → ACTIVE). | |
Functions | |
| CompatResult | entropic::speculative::check_compat (const llama_model *target, const llama_model *draft) |
| Check whether a draft model can pair with a target for sequential speculative decoding. | |
Tokenizer/architecture compatibility check for speculative decoding draft pairing.
253ba110b), llama.cpp's vocab compatibility check moved from a public symbol (common_speculative_is_compat, exposed at the older 7f2cbd9a4 pin) to a file-private static bool common_speculative_are_compatible inside extern/llama.cpp/common/speculative.cpp. The check is exercised implicitly by common_speculative_impl_draft_simple's constructor, which throws std::runtime_error on mismatch — no query-without-commit path remains in the public C++ API.Mirroring the logic in entropic gives us:
entropic_speculative_compat C ABI entry point for downstream consumers (see entropic/entropic.h).In addition to the vocab-level checks llama.cpp's static helper performs, entropic adds an explicit architecture gate that refuses speculative pairing for both recurrent (Mamba/RWKV) AND hybrid (Jamba/Granite/Nemotron-H/QWEN35/QWEN35MOE/QWEN3NEXT/KIMI_LINEAR/ FALCON_H1/PLAMO2/LFM2/LFM2MOE) target architectures.
Upstream's speculative layer does NOT self-disable for either recurrent or hybrid targets at this pin. The speculative-decoding math assumes a non-recurrent verifier; Gate-A diagnostics in Session 5 (proposal Implementation Log) showed empirically that hybrid-SSM targets produce totally divergent logits at the first speculative-batch boundary because the chunked SSM scan does not carry the recurrent state across ubatch boundaries. The guard refuses both categories so consumers fall back to plain decode cleanly instead of generating garbage.
Definition in file speculative_compat.h.
| CompatResult entropic::speculative::check_compat | ( | const llama_model * | target, |
| const llama_model * | draft | ||
| ) |
Check whether a draft model can pair with a target for sequential speculative decoding.
Compatibility orchestrator.
Mirrors the logic of llama.cpp's file-private common_speculative_are_compatible (in common/speculative.cpp) and additionally enforces entropic's architecture gate:
SPEC_VOCAB_MAX_SIZE_DIFFERENCE in llama.cpp).[SPEC_VOCAB_CHECK_START_TOKEN_ID=5, min(n_vocab_tgt, n_vocab_dft)).| target | Target (verifier) llama_model handle. Must be non-null. |
| draft | Draft (proposer) llama_model handle. Must be non-null. |
compatible=true and empty diagnostic on success; compatible=false with a specific diagnostic string identifying the first failed rule on failure. @utility | target | Target (verifier) model. |
| draft | Draft (proposer) model. |
Definition at line 271 of file speculative_compat.cpp.