|
Entropic 2.3.8
Local-first agentic inference engine
|
Nemotron 3 chat adapter (v2.1.9, gh#47). More...


Go to the source code of this file.
Classes | |
| class | entropic::Nemotron3Adapter |
| Nemotron 3 chat adapter (hybrid Mamba-Transformer family). More... | |
Namespaces | |
| namespace | entropic |
| Activate model on GPU (WARM → ACTIVE). | |
Nemotron 3 chat adapter (v2.1.9, gh#47).
nemotron_h (variant nemotron_h_moe).LLM_ARCH_NEMOTRON_H is in the arch enumeration; llm_build_nemotron_h extends llm_build_mamba_base — state handling is shared with the stable Mamba path, not experimental.<think> and </think> are separate special tokens. With llama.cpp CLI use --special to surface them; programmatic generation receives the tokens already detokenised, so the adapter's base-class strip_think_blocks / extract_thinking handle them naturally.qwen3_coder XML parser, but empirical capture (gh#70, v2.3.8) showed the bundled nemotron_h GGUFs actually emit a DSML invoke format at every precision (Q4_K_XL / Q8_0 / BF16): | = U+FF5C; self-closing typed parameter tags). The adapter parses DSML first, then falls back to the qwen XML and tagged-JSON paths for rigged-prompt / mixed-format consumers.Gate outcome: PASSES. Nemotron3Adapter proceeds.
Internal to inference .so.
Definition in file nemotron3_adapter.h.