Entropic 2.3.8
Local-first agentic inference engine
Loading...
Searching...
No Matches
nemotron3_adapter.h File Reference

Nemotron 3 chat adapter (v2.1.9, gh#47). More...

#include <entropic/inference/adapters/adapter_base.h>
#include <unordered_map>
Include dependency graph for nemotron3_adapter.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  entropic::Nemotron3Adapter
 Nemotron 3 chat adapter (hybrid Mamba-Transformer family). More...
 

Namespaces

namespace  entropic
 Activate model on GPU (WARM → ACTIVE).
 

Detailed Description

Nemotron 3 chat adapter (v2.1.9, gh#47).

Architecture-verification gate (proposal §gh#47)
  • Hybrid Mamba-Transformer (Mamba-2 + MLP + 4 attention layers), compressed from NVIDIA-Nemotron-Nano-9B-v2 via Nemotron Elastic.
  • GGUF arch tag: nemotron_h (variant nemotron_h_moe).
  • llama.cpp status: fully integrated. LLM_ARCH_NEMOTRON_H is in the arch enumeration; llm_build_nemotron_h extends llm_build_mamba_base — state handling is shared with the stable Mamba path, not experimental.
  • Chat template: thinking-enabled by default; <think> and </think> are separate special tokens. With llama.cpp CLI use --special to surface them; programmatic generation receives the tokens already detokenised, so the adapter's base-class strip_think_blocks / extract_thinking handle them naturally.
  • Tool-call format: the vLLM docs advertise the qwen3_coder XML parser, but empirical capture (gh#70, v2.3.8) showed the bundled nemotron_h GGUFs actually emit a DSML invoke format at every precision (Q4_K_XL / Q8_0 / BF16):
    <|DSML|function_calls>
    <|DSML|invoke name="tool.name">
    <|DSML|parameter name="key" string="value"/>
    </|DSML|invoke>
    </|DSML|function_calls>
    (fullwidth-pipe = U+FF5C; self-closing typed parameter tags). The adapter parses DSML first, then falls back to the qwen XML and tagged-JSON paths for rigged-prompt / mixed-format consumers.
  • Reasoning trace: yes — handled by base-class think-block primitives; no Nemotron-specific override needed.

Gate outcome: PASSES. Nemotron3Adapter proceeds.

Internal to inference .so.

Version
2.3.8

Definition in file nemotron3_adapter.h.