Entropic 2.3.8
Local-first agentic inference engine
Loading...
Searching...
No Matches
gemma4_adapter.h File Reference

Gemma 4 chat adapter (v2.1.9, gh#46). More...

Include dependency graph for gemma4_adapter.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  entropic::Gemma4Adapter
 Gemma 4 chat adapter (covers A4B / E4B / E2B variants). More...
 

Namespaces

namespace  entropic
 Activate model on GPU (WARM → ACTIVE).
 

Detailed Description

Gemma 4 chat adapter (v2.1.9, gh#46).

Covers the Gemma 4 instruct family: 26B-A4B, E4B, E2B variants. All three share the same chat template structure and channel-based thinking convention, so a single adapter class handles them.

Chat template
The GGUF-embedded template uses Gemma's <|channel>thought\n... <channel|> markers for the reasoning/answer split. We do not replicate the template here — chat_format() returns the empty string so llama.cpp applies the GGUF-stored template directly.
Tool-call format (resolved at v2.3.8, gh#69)
Empirical capture from gemma-4-E2B-it / E4B-it confirmed Gemma 4 emits tool calls inside a ChatML-style channel: opening header <|im_start|>tool_call, JSON body, plain </tool_call> close (an asymmetric pair). Both sizes scored 0/6 completion before the fix because none of the prior open variants matched that header. The adapter parses with this layered strategy:
  1. Primary: tagged JSON via base parse_tagged_tool_calls, which now accepts <tool_call>, <|tool_call>, <|tool_call|>, and the <|im_start|>tool_call channel header (gh#69) — all closed by </tool_call>.
  2. Fallback: bare-JSON lines containing "name" — base class parse_bare_json_tool_calls.
  3. Final fallback: malformed-JSON recovery via the base class.

Surface scrubbing (parse_tool_calls in the .cpp) removes the channel block and any stray <|im_start|>tool_call / <|im_end|> turn markers so they never reach the assistant-visible body.

Reasoning
Gemma 4 emits a channel-tagged thought block. If the GGUF template surfaces those tokens as <think>...</think> (common llama.cpp convention for thinking models), the base-class strip_think_blocks cleans them. If a different marker shape leaks through, that is addressed in the same model-test loop.

Internal to inference .so.

Version
2.1.9

Definition in file gemma4_adapter.h.