Entropic 2.3.8
Local-first agentic inference engine
Loading...
Searching...
No Matches
Entropic

‍Local-first agentic inference engine — your models, your hardware, your control

API reference: tvanfossen.github.io/entropic — auto-generated from doxygen on every release

What Is Entropic?

Entropic is a C inference engine that turns a local GGUF model into a multi-tier, tool-calling AI system. It runs entirely on your hardware — no cloud, no API keys, no telemetry. You control the model, the prompts, the tools, and the data.

The name comes from information theory: every handoff between human intent, prompt, and model is a lossy translation. Information decays at each boundary. Entropic is purpose-built to manage that decay — structured context management, identity-based delegation, grammar-constrained output, and tool-augmented reasoning minimize what gets lost along the way.

Why Entropic?

You want local AI that actually does things, not just answers questions.

Most local inference tools give you a model and a chat loop. Entropic gives you an engine — the infrastructure between your application and the model that handles the hard parts:

Problem How Entropic Solves It
Model just generates text Agentic loop: generate → parse tool calls → execute → re-generate
One model, one personality Identity system: same model serves multiple roles with different prompts, tools, and constraints
Context gets stale Auto-compaction: summarizes old context to stay within window
Output format unpredictable Grammar constraints: GBNF grammars force structured output
Need tools but no cloud MCP tool servers: filesystem, bash, git, web — all local, plugin architecture
Privacy concerns Zero network calls. Everything stays on your machine.

Who Is It For?

Entropic is an engine, not an application. It's for developers building AI-powered software that needs to run locally:

  • CLI/TUI tools that use AI for code generation, analysis, or planning
  • Game engines that need NPC dialogue or decision-making from a local model
  • Embedded systems with on-device inference (CPU-only static build available)
  • Education platforms running student-facing AI without cloud dependencies
  • Privacy-sensitive applications where data cannot leave the device

Architecture

┌─────────────────────────────────────────────────────────┐
│ Your Application │
│ C/C++ (direct linkage) · Python (ctypes wrapper) │
├─────────────────────────────────────────────────────────┤
│ librentropic.so — C API │
│ │
│ ┌─────────┐ ┌────────┐ ┌──────┐ ┌───────┐ ┌────────┐ │
│ │ Engine │ │Inference│ │ MCP │ │Config │ │Storage │ │
│ │ Loop │ │Backend │ │Servers│ │Loader │ │Backend │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ Context │ │ Prompt │ │ Tool │ │ YAML │ │ SQLite │ │
│ │ Routing │ │ Cache │ │ Auth │ │ Layer │ │ Audit │ │
│ │ Delegate│ │ Grammar│ │Plugin│ │ Merge │ │ Session│ │
│ └─────────┘ └────────┘ └──────┘ └───────┘ └────────┘ │
├─────────────────────────────────────────────────────────┤
│ llama.cpp (CUDA / Vulkan / CPU) │
└─────────────────────────────────────────────────────────┘

Pure C at all .so boundaries. No C++ ABI crossing. Any language that can call C functions can use the engine.

Quick Start

For consumer install paths (tarball or pip install entropic-engine), read docs/getting-started.md — it covers C/C++ direct linking and the Python wrapper end-to-end.

For contributors building from source:

Prerequisites

  • Linux (tested on Ubuntu 24.04)
  • cmake 3.21+, C++20 compiler
  • NVIDIA GPU with 16GB+ VRAM (or CPU-only for smaller models)
  • Python 3.10+ (for invoke task runner and pre-commit hooks)

Build

git clone --recurse-submodules https://github.com/tvanfossen/entropic.git
cd entropic
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]" # invoke + pre-commit + gcovr + ruff + mypy
.venv/bin/pre-commit install
# CUDA build (default)
inv build --clean
# CPU-only build
inv build --cpu

Run an Example

inv example -n pychess # Multi-tier chess (C++)
inv example -n explorer # Interactive REPL (C++)
inv example -n headless # Minimal C harness

Usage

C API

entropic_configure_dir(h, ".myapp"); // Layered config resolution
// Streaming generation with full engine pipeline
entropic_run_streaming(h, "What is 2+2?", on_token, NULL, NULL);
// Conversation persists across calls
entropic_run_streaming(h, "Explain your reasoning", on_token, NULL, NULL);
// Manage conversation
size_t count;
entropic_context_count(h, &count); // Message count
entropic_context_clear(h); // New session
Public C API for the Entropic inference engine.
ENTROPIC_EXPORT entropic_error_t entropic_context_count(entropic_handle_t handle, size_t *count)
Get the number of messages in the conversation.
ENTROPIC_EXPORT entropic_error_t entropic_configure_dir(entropic_handle_t handle, const char *project_dir)
Configure engine using layered config resolution.
ENTROPIC_EXPORT entropic_error_t entropic_context_clear(entropic_handle_t handle)
Clear conversation history, starting a new session.
ENTROPIC_EXPORT void entropic_destroy(entropic_handle_t handle)
Destroy an engine instance and free all resources.
ENTROPIC_EXPORT entropic_error_t entropic_create(entropic_handle_t *handle)
Create a new engine instance.
Definition entropic.cpp:181
ENTROPIC_EXPORT entropic_error_t entropic_run_streaming(entropic_handle_t handle, const char *input, void(*on_token)(const char *token, size_t len, void *user_data), void *user_data, int *cancel_flag)
Streaming agentic loop with token callback.
Engine handle struct — owns all subsystems.

Python Wrapper (ctypes)

The Python package is a thin ctypes binding over the C ABI — no OOP wrapper. See docs/getting-started.md for the full walkthrough including entropic install-engine (downloads the matching librentropic.so from GitHub Releases).

import ctypes
from entropic import (
entropic_create, entropic_configure_dir,
entropic_run_streaming, entropic_destroy,
)
handle = ctypes.c_void_p()
entropic_create(ctypes.byref(handle))
entropic_configure_dir(handle, b".myapp")
@ctypes.CFUNCTYPE(None, ctypes.c_char_p, ctypes.c_size_t, ctypes.c_void_p)
def on_token(tok, n, ud):
print(tok.decode("utf-8", errors="replace"), end="", flush=True)
entropic_run_streaming(handle, b"What is 2 + 2?", on_token, None, None)

Configuration

Configuration loads in layers (highest priority wins):

  1. Compiled defaults — struct initializers built into the engine
  2. Consumer defaults (default_config.yaml in CWD) — shipped with your app
  3. Global (~/.entropic/config.yaml) — user machine-wide settings
  4. Project local ({project_dir}/config.local.yaml) — per-project overrides
  5. Environment (ENTROPIC_* variables)

Minimal config:

models:
lead:
path: primary # Resolves via bundled model registry
adapter: qwen35
context_length: 16384
default: lead
routing:
enabled: false
mcp:
enable_entropic: true # Internal tools (delegate, complete, etc.)
enable_filesystem: false
enable_bash: false
permissions:
auto_approve: true # Skip tool approval prompts

Session Logging

When using entropic_configure_dir(), the engine automatically creates:

File Contents
{project_dir}/session.log Engine operations — config, routing, timing, errors
{project_dir}/session_model.log Full user/assistant exchanges — streamed in real time

Features

Comprehensive engine capability inventory grouped by domain, with the primary source file for each item so the README doubles as a navigation map. For the historical "how we got here" narrative see docs/roadmap.md; for the layering behind these features see docs/architecture-cpp.md.

Agentic Loop

  • Generate → parse tool calls → execute → re-generate, with explicit state machine (IDLE → GENERATING → EXECUTING → VERIFYING → COMPLETE) — src/core/engine.cpp::loop, execute_iteration, include/entropic/core/engine_types.h::AgentState
  • Streaming generation with cancel-token plumbed end-to-end — src/core/response_generator.cpp::generate_streaming
  • Per-iteration hook lifecycle: ON_LOOP_ITERATION, ON_CONTEXT_ASSEMBLE, PRE_GENERATE (cancellable), POST_GENERATE (revisable), PRE_TOOL_CALL, POST_TOOL_CALLsrc/core/engine.cpp::execute_iteration, dispatch_post_generate
  • LoopMetrics surfaced via last_loop_metrics() and per-tier accessors (iterations, tool calls, tokens, duration, errors) — include/entropic/core/engine_types.h::LoopMetrics
  • Iteration count + budget exposed to the model each turn as a system reminder so identity prompts can teach budget-aware rules the model can actually enforce — src/core/response_generator.cpp::inject_engine_state_reminder
  • Anti-spiral primitive: warns the model after N consecutive calls of the same tool; threshold via LoopConfig.max_consecutive_same_toolsrc/mcp/tool_executor.cpp::update_anti_spiral_tracking
  • Synthetic completion forced when iteration cap is hit, with terminal-reason metadata for caller diagnostics — src/core/engine.cpp::loop (the post-while terminal-reason block)
  • Per-identity overrides for max_iterations and max_tool_calls_per_turn from frontmatter — src/facade/entropic.cpp::cache_per_tier_frontmatter, include/entropic/core/engine_types.h::effective_max_iterations

Identity System & Delegation

Validation

Tools / MCP

Built-in MCP servers (each shipped as a plugin):

Server Tools Source
entropic delegate, pipeline, complete, diagnose, inspect, context_inspect src/mcp/servers/entropic_server.cpp
filesystem read_file, write_file, edit_file, glob, grep src/mcp/servers/filesystem.cpp
bash execute src/mcp/servers/bash.cpp
git status, diff, log, commit, branch, checkout src/mcp/servers/git.cpp
diagnostics diagnostics, check_errors src/mcp/servers/diagnostics.cpp
web web_fetch, web_search src/mcp/servers/web.cpp

External MCP servers connect at runtime via stdio or SSE transport:

"{\"command\":\"python3\",\"args\":[\"chess_server.py\"]}");
ENTROPIC_EXPORT entropic_error_t entropic_register_mcp_server(entropic_handle_t handle, const char *name, const char *config_json)
Register an external MCP server at runtime.

Cross-cutting machinery:

Inference

Configuration

  • Layered config resolution (highest priority wins) — src/config/loader.cpp, src/config/env_overrides.cpp:
    1. Compiled defaults (struct initializers in the engine)
    2. Consumer defaults — default_config.yaml next to your binary
    3. Global — ~/.entropic/config.yaml
    4. Project local — {project_dir}/config.local.yaml
    5. Environment — ENTROPIC_* variables
  • Cross-field validation: routing references, default tier consistency, threshold ordering, identity references — src/config/validate.cpp
  • Bundled model registry (path: primary resolves via bundled_models.yaml to a vetted GGUF) — src/config/bundled_models.cpp, data/bundled_models.yaml
  • entropic download primary fetches into ~/.entropic/models/ with resume support — src/cli/download.cpp
  • Bundled data file discovery: compile-time DATA_DIR define, overridable at runtime via config_dirsrc/config/data_dir.cpp
  • Per-identity frontmatter parsing: allowed_tools, validation_rules, relay, max_iterations, max_tool_calls_per_turnsrc/prompts/manager.cpp, include/entropic/prompts/manager.h::IdentityFrontmatter

Storage / Persistence

Hooks (Extensibility)

20+ hook points across the loop lifecycle. All registration / dispatch lives in src/core/hook_registry.cpp and include/entropic/types/hooks.h.

Hook semantics:

  • Pre-hooks return non-zero to cancel the operation
  • Post-hooks return modified content to revise the engine's view
  • Info-level hooks are fire-and-forget
  • Multiple callbacks per hook point with priority ordering
  • Registration / deregistration at any time during engine lifetime — src/facade/entropic_hooks.cpp

External MCP Bridge

  • entropic mcp-bridge — pure stdio↔unix-socket relay (v2.1.7+, gh#34): forwards JSON-RPC bytes between an MCP client (Claude Code, VSCode, etc.) and a running engine's external bridge socket. Owns no engine instance and loads no model; an engine host (TUI / consumer app / future headless server) must already be running for the same project directory. Fails fast with a diagnostic naming the canonical path + socket when no engine is reachable — src/cli/mcp_bridge.cpp, src/facade/external_bridge.cpp
  • Multi-client subscription: TUI + Claude Code can both receive ask_complete / progress events simultaneously — src/facade/external_bridge.cpp::subscribe, broadcast_notification
  • Async ask via push notification + ask_status polling for long-running tasks — src/facade/external_bridge.cpp::run_async_ask, handle_ask_status
  • Phase observer: VERIFYING → "validating" / "revising" sub-phases surfaced to bridge subscribers — src/facade/external_bridge.cpp::attach_phase_observer, phase_observer_cb
  • Generation counter on the phase observer: in-flight stale callbacks are guaranteed no-ops after detach (race-safe under concurrent cancel) — src/facade/external_bridge.cpp::observer_call_is_stale, include/entropic/mcp/external_bridge.h::observer_gen_
  • Cancel-on-clear semantics: entropic.context_clear interrupts and drains in-flight async tasks before returning — src/facade/external_bridge.cpp::cancel_inflight_async_tasks

Distribution

  • Pure C ABI at every .so boundary — opaque handles, error codes via enum, no C++ ABI crossing, no exceptions across the boundary — include/entropic/entropic.h, src/facade/entropic.cpp
  • find_package(entropic 2.1) CMake support with imported target entropic::entropiccmake/entropic-config.cmake.in, CMakeLists.txt
  • Tarball layout: bin/, lib/, include/, share/ — standard Unix prefix — see docs/releasing.md
  • pip install entropic-engine — pure-Python ~50 KB ctypes wrapper + entropic install-engine subcommand that fetches the matching tarball from GitHub Releases (playwright pattern) — python/src/entropic/
  • $ENTROPIC_LIB / $ENTROPIC_HOME env vars for custom install resolution — python/src/entropic/_loader.py
  • entropic CLI subcommands: mcp-bridge, download, version, plus wrapper-side install-enginesrc/cli/main.cpp, python/src/entropic/cli.py
  • Reference examples: headless (C), pychess (C++ multi-tier showcase), explorer (interactive REPL), openai-server (OpenAI-compat HTTP front-end with chat/completions, completions, models, models/{name}, health, SSE streaming) — examples/

Observability

  • Per-subsystem structured logging via spdlog (e.g. mcp.tool_executor, core.response_generator, inference.orchestrator) — src/types/logging.cpp
  • LoopMetrics exposed through last_loop_metrics() and per-tier accessor maps — include/entropic/core/engine.h::last_loop_metrics, per_tier_metrics
  • ThroughputTracker integration on GenerationResultsrc/inference/throughput_tracker.cpp
  • entropic_throughput_tok_per_sec() facade query — src/facade/entropic.cpp
  • 20+ hook points doubling as telemetry consumption points — src/core/hook_registry.cpp
  • session.log + session_model.log per project_dir — src/types/session_logger.cpp
  • doxygen-guard enforces inline documentation on every function (@brief, exemption tag, @version bumped on body change) — .doxygen-guard.yaml, .pre-commit-config.yaml
  • knots enforces code-quality budget per function (cognitive complexity ≤ 15, McCabe ≤ 15, nesting ≤ 4, SLOC ≤ 50, ABC ≤ 10, returns ≤ 3) — .pre-commit-config.yaml
  • Per-library coverage gates via gcovr enforced by inv check-coveragetasks.py::check_coverage, .pre-commit-config.yaml

Quality / Testing

  • 16 pre-commit hooks: trim whitespace, end-of-file, ruff, ruff-format, flake8, knots, doxygen-guard, build, unit tests, per-library coverage, plus standard pre-commit checks — .pre-commit-config.yaml
  • Unit + regression tests (Catch2 v3 BDD style) — tests/unit/
  • Model tests, GPU-recommended (CPU works but is impractically slow), developer-run; results attached to the GitHub Release as model-results-vX.Y.Z.json at each x.y.0 — tests/model/, tasks.py::test
  • tests/distribution-smoke-consumer/ exercises the find_package(entropic) consumer experience end-to-end — tests/distribution-smoke-consumer/
  • Test gating per CLAUDE.md:
Gate When What runs
Pre-commit Every commit Unit tests (CPU, no GPU)
Minor version Each x.y.0 bump Full model/benchmark suite (GPU)
Patch version Each x.y.z bump Unit tests only

Privacy

  • Zero network calls in the inference hot path
  • All processing local; no telemetry collected
  • Conversation data, prompts, and model outputs stay on the host
  • Only outbound network call is the optional web MCP server's web_fetch / web_search tools, which the consumer opts into explicitly via config — src/mcp/servers/web.cpp

Bundled Models

Key Model Size VRAM
primary Qwen3.5-35B-A3B-UD-IQ3_XXS 13.1 GB 15+ GB
mid Qwen3.5-9B-Q8_0 9.5 GB 12+ GB
lightweight Qwen3.5-4B-Q8_0 4.5 GB 8+ GB

Use path: primary (or mid, lightweight) in config — the engine resolves to the full model path via the bundled registry.

Build Presets

Preset Description Use Case
full CUDA, all servers, tests Development workstation
dev CPU, debug, tests Fast iteration
minimal-static CPU, static .a, minimal servers Embedded consumer
game CUDA, minimal MCP servers Game engine integration
coverage CPU, gcov instrumentation Coverage analysis
inv build --clean # full (CUDA)
inv build --cpu # dev (CPU)
inv test --cpu --no-build # unit + regression tests
inv test --model --no-build # model tests (GPU recommended)

Examples

Example Demonstrates Language
headless/main.c Pure-C minimal harness — CI smoke target C
pychess/main.cpp Multi-tier pipeline, grammar, delegation, external MCP C++
explorer/main.cpp Interactive C++ REPL for poking at the engine C++
openai-server/src/main.cpp OpenAI-compat HTTP front-end (chat/completions, models, SSE streaming) C++

Each example has its own default_config.yaml (consumer defaults) and .{name}/ directory (session logs, local config).

Privacy

Entropic runs entirely on your local hardware. No data is sent to external servers. No telemetry is collected. Your prompts, conversations, and model outputs never leave your machine.

Disclaimer

Entropic runs AI models locally on your hardware. AI-generated outputs may be inaccurate, biased, or inappropriate. Users are solely responsible for evaluating and using any generated content. This software does not provide professional, legal, medical, or financial advice.

Documentation

  • docs/getting-started.md — install + first call
  • docs/architecture-cpp.md — library design
  • docs/roadmap.md — version targeting
  • docs/contributing.md — dev setup, gates, branching
  • docs/releasing.md — release workflow
  • docs/security.md — vulnerability reporting

License

Apache-2.0. See [LICENSE](LICENSE) for the canonical text and [NOTICE](NOTICE) for third-party attribution. Contributors retain copyright in their contributions; see `CONTRIBUTING.md` for the DCO sign-off process.

Versions 2.0.0 through 2.2.1 were released under LGPL-3.0-or-later with a linking exception; those releases remain under that license. The relicense to Apache-2.0 takes effect at v2.2.2 and forward, and restores the permissive license used for v1.x.