|
Entropic 2.3.8
Local-first agentic inference engine
|
Local-first agentic inference engine — your models, your hardware, your control
API reference: tvanfossen.github.io/entropic — auto-generated from doxygen on every release
Entropic is a C inference engine that turns a local GGUF model into a multi-tier, tool-calling AI system. It runs entirely on your hardware — no cloud, no API keys, no telemetry. You control the model, the prompts, the tools, and the data.
The name comes from information theory: every handoff between human intent, prompt, and model is a lossy translation. Information decays at each boundary. Entropic is purpose-built to manage that decay — structured context management, identity-based delegation, grammar-constrained output, and tool-augmented reasoning minimize what gets lost along the way.
You want local AI that actually does things, not just answers questions.
Most local inference tools give you a model and a chat loop. Entropic gives you an engine — the infrastructure between your application and the model that handles the hard parts:
| Problem | How Entropic Solves It |
|---|---|
| Model just generates text | Agentic loop: generate → parse tool calls → execute → re-generate |
| One model, one personality | Identity system: same model serves multiple roles with different prompts, tools, and constraints |
| Context gets stale | Auto-compaction: summarizes old context to stay within window |
| Output format unpredictable | Grammar constraints: GBNF grammars force structured output |
| Need tools but no cloud | MCP tool servers: filesystem, bash, git, web — all local, plugin architecture |
| Privacy concerns | Zero network calls. Everything stays on your machine. |
Entropic is an engine, not an application. It's for developers building AI-powered software that needs to run locally:
Pure C at all .so boundaries. No C++ ABI crossing. Any language that can call C functions can use the engine.
For consumer install paths (tarball or pip install entropic-engine), read docs/getting-started.md — it covers C/C++ direct linking and the Python wrapper end-to-end.
For contributors building from source:
The Python package is a thin ctypes binding over the C ABI — no OOP wrapper. See docs/getting-started.md for the full walkthrough including entropic install-engine (downloads the matching librentropic.so from GitHub Releases).
Configuration loads in layers (highest priority wins):
default_config.yaml in CWD) — shipped with your app~/.entropic/config.yaml) — user machine-wide settings{project_dir}/config.local.yaml) — per-project overridesENTROPIC_* variables)Minimal config:
When using entropic_configure_dir(), the engine automatically creates:
| File | Contents |
|---|---|
{project_dir}/session.log | Engine operations — config, routing, timing, errors |
{project_dir}/session_model.log | Full user/assistant exchanges — streamed in real time |
Comprehensive engine capability inventory grouped by domain, with the primary source file for each item so the README doubles as a navigation map. For the historical "how we got here" narrative see docs/roadmap.md; for the layering behind these features see docs/architecture-cpp.md.
src/core/engine.cpp::loop, execute_iteration, include/entropic/core/engine_types.h::AgentStatesrc/core/response_generator.cpp::generate_streamingON_LOOP_ITERATION, ON_CONTEXT_ASSEMBLE, PRE_GENERATE (cancellable), POST_GENERATE (revisable), PRE_TOOL_CALL, POST_TOOL_CALL — src/core/engine.cpp::execute_iteration, dispatch_post_generateLoopMetrics surfaced via last_loop_metrics() and per-tier accessors (iterations, tool calls, tokens, duration, errors) — include/entropic/core/engine_types.h::LoopMetricssrc/core/response_generator.cpp::inject_engine_state_reminderLoopConfig.max_consecutive_same_tool — src/mcp/tool_executor.cpp::update_anti_spiral_trackingsrc/core/engine.cpp::loop (the post-while terminal-reason block)max_iterations and max_tool_calls_per_turn from frontmatter — src/facade/entropic.cpp::cache_per_tier_frontmatter, include/entropic/core/engine_types.h::effective_max_iterationssrc/core/identity_manager.cpp, src/core/delegation.cppsrc/core/response_generator.cpp::lock_tier_if_neededentropic.delegate (single child) and entropic.pipeline (multi-stage) directives — src/mcp/servers/entropic_server.cpp, src/core/directives.cppentropic.complete for explicit child-completion summaries — src/mcp/servers/entropic_server.cppinclude/entropic/core/engine_types.h::delegation_depth, delegation_ancestor_tierssrc/core/engine.cpp::finalize_delegation_result, relay_partial_result[partial — budget_exhausted] prefix and verdict-tagged metadata, instead of silently dropping — src/core/engine.cpp::relay_partial_result, src/core/delegation.cpp::build_child_resultsrc/storage/database.cpp, include/entropic/core/engine_types.h::parent_conversation_idsrc/core/worktree.cpp, include/entropic/core/worktree.hValidationVerdict enum (passed / revised / rejected_max_revisions / skipped) — src/core/constitutional_validator.cpp, include/entropic/types/validation.hvalidation_rules from identity frontmatter; constitution is background context when per-tier rules exist — src/core/constitutional_validator.cpp::build_critique_prompt, set_tier_rulesPOST_GENERATE hook (Path A: structured JSON, Path B: full message-context revision) — src/core/constitutional_validator.cpp::handle_hook, attempt_revisionsrc/core/engine.cpp::capture_validation_feedback, src/core/response_generator.cpp::inject_engine_state_reminderON_COMPLETE hook for summary validation; can reject and inject feedback as a user-role message tagged [CITATION VALIDATION] … — src/core/engine.cpp::fire_complete_hooksrc/core/constitutional_validator.cpp::set_identity_validationBuilt-in MCP servers (each shipped as a plugin):
| Server | Tools | Source |
|---|---|---|
entropic | delegate, pipeline, complete, diagnose, inspect, context_inspect | src/mcp/servers/entropic_server.cpp |
filesystem | read_file, write_file, edit_file, glob, grep | src/mcp/servers/filesystem.cpp |
bash | execute | src/mcp/servers/bash.cpp |
git | status, diff, log, commit, branch, checkout | src/mcp/servers/git.cpp |
diagnostics | diagnostics, check_errors | src/mcp/servers/diagnostics.cpp |
web | web_fetch, web_search | src/mcp/servers/web.cpp |
External MCP servers connect at runtime via stdio or SSE transport:
Cross-cutting machinery:
dlopen + entropic_create_server() factory at runtime — src/mcp/server_manager.cpp, include/entropic/interfaces/i_mcp_server.hsrc/mcp/transport_stdio.cpp, src/mcp/transport_sse.cpp, src/mcp/external_client.cppsrc/mcp/reconnect_policy.cpp, src/mcp/health_monitor.cppallowed_tools authorization from frontmatter — src/facade/engine_handle.h::tier_allowed_tools, src/prompts/manager.cppsrc/mcp/mcp_key_set.cpp, src/mcp/mcp_authorization.cppsrc/mcp/permission_manager.cpp, src/storage/permission_persister.cppsrc/mcp/tool_call_history.cppsrc/mcp/tool_executor.cpp::check_duplicate, handle_duplicateresult_kind on every POST_TOOL_CALL payload (ok / ok_empty / error / rejected_duplicate / rejected_schema / rejected_precondition) — include/entropic/types/tool_result.hok_empty) for pivot-on-empty rules — src/mcp/tool_result_classify.cpp::is_effectively_empty{"error": …}) — src/mcp/tool_result_classify.cpp::looks_like_tool_errormax_tool_result_bytes, default 16 KB; 0 disables) — src/mcp/tool_result_classify.cpp::truncate_to_cap, src/mcp/tool_executor.cpp::apply_result_size_capsrc/mcp/utf8_sanitize.cpp, include/entropic/mcp/utf8_sanitize.hsrc/mcp/servers/filesystem.cppentropic.inspect identity <tier> returns the full system prompt — src/facade/entropic.cppsp_get_identities, build_assembled_prompt_for_tiersrc/mcp/tool_executor.cpp::check_required_fields, check_enumsrc/inference/llama_cpp_backend.cpp, src/inference/orchestrator.cppdocs/dist-README.md for the arch matrixAdapterRegistry — src/inference/adapters/, src/inference/adapter_manager.cppsrc/inference/llama_cpp_backend.cpp, src/inference/orchestrator.cppon_token AND a global set_stream_observer() that fires for every token from every entry point — src/core/response_generator.cpp::stream_token_callback, src/facade/entropic.cppentropic_set_stream_observerentropic_run_streaming() reassignment so consumers don't lose events on inner loop iterations — src/core/response_generator.cpp::stream_observer_GBNF grammar constraints applied per-tier:
Output is structurally correct by construction; no post-processing, no retries — src/inference/grammar_registry.cpp
entropic_adapter_load, _unload, _swap, _state, _info) — src/facade/entropic.cpp (entropic_adapter_*)src/inference/llama_cpp_backend.cppGenerationParams: temperature, top_p, top_k, repeat_penalty, max_tokens, seed, reasoning_budget — src/inference/profile_registry.cpp, src/types/config.cppllama_state_seq_get_data / llama_state_seq_set_data (host-memory prompt cache); identity / constitution / tool prefixes hot-swap on identity change — src/inference/prompt_cache.cppsrc/inference/image_preprocessor.cpp, src/inference/adapters/qwen35_adapter.cppsrc/config/loader.cpp, src/config/env_overrides.cpp:default_config.yaml next to your binary~/.entropic/config.yaml{project_dir}/config.local.yamlENTROPIC_* variablessrc/config/validate.cpppath: primary resolves via bundled_models.yaml to a vetted GGUF) — src/config/bundled_models.cpp, data/bundled_models.yamlentropic download primary fetches into ~/.entropic/models/ with resume support — src/cli/download.cppDATA_DIR define, overridable at runtime via config_dir — src/config/data_dir.cppallowed_tools, validation_rules, relay, max_iterations, max_tool_calls_per_turn — src/prompts/manager.cpp, include/entropic/prompts/manager.h::IdentityFrontmattersrc/storage/database.cpp, src/storage/backend.cppStorageInterface for downstream backends — include/entropic/interfaces/i_storage.h, src/storage/c_interface.cppsrc/core/compaction.cpp, src/core/compactor_registry.cppsrc/storage/audit_logger.cpp, src/storage/audit_entry.cppsession.log (engine ops, routing, timing) + session_model.log (full transcripts streamed live) — src/types/session_logger.cpp20+ hook points across the loop lifecycle. All registration / dispatch lives in src/core/hook_registry.cpp and include/entropic/types/hooks.h.
ON_LOOP_START, ON_LOOP_ITERATION, ON_LOOP_END — fired in src/core/engine.cpp::loopPRE_GENERATE (cancellable), POST_GENERATE (content-revisable), ON_STREAM_TOKEN — src/core/engine.cpp::execute_iteration, src/core/response_generator.cpp::stream_token_callbackPRE_TOOL_CALL, POST_TOOL_CALL, ON_PERMISSION_CHECK — src/mcp/tool_executor.cpp::fire_pre_tool_hook, fire_post_tool_hookON_DELEGATE, ON_COMPLETE, ON_TIER_SELECTED — src/core/engine.cpp::fire_complete_hookON_CONTEXT_ASSEMBLE, ON_ADAPTER_SWAP — src/core/context_manager.cpp, src/inference/adapter_manager.cppHook semantics:
src/facade/entropic_hooks.cppentropic mcp-bridge — pure stdio↔unix-socket relay (v2.1.7+, gh#34): forwards JSON-RPC bytes between an MCP client (Claude Code, VSCode, etc.) and a running engine's external bridge socket. Owns no engine instance and loads no model; an engine host (TUI / consumer app / future headless server) must already be running for the same project directory. Fails fast with a diagnostic naming the canonical path + socket when no engine is reachable — src/cli/mcp_bridge.cpp, src/facade/external_bridge.cppask_complete / progress events simultaneously — src/facade/external_bridge.cpp::subscribe, broadcast_notificationask_status polling for long-running tasks — src/facade/external_bridge.cpp::run_async_ask, handle_ask_statussrc/facade/external_bridge.cpp::attach_phase_observer, phase_observer_cbsrc/facade/external_bridge.cpp::observer_call_is_stale, include/entropic/mcp/external_bridge.h::observer_gen_entropic.context_clear interrupts and drains in-flight async tasks before returning — src/facade/external_bridge.cpp::cancel_inflight_async_tasks.so boundary — opaque handles, error codes via enum, no C++ ABI crossing, no exceptions across the boundary — include/entropic/entropic.h, src/facade/entropic.cppfind_package(entropic 2.1) CMake support with imported target entropic::entropic — cmake/entropic-config.cmake.in, CMakeLists.txtbin/, lib/, include/, share/ — standard Unix prefix — see docs/releasing.mdpip install entropic-engine — pure-Python ~50 KB ctypes wrapper + entropic install-engine subcommand that fetches the matching tarball from GitHub Releases (playwright pattern) — python/src/entropic/$ENTROPIC_LIB / $ENTROPIC_HOME env vars for custom install resolution — python/src/entropic/_loader.pyentropic CLI subcommands: mcp-bridge, download, version, plus wrapper-side install-engine — src/cli/main.cpp, python/src/entropic/cli.pyheadless (C), pychess (C++ multi-tier showcase), explorer (interactive REPL), openai-server (OpenAI-compat HTTP front-end with chat/completions, completions, models, models/{name}, health, SSE streaming) — examples/spdlog (e.g. mcp.tool_executor, core.response_generator, inference.orchestrator) — src/types/logging.cppLoopMetrics exposed through last_loop_metrics() and per-tier accessor maps — include/entropic/core/engine.h::last_loop_metrics, per_tier_metricsThroughputTracker integration on GenerationResult — src/inference/throughput_tracker.cppentropic_throughput_tok_per_sec() facade query — src/facade/entropic.cppsrc/core/hook_registry.cppsession.log + session_model.log per project_dir — src/types/session_logger.cppdoxygen-guard enforces inline documentation on every function (@brief, exemption tag, @version bumped on body change) — .doxygen-guard.yaml, .pre-commit-config.yamlknots enforces code-quality budget per function (cognitive complexity ≤ 15, McCabe ≤ 15, nesting ≤ 4, SLOC ≤ 50, ABC ≤ 10, returns ≤ 3) — .pre-commit-config.yamlgcovr enforced by inv check-coverage — tasks.py::check_coverage, .pre-commit-config.yaml.pre-commit-config.yamltests/unit/model-results-vX.Y.Z.json at each x.y.0 — tests/model/, tasks.py::testtests/distribution-smoke-consumer/ exercises the find_package(entropic) consumer experience end-to-end — tests/distribution-smoke-consumer/| Gate | When | What runs |
|---|---|---|
| Pre-commit | Every commit | Unit tests (CPU, no GPU) |
| Minor version | Each x.y.0 bump | Full model/benchmark suite (GPU) |
| Patch version | Each x.y.z bump | Unit tests only |
web MCP server's web_fetch / web_search tools, which the consumer opts into explicitly via config — src/mcp/servers/web.cpp| Key | Model | Size | VRAM |
|---|---|---|---|
primary | Qwen3.5-35B-A3B-UD-IQ3_XXS | 13.1 GB | 15+ GB |
mid | Qwen3.5-9B-Q8_0 | 9.5 GB | 12+ GB |
lightweight | Qwen3.5-4B-Q8_0 | 4.5 GB | 8+ GB |
Use path: primary (or mid, lightweight) in config — the engine resolves to the full model path via the bundled registry.
| Preset | Description | Use Case |
|---|---|---|
full | CUDA, all servers, tests | Development workstation |
dev | CPU, debug, tests | Fast iteration |
minimal-static | CPU, static .a, minimal servers | Embedded consumer |
game | CUDA, minimal MCP servers | Game engine integration |
coverage | CPU, gcov instrumentation | Coverage analysis |
| Example | Demonstrates | Language |
|---|---|---|
headless/main.c | Pure-C minimal harness — CI smoke target | C |
pychess/main.cpp | Multi-tier pipeline, grammar, delegation, external MCP | C++ |
explorer/main.cpp | Interactive C++ REPL for poking at the engine | C++ |
openai-server/src/main.cpp | OpenAI-compat HTTP front-end (chat/completions, models, SSE streaming) | C++ |
Each example has its own default_config.yaml (consumer defaults) and .{name}/ directory (session logs, local config).
Entropic runs entirely on your local hardware. No data is sent to external servers. No telemetry is collected. Your prompts, conversations, and model outputs never leave your machine.
Entropic runs AI models locally on your hardware. AI-generated outputs may be inaccurate, biased, or inappropriate. Users are solely responsible for evaluating and using any generated content. This software does not provide professional, legal, medical, or financial advice.
Apache-2.0. See [LICENSE](LICENSE) for the canonical text and [NOTICE](NOTICE) for third-party attribution. Contributors retain copyright in their contributions; see `CONTRIBUTING.md` for the DCO sign-off process.
Versions 2.0.0 through 2.2.1 were released under LGPL-3.0-or-later with a linking exception; those releases remain under that license. The relicense to Apache-2.0 takes effect at v2.2.2 and forward, and restores the permissive license used for v1.x.