Entropic 2.3.8
Local-first agentic inference engine
Loading...
Searching...
No Matches
llama_cpp_backend.h File Reference

LlamaCppBackend — llama.cpp C API integration. More...

#include <entropic/inference/backend.h>
#include "prompt_cache.h"
#include <llama.h>
#include <atomic>
#include <chrono>
#include <cstdint>
#include <functional>
#include <memory>
#include <mutex>
#include <string>
#include <vector>
Include dependency graph for llama_cpp_backend.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  entropic::LlamaCppBackend
 LlamaCppBackend — common llama.cpp patterns (15% layer). More...
 

Namespaces

namespace  entropic
 Activate model on GPU (WARM → ACTIVE).
 

Detailed Description

LlamaCppBackend — llama.cpp C API integration.

Versioned subclass pattern: LlamaCppBackend provides common llama.cpp patterns (decode loop, sampler chain, tokenization). The pinned-commit subclass (LlamaCppBackend_b8420) overrides API-version-specific calls.

VRAM lifecycle mapping
  • COLD: nothing allocated
  • WARM: llama_model loaded (CPU mmap+mlock, n_gpu_layers=0)
  • ACTIVE: model reloaded with gpu_layers, llama_context created
Key differences from Python LlamaCppBackend
  • Direct llama.cpp C API (not llama-cpp-python wrapper)
  • No Python GIL — generation runs natively
  • No asyncio bridge — streaming is synchronous with callback

Internal to inference .so — not exposed across boundaries.

Version
1.9.13

Definition in file llama_cpp_backend.h.