Host-memory KV cache with LRU eviction. More...

#include </home/runner/work/entropic/entropic/src/inference/prompt_cache.h>

Public Member Functions
	PromptCache (size_t max_bytes)
	Construct with maximum RAM budget.

bool	store (const CacheKey &key, std::vector< uint8_t > &&data, int token_count)
	Store a KV cache snapshot.

const CacheEntry *	lookup (const CacheKey &key)
	Retrieve a cached KV snapshot.

void	clear ()
	Evict all entries.

size_t	bytes_used () const
	Current total bytes consumed by cached entries.

size_t	entry_count () const
	Number of cached entries.

CacheStats	stats () const
	Cache hit/miss statistics.

Static Public Member Functions
static CacheKey	make_key (std::string_view prompt_text, std::string_view model_path)
	Compute a cache key from prompt text and model path.

Detailed Description

Host-memory KV cache with LRU eviction.

Stores KV cache snapshots keyed by content hash of the system prompt text concatenated with the model path. Evicts least-recently-used entries when the configured RAM limit is exceeded.

Thread safety: All public methods acquire mutex_. The expensive llama.cpp calls (llama_decode, llama_state_seq_get/set_data) happen OUTSIDE the cache mutex in the caller (LlamaCppBackend).

Lifecycle: PromptCache cache(max_bytes);

cache.store(key, data, token_count); // after system prompt decode

auto* entry = cache.lookup(key); // before next decode

cache.clear(); // on model unload

entropic::PromptCache
Host-memory KV cache with LRU eviction.
Definition prompt_cache.h:102

Version: 1.8.3

Definition at line 102 of file prompt_cache.h.

Constructor & Destructor Documentation

◆ PromptCache()

entropic::PromptCache::PromptCache ( size_t max_bytes )

explicit

Construct with maximum RAM budget.

Parameters

max_bytes Maximum total bytes across all cached entries. 0 = caching disabled.

Version: 1.8.3

Parameters

max_bytes Maximum total bytes. 0 disables caching.

Version: 1.8.3

Definition at line 49 of file prompt_cache.cpp.

Member Function Documentation

◆ bytes_used()

size_t entropic::PromptCache::bytes_used ( ) const

Current total bytes consumed by cached entries.

Current total bytes consumed.

Returns: Byte count.

Version: 1.8.3

Returns: Byte count.

Definition at line 198 of file prompt_cache.cpp.

◆ clear()

void entropic::PromptCache::clear ( )

Evict all entries.

Called on model reload.

Version: 1.8.3

Definition at line 183 of file prompt_cache.cpp.

◆ entry_count()

size_t entropic::PromptCache::entry_count ( ) const

Number of cached entries.

Returns: Entry count.

Version: 1.8.3

Returns: Entry count.

Definition at line 209 of file prompt_cache.cpp.

◆ lookup()

const CacheEntry * entropic::PromptCache::lookup ( const CacheKey & key )

Retrieve a cached KV snapshot.

Look up a cached KV snapshot.

Parameters

key	Hash to look up.

Returns: Pointer to cached entry if found, nullptr on miss. Pointer valid until next store() or clear() call. Updates LRU ordering on hit.

Version: 1.8.3

On hit, moves the entry to front of LRU list and increments hit counter. On miss, increments miss counter.

Parameters

key	Hash to look up.

Returns: Pointer to entry on hit, nullptr on miss.

Definition at line 154 of file prompt_cache.cpp.

◆ make_key()

CacheKey entropic::PromptCache::make_key	(	std::string_view	prompt_text,
		std::string_view	model_path
	)

static

Compute a cache key from prompt text and model path.

Compute cache key from prompt text and model path.

Parameters

prompt_text	Full system prompt string.
model_path	Model file path string.

Returns: CacheKey with combined hash.

Version: 1.8.3

Concatenates prompt_text + '\0' + model_path and hashes with FNV-1a. The null separator prevents prefix collisions.

Parameters

prompt_text	Full system prompt string.
model_path	Model file path string.

Returns: CacheKey with combined hash.

Definition at line 67 of file prompt_cache.cpp.

◆ stats()

CacheStats entropic::PromptCache::stats ( ) const

Cache hit/miss statistics.

Cache performance statistics.

Returns: Copy of current stats.

Version: 1.8.3

Returns: Copy of current stats.

Definition at line 220 of file prompt_cache.cpp.

◆ store()

bool entropic::PromptCache::store	(	const CacheKey &	key,
		std::vector< uint8_t > &&	data,
		int	token_count
	)

Store a KV cache snapshot.

Parameters

key	Hash of prompt content + model path.
data	Raw KV cache bytes from llama_state_seq_get_data.
token_count	Number of prompt tokens this entry covers.

Returns: true if stored (may evict older entries), false if entry exceeds max_bytes entirely and cannot be stored.

Version: 1.8.3

If the entry size exceeds max_bytes, returns false without storing. Otherwise evicts LRU entries as needed and stores the new entry.

Parameters

key	Hash of prompt content + model path.
data	Raw KV cache bytes (moved).
token_count	Prompt tokens covered.

Returns: true if stored, false if too large.

Definition at line 91 of file prompt_cache.cpp.

The documentation for this class was generated from the following files:

inference/prompt_cache.h
inference/prompt_cache.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ PromptCache()

Member Function Documentation

◆ bytes_used()

◆ clear()

◆ entry_count()

◆ lookup()

◆ make_key()

◆ stats()

◆ store()