|
Entropic 2.3.8
Local-first agentic inference engine
|
Host-memory KV cache with LRU eviction. More...
#include </home/runner/work/entropic/entropic/src/inference/prompt_cache.h>
Public Member Functions | |
| PromptCache (size_t max_bytes) | |
| Construct with maximum RAM budget. | |
| bool | store (const CacheKey &key, std::vector< uint8_t > &&data, int token_count) |
| Store a KV cache snapshot. | |
| const CacheEntry * | lookup (const CacheKey &key) |
| Retrieve a cached KV snapshot. | |
| void | clear () |
| Evict all entries. | |
| size_t | bytes_used () const |
| Current total bytes consumed by cached entries. | |
| size_t | entry_count () const |
| Number of cached entries. | |
| CacheStats | stats () const |
| Cache hit/miss statistics. | |
Static Public Member Functions | |
| static CacheKey | make_key (std::string_view prompt_text, std::string_view model_path) |
| Compute a cache key from prompt text and model path. | |
Host-memory KV cache with LRU eviction.
Stores KV cache snapshots keyed by content hash of the system prompt text concatenated with the model path. Evicts least-recently-used entries when the configured RAM limit is exceeded.
Definition at line 102 of file prompt_cache.h.
|
explicit |
Construct with maximum RAM budget.
| max_bytes | Maximum total bytes across all cached entries. 0 = caching disabled. |
| max_bytes | Maximum total bytes. 0 disables caching. |
Definition at line 49 of file prompt_cache.cpp.
| size_t entropic::PromptCache::bytes_used | ( | ) | const |
Current total bytes consumed by cached entries.
Current total bytes consumed.
Definition at line 198 of file prompt_cache.cpp.
| void entropic::PromptCache::clear | ( | ) |
Evict all entries.
Called on model reload.
Definition at line 183 of file prompt_cache.cpp.
| size_t entropic::PromptCache::entry_count | ( | ) | const |
Number of cached entries.
Definition at line 209 of file prompt_cache.cpp.
| const CacheEntry * entropic::PromptCache::lookup | ( | const CacheKey & | key | ) |
Retrieve a cached KV snapshot.
Look up a cached KV snapshot.
| key | Hash to look up. |
On hit, moves the entry to front of LRU list and increments hit counter. On miss, increments miss counter.
| key | Hash to look up. |
Definition at line 154 of file prompt_cache.cpp.
|
static |
Compute a cache key from prompt text and model path.
Compute cache key from prompt text and model path.
| prompt_text | Full system prompt string. |
| model_path | Model file path string. |
Concatenates prompt_text + '\0' + model_path and hashes with FNV-1a. The null separator prevents prefix collisions.
| prompt_text | Full system prompt string. |
| model_path | Model file path string. |
Definition at line 67 of file prompt_cache.cpp.
| CacheStats entropic::PromptCache::stats | ( | ) | const |
Cache hit/miss statistics.
Cache performance statistics.
Definition at line 220 of file prompt_cache.cpp.
| bool entropic::PromptCache::store | ( | const CacheKey & | key, |
| std::vector< uint8_t > && | data, | ||
| int | token_count | ||
| ) |
Store a KV cache snapshot.
| key | Hash of prompt content + model path. |
| data | Raw KV cache bytes from llama_state_seq_get_data. |
| token_count | Number of prompt tokens this entry covers. |
If the entry size exceeds max_bytes, returns false without storing. Otherwise evicts LRU entries as needed and stores the new entry.
| key | Hash of prompt content + model path. |
| data | Raw KV cache bytes (moved). |
| token_count | Prompt tokens covered. |
Definition at line 91 of file prompt_cache.cpp.