Entropic 2.3.8
Local-first agentic inference engine
Loading...
Searching...
No Matches
entropic::InferenceBackend Class Referenceabstract

Concrete base class for inference backends (80% logic). More...

#include <entropic/inference/backend.h>

Inheritance diagram for entropic::InferenceBackend:

Public Member Functions

bool load (const ModelConfig &config)
 Load model into CPU RAM (COLD → WARM).
 
bool activate ()
 Promote to GPU (WARM → ACTIVE).
 
void deactivate ()
 Release GPU layers (ACTIVE → WARM).
 
void unload ()
 Full unload (→ COLD).
 
bool load_and_activate (const ModelConfig &config)
 Convenience: load() + activate().
 
GenerationResult generate (const std::vector< Message > &messages, const GenerationParams &params)
 Generate a complete response.
 
GenerationResult generate_streaming (const std::vector< Message > &messages, const GenerationParams &params, std::function< void(std::string_view token)> on_token, std::atomic< bool > &cancel)
 Generate with per-token streaming callback.
 
GenerationResult generate_speculative (const std::vector< Message > &messages, const GenerationParams &params, std::function< void(std::string_view token)> on_token, std::atomic< bool > &cancel)
 Generate via the speculative-decoding kernel (v2.1.11).
 
GenerationResult complete (const std::string &prompt, const GenerationParams &params)
 Raw text completion without chat template.
 
LogprobResult evaluate_logprobs (const int32_t *tokens, int n_tokens)
 Evaluate per-token log-probabilities for a token sequence.
 
float compute_perplexity (const int32_t *tokens, int n_tokens)
 Compute perplexity for a token sequence.
 
ModelState state () const
 Current lifecycle state (lock-free read).
 
bool is_active () const
 True when state is ACTIVE.
 
bool is_loaded () const
 True when state is WARM or ACTIVE.
 
int count_tokens (const std::string &text) const
 Count tokens using model's tokenizer.
 
virtual std::vector< int32_t > tokenize_text (const std::string &text) const
 Tokenize text to token IDs.
 
int context_length () const
 Model's context window size.
 
virtual void clear_prompt_cache ()
 Invalidate any backend-owned prompt/KV caches.
 
const ModelConfigconfig () const
 Stored model config.
 
bool supports (BackendCapability cap) const
 Query whether this backend supports a capability.
 
std::vector< BackendCapabilitycapabilities () const
 Get all supported capabilities as a vector.
 
BackendInfo info () const
 Get backend metadata.
 
bool save_state (int seq_id, std::vector< uint8_t > &buffer) const
 Save model state to buffer.
 
bool restore_state (int seq_id, const std::vector< uint8_t > &buffer)
 Restore model state from buffer.
 
bool clear_state (int seq_id=-1)
 Clear/reset model state for a sequence.
 
GenerationResult generate_seq (int seq_id, const std::vector< Message > &messages, const GenerationParams &params)
 Generate with explicit sequence ID.
 
GenerationResult generate_streaming_seq (int seq_id, const std::vector< Message > &messages, const GenerationParams &params, std::function< void(std::string_view token)> on_token, std::atomic< bool > &cancel)
 Streaming generation with explicit sequence ID.
 

Protected Member Functions

virtual bool do_load (const ModelConfig &config)=0
 Load model into CPU RAM.
 
virtual bool do_activate ()=0
 Promote loaded model to GPU.
 
virtual void do_deactivate ()=0
 Release GPU, keep CPU.
 
virtual void do_unload ()=0
 Full unload.
 
virtual GenerationResult do_generate (const std::vector< Message > &messages, const GenerationParams &params)=0
 Subclass generation.
 
virtual GenerationResult do_generate_streaming (const std::vector< Message > &messages, const GenerationParams &params, std::function< void(std::string_view token)> on_token, std::atomic< bool > &cancel)=0
 Subclass streaming generation.
 
virtual GenerationResult do_generate_speculative (const std::vector< Message > &messages, const GenerationParams &params, std::function< void(std::string_view token)> on_token, std::atomic< bool > &cancel)
 Subclass speculative-decoding streaming generation.
 
virtual GenerationResult do_complete (const std::string &prompt, const GenerationParams &params)=0
 Subclass raw completion.
 
virtual int do_count_tokens (const std::string &text) const =0
 Subclass token counting.
 
virtual LogprobResult do_evaluate_logprobs (const int32_t *tokens, int n_tokens)=0
 Backend-specific logprob evaluation.
 
virtual bool do_supports (BackendCapability cap) const
 Declare supported capabilities.
 
virtual std::string do_backend_name () const =0
 Return backend name identifier.
 
virtual BackendInfo do_info () const
 Populate backend metadata.
 
virtual bool do_save_state (int seq_id, std::vector< uint8_t > &buffer) const
 Save model state (KV cache or hidden state).
 
virtual bool do_restore_state (int seq_id, const std::vector< uint8_t > &buffer)
 Restore model state.
 
virtual bool do_clear_state (int seq_id)
 Clear/reset model state.
 
virtual GenerationResult do_generate_seq (int seq_id, const std::vector< Message > &messages, const GenerationParams &params)
 Generate with sequence ID.
 
virtual GenerationResult do_generate_streaming_seq (int seq_id, const std::vector< Message > &messages, const GenerationParams &params, std::function< void(std::string_view token)> on_token, std::atomic< bool > &cancel)
 Streaming generation with sequence ID.
 
bool fire_model_load_hook (const ModelConfig &config)
 Fire ON_MODEL_LOAD pre-hook.
 
void set_hooks (const HookInterface &hooks)
 Set the hook dispatch interface.
 

Protected Attributes

std::string last_error_
 Last error message for diagnostics.
 

Detailed Description

Concrete base class for inference backends (80% logic).

Public methods implement the lifecycle state machine, transition locking, timing, and logging. Protected virtual methods are the 20% that subclasses override with backend-specific logic.

Transition rules (enforced by base class)

Invalid transitions are no-ops with INFO log (not errors).

Version
1.9.13

Definition at line 69 of file backend.h.

Member Function Documentation

◆ activate()

bool entropic::InferenceBackend::activate ( )

Promote to GPU (WARM → ACTIVE).

Loads first if COLD.

Returns
true on success.
Version
1.8.2

Loads first if COLD.

Returns
true on success, false on failure.

Definition at line 88 of file backend.cpp.

◆ capabilities()

std::vector< BackendCapability > entropic::InferenceBackend::capabilities ( ) const

Get all supported capabilities as a vector.

Get all supported capabilities.

Returns
Vector of capabilities this backend supports.

Convenience method. Iterates BackendCapability enum, calls supports() on each.

Version
1.9.13
Returns
Vector of supported capabilities.

Definition at line 468 of file backend.cpp.

◆ clear_prompt_cache()

virtual void entropic::InferenceBackend::clear_prompt_cache ( )
inlinevirtual

Invalidate any backend-owned prompt/KV caches.

Called when identity or prompt-prefix inputs change so stale cached prefixes are never served against the new system prompt. Default is a no-op for backends with no cache. (P1-7, 2.0.6-rc16)

@utility

Version
2.0.6-rc16

Reimplemented in entropic::LlamaCppBackend.

Definition at line 270 of file backend.h.

◆ clear_state()

bool entropic::InferenceBackend::clear_state ( int  seq_id = -1)

Clear/reset model state for a sequence.

Clear model state.

Parameters
seq_idSequence identifier (-1 for all sequences).
Returns
true on success.

For transformers: clears KV cache. For recurrent: resets hidden state to initial values.

Version
1.9.13

Requires WARM or ACTIVE.

Parameters
seq_idSequence ID, or -1 for all.
Returns
true on success.

Definition at line 548 of file backend.cpp.

◆ complete()

GenerationResult entropic::InferenceBackend::complete ( const std::string &  prompt,
const GenerationParams params 
)

Raw text completion without chat template.

Raw text completion.

Parameters
promptRaw prompt string (no chat formatting).
paramsGeneration parameters.
Returns
GenerationResult.
Version
1.8.2

Requires ACTIVE state.

Parameters
promptRaw prompt string.
paramsGeneration parameters.
Returns
GenerationResult.

Definition at line 308 of file backend.cpp.

◆ compute_perplexity()

float entropic::InferenceBackend::compute_perplexity ( const int32_t *  tokens,
int  n_tokens 
)

Compute perplexity for a token sequence.

Parameters
tokensArray of token IDs.
n_tokensNumber of tokens (minimum 2).
Returns
Perplexity as exp(-mean(logprobs)).

Convenience method — calls evaluate_logprobs() and returns only the perplexity value.

Version
1.9.10

Convenience wrapper — calls evaluate_logprobs() and returns only the perplexity value.

Parameters
tokensArray of token IDs.
n_tokensNumber of tokens (minimum 2).
Returns
Perplexity as exp(-mean(logprobs)).

Definition at line 401 of file backend.cpp.

◆ config()

const ModelConfig & entropic::InferenceBackend::config ( ) const
inline

Stored model config.

Returns
Const reference to the ModelConfig used for this backend. @utility
Version
1.8.2

Definition at line 278 of file backend.h.

◆ context_length()

int entropic::InferenceBackend::context_length ( ) const
inline

Model's context window size.

Returns
Maximum context length in tokens. @utility
Version
1.8.2

Definition at line 257 of file backend.h.

◆ count_tokens()

int entropic::InferenceBackend::count_tokens ( const std::string &  text) const

Count tokens using model's tokenizer.

Count tokens.

Returns
Exact count if loaded, len/4 estimate if COLD.
Version
1.8.2

Exact if loaded, estimate if COLD.

Parameters
textText to tokenize.
Returns
Token count.

Definition at line 442 of file backend.cpp.

◆ deactivate()

void entropic::InferenceBackend::deactivate ( )

Release GPU layers (ACTIVE → WARM).

No-op if not ACTIVE.

Version
1.8.2

No-op if not ACTIVE.

Definition at line 117 of file backend.cpp.

◆ do_activate()

virtual bool entropic::InferenceBackend::do_activate ( )
protectedpure virtual

Promote loaded model to GPU.

Called under transition_mutex_.

Version
1.8.2

Implemented in entropic::LlamaCppBackend.

◆ do_backend_name()

virtual std::string entropic::InferenceBackend::do_backend_name ( ) const
protectedpure virtual

Return backend name identifier.

Returns
Short name (e.g. "llama.cpp", "axcl").

Pure virtual — every backend must identify itself.

Version
1.9.13

Implemented in entropic::LlamaCppBackend.

◆ do_clear_state()

bool entropic::InferenceBackend::do_clear_state ( int  seq_id)
protectedvirtual

Clear/reset model state.

Default: state clear not supported.

Parameters
seq_idSequence ID, or -1 for all.
Returns
true on success. Default: returns false (not supported).
Version
1.9.13
Parameters
seq_idSequence identifier.
Returns
false.

Reimplemented in entropic::LlamaCppBackend.

Definition at line 688 of file backend.cpp.

◆ do_complete()

virtual GenerationResult entropic::InferenceBackend::do_complete ( const std::string &  prompt,
const GenerationParams params 
)
protectedpure virtual

Subclass raw completion.

Called only when ACTIVE.

Parameters
promptRaw prompt string (no chat template applied).
paramsGeneration parameters.
Returns
Generation result populated by the subclass.
Version
1.8.2

Implemented in entropic::LlamaCppBackend.

◆ do_count_tokens()

virtual int entropic::InferenceBackend::do_count_tokens ( const std::string &  text) const
protectedpure virtual

Subclass token counting.

Called only when model loaded.

Parameters
textText whose tokens should be counted.
Returns
Token count for the provided text.
Version
1.8.2

Implemented in entropic::LlamaCppBackend.

◆ do_deactivate()

virtual void entropic::InferenceBackend::do_deactivate ( )
protectedpure virtual

Release GPU, keep CPU.

Called under transition_mutex_.

Version
1.8.2

Implemented in entropic::LlamaCppBackend.

◆ do_evaluate_logprobs()

virtual LogprobResult entropic::InferenceBackend::do_evaluate_logprobs ( const int32_t *  tokens,
int  n_tokens 
)
protectedpure virtual

Backend-specific logprob evaluation.

Parameters
tokensToken IDs to evaluate.
n_tokensNumber of tokens.
Returns
LogprobResult with per-token logprobs (N-1 values).

Called by evaluate_logprobs() after state validation and eval_mutex_ acquisition. The base class handles state assertion, minimum token count validation, mutex, perplexity computation from logprobs, and logging. The implementation handles batch allocation, decode calls, logit extraction, and temporary seq_id lifecycle.

Version
1.9.10

Implemented in entropic::LlamaCppBackend.

◆ do_generate()

virtual GenerationResult entropic::InferenceBackend::do_generate ( const std::vector< Message > &  messages,
const GenerationParams params 
)
protectedpure virtual

Subclass generation.

Called only when ACTIVE.

Parameters
messagesConversation history.
paramsGeneration parameters.
Returns
Generation result populated by the subclass.
Version
1.8.2

Implemented in entropic::LlamaCppBackend.

◆ do_generate_seq()

GenerationResult entropic::InferenceBackend::do_generate_seq ( int  seq_id,
const std::vector< Message > &  messages,
const GenerationParams params 
)
protectedvirtual

Generate with sequence ID.

Default: ignores seq_id, delegates to do_generate().

Parameters
seq_idSequence identifier.
messagesConversation history.
paramsGeneration parameters.
Returns
GenerationResult.

Default: ignores seq_id, delegates to do_generate().

Version
1.9.13
Parameters
seq_idSequence identifier (ignored).
messagesConversation history.
paramsGeneration parameters.
Returns
GenerationResult from do_generate().

Definition at line 701 of file backend.cpp.

◆ do_generate_speculative()

GenerationResult entropic::InferenceBackend::do_generate_speculative ( const std::vector< Message > &  messages,
const GenerationParams params,
std::function< void(std::string_view token)>  on_token,
std::atomic< bool > &  cancel 
)
protectedvirtual

Subclass speculative-decoding streaming generation.

Default implementation of speculative streaming generation.

Same contract as do_generate_streaming (callback fires once per accepted token, cancel flag honored between accept rounds) but the backend is expected to drive a draft model through common_speculative_* (or equivalent) to propose tokens, then verify them in batch against the target. Output distribution MUST be bit-identical to plain decode on rejection cases (correctness contract from the v2.1.11 proposal).

Default returns a result with ENTROPIC_ERROR_NOT_SUPPORTED — backends that don't implement speculative fall through to the standard streaming path via the orchestrator.

Parameters
messagesConversation history.
paramsGeneration parameters.
on_tokenCallback for each accepted token.
cancelAtomic cancel flag.
Returns
Generation result. On NOT_SUPPORTED, caller should fall back to do_generate_streaming.
Version
2.1.11

Returns ENTROPIC_ERROR_NOT_SUPPORTED so the orchestrator falls back to plain do_generate_streaming. Backends that implement the speculative kernel override this method. (v2.1.11, gh#36)

Parameters
messagesUnused in default impl.
paramsUnused in default impl.
on_tokenUnused in default impl.
cancelUnused in default impl.
Returns
GenerationResult with NOT_SUPPORTED error code.

Reimplemented in entropic::LlamaCppBackend.

Definition at line 286 of file backend.cpp.

◆ do_generate_streaming()

virtual GenerationResult entropic::InferenceBackend::do_generate_streaming ( const std::vector< Message > &  messages,
const GenerationParams params,
std::function< void(std::string_view token)>  on_token,
std::atomic< bool > &  cancel 
)
protectedpure virtual

Subclass streaming generation.

Called only when ACTIVE.

Parameters
messagesConversation history.
paramsGeneration parameters.
on_tokenCallback invoked per emitted token.
cancelAtomic flag — when true, subclass must stop streaming.
Returns
Generation result populated by the subclass.
Version
1.8.2

Implemented in entropic::LlamaCppBackend.

◆ do_generate_streaming_seq()

GenerationResult entropic::InferenceBackend::do_generate_streaming_seq ( int  seq_id,
const std::vector< Message > &  messages,
const GenerationParams params,
std::function< void(std::string_view token)>  on_token,
std::atomic< bool > &  cancel 
)
protectedvirtual

Streaming generation with sequence ID.

Default: ignores seq_id, delegates to do_generate_streaming().

Parameters
seq_idSequence identifier.
messagesConversation history.
paramsGeneration parameters.
on_tokenPer-token callback.
cancelCancellation flag.
Returns
GenerationResult.

Default: ignores seq_id, delegates to do_generate_streaming().

Version
1.9.13
Parameters
seq_idSequence identifier (ignored).
messagesConversation history.
paramsGeneration parameters.
on_tokenPer-token callback.
cancelCancellation flag.
Returns
GenerationResult from do_generate_streaming().

Definition at line 720 of file backend.cpp.

◆ do_info()

BackendInfo entropic::InferenceBackend::do_info ( ) const
protectedvirtual

Populate backend metadata.

Default: BackendInfo with name only.

Returns
BackendInfo with model-specific details.

Default: returns BackendInfo with name from do_backend_name().

Version
1.9.13
Returns
BackendInfo with name from do_backend_name().

Reimplemented in entropic::LlamaCppBackend.

Definition at line 647 of file backend.cpp.

◆ do_load()

virtual bool entropic::InferenceBackend::do_load ( const ModelConfig config)
protectedpure virtual

Load model into CPU RAM.

Called under transition_mutex_.

Parameters
configValidated model config.
Returns
true on success. Set last_error_ on failure.
Version
1.8.2

Implemented in entropic::LlamaCppBackend.

◆ do_restore_state()

bool entropic::InferenceBackend::do_restore_state ( int  seq_id,
const std::vector< uint8_t > &  buffer 
)
protectedvirtual

Restore model state.

Default: state restore not supported.

Parameters
seq_idSequence identifier.
bufferState data to restore.
Returns
true on success. Default: returns false (not supported).
Version
1.9.13
Parameters
seq_idSequence identifier.
bufferState data.
Returns
false.

Definition at line 675 of file backend.cpp.

◆ do_save_state()

bool entropic::InferenceBackend::do_save_state ( int  seq_id,
std::vector< uint8_t > &  buffer 
) const
protectedvirtual

Save model state (KV cache or hidden state).

Default: state save not supported.

Parameters
seq_idSequence identifier.
bufferOutput buffer.
Returns
true on success. Default: returns false (not supported).
Version
1.9.13
Parameters
seq_idSequence identifier.
bufferOutput buffer.
Returns
false.

Definition at line 661 of file backend.cpp.

◆ do_supports()

bool entropic::InferenceBackend::do_supports ( BackendCapability  cap) const
protectedvirtual

Declare supported capabilities.

Default: no capabilities supported.

Parameters
capCapability to check.
Returns
true if this backend supports the capability.

Default: returns false for everything.

Version
1.9.13
Parameters
capCapability to check.
Returns
false.

Reimplemented in entropic::LlamaCppBackend.

Definition at line 637 of file backend.cpp.

◆ do_unload()

virtual void entropic::InferenceBackend::do_unload ( )
protectedpure virtual

Full unload.

Called under transition_mutex_.

Version
1.8.2

Implemented in entropic::LlamaCppBackend.

◆ evaluate_logprobs()

LogprobResult entropic::InferenceBackend::evaluate_logprobs ( const int32_t *  tokens,
int  n_tokens 
)

Evaluate per-token log-probabilities for a token sequence.

Evaluate per-token log-probabilities.

Parameters
tokensArray of token IDs to evaluate.
n_tokensNumber of tokens in the array (minimum 2).
Returns
LogprobResult with per-token logprobs and perplexity.
Exceptions
std::runtime_errorif model is not ACTIVE.
std::runtime_errorif n_tokens < 2.
Thread safety
Serialized by eval_mutex_. Does not block generation. Uses a temporary KV cache sequence ID — no mutation of generation state.
Version
1.9.10

Requires ACTIVE state.

The 80% logic: state check, input validation, eval_mutex_, perplexity computation from raw logprobs, and logging. Delegates to do_evaluate_logprobs() for backend-specific batch/decode work.

Parameters
tokensArray of token IDs.
n_tokensNumber of tokens (minimum 2).
Returns
LogprobResult with per-token logprobs and perplexity.
Exceptions
std::runtime_erroron state/input errors.

Definition at line 343 of file backend.cpp.

◆ fire_model_load_hook()

bool entropic::InferenceBackend::fire_model_load_hook ( const ModelConfig config)
protected

Fire ON_MODEL_LOAD pre-hook.

Parameters
configModel config being loaded.
Returns
true if hook cancelled the load.
Version
1.9.1
Parameters
configModel config being loaded.
Returns
true if hook cancelled the load.

Definition at line 417 of file backend.cpp.

◆ generate()

GenerationResult entropic::InferenceBackend::generate ( const std::vector< Message > &  messages,
const GenerationParams params 
)

Generate a complete response.

Parameters
messagesConversation history.
paramsGeneration parameters.
Returns
GenerationResult with content, token count, timing. Returns error result if not ACTIVE.
Version
1.8.2

Requires ACTIVE state.

Parameters
messagesConversation history.
paramsGeneration parameters.
Returns
GenerationResult (error result if not ACTIVE).

Definition at line 182 of file backend.cpp.

◆ generate_seq()

GenerationResult entropic::InferenceBackend::generate_seq ( int  seq_id,
const std::vector< Message > &  messages,
const GenerationParams params 
)

Generate with explicit sequence ID.

Parameters
seq_idSequence identifier for multi-sequence backends.
messagesConversation history.
paramsGeneration parameters.
Returns
GenerationResult with seq_id set.

Default: ignores seq_id, delegates to generate().

Version
1.9.13

Requires ACTIVE.

Parameters
seq_idSequence identifier.
messagesConversation history.
paramsGeneration parameters.
Returns
GenerationResult with seq_id set.

Definition at line 571 of file backend.cpp.

◆ generate_speculative()

GenerationResult entropic::InferenceBackend::generate_speculative ( const std::vector< Message > &  messages,
const GenerationParams params,
std::function< void(std::string_view token)>  on_token,
std::atomic< bool > &  cancel 
)

Generate via the speculative-decoding kernel (v2.1.11).

Public entry point for speculative-decoding streaming.

Public wrapper around do_generate_speculative. Validates ACTIVE state, then delegates to the subclass override. Backends that do not implement the kernel return ENTROPIC_ERROR_NOT_SUPPORTED — the orchestrator falls back to generate_streaming in that case.

Parameters
messagesConversation history.
paramsGeneration parameters.
on_tokenPer-accepted-token callback.
cancelCancellation flag.
Returns
GenerationResult.
Version
2.1.11

Mirrors generate_streaming: validates ACTIVE state then delegates to the subclass override. Stamps generation_time_ms. Returns the subclass's NOT_SUPPORTED on stub backends — caller falls back to generate_streaming.

Parameters
messagesConversation history.
paramsGeneration parameters.
on_tokenPer-accepted-token callback.
cancelCancellation flag.
Returns
GenerationResult.

Definition at line 248 of file backend.cpp.

◆ generate_streaming()

GenerationResult entropic::InferenceBackend::generate_streaming ( const std::vector< Message > &  messages,
const GenerationParams params,
std::function< void(std::string_view token)>  on_token,
std::atomic< bool > &  cancel 
)

Generate with per-token streaming callback.

Streaming generation.

Parameters
messagesConversation history.
paramsGeneration parameters.
on_tokenCalled for each token (valid only during callback).
cancelSet to true to abort. Latency: one token.
Returns
GenerationResult with final content and timing.
Version
1.8.2

Requires ACTIVE state.

Parameters
messagesConversation history.
paramsGeneration parameters.
on_tokenPer-token callback.
cancelAtomic cancel flag.
Returns
GenerationResult.

Definition at line 211 of file backend.cpp.

◆ generate_streaming_seq()

GenerationResult entropic::InferenceBackend::generate_streaming_seq ( int  seq_id,
const std::vector< Message > &  messages,
const GenerationParams params,
std::function< void(std::string_view token)>  on_token,
std::atomic< bool > &  cancel 
)

Streaming generation with explicit sequence ID.

Streaming generation with sequence ID.

Parameters
seq_idSequence identifier.
messagesConversation history.
paramsGeneration parameters.
on_tokenPer-token callback.
cancelCancellation flag.
Returns
GenerationResult with seq_id set.
Version
1.9.13

Requires ACTIVE.

Parameters
seq_idSequence identifier.
messagesConversation history.
paramsGeneration parameters.
on_tokenPer-token callback.
cancelCancellation flag.
Returns
GenerationResult with seq_id set.

Definition at line 603 of file backend.cpp.

◆ info()

BackendInfo entropic::InferenceBackend::info ( ) const

Get backend metadata.

Returns
BackendInfo populated from model metadata after load(). Returns default (empty) info with name only if COLD.

Base class returns a default-constructed BackendInfo with name from do_backend_name(). Subclasses override do_info() to populate architecture, quantization, memory usage, etc.

Version
1.9.13

Delegates to do_info().

Returns
BackendInfo with at least name populated.

Definition at line 486 of file backend.cpp.

◆ is_active()

bool entropic::InferenceBackend::is_active ( ) const
inline

True when state is ACTIVE.

Returns
true if state is ACTIVE, false otherwise. @utility
Version
1.8.2

Definition at line 224 of file backend.h.

◆ is_loaded()

bool entropic::InferenceBackend::is_loaded ( ) const
inline

True when state is WARM or ACTIVE.

Returns
true if state is WARM or ACTIVE, false if COLD. @utility
Version
1.8.2

Definition at line 232 of file backend.h.

◆ load()

bool entropic::InferenceBackend::load ( const ModelConfig config)

Load model into CPU RAM (COLD → WARM).

Parameters
configModel configuration (path, context length, GPU layers, etc.).
Returns
true on success.
Version
1.8.2

Acquires transition_mutex_. No-op if already WARM/ACTIVE.

Parameters
configValidated model config.
Returns
true on success, false on failure.

Definition at line 54 of file backend.cpp.

◆ load_and_activate()

bool entropic::InferenceBackend::load_and_activate ( const ModelConfig config)

Convenience: load() + activate().

Parameters
configModel configuration passed through to load().
Returns
true on success (both load and activate succeeded).
Version
1.8.2
Parameters
configModel config.
Returns
true on success.

Definition at line 165 of file backend.cpp.

◆ restore_state()

bool entropic::InferenceBackend::restore_state ( int  seq_id,
const std::vector< uint8_t > &  buffer 
)

Restore model state from buffer.

Restore model state.

Parameters
seq_idSequence identifier to restore into.
bufferPreviously saved state buffer.
Returns
true on success. false if incompatible or unsupported.
Version
1.9.13

Requires ACTIVE.

Parameters
seq_idSequence identifier.
bufferPreviously saved state.
Returns
true on success.

Definition at line 524 of file backend.cpp.

◆ save_state()

bool entropic::InferenceBackend::save_state ( int  seq_id,
std::vector< uint8_t > &  buffer 
) const

Save model state to buffer.

Save model state.

Parameters
seq_idSequence identifier (0 for single-sequence backends).
bufferOutput buffer. Caller owns the returned data.
Returns
true on success. false if not ACTIVE or unsupported.

For transformers: saves KV cache state for the sequence. For recurrent: saves hidden state.

Version
1.9.13

Requires ACTIVE.

Parameters
seq_idSequence identifier.
bufferOutput buffer.
Returns
true on success.

Definition at line 500 of file backend.cpp.

◆ set_hooks()

void entropic::InferenceBackend::set_hooks ( const HookInterface &  hooks)
inlineprotected

Set the hook dispatch interface.

Parameters
hooksHook dispatch interface. @utility
Version
1.9.1

Definition at line 627 of file backend.h.

◆ state()

ModelState entropic::InferenceBackend::state ( ) const
inline

Current lifecycle state (lock-free read).

Returns
Current ModelState value. @utility
Version
1.8.2

Definition at line 216 of file backend.h.

◆ supports()

bool entropic::InferenceBackend::supports ( BackendCapability  cap) const

Query whether this backend supports a capability.

Query backend capability.

Parameters
capCapability to query.
Returns
true if supported.

Base class returns false for all capabilities. Subclasses override do_supports() to declare their capabilities. Lock-free — no state transitions involved.

Version
1.9.13

Delegates to do_supports().

Parameters
capCapability to query.
Returns
true if supported.

Definition at line 458 of file backend.cpp.

◆ tokenize_text()

virtual std::vector< int32_t > entropic::InferenceBackend::tokenize_text ( const std::string &  text) const
inlinevirtual

Tokenize text to token IDs.

Parameters
textInput text.
Returns
Token ID vector (empty if COLD or error). @utility
Version
1.10.2

Reimplemented in entropic::LlamaCppBackend.

Definition at line 248 of file backend.h.

◆ unload()

void entropic::InferenceBackend::unload ( )

Full unload (→ COLD).

Releases all RAM + VRAM.

Version
1.8.2

Idempotent.

Definition at line 139 of file backend.cpp.

Member Data Documentation

◆ last_error_

std::string entropic::InferenceBackend::last_error_
protected

Last error message for diagnostics.

Definition at line 611 of file backend.h.


The documentation for this class was generated from the following files: