Entropic 2.3.8
Local-first agentic inference engine
Loading...
Searching...
No Matches
i_inference_backend.h File Reference

Pure C interface contract for inference backends. More...

#include <entropic/types/error.h>
#include <stddef.h>
#include <stdint.h>
Include dependency graph for i_inference_backend.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Typedefs

typedef struct entropic_inference_backend * entropic_inference_backend_t
 Opaque handle to an inference backend instance.
 

Functions

entropic_error_t entropic_inference_load (entropic_inference_backend_t backend, const char *config_json)
 Load a model from config (COLD → WARM).
 
entropic_error_t entropic_inference_activate (entropic_inference_backend_t backend)
 Activate model on GPU (WARM → ACTIVE).
 
entropic_error_t entropic_inference_deactivate (entropic_inference_backend_t backend)
 Deactivate model (ACTIVE → WARM).
 
entropic_error_t entropic_inference_unload (entropic_inference_backend_t backend)
 Unload model completely (→ COLD).
 
int entropic_inference_state (entropic_inference_backend_t backend)
 Query model state (lock-free).
 
entropic_error_t entropic_inference_generate (entropic_inference_backend_t backend, const char *messages_json, const char *params_json, char **result_json)
 Generate a response from messages (batch mode).
 
entropic_error_t entropic_inference_generate_streaming (entropic_inference_backend_t backend, const char *messages_json, const char *params_json, void(*on_token)(const char *token, size_t len, void *user_data), void *user_data, int *cancel_flag)
 Generate with streaming token callback.
 
entropic_error_t entropic_inference_complete (entropic_inference_backend_t backend, const char *prompt, const char *params_json, char **result_json)
 Raw text completion without chat template.
 
int entropic_inference_count_tokens (entropic_inference_backend_t backend, const char *text, size_t text_len)
 Count tokens in text using model's tokenizer.
 
void entropic_inference_destroy (entropic_inference_backend_t backend)
 Destroy backend instance and free all resources.
 
void entropic_inference_free (void *ptr)
 Free a string allocated by the inference backend.
 
int entropic_inference_supports (entropic_inference_backend_t backend, int capability)
 Query backend capability.
 
uint32_t entropic_inference_capabilities (entropic_inference_backend_t backend)
 Get all supported capabilities as bitmask.
 
char * entropic_inference_info (entropic_inference_backend_t backend)
 Get backend metadata as JSON.
 
entropic_error_t entropic_inference_save_state (entropic_inference_backend_t backend, int seq_id, void **buffer, size_t *buffer_size)
 Save model state for a sequence.
 
entropic_error_t entropic_inference_restore_state (entropic_inference_backend_t backend, int seq_id, const void *buffer, size_t buffer_size)
 Restore model state for a sequence.
 
entropic_error_t entropic_inference_clear_state (entropic_inference_backend_t backend, int seq_id)
 Clear/reset model state.
 
entropic_error_t entropic_inference_generate_seq (entropic_inference_backend_t backend, int seq_id, const char *messages_json, const char *params_json, char **result_json)
 Generate with explicit sequence ID.
 
entropic_error_t entropic_inference_generate_streaming_seq (entropic_inference_backend_t backend, int seq_id, const char *messages_json, const char *params_json, void(*on_token)(const char *token, size_t len, void *user_data), void *user_data, int *cancel_flag)
 Streaming generation with explicit sequence ID.
 
void entropic_inference_log_to_file (const char *path)
 Redirect llama/ggml logs to a file.
 
void entropic_inference_log_silence (void)
 Silence all llama/ggml log output.
 

Detailed Description

Pure C interface contract for inference backends.

This is the .so boundary for inference. All types are C-safe: opaque handles, error codes, function pointers, const char* JSON strings. No C++ types cross this boundary.

Memory ownership
  • Strings returned by generate/complete (via result_json) are allocated by the backend. Caller must free with entropic_inference_free().
  • Strings passed IN (messages_json, params_json, prompt) are borrowed for the duration of the call only.
  • The on_token callback receives a pointer valid only for the callback duration. Caller must copy if they need to retain the token.
Thread safety
  • State queries (entropic_inference_state) are lock-free.
  • Lifecycle transitions (load/activate/deactivate/unload) are serialized.
  • Generation calls require ACTIVE state and do not acquire the transition lock — concurrent generation is allowed.
Version
1.9.13

Definition in file i_inference_backend.h.

Typedef Documentation

◆ entropic_inference_backend_t

typedef struct entropic_inference_backend* entropic_inference_backend_t

Opaque handle to an inference backend instance.

Version
1.8.2

Definition at line 42 of file i_inference_backend.h.

Function Documentation

◆ entropic_inference_activate()

entropic_error_t entropic_inference_activate ( entropic_inference_backend_t  backend)

Activate model on GPU (WARM → ACTIVE).

If COLD, loads first (convenience path).

Parameters
backendBackend handle.
Returns
ENTROPIC_OK on success.
Version
1.8.2

Activate model on GPU (WARM → ACTIVE).

Parameters
backendOpaque backend handle.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_LOAD_FAILED otherwise. @req REQ-INFER-017
Version
2.0.0

Definition at line 185 of file inference_c_api.cpp.

◆ entropic_inference_capabilities()

uint32_t entropic_inference_capabilities ( entropic_inference_backend_t  backend)

Get all supported capabilities as bitmask.

Parameters
backendBackend handle.
Returns
Bitmask where bit N corresponds to BackendCapability N.
Version
1.9.13

◆ entropic_inference_clear_state()

entropic_error_t entropic_inference_clear_state ( entropic_inference_backend_t  backend,
int  seq_id 
)

Clear/reset model state.

Parameters
backendBackend handle.
seq_idSequence identifier, or -1 for all sequences.
Returns
ENTROPIC_OK on success.
Version
1.9.13

◆ entropic_inference_complete()

entropic_error_t entropic_inference_complete ( entropic_inference_backend_t  backend,
const char *  prompt,
const char *  params_json,
char **  result_json 
)

Raw text completion without chat template.

Used by the router for digit-based classification. The prompt is passed directly to the model without any chat template formatting.

Parameters
backendBackend handle.
promptRaw prompt string.
params_jsonGeneration parameters as JSON.
[out]result_jsonOutput: JSON result string. Caller frees.
Returns
ENTROPIC_OK on success. @req REQ-INFER-004
Version
1.9.13

Raw text completion without chat template.

Parameters
backendOpaque backend handle.
promptNull-terminated prompt string.
params_jsonJSON-serialized GenerationParams.
result_jsonOut-param: newly allocated result JSON.
Returns
ENTROPIC_OK on success, result.error_code or ENTROPIC_ERROR_GENERATE_FAILED otherwise. @req REQ-INFER-004
Version
2.0.0

Definition at line 330 of file inference_c_api.cpp.

◆ entropic_inference_count_tokens()

int entropic_inference_count_tokens ( entropic_inference_backend_t  backend,
const char *  text,
size_t  text_len 
)

Count tokens in text using model's tokenizer.

Returns exact count when model is loaded (WARM or ACTIVE). Returns len/4 estimate when model is COLD.

Parameters
backendBackend handle.
textText to tokenize.
text_lenLength of text in bytes.
Returns
Token count (exact or estimated). @utility
Version
1.9.13

Count tokens in text using model's tokenizer.

Parameters
backendOpaque backend handle.
textPointer to UTF-8 text bytes.
text_lenLength of the text in bytes.
Returns
Exact token count when backend is WARM/ACTIVE, text_len/4 estimate on error. @req REQ-INFER-019
Version
2.0.0

Definition at line 356 of file inference_c_api.cpp.

◆ entropic_inference_deactivate()

entropic_error_t entropic_inference_deactivate ( entropic_inference_backend_t  backend)

Deactivate model (ACTIVE → WARM).

No-op if not ACTIVE.

Parameters
backendBackend handle.
Returns
ENTROPIC_OK.
Version
1.8.2

Deactivate model (ACTIVE → WARM).

Parameters
backendOpaque backend handle.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_INTERNAL on exception. @req REQ-INFER-017
Version
2.0.0

Definition at line 203 of file inference_c_api.cpp.

◆ entropic_inference_destroy()

void entropic_inference_destroy ( entropic_inference_backend_t  backend)

Destroy backend instance and free all resources.

Parameters
backendBackend handle. NULL is a safe no-op.
Version
1.8.2

Destroy backend instance and free all resources.

Parameters
backendOpaque backend handle (must not be used after this call). @req REQ-INFER-017
Version
2.0.0

Definition at line 374 of file inference_c_api.cpp.

◆ entropic_inference_free()

void entropic_inference_free ( void *  ptr)

Free a string allocated by the inference backend.

Parameters
ptrPointer returned by generate, complete, or similar. NULL is a safe no-op.
Version
1.8.2

Free a string allocated by the inference backend.

Parameters
ptrPointer returned by a previous generate/complete call. @utility
Version
2.0.0

Definition at line 386 of file inference_c_api.cpp.

◆ entropic_inference_generate()

entropic_error_t entropic_inference_generate ( entropic_inference_backend_t  backend,
const char *  messages_json,
const char *  params_json,
char **  result_json 
)

Generate a response from messages (batch mode).

Requires ACTIVE state. Returns ENTROPIC_ERROR_INVALID_STATE otherwise.

Parameters
backendBackend handle.
messages_jsonJSON array of message objects.
params_jsonJSON object of GenerationParams fields.
[out]result_jsonOutput: JSON result string. Caller frees with entropic_inference_free().
Returns
ENTROPIC_OK on success. @req REQ-INFER-001
Version
1.9.13

Generate a response from messages (batch mode).

Parameters
backendOpaque backend handle.
messages_jsonJSON-serialized message list.
params_jsonJSON-serialized GenerationParams.
result_jsonOut-param: newly allocated result JSON (free with entropic_inference_free).
Returns
ENTROPIC_OK on success, result.error_code or ENTROPIC_ERROR_GENERATE_FAILED otherwise. @req REQ-INFER-003
Version
2.1.8

Definition at line 257 of file inference_c_api.cpp.

◆ entropic_inference_generate_seq()

entropic_error_t entropic_inference_generate_seq ( entropic_inference_backend_t  backend,
int  seq_id,
const char *  messages_json,
const char *  params_json,
char **  result_json 
)

Generate with explicit sequence ID.

Parameters
backendBackend handle.
seq_idSequence identifier (0 for single-sequence backends).
messages_jsonJSON string of messages.
params_jsonJSON string of generation params.
[out]result_jsonOutput: JSON result string. Caller frees.
Returns
ENTROPIC_OK on success. @utility
Version
1.9.13

◆ entropic_inference_generate_streaming()

entropic_error_t entropic_inference_generate_streaming ( entropic_inference_backend_t  backend,
const char *  messages_json,
const char *  params_json,
void(*)(const char *token, size_t len, void *user_data)  on_token,
void *  user_data,
int *  cancel_flag 
)

Generate with streaming token callback.

Requires ACTIVE state. The on_token callback is invoked for each generated token. The token pointer is valid only for the duration of the callback — caller must copy if retention is needed.

Parameters
backendBackend handle.
messages_jsonJSON array of message objects.
params_jsonJSON object of GenerationParams fields.
on_tokenCallback for each token. Must not call back into API.
user_dataOpaque pointer forwarded to on_token.
cancel_flagPointer to int flag. Caller sets to 1 to cancel. Checked between tokens — cancellation latency is one token. May be NULL if cancellation is not needed.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_CANCELLED if cancelled. @req REQ-INFER-003
Version
1.9.13

Generate with streaming token callback.

Parameters
backendOpaque backend handle.
messages_jsonJSON-serialized message list.
params_jsonJSON-serialized GenerationParams.
on_tokenCallback fired per token (token bytes, length, user_data).
user_dataOpaque pointer passed to on_token.
cancel_flagOptional pointer; setting *cancel_flag to non-zero stops generation.
Returns
ENTROPIC_OK on success, result.error_code or ENTROPIC_ERROR_GENERATE_FAILED otherwise. @req REQ-INFER-003
Version
2.1.8

Definition at line 290 of file inference_c_api.cpp.

◆ entropic_inference_generate_streaming_seq()

entropic_error_t entropic_inference_generate_streaming_seq ( entropic_inference_backend_t  backend,
int  seq_id,
const char *  messages_json,
const char *  params_json,
void(*)(const char *token, size_t len, void *user_data)  on_token,
void *  user_data,
int *  cancel_flag 
)

Streaming generation with explicit sequence ID.

Parameters
backendBackend handle.
seq_idSequence identifier.
messages_jsonJSON string of messages.
params_jsonJSON string of generation params.
on_tokenCallback for each token. Must not call back into API.
user_dataOpaque pointer forwarded to on_token.
cancel_flagPointer to int flag. Set to 1 to cancel. May be NULL.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_CANCELLED if cancelled. @utility
Version
1.9.13

◆ entropic_inference_info()

char * entropic_inference_info ( entropic_inference_backend_t  backend)

Get backend metadata as JSON.

Parameters
backendBackend handle.
Returns
JSON string of BackendInfo. Caller must free with entropic_inference_free().
Version
1.9.13

◆ entropic_inference_load()

entropic_error_t entropic_inference_load ( entropic_inference_backend_t  backend,
const char *  config_json 
)

Load a model from config (COLD → WARM).

Parameters
backendBackend handle.
config_jsonJSON string of ModelConfig fields.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_LOAD_FAILED on failure.
Version
1.8.2

Load a model from config (COLD → WARM).

Parameters
backendOpaque backend handle from entropic_create_inference_backend().
config_jsonJSON-serialized ModelConfig string.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_LOAD_FAILED otherwise. @req REQ-INFER-017
Version
2.0.0

Definition at line 160 of file inference_c_api.cpp.

◆ entropic_inference_log_silence()

void entropic_inference_log_silence ( void  )

Silence all llama/ggml log output.

Version
2.0.1

Silence all llama/ggml log output.

Definition at line 521 of file inference_c_api.cpp.

◆ entropic_inference_log_to_file()

void entropic_inference_log_to_file ( const char *  path)

Redirect llama/ggml logs to a file.

Opens the file (truncating), redirects all llama.cpp and ggml log output to it. Call with NULL to silence logs entirely.

Parameters
pathLog file path (null-terminated), or NULL to silence.
Version
2.0.1

Redirect llama/ggml logs to a file.

First-call-wins under multi-handle (gh#58): a second handle whose canonical path differs is rejected with a warning rather than clobbering the live redirect. Same-path re-call truncates and reopens (preserves pre-v2.2.5 reset-on-recall behavior).

Definition at line 486 of file inference_c_api.cpp.

◆ entropic_inference_restore_state()

entropic_error_t entropic_inference_restore_state ( entropic_inference_backend_t  backend,
int  seq_id,
const void *  buffer,
size_t  buffer_size 
)

Restore model state for a sequence.

Parameters
backendBackend handle.
seq_idSequence identifier.
bufferState data from previous save_state call.
buffer_sizeSize of state data.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_STATE_INCOMPATIBLE if buffer is incompatible. @utility
Version
1.9.13

◆ entropic_inference_save_state()

entropic_error_t entropic_inference_save_state ( entropic_inference_backend_t  backend,
int  seq_id,
void **  buffer,
size_t *  buffer_size 
)

Save model state for a sequence.

Parameters
backendBackend handle.
seq_idSequence identifier (0 for single-sequence).
[out]bufferOutput: pointer to state data. Caller frees with entropic_inference_free().
[out]buffer_sizeOutput: size of state data in bytes.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_NOT_SUPPORTED if backend has neither KV_CACHE nor HIDDEN_STATE capability. @utility
Version
1.9.13

◆ entropic_inference_state()

int entropic_inference_state ( entropic_inference_backend_t  backend)

Query model state (lock-free).

Parameters
backendBackend handle.
Returns
State as int: 0=COLD, 1=WARM, 2=ACTIVE. Maps to entropic_model_state_t values.
Version
1.8.2

Query model state (lock-free).

Parameters
backendOpaque backend handle.
Returns
Integer cast of ModelState (0=COLD, 1=WARM, 2=ACTIVE). @req REQ-INFER-018
Version
2.0.0

Definition at line 241 of file inference_c_api.cpp.

◆ entropic_inference_supports()

int entropic_inference_supports ( entropic_inference_backend_t  backend,
int  capability 
)

Query backend capability.

Parameters
backendBackend handle.
capabilityCapability enum value (see BackendCapability).
Returns
1 if supported, 0 if not.
Version
1.9.13

◆ entropic_inference_unload()

entropic_error_t entropic_inference_unload ( entropic_inference_backend_t  backend)

Unload model completely (→ COLD).

Releases all RAM + VRAM.

Parameters
backendBackend handle.
Returns
ENTROPIC_OK.
Version
1.8.2

Unload model completely (→ COLD).

Parameters
backendOpaque backend handle.
Returns
ENTROPIC_OK on success, ENTROPIC_ERROR_INTERNAL on exception. @req REQ-INFER-017
Version
2.0.0

Definition at line 222 of file inference_c_api.cpp.