Pure C interface contract for inference backends. More...

#include <entropic/types/error.h>
#include <stddef.h>
#include <stdint.h>

Include dependency graph for i_inference_backend.h:

This graph shows which files directly or indirectly include this file:

Typedefs
typedef struct entropic_inference_backend *	entropic_inference_backend_t
	Opaque handle to an inference backend instance.

Functions
entropic_error_t	entropic_inference_load (entropic_inference_backend_t backend, const char *config_json)
	Load a model from config (COLD → WARM).

entropic_error_t	entropic_inference_activate (entropic_inference_backend_t backend)
	Activate model on GPU (WARM → ACTIVE).

entropic_error_t	entropic_inference_deactivate (entropic_inference_backend_t backend)
	Deactivate model (ACTIVE → WARM).

entropic_error_t	entropic_inference_unload (entropic_inference_backend_t backend)
	Unload model completely (→ COLD).

int	entropic_inference_state (entropic_inference_backend_t backend)
	Query model state (lock-free).

entropic_error_t	entropic_inference_generate (entropic_inference_backend_t backend, const char messages_json, const char params_json, char **result_json)
	Generate a response from messages (batch mode).

entropic_error_t	entropic_inference_generate_with_cancel (entropic_inference_backend_t backend, const char messages_json, const char params_json, char *result_json, int cancel_flag)
	Generate a response (batch mode) with mid-decode cancellation.

entropic_error_t	entropic_inference_generate_streaming (entropic_inference_backend_t backend, const char messages_json, const char params_json, void(on_token)(const char token, size_t len, void user_data), void user_data, int *cancel_flag)
	Generate with streaming token callback.

entropic_error_t	entropic_inference_complete (entropic_inference_backend_t backend, const char prompt, const char params_json, char **result_json)
	Raw text completion without chat template.

int	entropic_inference_count_tokens (entropic_inference_backend_t backend, const char *text, size_t text_len)
	Count tokens in text using model's tokenizer.

void	entropic_inference_destroy (entropic_inference_backend_t backend)
	Destroy backend instance and free all resources.

void	entropic_inference_free (void *ptr)
	Free a string allocated by the inference backend.

int	entropic_inference_supports (entropic_inference_backend_t backend, int capability)
	Query backend capability.

uint32_t	entropic_inference_capabilities (entropic_inference_backend_t backend)
	Get all supported capabilities as bitmask.

char *	entropic_inference_info (entropic_inference_backend_t backend)
	Get backend metadata as JSON.

entropic_error_t	entropic_inference_save_state (entropic_inference_backend_t backend, int seq_id, void *buffer, size_t buffer_size)
	Save model state for a sequence.

entropic_error_t	entropic_inference_restore_state (entropic_inference_backend_t backend, int seq_id, const void *buffer, size_t buffer_size)
	Restore model state for a sequence.

entropic_error_t	entropic_inference_clear_state (entropic_inference_backend_t backend, int seq_id)
	Clear/reset model state.

entropic_error_t	entropic_inference_generate_seq (entropic_inference_backend_t backend, int seq_id, const char messages_json, const char params_json, char **result_json)
	Generate with explicit sequence ID.

entropic_error_t	entropic_inference_generate_streaming_seq (entropic_inference_backend_t backend, int seq_id, const char messages_json, const char params_json, void(on_token)(const char token, size_t len, void user_data), void user_data, int *cancel_flag)
	Streaming generation with explicit sequence ID.

void	entropic_inference_log_to_file (const char *path)
	Redirect llama/ggml logs to a file.

void	entropic_inference_log_silence (void)
	Silence all llama/ggml log output.

Detailed Description

Pure C interface contract for inference backends.

This is the .so boundary for inference. All types are C-safe: opaque handles, error codes, function pointers, const char* JSON strings. No C++ types cross this boundary.

Memory ownership

Strings returned by generate/complete (via result_json) are allocated by the backend. Caller must free with entropic_inference_free().
Strings passed IN (messages_json, params_json, prompt) are borrowed for the duration of the call only.
The on_token callback receives a pointer valid only for the callback duration. Caller must copy if they need to retain the token.

Thread safety

State queries (entropic_inference_state) are lock-free.
Lifecycle transitions (load/activate/deactivate/unload) are serialized.
Generation calls require ACTIVE state and do not acquire the transition lock — concurrent generation is allowed.

Version: 1.9.13

Definition in file i_inference_backend.h.

Typedef Documentation

◆ entropic_inference_backend_t

typedef struct entropic_inference_backend* entropic_inference_backend_t

Opaque handle to an inference backend instance.

Version: 1.8.2

Definition at line 42 of file i_inference_backend.h.

Function Documentation

◆ entropic_inference_activate()

entropic_error_t entropic_inference_activate ( entropic_inference_backend_t backend )

Activate model on GPU (WARM → ACTIVE).

If COLD, loads first (convenience path).

Parameters

backend Backend handle.

Returns: ENTROPIC_OK on success.

Version: 1.8.2

Activate model on GPU (WARM → ACTIVE).

Parameters

backend Opaque backend handle.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_LOAD_FAILED otherwise. @req REQ-INFER-017

Version: 2.0.0

Definition at line 238 of file inference_c_api.cpp.

◆ entropic_inference_capabilities()

uint32_t entropic_inference_capabilities ( entropic_inference_backend_t backend )

Get all supported capabilities as bitmask.

Parameters

backend Backend handle.

Returns: Bitmask where bit N corresponds to BackendCapability N.

Version: 1.9.13

◆ entropic_inference_clear_state()

entropic_error_t entropic_inference_clear_state	(	entropic_inference_backend_t	backend,
		int	seq_id
	)

Clear/reset model state.

Parameters

backend	Backend handle.
seq_id	Sequence identifier, or -1 for all sequences.

Returns: ENTROPIC_OK on success.

Version: 1.9.13

◆ entropic_inference_complete()

entropic_error_t entropic_inference_complete	(	entropic_inference_backend_t	backend,
		const char *	prompt,
		const char *	params_json,
		char **	result_json
	)

Raw text completion without chat template.

Used by the router for digit-based classification. The prompt is passed directly to the model without any chat template formatting.

Parameters

	backend	Backend handle.
	prompt	Raw prompt string.
	params_json	Generation parameters as JSON.
[out]	result_json	Output: JSON result string. Caller frees.

Returns: ENTROPIC_OK on success. @req REQ-INFER-004

Version: 1.9.13

Raw text completion without chat template.

Parameters

backend	Opaque backend handle.
prompt	Null-terminated prompt string.
params_json	JSON-serialized GenerationParams.
result_json	Out-param: newly allocated result JSON.

Returns: ENTROPIC_OK on success, result.error_code or ENTROPIC_ERROR_GENERATE_FAILED otherwise. @req REQ-INFER-004

Version: 2.0.0

Definition at line 519 of file inference_c_api.cpp.

◆ entropic_inference_count_tokens()

int entropic_inference_count_tokens	(	entropic_inference_backend_t	backend,
		const char *	text,
		size_t	text_len
	)

Count tokens in text using model's tokenizer.

Returns exact count when model is loaded (WARM or ACTIVE). Returns len/4 estimate when model is COLD.

Parameters

backend	Backend handle.
text	Text to tokenize.
text_len	Length of text in bytes.

Returns: Token count (exact or estimated). @utility

Version: 1.9.13

Count tokens in text using model's tokenizer.

Parameters

backend	Opaque backend handle.
text	Pointer to UTF-8 text bytes.
text_len	Length of the text in bytes.

Returns: Exact token count when backend is WARM/ACTIVE, text_len/4 estimate on error. @req REQ-INFER-019

Version: 2.0.0

Definition at line 548 of file inference_c_api.cpp.

◆ entropic_inference_deactivate()

entropic_error_t entropic_inference_deactivate ( entropic_inference_backend_t backend )

Deactivate model (ACTIVE → WARM).

No-op if not ACTIVE.

Parameters

backend Backend handle.

Returns: ENTROPIC_OK.

Version: 1.8.2

Deactivate model (ACTIVE → WARM).

Parameters

backend Opaque backend handle.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_INTERNAL on exception. @req REQ-INFER-017

Version: 2.0.0

Definition at line 257 of file inference_c_api.cpp.

◆ entropic_inference_destroy()

void entropic_inference_destroy ( entropic_inference_backend_t backend )

Destroy backend instance and free all resources.

Parameters

backend Backend handle. NULL is a safe no-op.

Version: 1.8.2

Destroy backend instance and free all resources.

Parameters

backend Opaque backend handle (must not be used after this call). @req REQ-INFER-017

Version: 2.0.0

Definition at line 570 of file inference_c_api.cpp.

◆ entropic_inference_free()

void entropic_inference_free ( void * ptr )

Free a string allocated by the inference backend.

Parameters

ptr	Pointer returned by generate, complete, or similar. NULL is a safe no-op.

Version: 1.8.2

Free a string allocated by the inference backend.

Parameters

ptr	Pointer returned by a previous generate/complete call. @utility

Version: 2.0.0

Definition at line 582 of file inference_c_api.cpp.

◆ entropic_inference_generate()

entropic_error_t entropic_inference_generate	(	entropic_inference_backend_t	backend,
		const char *	messages_json,
		const char *	params_json,
		char **	result_json
	)

Generate a response from messages (batch mode).

Requires ACTIVE state. Returns ENTROPIC_ERROR_INVALID_STATE otherwise.

Parameters

	backend	Backend handle.
	messages_json	JSON array of message objects.
	params_json	JSON object of GenerationParams fields.
[out]	result_json	Output: JSON result string. Caller frees with entropic_inference_free().

Returns: ENTROPIC_OK on success. @req REQ-INFER-001

Version: 1.9.13

Generate a response from messages (batch mode).

Parameters

backend	Opaque backend handle.
messages_json	JSON-serialized message list.
params_json	JSON-serialized GenerationParams.
result_json	Out-param: newly allocated result JSON (free with entropic_inference_free).

Returns: ENTROPIC_OK on success, result.error_code or ENTROPIC_ERROR_GENERATE_FAILED otherwise. @req REQ-INFER-003

Version: 2.1.8

Definition at line 317 of file inference_c_api.cpp.

◆ entropic_inference_generate_seq()

entropic_error_t entropic_inference_generate_seq	(	entropic_inference_backend_t	backend,
		int	seq_id,
		const char *	messages_json,
		const char *	params_json,
		char **	result_json
	)

Generate with explicit sequence ID.

Parameters

	backend	Backend handle.
	seq_id	Sequence identifier (0 for single-sequence backends).
	messages_json	JSON string of messages.
	params_json	JSON string of generation params.
[out]	result_json	Output: JSON result string. Caller frees.

Returns: ENTROPIC_OK on success. @utility

Version: 1.9.13

◆ entropic_inference_generate_streaming()

entropic_error_t entropic_inference_generate_streaming	(	entropic_inference_backend_t	backend,
		const char *	messages_json,
		const char *	params_json,
		void()(const char token, size_t len, void *user_data)	on_token,
		void *	user_data,
		int *	cancel_flag
	)

Generate with streaming token callback.

Requires ACTIVE state. The on_token callback is invoked for each generated token. The token pointer is valid only for the duration of the callback — caller must copy if retention is needed.

Parameters

backend	Backend handle.
messages_json	JSON array of message objects.
params_json	JSON object of GenerationParams fields.
on_token	Callback for each token. Must not call back into API.
user_data	Opaque pointer forwarded to on_token.
cancel_flag	Pointer to int flag. Caller sets to 1 to cancel. Checked between tokens — cancellation latency is one token. May be NULL if cancellation is not needed.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_CANCELLED if cancelled. @req REQ-INFER-003

Version: 1.9.13

Generate with streaming token callback.

Parameters

backend	Opaque backend handle.
messages_json	JSON-serialized message list.
params_json	JSON-serialized GenerationParams.
on_token	Callback fired per token (token bytes, length, user_data).
user_data	Opaque pointer passed to on_token.
cancel_flag	Optional pointer; setting *cancel_flag to non-zero stops generation.

Returns: ENTROPIC_OK on success, result.error_code or ENTROPIC_ERROR_GENERATE_FAILED otherwise. @req REQ-INFER-003

Version: 2.1.8

Definition at line 476 of file inference_c_api.cpp.

◆ entropic_inference_generate_streaming_seq()

entropic_error_t entropic_inference_generate_streaming_seq	(	entropic_inference_backend_t	backend,
		int	seq_id,
		const char *	messages_json,
		const char *	params_json,
		void()(const char token, size_t len, void *user_data)	on_token,
		void *	user_data,
		int *	cancel_flag
	)

Streaming generation with explicit sequence ID.

Parameters

backend	Backend handle.
seq_id	Sequence identifier.
messages_json	JSON string of messages.
params_json	JSON string of generation params.
on_token	Callback for each token. Must not call back into API.
user_data	Opaque pointer forwarded to on_token.
cancel_flag	Pointer to int flag. Set to 1 to cancel. May be NULL.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_CANCELLED if cancelled. @utility

Version: 1.9.13

◆ entropic_inference_generate_with_cancel()

entropic_error_t entropic_inference_generate_with_cancel	(	entropic_inference_backend_t	backend,
		const char *	messages_json,
		const char *	params_json,
		char **	result_json,
		int *	cancel_flag
	)

Generate a response (batch mode) with mid-decode cancellation.

gh#81 (v2.4.2): same contract as entropic_inference_generate plus a cancel-flag pointer. The pre-v2.4.2 batch path ran to natural stop with no way to honor an interrupt — observed ~60s lag on 13B models at default max_tokens. Setting *cancel_flag non-zero stops the decode within ~10ms + one token.

Requires ACTIVE state.

Parameters

	backend	Backend handle.
	messages_json	JSON array of message objects.
	params_json	JSON object of GenerationParams fields.
[out]	result_json	Output: JSON result string. Caller frees with entropic_inference_free().
	cancel_flag	Pointer to int flag. Caller sets to non-zero to cancel. May be NULL if cancellation is not needed.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_CANCELLED if cancelled. @req REQ-INFER-001

Version: 2.4.2

Generate a response (batch mode) with mid-decode cancellation.

(gh#81, v2.4.2)

See header for the full contract. Bridges the int* cancel flag to the backend's std::atomic<bool> via a CancelFlagBridge poller.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_CANCELLED if the cancel flag was raised mid-decode, ENTROPIC_ERROR_INVALID_ARGUMENT for a null backend/result, or the backend's error_code / ENTROPIC_ERROR_GENERATE_FAILED otherwise. @req REQ-INFER-001

Version: 2.4.2

Definition at line 437 of file inference_c_api.cpp.

◆ entropic_inference_info()

char * entropic_inference_info ( entropic_inference_backend_t backend )

Get backend metadata as JSON.

Parameters

backend Backend handle.

Returns: JSON string of BackendInfo. Caller must free with entropic_inference_free().

Version: 1.9.13

◆ entropic_inference_load()

entropic_error_t entropic_inference_load	(	entropic_inference_backend_t	backend,
		const char *	config_json
	)

Load a model from config (COLD → WARM).

Parameters

backend	Backend handle.
config_json	JSON string of ModelConfig fields.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_LOAD_FAILED on failure.

Version: 1.8.2

Load a model from config (COLD → WARM).

Parameters

backend	Opaque backend handle from entropic_create_inference_backend().
config_json	JSON-serialized ModelConfig string.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_LOAD_FAILED otherwise. @req REQ-INFER-017

Version: 2.0.0

Definition at line 207 of file inference_c_api.cpp.

◆ entropic_inference_log_silence()

void entropic_inference_log_silence ( void )

Silence all llama/ggml log output.

Version: 2.0.1

Silence all llama/ggml log output.

Definition at line 717 of file inference_c_api.cpp.

◆ entropic_inference_log_to_file()

void entropic_inference_log_to_file ( const char * path )

Redirect llama/ggml logs to a file.

Opens the file (truncating), redirects all llama.cpp and ggml log output to it. Call with NULL to silence logs entirely.

Parameters

path	Log file path (null-terminated), or NULL to silence.

Version: 2.0.1

Redirect llama/ggml logs to a file.

First-call-wins under multi-handle (gh#58): a second handle whose canonical path differs is rejected with a warning rather than clobbering the live redirect. Same-path re-call truncates and reopens (preserves pre-v2.2.5 reset-on-recall behavior).

Definition at line 682 of file inference_c_api.cpp.

◆ entropic_inference_restore_state()

entropic_error_t entropic_inference_restore_state	(	entropic_inference_backend_t	backend,
		int	seq_id,
		const void *	buffer,
		size_t	buffer_size
	)

Restore model state for a sequence.

Parameters

backend	Backend handle.
seq_id	Sequence identifier.
buffer	State data from previous save_state call.
buffer_size	Size of state data.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_STATE_INCOMPATIBLE if buffer is incompatible. @utility

Version: 1.9.13

◆ entropic_inference_save_state()

entropic_error_t entropic_inference_save_state	(	entropic_inference_backend_t	backend,
		int	seq_id,
		void **	buffer,
		size_t *	buffer_size
	)

Save model state for a sequence.

Parameters

	backend	Backend handle.
	seq_id	Sequence identifier (0 for single-sequence).
[out]	buffer	Output: pointer to state data. Caller frees with entropic_inference_free().
[out]	buffer_size	Output: size of state data in bytes.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_NOT_SUPPORTED if backend has neither KV_CACHE nor HIDDEN_STATE capability. @utility

Version: 1.9.13

◆ entropic_inference_state()

int entropic_inference_state ( entropic_inference_backend_t backend )

Query model state (lock-free).

Parameters

backend Backend handle.

Returns: State as int: 0=COLD, 1=WARM, 2=ACTIVE. Maps to entropic_model_state_t values.

Version: 1.8.2

Query model state (lock-free).

Parameters

backend Opaque backend handle.

Returns: Integer cast of ModelState (0=COLD, 1=WARM, 2=ACTIVE). @req REQ-INFER-018

Version: 2.0.0

Definition at line 297 of file inference_c_api.cpp.

◆ entropic_inference_supports()

int entropic_inference_supports	(	entropic_inference_backend_t	backend,
		int	capability
	)

Query backend capability.

Parameters

backend	Backend handle.
capability	Capability enum value (see BackendCapability).

Returns: 1 if supported, 0 if not.

Version: 1.9.13

◆ entropic_inference_unload()

entropic_error_t entropic_inference_unload ( entropic_inference_backend_t backend )

Unload model completely (→ COLD).

Releases all RAM + VRAM.

Parameters

backend Backend handle.

Returns: ENTROPIC_OK.

Version: 1.8.2

Unload model completely (→ COLD).

Parameters

backend Opaque backend handle.

Returns: ENTROPIC_OK on success, ENTROPIC_ERROR_INTERNAL on exception. @req REQ-INFER-017

Version: 2.0.0

Definition at line 277 of file inference_c_api.cpp.

Typedefs

Functions

Detailed Description

Typedef Documentation

◆ entropic_inference_backend_t

Function Documentation

◆ entropic_inference_activate()

◆ entropic_inference_capabilities()

◆ entropic_inference_clear_state()

◆ entropic_inference_complete()

◆ entropic_inference_count_tokens()

◆ entropic_inference_deactivate()

◆ entropic_inference_destroy()

◆ entropic_inference_free()

◆ entropic_inference_generate()

◆ entropic_inference_generate_seq()

◆ entropic_inference_generate_streaming()

◆ entropic_inference_generate_streaming_seq()

◆ entropic_inference_generate_with_cancel()

◆ entropic_inference_info()

◆ entropic_inference_load()

◆ entropic_inference_log_silence()

◆ entropic_inference_log_to_file()

◆ entropic_inference_restore_state()

◆ entropic_inference_save_state()

◆ entropic_inference_state()

◆ entropic_inference_supports()

◆ entropic_inference_unload()