LoRA adapter lifecycle manager. More...

#include <entropic/inference/adapter_manager.h>

Public Member Functions
bool	load (const std::string &name, const std::filesystem::path &adapter_path, llama_model *model, float scale=1.0f)
	Load a LoRA adapter into RAM (COLD -> WARM).

void	unload (const std::string &name, llama_context *ctx)
	Unload adapter (any state -> COLD).

bool	activate (const std::string &name, llama_context *ctx)
	Activate adapter on context (WARM -> HOT).

void	deactivate (llama_context *ctx)
	Deactivate current HOT adapter (HOT -> WARM).

bool	swap (const std::string &name, llama_context *ctx)
	Swap to a different adapter atomically.

void	unload_all_for_model (llama_model model, llama_context ctx)
	Unload all adapters for a given base model.

void	unload_all ()
	Free every loaded adapter handle (gh#58 close-out, v2.3.0).

AdapterState	state (const std::string &name) const
	Get adapter state.

AdapterInfo	info (const std::string &name) const
	Get metadata for an adapter.

std::vector< AdapterInfo >	list_adapters () const
	List all known adapters.

std::string	active_adapter () const
	Get the currently HOT adapter name.

void	set_hook_interface (const HookInterface &hooks)
	Set hook interface for ON_ADAPTER_SWAP dispatch.

Detailed Description

LoRA adapter lifecycle manager.

Manages adapter COLD/WARM/HOT states. One active (HOT) adapter per context at a time. Multiple adapters can be WARM simultaneously.

Lifecycle: COLD ──load()──> WARM ──activate()──> HOT

^ ^ |

└──unload()──────┘<──deactivate()─────┘

entropic::ModelState::WARM
@ WARM
mmap'd + mlock'd in RAM

entropic::ModelState::COLD
@ COLD
On disk only, no RAM consumed.

entropic::AdapterState::HOT
@ HOT
Active on context via llama_set_adapter_lora(). Influencing generation.

Version: 1.9.2

Definition at line 58 of file adapter_manager.h.

Member Function Documentation

◆ activate()

bool entropic::AdapterManager::activate	(	const std::string &	name,
		llama_context *	ctx
	)

Activate adapter on context (WARM -> HOT).

Parameters

name	Adapter identifier.
ctx	llama_context to activate on.

Returns: true on success.

Version: 1.9.2

If another adapter is HOT, deactivates it first. Uses llama_set_adapters_lora() to apply the single adapter.

Parameters

name	Adapter identifier.
ctx	llama_context to activate on.

Returns: true on success.

Definition at line 164 of file adapter_manager.cpp.

◆ active_adapter()

std::string entropic::AdapterManager::active_adapter ( ) const

Get the currently HOT adapter name.

Get currently HOT adapter name.

Returns: Adapter name, empty if none active.

Version: 1.9.2

Returns: Adapter name, empty if none.

Definition at line 417 of file adapter_manager.cpp.

◆ deactivate()

void entropic::AdapterManager::deactivate ( llama_context * ctx )

Deactivate current HOT adapter (HOT -> WARM).

Parameters

ctx	llama_context to clear adapter from.

Version: 1.9.2

Clears all adapters from the context. No-op if none active.

Parameters

ctx	llama_context to clear from.

Definition at line 208 of file adapter_manager.cpp.

◆ info()

AdapterInfo entropic::AdapterManager::info ( const std::string & name ) const

Get metadata for an adapter.

Parameters

name	Adapter identifier.

Returns: AdapterInfo. COLD state if not found.

Version: 1.9.2

Parameters

name	Adapter identifier.

Returns: AdapterInfo. COLD with empty name if not found.

Definition at line 386 of file adapter_manager.cpp.

◆ list_adapters()

std::vector< AdapterInfo > entropic::AdapterManager::list_adapters ( ) const

List all known adapters.

Returns: Vector of AdapterInfo.

Version: 1.9.2

Returns: Vector of AdapterInfo snapshots.

Definition at line 401 of file adapter_manager.cpp.

◆ load()

bool entropic::AdapterManager::load	(	const std::string &	name,
		const std::filesystem::path &	adapter_path,
		llama_model *	model,
		float	scale = `1.0f`
	)

Load a LoRA adapter into RAM (COLD -> WARM).

Parameters

name	Unique identifier for this adapter.
adapter_path	Path to the LoRA .gguf file.
model	llama_model the adapter targets.
scale	LoRA scaling factor (default 1.0).

Returns: true on success.

Version: 1.9.2

Calls llama_adapter_lora_init() to load the adapter against the base model. The adapter stays in RAM until explicitly unloaded or the base model is unloaded.

Parameters

name	Unique identifier.
adapter_path	Path to .gguf adapter file.
model	Base llama_model pointer.
scale	LoRA scaling factor.

Returns: true on success, false on duplicate name or load failure.

Definition at line 73 of file adapter_manager.cpp.

◆ set_hook_interface()

void entropic::AdapterManager::set_hook_interface ( const HookInterface & hooks )

Set hook interface for ON_ADAPTER_SWAP dispatch.

Set hook dispatch interface.

Parameters

hooks Hook dispatch interface.

Version: 1.9.2

Parameters

hooks Hook interface from facade.

Definition at line 428 of file adapter_manager.cpp.

◆ state()

AdapterState entropic::AdapterManager::state ( const std::string & name ) const

Get adapter state.

Parameters

name	Adapter identifier.

Returns: AdapterState. COLD if not found.

Version: 1.9.2

Parameters

name	Adapter identifier.

Returns: AdapterState. COLD if not found.

Definition at line 370 of file adapter_manager.cpp.

◆ swap()

bool entropic::AdapterManager::swap	(	const std::string &	name,
		llama_context *	ctx
	)

Swap to a different adapter atomically.

Parameters

name	Target adapter (must be WARM).
ctx	llama_context to swap on.

Returns: true on success.

Version: 1.9.2

Deactivates current HOT adapter and activates the target. Fires ON_ADAPTER_SWAP hook which can cancel the operation.

Parameters

name	Target adapter (must be WARM).
ctx	llama_context to swap on.

Returns: true on success.

Definition at line 241 of file adapter_manager.cpp.

◆ unload()

void entropic::AdapterManager::unload	(	const std::string &	name,
		llama_context *	ctx
	)

Unload adapter (any state -> COLD).

Parameters

name	Adapter identifier.
ctx	Context to deactivate from (if HOT). May be nullptr.

Version: 1.9.2

If HOT, clears from context first. Frees via llama_adapter_lora_free(). No-op if not found.

Parameters

name	Adapter identifier.
ctx	Context to clear from (if HOT). May be nullptr.

Definition at line 125 of file adapter_manager.cpp.

◆ unload_all()

void entropic::AdapterManager::unload_all ( )

Free every loaded adapter handle (gh#58 close-out, v2.3.0).

Called by ~ModelOrchestrator after backends are torn down, so by the time this runs the llama_contexts that referenced these adapters are already destroyed and there is no HOT-attachment to clear. Safe to call repeatedly.

Without this, the bare AdapterManager destructor (defaulted) dropped raw llama_adapter_lora* handles on engine destroy, leaking each loaded LoRA's VRAM until process exit — the same pattern that caused gh#58's two-handle GPU regression for LlamaCppBackend in v2.2.8.

@utility

Version: 2.3.0

Called from ~ModelOrchestrator after backends are torn down. Skips the HOT-deactivation path because the contexts that were holding the adapter references are already gone — calling clear_adapters(ctx) against a destroyed context would be a use-after-free.

Definition at line 343 of file adapter_manager.cpp.

◆ unload_all_for_model()

void entropic::AdapterManager::unload_all_for_model	(	llama_model *	model,
		llama_context *	ctx
	)

Unload all adapters for a given base model.

Unload all adapters targeting a specific base model.

Parameters

model	The base model being unloaded.
ctx	Context to deactivate from. May be nullptr.

Version: 1.9.2

Called when base model transitions out of ACTIVE/WARM. Prevents dangling adapter handles.

Parameters

model	The base model being unloaded.
ctx	Context to clear from. May be nullptr.

Definition at line 292 of file adapter_manager.cpp.

The documentation for this class was generated from the following files:

entropic/inference/adapter_manager.h
inference/adapter_manager.cpp

Public Member Functions

Detailed Description

Member Function Documentation

◆ activate()

◆ active_adapter()

◆ deactivate()

◆ info()

◆ list_adapters()

◆ load()

◆ set_hook_interface()

◆ state()

◆ swap()

◆ unload()

◆ unload_all()

◆ unload_all_for_model()