Entropic 2.3.8
Local-first agentic inference engine
Loading...
Searching...
No Matches
utf8_safe.h File Reference

UTF-8-boundary-aware string truncation helper for the facade. More...

#include <string>
#include <cstddef>
Include dependency graph for utf8_safe.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Namespaces

namespace  entropic
 Activate model on GPU (WARM → ACTIVE).
 

Functions

std::string entropic::facade::utf8_safe_substr (const std::string &s, std::size_t max_bytes)
 Truncate a UTF-8 string at-or-before a byte cap on a codepoint boundary.
 

Detailed Description

UTF-8-boundary-aware string truncation helper for the facade.

Byte-indexed std::string::substr slices through multi-byte UTF-8 codepoints, producing invalid bytes that nlohmann::json::dump() rejects with type_error.316. This header exposes a small helper that rounds the cut DOWN to the previous codepoint boundary by walking back over UTF-8 continuation bytes (0x80..0xBF). Defined in a header so it can be unit tested from tests/unit/api/ without exporting a public C symbol. (gh#56)

Definition in file utf8_safe.h.

Function Documentation

◆ utf8_safe_substr()

std::string entropic::facade::utf8_safe_substr ( const std::string &  s,
std::size_t  max_bytes 
)
inline

Truncate a UTF-8 string at-or-before a byte cap on a codepoint boundary.

@utility

Parameters
sInput string (treated as UTF-8 bytes).
max_bytesMaximum byte length of the returned prefix.
Returns
Prefix of s with length <= max_bytes ending on a codepoint boundary. If s is already <= max_bytes, it is returned as-is. If s is valid UTF-8 the returned prefix is also valid UTF-8.
Version
2.2.3

Definition at line 32 of file utf8_safe.h.