|
Entropic 2.3.8
Local-first agentic inference engine
|
EWMA-based throughput tracker for generation budgeting. More...
#include <entropic/inference/throughput_tracker.h>
Public Member Functions | |
| void | record (int tokens_generated, int64_t elapsed_ms) |
| Record a completed generation sample. | |
| double | tok_per_sec () const |
| Current smoothed throughput estimate. | |
| int64_t | predict_ms (int token_count) const |
| Predict wall-clock time for generating N tokens. | |
| int | recommend_tokens (int64_t time_budget_ms, float headroom=0.9f, int floor=64) const |
| Recommend max_tokens to fit within a time budget. | |
| int | sample_count () const |
| Number of recorded samples. | |
| void | reset () |
| Reset all throughput data. | |
EWMA-based throughput tracker for generation budgeting.
Single concrete class (no three-layer hierarchy). Records tok/s samples from completed generations and provides smoothed estimates for auto-adaptation of max_tokens.
Definition at line 43 of file throughput_tracker.h.
| int64_t entropic::ThroughputTracker::predict_ms | ( | int | token_count | ) | const |
Predict wall-clock time for generating N tokens.
| token_count | Desired token count. |
| token_count | Desired token count. |
Definition at line 69 of file throughput_tracker.cpp.
| int entropic::ThroughputTracker::recommend_tokens | ( | int64_t | time_budget_ms, |
| float | headroom = 0.9f, |
||
| int | floor = 64 |
||
| ) | const |
Recommend max_tokens to fit within a time budget.
| time_budget_ms | Available wall-clock time. |
| headroom | Fraction of budget to target (e.g., 0.9 = 90%). |
| floor | Minimum token count to return (never recommend fewer). |
| time_budget_ms | Available wall-clock time. |
| headroom | Fraction of budget to target. |
| floor | Minimum token count to return. |
Definition at line 87 of file throughput_tracker.cpp.
| void entropic::ThroughputTracker::record | ( | int | tokens_generated, |
| int64_t | elapsed_ms | ||
| ) |
Record a completed generation sample.
| tokens_generated | Number of tokens produced. |
| elapsed_ms | Wall-clock generation time in milliseconds. |
Updates the EWMA. Ignores samples with fewer than kMinTokens tokens or elapsed_ms <= 0 (degenerate generations).
| tokens_generated | Number of tokens produced. |
| elapsed_ms | Wall-clock generation time in milliseconds. |
Definition at line 27 of file throughput_tracker.cpp.
| void entropic::ThroughputTracker::reset | ( | ) |
| int entropic::ThroughputTracker::sample_count | ( | ) | const |
Number of recorded samples.
Definition at line 106 of file throughput_tracker.cpp.
| double entropic::ThroughputTracker::tok_per_sec | ( | ) | const |
Current smoothed throughput estimate.
Definition at line 58 of file throughput_tracker.cpp.