ThroughputTracker – real-time throughput measurement and prediction.
More...
#include <atomic>
#include <cstdint>
#include <mutex>
Go to the source code of this file.
|
| namespace | entropic |
| | Activate model on GPU (WARM → ACTIVE).
|
| |
ThroughputTracker – real-time throughput measurement and prediction.
- Responsibilities:
- Record per-generation throughput samples (tokens, wall-clock time)
- Maintain exponentially weighted moving average (EWMA) of tok/s
- Predict time required for N tokens based on recent throughput
- Recommend max_tokens given a time budget
- Thread safety:
- record() acquires mutex_ (called from generation thread)
- tok_per_sec() and sample_count() are lock-free (read from atomics)
- predict_ms() and recommend_tokens() derive from lock-free tok_per_sec()
- One tracker per model (not per tier) – multiple tiers sharing a model share throughput data, which is correct (same hardware, same model)
- Ownership:
- Owned by ModelOrchestrator. One ThroughputTracker per loaded model (keyed by model path, same dedup as model pool).
- Version
- 1.9.7
Definition in file throughput_tracker.h.