Architecture
Ensemble uses a local-first rate management design optimized for zero hot-path latency:atomic.Int64 for lock-free operation.
Rate Limit Tracking
Per-Endpoint Limits
Each endpoint has RPM (requests per minute) and TPM (tokens per minute) limits:Local Counter Structure
Counters are packed into a singleatomic.Uint64 for cache-line efficiency:
CompareAndSwap — no mutex, no contention.
Global View
Background sync publishes local counters to Redis and reads global aggregates:Mock Endpoint Detection
Endpoints with TPM limits above a configurable threshold (MockEndpointTPMThreshold) are treated as unlimited locally — useful for testing and development.
RateDecision
Every rate limit check produces aRateDecision: