OpenTelemetry Tracing

Ensemble provides production-grade distributed tracing:

Span Hierarchy

ensemble.request
├── ensemble.auth                    # API key validation
├── ensemble.parameter_validation    # Parameter checking
├── ensemble.routing_decision        # Router evaluation
├── ensemble.provider_call           # Upstream provider request
│   ├── ensemble.stream_start
│   └── ensemble.stream_complete
└── ensemble.response_persistence    # S3 storage (if enabled)

Span Attributes

Each span includes:
AttributeDescription
ensemble.modelRequested model name
ensemble.providerSelected provider
ensemble.endpointSelected endpoint ID
ensemble.session_idSession ID (for affinity tracking)
ensemble.costRequest cost (decimal)
ensemble.input_tokensInput token count
ensemble.output_tokensOutput token count
ensemble.cache_hitWhether cache was used
ensemble.routing.reasonRouting decision reason
ensemble.error.classError classification (rate_limit, permanent, retryable)

W3C Trace Context

Ensemble propagates W3C traceparent and tracestate headers from clients (e.g., Langfuse agents) through to provider calls, enabling end-to-end trace correlation.

Metrics

MetricTypeLabels
ensemble.requests.totalCounterstatus, provider, model, error_class
ensemble.request.durationHistogramprovider, model
ensemble.tokens.inputCounterprovider, model
ensemble.tokens.outputCounterprovider, model
ensemble.cost.totalCounterprovider, model
ensemble.cache.hit_rateGaugeprovider
ensemble.ratelimit.utilizationGaugeendpoint

Logging

Async batched logger with:
  • 65,536 message ring buffer (non-blocking)
  • Structured JSON output
  • Request ID correlation on every log line
  • Configurable log levels
  • SQLite-backed log persistence for admin API queries