Overview
- 50% cost savings on batch requests at all providers that support batching
- Graceful degradation: Queue requests during rate limits instead of immediate failure
- Time-windowed batching: Collect requests over a configurable interval (e.g., 30 seconds)
Supported Providers
| Provider | Platform | Discount |
|---|---|---|
| Anthropic | Direct API | 50% |
| Anthropic | Bedrock | 50% |
| Anthropic | Vertex AI | 50% |
| OpenAI | Direct API | 50% |
| Google Gemini | Direct API | 50% |