Ensemble supports provider batch APIs for asynchronous bulk processing at discounted rates.

Overview

  • 50% cost savings on batch requests at all providers that support batching
  • Graceful degradation: Queue requests during rate limits instead of immediate failure
  • Time-windowed batching: Collect requests over a configurable interval (e.g., 30 seconds)

Supported Providers

ProviderPlatformDiscount
AnthropicDirect API50%
AnthropicBedrock50%
AnthropicVertex AI50%
OpenAIDirect API50%
Google GeminiDirect API50%

API

POST /api/v1/batch

Submit a batch of requests.
{
  "requests": [
    {
      "custom_id": "task-001",
      "model": "claude-sonnet-4-20250514",
      "messages": [{"role": "user", "content": "Summarize this document..."}],
      "max_tokens": 1024
    },
    {
      "custom_id": "task-002",
      "model": "claude-sonnet-4-20250514",
      "messages": [{"role": "user", "content": "Translate this text..."}],
      "max_tokens": 512
    }
  ]
}
Response:
{
  "batch_id": "batch_abc123",
  "status": "submitted",
  "request_count": 2,
  "created_at": "2025-01-15T10:30:00Z"
}

Rate Limit Fallback

When synchronous endpoints are rate-limited, Ensemble can optionally queue requests for batch processing:
batch:
  rate_limit_fallback: true
  collection_window: 30s
  max_batch_size: 100
This turns 429 errors into deferred processing — the response arrives later but the request is never dropped.