POST /api/v1/generate

Synchronous inference — sends a request and waits for the complete response.

Headers

HeaderRequiredDescription
AuthorizationYesBearer ens_your_api_key
X-Session-IDNoSession identifier for cache affinity routing
X-Request-IDNoClient-provided correlation ID (auto-generated if omitted)
Content-TypeYesapplication/json

Request Body

{
  "model": "claude-sonnet-4-20250514",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, world!"
    }
  ],
  "max_tokens": 4096,
  "temperature": 0.7,
  "tools": [],
  "provider_config": {}
}

Request Fields

FieldTypeRequiredDescription
modelstringYesModel name (e.g., claude-sonnet-4-20250514, gpt-5, gemini-2.5-pro)
messagesMessage[]YesConversation history
max_tokensintNoMaximum output tokens
temperaturefloatNoSampling temperature (0.0–2.0)
top_pfloatNoNucleus sampling threshold
stop_sequencesstring[]NoStop generation sequences
toolsToolDefinition[]NoFunction calling definitions
streamboolNoAlways false for this endpoint
provider_configobjectNoPer-request provider overrides

Message Format

{
  "role": "user",
  "content": "text content",
  "content_blocks": [
    {"type": "text", "text": "..."},
    {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}},
    {"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": "..."}}
  ]
}
Roles: system, user, assistant, tool

Response

{
  "id": "req_abc123",
  "model": "claude-sonnet-4-20250514",
  "provider": "anthropic",
  "endpoint": "anthropic-primary",
  "blocks": [
    {"type": "text", "text": "Hello! How can I help you today?"}
  ],
  "input_tokens": 25,
  "output_tokens": 12,
  "cached_prompt_tokens": 0,
  "cache_creation_tokens": 0,
  "reasoning_tokens": 0,
  "cost": "0.000111",
  "processing_time": "1.234s",
  "finish_reason": "end_turn",
  "performance_metrics": {
    "time_to_first_token": "0.45s",
    "total_latency": "1.234s",
    "tokens_per_second": 9.7
  },
  "rate_limit_info": {
    "requests_remaining": 4999,
    "tokens_remaining": 999975
  }
}

Error Responses

StatusMeaning
400Invalid request (bad model, missing messages, parameter validation failure)
401Invalid or missing API key
429All capacity pools rate-limited
500Internal server error
502Provider error (upstream failure)