POST /api/v1/generate
Synchronous inference — sends a request and waits for the complete response.Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer ens_your_api_key |
X-Session-ID | No | Session identifier for cache affinity routing |
X-Request-ID | No | Client-provided correlation ID (auto-generated if omitted) |
Content-Type | Yes | application/json |
Request Body
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model name (e.g., claude-sonnet-4-20250514, gpt-5, gemini-2.5-pro) |
messages | Message[] | Yes | Conversation history |
max_tokens | int | No | Maximum output tokens |
temperature | float | No | Sampling temperature (0.0–2.0) |
top_p | float | No | Nucleus sampling threshold |
stop_sequences | string[] | No | Stop generation sequences |
tools | ToolDefinition[] | No | Function calling definitions |
stream | bool | No | Always false for this endpoint |
provider_config | object | No | Per-request provider overrides |
Message Format
system, user, assistant, tool
Response
Error Responses
| Status | Meaning |
|---|---|
| 400 | Invalid request (bad model, missing messages, parameter validation failure) |
| 401 | Invalid or missing API key |
| 429 | All capacity pools rate-limited |
| 500 | Internal server error |
| 502 | Provider error (upstream failure) |