Base URL
Authentication
Include your access token in theAuthorization header:
| Method | Use Case |
|---|---|
| JWT Sessions | Frontend apps (from Stytch B2B auth) |
| M2M OAuth 2.0 | Backend services (client credentials flow) |
Request Format
All requests use JSON:Idempotency
For POST requests, includeIdempotency-Key to safely retry without duplicates:
Response Format
Success
Error (RFC 7807)
Rate Limits
| Scope | Limit |
|---|---|
| Per organization | 300 requests/minute |
| Per user | 60 requests/minute |
Retry-After header.
Pagination
List endpoints use cursor-based pagination:Endpoints
| Endpoint | Description |
|---|---|
POST /v1/chats | Create chat completion |
GET /v1/models | List models |
POST /v1/models | Create model |
GET /v1/datasets | List datasets |
POST /v1/datasets | Create dataset |
GET /v1/usage | Get usage metrics |
FAQ
Is the API RESTful?
Yes. The Cuadra AI API follows REST conventions with resource-based URLs, standard HTTP methods (GET, POST, PATCH, DELETE), and JSON payloads.What’s the latency?
Depends on the LLM provider and response length. Typical first-token latency is 200-500ms. Usestream: true for perceived faster responses.
Is there a sandbox environment?
No separate sandbox. Use the Free plan for testing.How do I handle rate limits?
Implement exponential backoff. Check theRetry-After header on 429 responses. See Errors for retry logic examples.