Cuadra AI - Connect, Train, and Deploy Your Custom AI Assistant

Base URL

https://api.cuadra.ai/v1

Authentication

Include your access token in the Authorization header:

Authorization: Bearer YOUR_TOKEN

Method	Use Case
JWT Sessions	Frontend apps (from Stytch B2B auth)
M2M OAuth 2.0	Backend services (client credentials flow)

See Authentication for setup details.

Request Format

All requests use JSON:

curl -X POST https://api.cuadra.ai/v1/chats \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"modelId": "model_abc", "messages": [{"role": "user", "content": "Hello"}]}'

Idempotency

For POST requests, include Idempotency-Key to safely retry without duplicates:

Idempotency-Key: unique-request-id-123

Response Format

Success

{
  "id": "chat_xyz789",
  "message": {
    "role": "assistant",
    "content": "Hello! How can I help?"
  },
  "usage": {
    "inputTokens": 15,
    "outputTokens": 8,
    "totalTokens": 23
  }
}

Error (RFC 7807)

{
  "type": "about:blank",
  "title": "Unauthorized",
  "status": 401,
  "detail": "Invalid or expired token."
}

Rate Limits

Scope	Limit
Per organization	300 requests/minute
Per user	60 requests/minute

Rate-limited requests return HTTP 429 with a Retry-After header.

Pagination

List endpoints use cursor-based pagination:

GET /v1/models?limit=20

{
  "data": [...],
  "nextCursor": "cursor_abc123",
  "hasMore": true
}

Endpoints

Endpoint	Description
`POST /v1/chats`	Create chat completion
`GET /v1/models`	List models
`POST /v1/models`	Create model
`GET /v1/datasets`	List datasets
`POST /v1/datasets`	Create dataset
`GET /v1/usage`	Get usage metrics

Chat API

Completions with streaming

Authentication

JWT and M2M setup

FAQ

Is the API RESTful?

Yes. The Cuadra AI API follows REST conventions with resource-based URLs, standard HTTP methods (GET, POST, PATCH, DELETE), and JSON payloads.

What’s the latency?

Depends on the LLM provider and response length. Typical first-token latency is 200-500ms. Use stream: true for perceived faster responses.

Is there a sandbox environment?

No separate sandbox. Use the Free plan for testing.

How do I handle rate limits?

Implement exponential backoff. Check the Retry-After header on 429 responses. See Errors for retry logic examples.

Getting Started

API Reference

Guides

Billing

API Overview

Base URL

Authentication

Request Format

Idempotency

Response Format

Success

Error (RFC 7807)

Rate Limits

Endpoints

Chat API

Authentication

FAQ

Is the API RESTful?

What’s the latency?

Is there a sandbox environment?

How do I handle rate limits?

Getting Started

API Reference

Guides

Billing

​Base URL

​Authentication

​Request Format

​Idempotency

​Response Format

​Success

​Error (RFC 7807)

​Rate Limits

​Pagination

​Endpoints

Chat API

Authentication

​FAQ

​Is the API RESTful?

​What’s the latency?

​Is there a sandbox environment?

​How do I handle rate limits?

Base URL

Authentication

Request Format

Idempotency

Response Format

Success

Error (RFC 7807)

Rate Limits

Pagination

Endpoints

FAQ

Is the API RESTful?

What’s the latency?

Is there a sandbox environment?

How do I handle rate limits?