Cuadra AI - Connect, Train, and Deploy Your Custom AI Assistant

Quick Start

curl -X POST https://api.cuadra.ai/v1/chats \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "model_abc123",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming Responses

Enable stream: true for real-time responses via Server-Sent Events:

curl -X POST https://api.cuadra.ai/v1/chats \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"modelId": "model_abc", "messages": [...], "stream": true}'

Stream format:

data: {"id":"chat_xyz","delta":"Once","finished":false}
data: {"id":"chat_xyz","delta":" upon","finished":false}
data: {"id":"chat_xyz","delta":"","finished":true,"usage":{...}}
data: [DONE]

AI SDK Format

For Vercel AI SDK compatibility, add the header:

X-Stream-Format: ai-sdk

Events: start, text-delta, source-document, reasoning-delta, tool-input-delta, finish

Reasoning (Extended Thinking)

Enable enableReasoning: true to see the model’s thinking process. Supported by Claude (Sonnet/Opus), OpenAI o1/o3, and Gemini thinking models.

{
  "modelId": "model_claude",
  "messages": [...],
  "enableReasoning": true,
  "reasoningBudget": 10000
}

Reasoning tokens are billed separately. Use reasoningBudget to cap costs.

Structured Outputs (JSON Mode)

Force JSON schema compliance with responseFormat:

{
  "modelId": "model_abc",
  "messages": [{"role": "user", "content": "Extract: iPhone 15 Pro costs $999"}],
  "responseFormat": {
    "type": "json_schema",
    "json_schema": {
      "name": "product",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "price": {"type": "number"}
        },
        "required": ["name", "price"]
      }
    }
  }
}

Response content will be valid JSON: {"name": "iPhone 15 Pro", "price": 999}

Tool Calling (Function Calling)

Define tools the model can invoke:

{
  "modelId": "model_abc",
  "messages": [{"role": "user", "content": "Weather in Paris?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather",
      "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
      }
    }
  }]
}

When the model calls a tool, respond with tool results:

{
  "chatId": "chat_xyz",
  "messages": [
    {"role": "user", "content": "Weather in Paris?"},
    {"role": "assistant", "toolCalls": [{"id": "call_1", "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}}]},
    {"role": "tool", "toolCallId": "call_1", "content": "{\"temp\": 18, \"conditions\": \"sunny\"}"}
  ]
}

Continuing Conversations

Use chatId to continue an existing chat:

{
  "chatId": "chat_xyz789",
  "messages": [{"role": "user", "content": "Tell me more"}]
}

Previous messages are automatically included in context.

FAQ

How does streaming work?

The API sends Server-Sent Events (SSE) with incremental content. Each data: line contains a JSON object with delta (new text) and finished (boolean). Parse events as they arrive for real-time display.

What’s the max conversation length?

Limited by the model’s context window. GPT-4o supports 128K tokens. The API automatically truncates old messages if needed.

Are responses cached?

No. Each request generates a fresh completion. For idempotent behavior, use the same Idempotency-Key header.

How do I count tokens before sending?

Use a tokenizer library (tiktoken for OpenAI, anthropic-tokenizer for Claude) to estimate. The response includes actual token counts in usage.

Models API

Create and configure models

Knowledge Bases

Add documents for RAG

Getting Started

API Reference

Guides

Billing

Chat API

Quick Start

Streaming Responses

AI SDK Format

Reasoning (Extended Thinking)

Structured Outputs (JSON Mode)

Tool Calling (Function Calling)

Continuing Conversations

FAQ

How does streaming work?

What’s the max conversation length?

Are responses cached?

How do I count tokens before sending?

Models API

Knowledge Bases

Getting Started

API Reference

Guides

Billing

​Quick Start

​Streaming Responses

​AI SDK Format

​Reasoning (Extended Thinking)

​Structured Outputs (JSON Mode)

​Tool Calling (Function Calling)

​Continuing Conversations

​FAQ

​How does streaming work?

​What’s the max conversation length?

​Are responses cached?

​How do I count tokens before sending?

​Related

Models API

Knowledge Bases

Quick Start

Streaming Responses

AI SDK Format

Reasoning (Extended Thinking)

Structured Outputs (JSON Mode)

Tool Calling (Function Calling)

Continuing Conversations

FAQ

How does streaming work?

What’s the max conversation length?

Are responses cached?

How do I count tokens before sending?

Related