Cuadra AI - Connect, Train, and Deploy Your Custom AI Assistant

Authorizations

Authorization

string

header

required

JWT token from Stytch B2B authentication (magic link, SSO, or M2M)

Headers

X-Stream-Format

enum<string>

Stream format for SSE responses. Set to ai-sdk to enable Vercel AI SDK UI Message Stream Protocol compatibility. When enabled, response includes x-vercel-ai-ui-message-stream: v1 header.

Available options:

ai-sdk

Body

application/json

Request schema for chat completions.

messages

MessageCreate · object[]

required

Messages to send to the model

Minimum array length: 1

Show child attributes

chatId

string | null

Existing chat ID to continue conversation

Example:

"chat_abc123"

modelId

string | null

Identifier of the AI model for this request. If omitted and chatId is provided, the chat's existing model is used. Must match a model 'id' from the /v1/models API.

system_prompt

string | null

System-level instructions for the AI model.

ephemeral

boolean

default:false

Create temporary chat for testing. Ephemeral chats are automatically deleted.

Example:

false

stream

boolean

default:false

Enable streaming response

Example:

true

maxTokens

integer | null

Maximum number of tokens to generate for this response

Required range: 1 <= x <= 32000

Example:

128

temperature

any

responseFormat

Responseformat · object

Structured output format specification (AI models-compatible json_schema format). Enforces the AI response to match the specified JSON schema.

Example:

{
  "json_schema": {
    "name": "response",
    "schema": {
      "additionalProperties": false,
      "properties": {
        "summary": { "type": "string" },
        "confidence": {
          "maximum": 1,
          "minimum": 0,
          "type": "number"
        }
      },
      "required": ["summary"],
      "type": "object"
    },
    "strict": true
  },
  "type": "json_schema"
}

enableReasoning

boolean

default:false

Enable reasoning/thinking tokens for supported models. When enabled, the model will expose its thinking process in the response. Supported by: AI models (extended thinking), AI models o1/o3, AI models thinking models. Note: Reasoning tokens are billed separately and may significantly increase costs.

Examples:

true

false

reasoningBudget

integer | null

Maximum tokens for reasoning/thinking (only used when enableReasoning=true). Higher budgets allow deeper reasoning but increase latency and cost. Default: 10000 for AI models, varies by provider.

Required range: 1000 <= x <= 128000

Example:

10000

tools

ToolSchema · object[] | null

List of tools the model may call. Pass-through to AI provider. Tools are defined and executed by the client, not the server.

Show child attributes

Example:

[
  {
    "function": {
      "description": "Get current weather for a location",
      "name": "get_weather",
      "parameters": {
        "properties": {
          "location": { "type": "string" },
          "unit": {
            "enum": ["celsius", "fahrenheit"],
            "type": "string"
          }
        },
        "required": ["location"],
        "type": "object"
      }
    },
    "type": "function"
  }
]

toolChoice

Controls tool selection. Options: 'auto' (model decides), 'none' (no tools), 'required' (must use a tool), or specific tool object.

Example:

"auto"

parallelToolCalls

boolean | null

Whether to allow parallel tool calls (AI models-specific, default true)

Response

Server-Sent Events stream of chat completion chunks (when stream=true)

The response is of type string.

Getting Started

API Reference

Guides

Billing

Create or Continue Chat

Authorizations

Headers

Body

Response