Skip to main content
POST
/
v1
/
chats
Basic request
curl -X POST "https://api.cuadra.ai/v1/chats" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "84d6f2f1-27a5-4b5c-8a53-e2f7f1f5b0a3",
    "messages": [ { "role": "user", "content": "Hello!" } ]
  }'
"<string>"

Authorizations

Authorization
string
header
required

JWT token from Stytch B2B authentication (magic link, SSO, or M2M)

Headers

X-Stream-Format
enum<string>

Stream format for SSE responses. Set to ai-sdk to enable Vercel AI SDK UI Message Stream Protocol compatibility. When enabled, response includes x-vercel-ai-ui-message-stream: v1 header.

Available options:
ai-sdk

Body

application/json

Request schema for chat completions.

messages
MessageCreate · object[]
required

Messages to send to the model

Minimum array length: 1
chatId
string | null

Existing chat ID to continue conversation

Example:

"chat_abc123"

modelId
string | null

Identifier of the AI model for this request. If omitted and chatId is provided, the chat's existing model is used. Must match a model 'id' from the /v1/models API.

system_prompt
string | null

System-level instructions for the AI model.

ephemeral
boolean
default:false

Create temporary chat for testing. Ephemeral chats are automatically deleted.

Example:

false

stream
boolean
default:false

Enable streaming response

Example:

true

maxTokens
integer | null

Maximum number of tokens to generate for this response

Required range: 1 <= x <= 32000
Example:

128

temperature
any
responseFormat
Responseformat · object

Structured output format specification (AI models-compatible json_schema format). Enforces the AI response to match the specified JSON schema.

Example:
{
"json_schema": {
"name": "response",
"schema": {
"additionalProperties": false,
"properties": {
"summary": { "type": "string" },
"confidence": {
"maximum": 1,
"minimum": 0,
"type": "number"
}
},
"required": ["summary"],
"type": "object"
},
"strict": true
},
"type": "json_schema"
}
enableReasoning
boolean
default:false

Enable reasoning/thinking tokens for supported models. When enabled, the model will expose its thinking process in the response. Supported by: AI models (extended thinking), AI models o1/o3, AI models thinking models. Note: Reasoning tokens are billed separately and may significantly increase costs.

Examples:

true

false

reasoningBudget
integer | null

Maximum tokens for reasoning/thinking (only used when enableReasoning=true). Higher budgets allow deeper reasoning but increase latency and cost. Default: 10000 for AI models, varies by provider.

Required range: 1000 <= x <= 128000
Example:

10000

tools
ToolSchema · object[] | null

List of tools the model may call. Pass-through to AI provider. Tools are defined and executed by the client, not the server.

Example:
[
{
"function": {
"description": "Get current weather for a location",
"name": "get_weather",
"parameters": {
"properties": {
"location": { "type": "string" },
"unit": {
"enum": ["celsius", "fahrenheit"],
"type": "string"
}
},
"required": ["location"],
"type": "object"
}
},
"type": "function"
}
]
toolChoice

Controls tool selection. Options: 'auto' (model decides), 'none' (no tools), 'required' (must use a tool), or specific tool object.

Example:

"auto"

parallelToolCalls
boolean | null

Whether to allow parallel tool calls (AI models-specific, default true)

Response

Server-Sent Events stream of chat completion chunks (when stream=true)

The response is of type string.