| Type | Extensions | Max Size |
|---|
| PDF | .pdf | 50 MB |
| Word | .docx | 50 MB |
| Text | .txt, .md | 50 MB |
| Data | .csv, .json | 50 MB |
Quick Start
1. Create a Dataset
curl -X POST https://api.cuadra.ai/v1/datasets \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: create-ds-001" \
-d '{"name": "Support KB", "description": "FAQs and guides"}'
2. Upload Documents
Use the Files API to upload documents, then associate them with the dataset.
3. Link to Model
curl -X POST https://api.cuadra.ai/v1/models/model_abc/datasets \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"datasetId": "ds_xyz", "usageType": "rag"}'
Document Processing
Documents are processed asynchronously after upload:
| Status | Description |
|---|
processing | Chunking and embedding in progress |
ready | Available for queries |
failed | Processing error (check file integrity) |
Processing time depends on document size. PDFs with complex layouts take longer. Poll the file status or use webhooks to know when processing completes.
Best Practices
Organize by topic
Create separate datasets for different knowledge domains (e.g., “Product Docs”, “Legal”, “HR Policies”). This improves retrieval relevance and lets you control which knowledge each model can access.
Keep documents focused
Prefer multiple focused documents over one large document. The chunking algorithm works best with well-structured content.
Use descriptive filenames
Filenames appear in source citations. Use descriptive names like password-reset-guide.pdf instead of doc123.pdf.