Theme
Managing Documents
Documents are the raw content that powers your Knowledge Base. After you upload a file, the platform extracts its text, splits it into chunks, generates vector embeddings, and stores everything for fast retrieval. This page covers the full document lifecycle.
Supported Formats
| Format | Extension | Notes |
|---|---|---|
.pdf | Text-based PDFs are fully supported. Scanned PDFs require OCR-capable processing. | |
| Word | .docx | Microsoft Word documents. Legacy .doc format is not supported. |
| Plain Text | .txt | Any UTF-8 encoded text file. |
| HTML | .html | HTML tags are stripped; text content is extracted. |
| CSV | .csv | Each row is treated as a separate passage. Headers are preserved as context. |
| Markdown | .md | Headings and structure are preserved during chunking. |
TIP
Markdown and HTML files tend to produce the best chunking results because their structure (headings, sections) provides natural boundaries for splitting text.
Uploading Documents
- Open the Knowledge Base detail view by clicking its name in the list.
- Navigate to the Documents tab.
- Click Upload Documents in the top-right corner.
- Drag and drop files onto the upload area, or click to browse your file system.
- You can upload multiple files at once. Each file is processed independently.
- Click Upload to begin processing.
Document upload interface showing drag-and-drop area with file list, file sizes, and Upload button
File Size Limits
| Constraint | Limit |
|---|---|
| Maximum file size | 50 MB per file |
| Maximum batch upload | 20 files at once |
| Maximum total KB storage | 2 GB per Knowledge Base |
Document Processing Status
After upload, each document moves through a processing pipeline. The status column in the documents table shows where it is:
| Status | Icon | Description |
|---|---|---|
| Queued | Clock | The document is waiting to be processed. |
| Processing | Spinner | Text extraction, chunking, and embedding are in progress. |
| Indexed | Green check | Processing is complete. The document's chunks are searchable. |
| Error | Red alert | Processing failed. Click the status to view the error message. |
Documents table showing status column with Queued (clock icon), Processing (spinner), Indexed (green check), and Error (red alert) indicators
WARNING
Documents in Error status are not included in search results. Review the error details and re-upload or fix the source file. Common errors include corrupt PDF files, unsupported encodings, and files that exceed the size limit.
Viewing Document Details
Click any document row to open its detail panel. The panel shows:
- Metadata -- File name, upload date, file size, MIME type.
- Processing Info -- Time taken to process, number of chunks generated, embedding model used.
- Chunks Preview -- A paginated list of all text chunks extracted from the document, with their chunk index and token count.
Viewing Chunks
The Chunks tab within the document detail panel lets you browse individual chunks. Each chunk displays:
| Field | Description |
|---|---|
| Index | The sequential position of the chunk within the document |
| Content | The raw text content of the chunk |
| Tokens | Token count for this chunk |
| Metadata | Source page number (for PDFs), section heading, or row number (for CSVs) |
This view is useful for verifying that your chunking settings produce segments that capture complete, meaningful passages.
Chunks viewer showing a paginated list of text chunks with index number, content preview, token count, and source metadata for each chunk
Re-Indexing Documents
You may need to re-index documents when:
- You change the chunking settings (chunk size or overlap) in RAG Settings.
- You switch to a different embedding model.
- A document previously failed processing and the issue has been resolved.
To re-index:
- Select one or more documents using the checkboxes.
- Click Re-Index Selected from the actions menu.
- Confirm the operation. Existing chunks for those documents will be deleted and regenerated.
Re-indexing runs in the background. The document status will change to Processing and return to Indexed when complete.
TIP
If you change the embedding model on the Knowledge Base, use the Re-Index All button to re-process every document at once rather than selecting them individually.
Deleting Documents
To remove documents from a Knowledge Base:
- Select one or more documents using the checkboxes.
- Click Delete Selected from the actions menu.
- Confirm the deletion.
Deleting a document permanently removes its text chunks and embeddings from the vector store. This action cannot be undone. The original file is also removed from storage.
Best Practices
- Keep documents focused. A single document covering one topic produces better retrieval results than a large document covering many topics.
- Update rather than append. If content changes, replace the old document rather than uploading a new version alongside it. Duplicate content can dilute search results.
- Review chunks after upload. Spot-check the chunks preview to confirm that important passages are not split awkwardly across chunk boundaries. Adjust chunk size and overlap if needed.
- Use descriptive file names. File names appear in the documents list and in chunk metadata, making it easier to trace search results back to their source.
