Managing Documents

Documents are the raw content that powers your Knowledge Base. After you upload a file, the platform extracts its text, splits it into chunks, generates vector embeddings, and stores everything for fast retrieval. This page covers the full document lifecycle.

Supported Formats

Format	Extension	Notes
PDF	`.pdf`	Text-based PDFs are fully supported. Scanned PDFs require OCR-capable processing.
Word	`.docx`	Microsoft Word documents. Legacy `.doc` format is not supported.
Plain Text	`.txt`	Any UTF-8 encoded text file.
HTML	`.html`	HTML tags are stripped; text content is extracted.
CSV	`.csv`	Each row is treated as a separate passage. Headers are preserved as context.
Markdown	`.md`	Headings and structure are preserved during chunking.

TIP

Markdown and HTML files tend to produce the best chunking results because their structure (headings, sections) provides natural boundaries for splitting text.

Uploading Documents

Open the Knowledge Base detail view by clicking its name in the list.
Navigate to the Documents tab.
Click Upload Documents in the top-right corner.
Drag and drop files onto the upload area, or click to browse your file system.
You can upload multiple files at once. Each file is processed independently.
Click Upload to begin processing.

imageDocument upload interface showing drag-and-drop area with file list, file sizes, and Upload button

The document upload interface

File Size Limits

Constraint	Limit
Maximum file size	50 MB per file
Maximum batch upload	20 files at once
Maximum total KB storage	2 GB per Knowledge Base

Document Processing Status

After upload, each document moves through a processing pipeline. The status column in the documents table shows where it is:

Status	Icon	Description
Queued	Clock	The document is waiting to be processed.
Processing	Spinner	Text extraction, chunking, and embedding are in progress.
Indexed	Green check	Processing is complete. The document's chunks are searchable.
Error	Red alert	Processing failed. Click the status to view the error message.

imageDocuments table showing status column with Queued (clock icon), Processing (spinner), Indexed (green check), and Error (red alert) indicators

Document processing status indicators

WARNING

Documents in Error status are not included in search results. Review the error details and re-upload or fix the source file. Common errors include corrupt PDF files, unsupported encodings, and files that exceed the size limit.

Viewing Document Details

Click any document row to open its detail panel. The panel shows:

Metadata -- File name, upload date, file size, MIME type.
Processing Info -- Time taken to process, number of chunks generated, embedding model used.
Chunks Preview -- A paginated list of all text chunks extracted from the document, with their chunk index and token count.

Viewing Chunks

The Chunks tab within the document detail panel lets you browse individual chunks. Each chunk displays:

Field	Description
Index	The sequential position of the chunk within the document
Content	The raw text content of the chunk
Tokens	Token count for this chunk
Metadata	Source page number (for PDFs), section heading, or row number (for CSVs)

This view is useful for verifying that your chunking settings produce segments that capture complete, meaningful passages.

imageChunks viewer showing a paginated list of text chunks with index number, content preview, token count, and source metadata for each chunk

Browsing document chunks

Re-Indexing Documents

You may need to re-index documents when:

You change the chunking settings (chunk size or overlap) in RAG Settings.
You switch to a different embedding model.
A document previously failed processing and the issue has been resolved.

To re-index:

Select one or more documents using the checkboxes.
Click Re-Index Selected from the actions menu.
Confirm the operation. Existing chunks for those documents will be deleted and regenerated.

Re-indexing runs in the background. The document status will change to Processing and return to Indexed when complete.

TIP

If you change the embedding model on the Knowledge Base, use the Re-Index All button to re-process every document at once rather than selecting them individually.

Deleting Documents

To remove documents from a Knowledge Base:

Select one or more documents using the checkboxes.
Click Delete Selected from the actions menu.
Confirm the deletion.

Deleting a document permanently removes its text chunks and embeddings from the vector store. This action cannot be undone. The original file is also removed from storage.

Best Practices

Keep documents focused. A single document covering one topic produces better retrieval results than a large document covering many topics.
Update rather than append. If content changes, replace the old document rather than uploading a new version alongside it. Duplicate content can dilute search results.
Review chunks after upload. Spot-check the chunks preview to confirm that important passages are not split awkwardly across chunk boundaries. Adjust chunk size and overlap if needed.
Use descriptive file names. File names appear in the documents list and in chunk metadata, making it easier to trace search results back to their source.

Managing Documents ​

Supported Formats ​

Uploading Documents ​

File Size Limits ​

Document Processing Status ​

Viewing Document Details ​

Viewing Chunks ​

Re-Indexing Documents ​

Deleting Documents ​

Best Practices ​