Skip to content

Managing Documents

Documents are the raw content that powers your Knowledge Base. After you upload a file, the platform extracts its text, splits it into chunks, generates vector embeddings, and stores everything for fast retrieval. This page covers the full document lifecycle.

Supported Formats

FormatExtensionNotes
PDF.pdfText-based PDFs are fully supported. Scanned PDFs require OCR-capable processing.
Word.docxMicrosoft Word documents. Legacy .doc format is not supported.
Plain Text.txtAny UTF-8 encoded text file.
HTML.htmlHTML tags are stripped; text content is extracted.
CSV.csvEach row is treated as a separate passage. Headers are preserved as context.
Markdown.mdHeadings and structure are preserved during chunking.

TIP

Markdown and HTML files tend to produce the best chunking results because their structure (headings, sections) provides natural boundaries for splitting text.

Uploading Documents

  1. Open the Knowledge Base detail view by clicking its name in the list.
  2. Navigate to the Documents tab.
  3. Click Upload Documents in the top-right corner.
  4. Drag and drop files onto the upload area, or click to browse your file system.
  5. You can upload multiple files at once. Each file is processed independently.
  6. Click Upload to begin processing.
imageDocument upload interface showing drag-and-drop area with file list, file sizes, and Upload button
The document upload interface

File Size Limits

ConstraintLimit
Maximum file size50 MB per file
Maximum batch upload20 files at once
Maximum total KB storage2 GB per Knowledge Base

Document Processing Status

After upload, each document moves through a processing pipeline. The status column in the documents table shows where it is:

StatusIconDescription
QueuedClockThe document is waiting to be processed.
ProcessingSpinnerText extraction, chunking, and embedding are in progress.
IndexedGreen checkProcessing is complete. The document's chunks are searchable.
ErrorRed alertProcessing failed. Click the status to view the error message.
imageDocuments table showing status column with Queued (clock icon), Processing (spinner), Indexed (green check), and Error (red alert) indicators
Document processing status indicators

WARNING

Documents in Error status are not included in search results. Review the error details and re-upload or fix the source file. Common errors include corrupt PDF files, unsupported encodings, and files that exceed the size limit.

Viewing Document Details

Click any document row to open its detail panel. The panel shows:

  • Metadata -- File name, upload date, file size, MIME type.
  • Processing Info -- Time taken to process, number of chunks generated, embedding model used.
  • Chunks Preview -- A paginated list of all text chunks extracted from the document, with their chunk index and token count.

Viewing Chunks

The Chunks tab within the document detail panel lets you browse individual chunks. Each chunk displays:

FieldDescription
IndexThe sequential position of the chunk within the document
ContentThe raw text content of the chunk
TokensToken count for this chunk
MetadataSource page number (for PDFs), section heading, or row number (for CSVs)

This view is useful for verifying that your chunking settings produce segments that capture complete, meaningful passages.

imageChunks viewer showing a paginated list of text chunks with index number, content preview, token count, and source metadata for each chunk
Browsing document chunks

Re-Indexing Documents

You may need to re-index documents when:

  • You change the chunking settings (chunk size or overlap) in RAG Settings.
  • You switch to a different embedding model.
  • A document previously failed processing and the issue has been resolved.

To re-index:

  1. Select one or more documents using the checkboxes.
  2. Click Re-Index Selected from the actions menu.
  3. Confirm the operation. Existing chunks for those documents will be deleted and regenerated.

Re-indexing runs in the background. The document status will change to Processing and return to Indexed when complete.

TIP

If you change the embedding model on the Knowledge Base, use the Re-Index All button to re-process every document at once rather than selecting them individually.

Deleting Documents

To remove documents from a Knowledge Base:

  1. Select one or more documents using the checkboxes.
  2. Click Delete Selected from the actions menu.
  3. Confirm the deletion.

Deleting a document permanently removes its text chunks and embeddings from the vector store. This action cannot be undone. The original file is also removed from storage.

Best Practices

  • Keep documents focused. A single document covering one topic produces better retrieval results than a large document covering many topics.
  • Update rather than append. If content changes, replace the old document rather than uploading a new version alongside it. Duplicate content can dilute search results.
  • Review chunks after upload. Spot-check the chunks preview to confirm that important passages are not split awkwardly across chunk boundaries. Adjust chunk size and overlap if needed.
  • Use descriptive file names. File names appear in the documents list and in chunk metadata, making it easier to trace search results back to their source.

OmniBots AI Bot Platform