Skip to content

External Document Sources

Instead of manually uploading files, you can connect an external storage provider and have documents synced into your Knowledge Base automatically. This keeps your KB content up-to-date as files change in your organization's storage systems.

Supported Providers

ProviderAuthenticationTypical Use Case
SharePointOAuth 2.0 (Microsoft Graph)Corporate document libraries and intranet content
Google DriveOAuth 2.0 (Google API)Shared drives and team folders
Amazon S3Access Key + Secret KeyDocument archives and data lake exports
DropboxOAuth 2.0Team file sharing and collaboration
OneDriveOAuth 2.0 (Microsoft Graph)Personal and shared business files
BoxOAuth 2.0 (Box API)Enterprise content management
Azure Blob StorageConnection String or SAS TokenCloud-native document storage

Prerequisites

Before connecting an external source to a Knowledge Base, you must create a storage integration in your tenant settings:

  1. Navigate to Settings > Integrations.
  2. Click Add Integration and select the Storage category.
  3. Choose your provider (e.g., SharePoint, Google Drive).
  4. Enter the required credentials for your provider.
  5. Click Test Connection to verify access.
  6. Save the integration.

WARNING

Storage integration credentials are encrypted at rest using your tenant's KMS key. However, ensure that the service account or OAuth token you provide has read-only access scoped to only the folders you intend to sync. Never use administrative credentials with broad write permissions.

Adding an External Source

  1. Open the Knowledge Base detail view.
  2. Navigate to the External Sources tab.
  3. Click Add Source.
  4. Select the storage integration you configured in Settings.
  5. Browse or enter the root folder path -- the folder in the external system that contains the documents you want to sync.
  6. Configure file filters and sync schedule (see below).
  7. Click Save Source.
imageExternal source configuration form showing integration selector, root folder path browser, file extension filters, and sync schedule dropdown
Configuring an external document source

The first sync begins immediately after saving. Subsequent syncs follow the schedule you configure.

File Filters

File filters let you control which documents are pulled from the external source. This prevents irrelevant or oversized files from cluttering your Knowledge Base.

FilterDescriptionExample
File ExtensionsOnly sync files with these extensions.pdf, .docx, .md
Include PathsOnly sync files within these subfolder paths/policies, /product-docs
Exclude PathsSkip files within these subfolder paths/archive, /drafts
Max File SizeSkip files larger than this threshold50 MB

TIP

Start with a narrow set of extensions (e.g., .pdf, .docx, .md) and a specific subfolder path. You can always broaden the filters later once you have confirmed that the sync produces good results.

Sync Schedules

ScheduleBehavior
ManualSync only when you click the Sync Now button.
HourlyRuns every 60 minutes. Best for frequently changing content.
DailyRuns once per day at midnight UTC. Suitable for most use cases.
WeeklyRuns every Sunday at midnight UTC. Good for stable, infrequently updated document sets.

You can trigger a manual sync at any time regardless of the configured schedule by clicking Sync Now on the source row.

Monitoring Sync Status

The External Sources tab displays the following for each connected source:

ColumnDescription
ProviderThe storage provider type and integration name
Root PathThe folder being synced
ScheduleThe configured sync frequency
Last SyncTimestamp of the most recent completed sync
Files SyncedTotal number of files currently indexed from this source
StatusSyncing, Idle, or Error

Click a source row to view detailed sync history, including per-file status and any errors encountered.

imageExternal Sources tab showing source rows with provider icon, root path, schedule, last sync timestamp, files synced count, and status badge
Monitoring sync status and schedule

URL Mapping

When documents are synced from an external source, the platform stores a URL mapping that links each chunk back to the original file location in the provider. This means that when the bot retrieves a passage, it can include a link to the source document in the response -- allowing end users to access the full document directly.

URL mappings are generated automatically based on the provider's file URLs (e.g., SharePoint document links, Google Drive share URLs, S3 pre-signed URLs).

Updating and Removing Sources

  • Edit -- Click the edit icon on a source row to change filters, schedule, or root path. Changes take effect on the next sync.
  • Delete -- Click the delete icon to remove a source. All documents that were synced from this source will be deleted from the Knowledge Base, including their chunks and embeddings. This action cannot be undone.

WARNING

Deleting an external source removes all of its synced documents immediately. If the Knowledge Base is actively used by a bot, this may cause the bot to lose access to content it was previously able to reference. Verify that the source is no longer needed before deleting.

OmniBots AI Bot Platform