Skip to content

RAG Settings

RAG settings control how the platform retrieves and ranks document chunks when a bot queries a Knowledge Base. Tuning these parameters affects the relevance, accuracy, and verbosity of your bot's answers. You can configure them per Knowledge Base under the Settings tab of the KB detail view.

imageRAG settings panel showing top_k slider, similarity threshold slider, chunk size and overlap fields, embedding model display, and search mode toggle
The RAG Settings panel

Retrieval Parameters

top_k -- Number of Chunks Returned

The top_k setting determines how many matching chunks are returned from the vector search and included as context in the LLM prompt.

ValueBehavior
3Returns the 3 most relevant chunks. Good for focused, concise answers.
5Default. Balances breadth and precision for most use cases.
10Returns more context. Useful for complex questions that span multiple sections.

Higher values provide more context but consume more tokens in the LLM prompt, which can increase response time and cost. Lower values are faster but may miss relevant information.

Similarity Threshold

The similarity threshold sets the minimum cosine similarity score a chunk must have to be included in results. Scores range from 0.0 (no similarity) to 1.0 (exact match).

ValueEffect
0.0No filtering. All top_k chunks are returned regardless of relevance.
0.5Moderate filtering. Excludes loosely related chunks.
0.7Default. Only clearly relevant chunks are returned.
0.85Strict filtering. Only highly relevant chunks pass. May return fewer than top_k results.

TIP

Start with the default threshold of 0.7. If the bot frequently says "I don't have information about that" for questions you know are covered, lower the threshold. If it returns irrelevant passages, raise it.

Chunking Settings

Chunking settings are configured at Knowledge Base creation time but can be adjusted later. Changing these values requires re-indexing all documents.

Chunk Size

The maximum number of tokens per chunk. This controls how much text each chunk contains.

SizeBest For
256Short, precise passages such as FAQ answers or glossary entries
512General-purpose documentation and articles (default)
1024Long-form content like legal contracts, research papers, or technical manuals
2048Very large context windows where each chunk should capture a full section

Smaller chunks improve precision -- the retrieved passage closely matches the question. Larger chunks improve recall -- more surrounding context is included, which helps the LLM understand nuance.

Chunk Overlap

The number of tokens shared between adjacent chunks. Overlap ensures that sentences at chunk boundaries are not split in a way that loses meaning.

OverlapGuideline
0No overlap. Fastest indexing but may split sentences.
50Default. Provides a reasonable boundary buffer.
100Good for dense technical content where context spans paragraphs.
200High overlap. Increases storage but maximizes boundary coverage.

WARNING

Changing chunk size or overlap requires a full re-index of all documents in the Knowledge Base. During re-indexing, the existing chunks remain searchable until the new chunks replace them, so there is no downtime.

Embedding Model

The embedding model converts text into vector representations for similarity search. This is selected when the Knowledge Base is created and can be changed later with a full re-index.

ProviderModelDimensionsNotes
OpenAItext-embedding-3-small1536Cost-effective, strong general performance
OpenAItext-embedding-3-large3072Higher accuracy, higher cost
Vertex AItext-embedding-005768Google Cloud native, good multilingual support
CustomSelf-hostedVariesBring your own model via a compatible API endpoint

The embedding model is tied to the AI integration configured in Settings > Integrations. To use a different model, you may need to add a new integration first.

Search Mode

OmniBots supports two search modes for querying the vector store:

ModeHow It WorksWhen to Use
SimilarityPure vector cosine similarity search. Fast and effective for semantic matching.Default. Works well for most natural language questions.
HybridCombines vector similarity with keyword (BM25) matching. Results are fused using reciprocal rank fusion.Use when your content contains domain-specific terms, product codes, or identifiers that benefit from exact keyword matching.

TIP

Hybrid mode is particularly effective for technical support bots where users reference specific error codes, model numbers, or configuration keys. The keyword component ensures exact matches rank highly even if the semantic similarity score alone would not surface them.

imageDiagram showing how different parameter combinations affect retrieval quality: low top_k with high threshold produces precise but narrow results, while high top_k with low threshold produces broad but potentially noisy results
How parameter tuning affects retrieval quality

Tuning Recommendations

Scenariotop_kThresholdChunk SizeSearch Mode
FAQ bot with short answers30.75256Similarity
General documentation assistant50.7512Similarity
Technical support with error codes50.6512Hybrid
Legal/compliance document search80.651024Hybrid
Research paper Q&A50.71024Similarity

Overriding Settings in the KB Search Node

The settings configured here serve as defaults for this Knowledge Base. When you add a KB Search node to a flow, you can override top_k, similarity threshold, and search mode on a per-node basis. This allows the same Knowledge Base to be queried with different parameters depending on the conversation context.

TIP

Use the KB-level settings as sensible defaults, then override at the node level only when a specific flow step needs different behavior -- for example, a summarization step that retrieves 10 chunks versus a quick lookup that retrieves 3.

OmniBots AI Bot Platform