RAG Settings

RAG settings control how the platform retrieves and ranks document chunks when a bot queries a Knowledge Base. Tuning these parameters affects the relevance, accuracy, and verbosity of your bot's answers. You can configure them per Knowledge Base under the Settings tab of the KB detail view.

imageRAG settings panel showing top_k slider, similarity threshold slider, chunk size and overlap fields, embedding model display, and search mode toggle

The RAG Settings panel

Retrieval Parameters

top_k -- Number of Chunks Returned

The top_k setting determines how many matching chunks are returned from the vector search and included as context in the LLM prompt.

Value	Behavior
`3`	Returns the 3 most relevant chunks. Good for focused, concise answers.
`5`	Default. Balances breadth and precision for most use cases.
`10`	Returns more context. Useful for complex questions that span multiple sections.

Higher values provide more context but consume more tokens in the LLM prompt, which can increase response time and cost. Lower values are faster but may miss relevant information.

Similarity Threshold

The similarity threshold sets the minimum cosine similarity score a chunk must have to be included in results. Scores range from 0.0 (no similarity) to 1.0 (exact match).

Value	Effect
`0.0`	No filtering. All `top_k` chunks are returned regardless of relevance.
`0.5`	Moderate filtering. Excludes loosely related chunks.
`0.7`	Default. Only clearly relevant chunks are returned.
`0.85`	Strict filtering. Only highly relevant chunks pass. May return fewer than `top_k` results.

TIP

Start with the default threshold of 0.7. If the bot frequently says "I don't have information about that" for questions you know are covered, lower the threshold. If it returns irrelevant passages, raise it.

Chunking Settings

Chunking settings are configured at Knowledge Base creation time but can be adjusted later. Changing these values requires re-indexing all documents.

Chunk Size

The maximum number of tokens per chunk. This controls how much text each chunk contains.

Size	Best For
`256`	Short, precise passages such as FAQ answers or glossary entries
`512`	General-purpose documentation and articles (default)
`1024`	Long-form content like legal contracts, research papers, or technical manuals
`2048`	Very large context windows where each chunk should capture a full section

Smaller chunks improve precision -- the retrieved passage closely matches the question. Larger chunks improve recall -- more surrounding context is included, which helps the LLM understand nuance.

Chunk Overlap

The number of tokens shared between adjacent chunks. Overlap ensures that sentences at chunk boundaries are not split in a way that loses meaning.

Overlap	Guideline
`0`	No overlap. Fastest indexing but may split sentences.
`50`	Default. Provides a reasonable boundary buffer.
`100`	Good for dense technical content where context spans paragraphs.
`200`	High overlap. Increases storage but maximizes boundary coverage.

WARNING

Changing chunk size or overlap requires a full re-index of all documents in the Knowledge Base. During re-indexing, the existing chunks remain searchable until the new chunks replace them, so there is no downtime.

Embedding Model

The embedding model converts text into vector representations for similarity search. This is selected when the Knowledge Base is created and can be changed later with a full re-index.

Provider	Model	Dimensions	Notes
OpenAI	`text-embedding-3-small`	1536	Cost-effective, strong general performance
OpenAI	`text-embedding-3-large`	3072	Higher accuracy, higher cost
Vertex AI	`text-embedding-005`	768	Google Cloud native, good multilingual support
Custom	Self-hosted	Varies	Bring your own model via a compatible API endpoint

The embedding model is tied to the AI integration configured in Settings > Integrations. To use a different model, you may need to add a new integration first.

Search Mode

OmniBots supports two search modes for querying the vector store:

Mode	How It Works	When to Use
Similarity	Pure vector cosine similarity search. Fast and effective for semantic matching.	Default. Works well for most natural language questions.
Hybrid	Combines vector similarity with keyword (BM25) matching. Results are fused using reciprocal rank fusion.	Use when your content contains domain-specific terms, product codes, or identifiers that benefit from exact keyword matching.

TIP

Hybrid mode is particularly effective for technical support bots where users reference specific error codes, model numbers, or configuration keys. The keyword component ensures exact matches rank highly even if the semantic similarity score alone would not surface them.

imageDiagram showing how different parameter combinations affect retrieval quality: low top_k with high threshold produces precise but narrow results, while high top_k with low threshold produces broad but potentially noisy results

How parameter tuning affects retrieval quality

Tuning Recommendations

Scenario	top_k	Threshold	Chunk Size	Search Mode
FAQ bot with short answers	3	0.75	256	Similarity
General documentation assistant	5	0.7	512	Similarity
Technical support with error codes	5	0.6	512	Hybrid
Legal/compliance document search	8	0.65	1024	Hybrid
Research paper Q&A	5	0.7	1024	Similarity

Overriding Settings in the KB Search Node

The settings configured here serve as defaults for this Knowledge Base. When you add a KB Search node to a flow, you can override top_k, similarity threshold, and search mode on a per-node basis. This allows the same Knowledge Base to be queried with different parameters depending on the conversation context.

TIP

Use the KB-level settings as sensible defaults, then override at the node level only when a specific flow step needs different behavior -- for example, a summarization step that retrieves 10 chunks versus a quick lookup that retrieves 3.

RAG Settings ​

Retrieval Parameters ​

top_k -- Number of Chunks Returned ​

Similarity Threshold ​

Chunking Settings ​

Chunk Size ​

Chunk Overlap ​

Embedding Model ​

Search Mode ​

Tuning Recommendations ​

Overriding Settings in the KB Search Node ​