Theme
RAG Settings
RAG settings control how the platform retrieves and ranks document chunks when a bot queries a Knowledge Base. Tuning these parameters affects the relevance, accuracy, and verbosity of your bot's answers. You can configure them per Knowledge Base under the Settings tab of the KB detail view.
RAG settings panel showing top_k slider, similarity threshold slider, chunk size and overlap fields, embedding model display, and search mode toggle
Retrieval Parameters
top_k -- Number of Chunks Returned
The top_k setting determines how many matching chunks are returned from the vector search and included as context in the LLM prompt.
| Value | Behavior |
|---|---|
3 | Returns the 3 most relevant chunks. Good for focused, concise answers. |
5 | Default. Balances breadth and precision for most use cases. |
10 | Returns more context. Useful for complex questions that span multiple sections. |
Higher values provide more context but consume more tokens in the LLM prompt, which can increase response time and cost. Lower values are faster but may miss relevant information.
Similarity Threshold
The similarity threshold sets the minimum cosine similarity score a chunk must have to be included in results. Scores range from 0.0 (no similarity) to 1.0 (exact match).
| Value | Effect |
|---|---|
0.0 | No filtering. All top_k chunks are returned regardless of relevance. |
0.5 | Moderate filtering. Excludes loosely related chunks. |
0.7 | Default. Only clearly relevant chunks are returned. |
0.85 | Strict filtering. Only highly relevant chunks pass. May return fewer than top_k results. |
TIP
Start with the default threshold of 0.7. If the bot frequently says "I don't have information about that" for questions you know are covered, lower the threshold. If it returns irrelevant passages, raise it.
Chunking Settings
Chunking settings are configured at Knowledge Base creation time but can be adjusted later. Changing these values requires re-indexing all documents.
Chunk Size
The maximum number of tokens per chunk. This controls how much text each chunk contains.
| Size | Best For |
|---|---|
256 | Short, precise passages such as FAQ answers or glossary entries |
512 | General-purpose documentation and articles (default) |
1024 | Long-form content like legal contracts, research papers, or technical manuals |
2048 | Very large context windows where each chunk should capture a full section |
Smaller chunks improve precision -- the retrieved passage closely matches the question. Larger chunks improve recall -- more surrounding context is included, which helps the LLM understand nuance.
Chunk Overlap
The number of tokens shared between adjacent chunks. Overlap ensures that sentences at chunk boundaries are not split in a way that loses meaning.
| Overlap | Guideline |
|---|---|
0 | No overlap. Fastest indexing but may split sentences. |
50 | Default. Provides a reasonable boundary buffer. |
100 | Good for dense technical content where context spans paragraphs. |
200 | High overlap. Increases storage but maximizes boundary coverage. |
WARNING
Changing chunk size or overlap requires a full re-index of all documents in the Knowledge Base. During re-indexing, the existing chunks remain searchable until the new chunks replace them, so there is no downtime.
Embedding Model
The embedding model converts text into vector representations for similarity search. This is selected when the Knowledge Base is created and can be changed later with a full re-index.
| Provider | Model | Dimensions | Notes |
|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | Cost-effective, strong general performance |
| OpenAI | text-embedding-3-large | 3072 | Higher accuracy, higher cost |
| Vertex AI | text-embedding-005 | 768 | Google Cloud native, good multilingual support |
| Custom | Self-hosted | Varies | Bring your own model via a compatible API endpoint |
The embedding model is tied to the AI integration configured in Settings > Integrations. To use a different model, you may need to add a new integration first.
Search Mode
OmniBots supports two search modes for querying the vector store:
| Mode | How It Works | When to Use |
|---|---|---|
| Similarity | Pure vector cosine similarity search. Fast and effective for semantic matching. | Default. Works well for most natural language questions. |
| Hybrid | Combines vector similarity with keyword (BM25) matching. Results are fused using reciprocal rank fusion. | Use when your content contains domain-specific terms, product codes, or identifiers that benefit from exact keyword matching. |
TIP
Hybrid mode is particularly effective for technical support bots where users reference specific error codes, model numbers, or configuration keys. The keyword component ensures exact matches rank highly even if the semantic similarity score alone would not surface them.
Diagram showing how different parameter combinations affect retrieval quality: low top_k with high threshold produces precise but narrow results, while high top_k with low threshold produces broad but potentially noisy results
Tuning Recommendations
| Scenario | top_k | Threshold | Chunk Size | Search Mode |
|---|---|---|---|---|
| FAQ bot with short answers | 3 | 0.75 | 256 | Similarity |
| General documentation assistant | 5 | 0.7 | 512 | Similarity |
| Technical support with error codes | 5 | 0.6 | 512 | Hybrid |
| Legal/compliance document search | 8 | 0.65 | 1024 | Hybrid |
| Research paper Q&A | 5 | 0.7 | 1024 | Similarity |
Overriding Settings in the KB Search Node
The settings configured here serve as defaults for this Knowledge Base. When you add a KB Search node to a flow, you can override top_k, similarity threshold, and search mode on a per-node basis. This allows the same Knowledge Base to be queried with different parameters depending on the conversation context.
TIP
Use the KB-level settings as sensible defaults, then override at the node level only when a specific flow step needs different behavior -- for example, a summarization step that retrieves 10 chunks versus a quick lookup that retrieves 3.
