Search settings¶
Retrieving relevant documents from a Knowledge Bank for RAG can be customized through three main settings: Filtering, Search type and Search mode. They can be configured in Augmented LLMs and Knowledge Bank Search Tools.
Filtering¶
The Filtering settings define which subset of the Knowledge Bank is available for retrieval. They can be used to narrow down the scope of the search to a specific segment of the stored knowledge.
Static filtering: Restricts the search to a fixed subset of the Knowledge Bank. The filter condition is defined once and remains the same for all queries.
Dynamic filtering: Allows the filter condition to vary for each query.
Query Strategy¶
The Query Strategy defines when and how DSS queries the Knowledge Bank. Mainly useful in chat-based contexts, where the system decides dynamically whether to retrieve new information for each user message.
Raw mode: Always queries the Knowledge Bank with the full query history in chat mode.
Smart mode: First evaluates whether retrieval would add value and can automatically reformulate the query to optimize results.
Retrieval¶
The Retrieval settings define how DSS selects documents from the Knowledge Bank. It includes several search types such as similarity-based search, threshold filtering, hybrid search, and diversity enhancement.
DSS leverages each vector store’s default/preferred similarity metric for vector similarity search:
Euclidean distance for Chroma, FAISS, OpenSearch
Cosine similarity for Milvus, Qdrant, Pinecone, Azure AI search, Elasticsearch
Dot product for Vertex Vector Search
When diversity is enabled, DSS uses the MMR (Maximal Marginal Relevance) algorithm to balance relevance and variety among retrieved documents.
Hybrid search combines similarity and keyword retrieval to improve coverage.
Note
Supported only by Azure AI Search, Elasticsearch, and Milvus (local), and not compatible with the diversity option.
Ranking and Reranking¶
To improve search relevance, you can refine and filter retrieved documents before they are sent to the LLM. DSS supports two methods for this: Native Rankers (specific to certain vector stores) and Model-Based Rerankers (using a dedicated model).
Native Rankers¶
Native rankers rely on the internal capabilities of specific vector stores to improve result ordering. This feature is available for some vector stores when Hybrid search is the selected search type. Native rankers are available for Azure AI Search and Elasticsearch with a compatible subscription, as well as Milvus (local).
Vector Store |
Ranker Type |
|---|---|
Azure AI Search |
Uses Azure’s proprietary Semantic Ranker. |
Elasticsearch |
Uses RRF (Reciprocal Rank Fusion). This accepts two parameters: Rank constant and Rank window size. |
Milvus (local) |
Uses Milvus implementation of RRF (Reciprocal Ranking Fusion) with the recommended value k=60. |
Model-Based Rerankers¶
Model-based reranking uses a specialized machine learning model to re-score and re-order the results returned by your retriever. Unlike native rankers, these are not tied to a specific vector store; they work by passing your retrieved documents through an additional model.
Supported connections for reranking:
Connection |
Reranker Capability |
|---|---|
Hugging Face |
Connects to the Hugging Face Hub, allowing you to run compatible open-source reranker models (such as BGE-Reranker). |
Amazon Bedrock |
Provides access to fully managed reranking models, including Cohere Rerank. |
Microsoft AI Foundry |
Provides access to Model-as-a-Service (MaaS) endpoints, including support for Cohere Rerank models. |
Note
Since model-based rerankers process results after retrieval, they may add latency to the pipeline but often provide higher accuracy for complex queries.
Response Generation¶
The Response Generation settings control how the LLM synthesizes the final answer using the documents retrieved and ranked in the previous steps.
Custom Generation Prompt: allows you to define the instructions sent to the LLM. You can specify the tone, style, and strictness of the response (e.g., instructing the model to say “I don’t know” if the context is insufficient).
Sources: let you to include metadata from the retrieved documents (such as file names, URLs, or page numbers) in the final output. This helps users verify the information provided by the model.
Standard: Appends the selected metadata field to the output “as is.”
With role: Assigns a specific semantic role (e.g., Title, URL) to the metadata, allowing the UI or client application to format it appropriately (for example, rendering a clickable link).