Dataiku Documentation
  • Academy
    • Join the Academy
      Benefit from guided learning opportunities →
      • Quick Starts
      • Learning Paths
      • New Features
      • Certifications
      • Academy Discussions
  • Community
      • Explore the Community
        Discover, share, and contribute →
      • Learn About Us
      • Ask A Question
      • What's New?
      • Discuss Dataiku
      • Using Dataiku
      • Setup And Configuration
      • General Discussion
      • Plugins & Extending Dataiku
      • Product Ideas
      • Programs
      • Frontrunner Awards
      • Dataiku Neurons
      • Community Resources
      • Community Feedback
      • User Research
  • Documentation
    • Reference Documentation
      Comprehensive specifications of Dataiku →
      • User's Guide
      • Specific Data Processing
      • Automation & Deployment
      • APIs
      • Installation & Administration
      • Other Topics
  • Knowledge
    • Knowledge Base
      Articles and tutorials on Dataiku features →
      • User Guide
      • Admin Guide
      • Dataiku Solutions
      • Dataiku Cloud
  • Developer
    • Developer Guide
      Tutorials and articles for developers and coder users →
      • Getting Started
      • Concepts and Examples
      • Tutorials
      • API Reference
  • User's Guide
  • DSS concepts
  • Connecting to data
  • Exploring data
  • Charts
  • The Flow
  • Data preparation
  • Visual recipes
  • Code recipes
  • Schemas, storage types and meanings
  • Generative AI and LLM Mesh
    • Introduction
    • LLM connections
    • Running Hugging Face models
    • The Prompt Studio
    • Adding Knowledge to LLMs
      • Introduction to Knowledge and RAG
      • Initial setup
      • Your first RAG
      • Working with Vector stores
      • Advanced Search
        • Diversity of documents
        • Hybrid Search
      • Embedding and searching documents
      • RAG guardrails
      • GraphRAG
    • Chat UI
    • Building Agents
    • Guardrails
    • LLM Mesh API
    • Multimodal capabilities
    • Fine-tuning
    • Cost Control
    • Evaluating GenAI use cases
  • Machine learning
  • MLOps
  • Interactive statistics
  • Code notebooks
  • Code Studios
  • Webapps
  • Collaboration
  • AI Assistants
  • Dashboards
  • Workspaces
  • Stories
  • Data Catalog
  • Dataiku Applications
  • Working with partitions
  • DSS and SQL
  • DSS and Python
  • DSS and R
  • DSS and Spark
  • Code environments
  • Specific Data Processing
  • Time Series
  • Geographic data
  • Text & Natural Language Processing
  • Images
  • Audio
  • Video
  • Automation & Deployment
  • Metrics, checks and Data Quality
  • Automation scenarios
  • Production deployments and bundles
  • API Node & API Deployer: Real-time APIs
  • Governance
  • APIs
  • Python APIs
  • R API
  • Public REST API
  • Additional APIs
  • Installation & Administration
  • Installing and setting up
  • Elastic AI computation
  • DSS in the cloud
  • DSS and Hadoop
  • Metastore catalog
  • Operating DSS
  • Security
  • User Isolation
  • Email Notifications
  • Other topics
  • Plugins
  • Streaming data
  • Formula language
  • Custom variables expansion
  • Sampling methods
  • Accessibility
  • Troubleshooting
  • Release notes
  • Other Documentation
  • Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version 13 of DSS.
  • »
  • Generative AI and LLM Mesh »
  • Adding Knowledge to LLMs »
  • Advanced Search Open page in a new tab

Advanced Search¶

At its core, searching documents that are relevant in a Knowledge Bank (in order to perform RAG) is information retrieval.

While embedding text (also sometimes called “Semantic Search”) is a modern way of doing this, other information retrieval and options exist.

These options can be configured:

  • In the Augmented LLM settings

  • When creating a Vector Search Query Tool in an agent

Diversity of documents¶

If enabled, use the MMR algorithm to improve the diversity of the documents retrieved. Not supported by Azure AI Search.

  • Diversity selection documents: The total number of documents pre-selected before selecting the more diverse final documents.

  • Diversity vs Relevancy factor: 0 to 1, trade off between diversity vs relevance of results. Lower favors more diverse documents.

Hybrid Search¶

Combines both similarity search (default behaviour) and keyword search to retrieve more relevant documents. Only supported by Azure AI Search and Elasticsearch; and not compatible with the diversity option.

Additionally, both vector store offer advanced reranking capabilities, to enhance the mix of documents retrieved. Each has its own specific configuration. This advanced reranking requires a compatible subscription with these providers.

Vector Store

Advanced Reranking

Azure AI search

Uses Azure AI proprietary Semantic Ranker.

Elasticsearch

Uses advanced reranking leveraging RRF (Reciprocal Ranking Fusion). This accepts two parameters: Rank constant and Rank window size

Next Previous

© Copyright 2025, Dataiku

Built with Sphinx using a theme provided by Read the Docs.