Knowledge Bank Search tool

This tool searches for relevant documents in a Knowledge Bank.

Importantly, this tool is a search/retrieval tool. It does not perform “Retrieval-Augmented Generation”: it does not “generate an answer” but simply returns matching documents. Generating the answer is the responsibility of the calling agent.

Core configuration

You configure the Knowledge Bank to use for the tool. For more information on how to build Knowledge Banks, see documentation about Knwoledge bank.

Retrieval settings

The tool supports a variety of search options, some depending on the underlying vector store of the Knowledge Bank. The Knowledge Bank Search tool has the same retrieval options as the builtin RAG. See Advanced Search for details about the retrieval settings.

Sources

In order for the Chat UIs or your own application to properly display documents, the tool returns rich source items, that can include a title, text, URL (for building a link to the document, for example in your Sharepoint site) and thumbnail URL (for displaying an image next to the result).

All of these are configured by optionally selecting which meta stored in the Knowledge Bank holds the information. Stored meta are configured in the embedding recipes.

Global filter

The “Perform filtering” option allows the tool creator to define a filter that applies on the documents of the Knowledge Bank, in order for the tool to be restricted to a subset of the Knowledge Bank

Document-level security

The Embedding recipe, Knowledge Bank Search tool, and LLM Mesh caller (for example, your own application or Dataiku Chat UIs) can work together to provide document-level security: the ability to only return documents that the user “ultimately performing the call” is allowed to see.

This works through the concept of “security tokens”. Security tokens are arrays of strings that are attached to both documents and users. A user can view a document if they have at least one security token in common.

The most common kind of security tokens is user groups:

  • the security tokens of documents are the groups allowed to view them

  • the security tokens of users are the groups they belong to

Thus, a user can view a document if one of the user’s groups is allowed to see the document.

In order for document-level security to work, you need to:

  • Before embedding, create a column on your documents containing the security tokens, as a JSON array of strings

  • In the Embedding recipe, select which columns contains the security tokens

  • Dataiku automatically indexes the security tokens in the Knowledge Bank

  • In the Knowledge Bank Search Tool, enable security token filtering

  • Calls to the Knowledge Bank Search Tool now require caller security tokens to be passed. If no security tokens are passed, the search fails.

Passing security tokens to the tool is done via the “context” parameter, in a key called “callerSecurityTokens”. Dataiku Chat UIs do this automatically. Importantly, you must make sure to pass tokens of the “final” end-user, not the technical user simply calling the agent or tool.