Dataiku Answers

Overview

Dataiku Answers is a packaged, scalable web application that enables enterprise-ready Large Language Model (LLM) chat and Retrieval Augmented Generation (RAG) to be deployed at scale across business processes and teams.

Homepage

Key Features

  • Simple and Scalable
    Connect Dataiku Answers to your choice of LLM, Knowledge Bank, or Dataset in a few clicks, and start sharing.
  • Customizable
    Set parameters and filters specific to your needs. Additionally, you can customize the visual web application.
  • Governed
    Monitor conversations and user feedback to control and optimize LLM impact in your organization.
  • Mobile-Responsive

    The visual web application is fully responsive, ensuring optimal usability on mobile devices. For seamless operation, it requires to be directly accessed.

Whether you need to develop an Enterprise LLM Chat in minutes or deploy RAG at scale, Dataiku Answers is a powerful value accelerator with broad customization options to embed LLM chat usage fully across business processes.

Getting Access

Dataiku Answers Plugin is available on demand through your Dataiku counterparts (please contact your Dataiku Customer Success Manager or Sales Engineer). Made available via a zip file, once installed it gives you access to a fully built Visual Webapp which can be used within your choice of Dataiku Projects:

VisualWebapp

Configuration

Introduction

This guide details the setup of the Dataiku Answers, outlining the steps to configure conversation logging, document management, and interactive chat functionalities using a Large Language Model (LLM).

Requirements

Dataiku version

  • Dataiku 12.4.1 and above. The minimum recommended Dataiku version is 12.5. The Last Dataiku version is always the best choice to leverage the latest plugin capabilities fully..

  • Available for both Dataiku Cloud and Self-Managed.

Python version

Dataiku Answers is compatible with Python versions between 3.8 and 3.11 under the following requirements:

General dependencies

Flask==3.0.1
Flask-Cors==4.0.0
flask-socketio==5.3.6
langchain
langchain-community
pydantic<2
chromadb<0.5.4
pysqlite3-binary; platform_system == "Linux"
faiss-cpu
pinecone-client>3,<=4.1
qdrant_client
protobuf==3.20.*
lingua-language-detector==2.0.2
  • Dependencies for specific Python versions

    protobuf==3.20.*;  python_version < '3.11'
    
    grpcio-tools==1.49.0;  python_version >= '3.11'
    protobuf==4.21.3;  python_version >= '3.11'
    
  • Infrastructure

    • SQL Datasets: Logging and feedback datasets must be SQL datasets for compatibility with the plugin’s storage mechanisms.

      • PostgreSQL

      • Snowflake

      • Redshift

      • MS SQL Server

      • BigQuery

      • Databricks

  • Knowledge Bank Configuration: If a Knowledge Bank is used, the web application must run locally on Dataiku DSS, which does not affect scalability despite the shift from a containerized environment.

  • Streaming: The plugin seamlessly enables for answers to be streamed when supported by the configured LLM, requiring only a DSS version of 12.5.0 or higher with no additional setup.

    Currently working with openai GPT family and bedrock Anthropic Claude and Amazon Titan.

Conversations Store Configuration

Dataiku Answers allows you to store all conversations for oversight and usage analysis. Flexible options allow you to define storage approach and mechanism.

Conversation History Dataset

Create a new or select an existing SQL dataset for logging queries, responses, and associated metadata (LLM used, Knowledge Bank, feedback, filters, etc.).

ConversationHistory

Index the chat history dataset

Addition of an index to the conversation history dataset to optimize the performance of the plugin. Indexing is only beneficial for specific database types. It is recommended to consult the database documentation for more information and only change if you are certain it will improve performance.

Conversation Deletion

Toggle ‘Permanent Delete’ to permanently delete conversations or keep them marked as deleted, maintaining a recoverable archive.

PermanentDelete

Feedback Choices

Configure positive and negative feedback options, enabling end-users to interact and rate their experience.

FeedbackChoices

Document Folder

Choose a folder to store user-uploaded documents and LLM generated images.

DocumentUploads

Overall Feedback collection feature

As you roll out chat applications in your organization, you can include a feedback option to improve understanding of feedback, enablement needs, and enhancements.

General Feedback Dataset

In addition to conversation-specific feedback, configure a dataset to capture general feedback from users. This dataset can provide valuable insights into the overall user experience with the plugin.

GeneralFeedback

LLM configuration

Connect each instance of Dataiku Answers to your choice of LLM, powered by Dataiku’s LLM Mesh

LLM Selection

Select from the LLMs configured in Dataiku DSS Connections.

Maximum number of LLM output tokens

Set the maximum number of output tokens that the LLM can generate for each query. To set this value correctly you should consult the documentation of you LLM provider. Having the value set too low can mean that answers are cut off, while having it set too high can lead to increased costs.

Configure your LLM when no knowledge bank or table retrieval is required

Tailor the prompt that will guide the behavior of the underlying LLM. For example, if the LLM is to function as a life sciences analyst, the prompt could instruct it not to use external knowledge and to structure the responses in a clear and chronological order, with bullet points for clarity where possible. This prompt is only used when no retrieval is performed.

ConfigureLLMNoRetrieval

Advanced Prompt Setting

Configure your Conversation system prompt

For more advanced configuration of the LLM prompt, you can provide a custom system prompt or override the prompt in charge of guiding the LLM when generating code. You need to enable the advanced settings option as shown below.

ConfigureConversationSystemPrompt

Enable Image generation for users

This checkbox allows you to activate the image generation feature for users. Once enabled, additional settings will become available.

Note

Important Requirements:
  • An upload folder is necessary for this feature to function, as generated images will be stored there.

  • This feature works only with DSS version >= 13.0.0

Users can adjust the following settings through the UI
  • Image Height

  • Image Width

  • Image Quality

  • Number of Images to Generate

The user settings will be passed to the image generation model. If the selected model does not support certain settings, the image generation will fail. Any error messages generated by the model will be forwarded to the user in English, as we do not translate the model’s responses.

Image generation LLM

The language model to use for image generation. This is mandatory when the image generation feature is enabled.

ImageGeneration

Note

Image generation is available with image generation models supported in Dataiku LLM Mesh; this includes:
  1. OpenAI (DALL-E 3)

  2. Azure OpenAI (DALL-E 3)

  3. Google Vertex (Imagen 1 and Imagen 2)

  4. Stability AI (Stable Image Core, Stable Diffusion 3.0, Stable Diffusion 3.0 Turbo)

  5. Bedrock Titan Image Generator

  6. Bedrock Stable Diffusion XL 1

Configure the query builder prompt for image generation

Image generation begins by the main chat model creating an image generation query based on the user’s input and history. You can include a prompt for guidelines and instructions on building this query. Only modify this if you fully understand the process.

Document Upload

You can upload multiple files of different types, enabling you to ask questions about each using the answers interface.

DocumentUploadUi

The two main methods that LLMs can use to understand the documents are:

  1. Viewing as an image (multimodal).

  2. Reading the extracted text (no images).

Note

Important Requirements:
  • Dataiku >= 13.0.2 required for method 1 support of anthropic models.

  • Dataiku >= 12.5.0 required for method 1 support of all other supported models.

Method 1 is only available for multimodal LLMs such as OpenAI Vision or Gemini Pro. It can be used for image files or PDFs. Method 2 is supported on all LLMs and all file types that contain text. Consideration needs to be taken with both methods to avoid exceeding the context window of the LLM you are using. The following parameters will help you manage this.

Maximum upload file size in MB

Allows you to set the file size limit for each uploaded file. The default value is 15 MB; however, some service providers may have lower limits.

Maximum number of files that can be uploaded at once

This parameter controls the number of documents that the LLM can interact with simultaneously using both methods.

UploadFileLimits

Send PDF pages as images instead of extracting text

This parameter allows the LLM to view each page using Method 1. It is most useful when the pages contain visual information such as charts, images, tables, diagrams, etc. This will increase the quality of the answers that the LLM can provide but may lead to higher latency and cost.

Maximum number of PDF pages to send as images

This parameter sets the threshold number of pages to be sent as images. The default value is 5. For example, if 5 concurrent files are allowed and each has a maximum of 5 pages sent as images, then 25 images are sent to the LLM (5 files x 5 pages each = 25 images). If any document exceeds this threshold, the default behavior is to use text extraction alone for that document. Understandably, this increases the cost of each query but can be necessary when asking questions about visual information.

DocsAsImages

Retrieval Method

In this section, you can decide how you will augment the LLM’s current knowledge with your external sources of information.

No retrieval. LLM answer only

No external sources of information will be provided to the LLM. (Default value). If you had settings related to the knowledge bank, they are still there; you just need to select knowledge bank retrieval mode.

Use knowledge bank retrieval (for searches within text)

The LLM will be provided with information taken from the Dataiku Knowledge Bank.

Use dataset retrieval (for specific answers from a table)

A SQL query will be crafted to provide information to the LLM.

RetrievalMethodSelection

Knowledge Bank Configuration

If you connect a Knowledge Bank to your Dataiku Answers, the following settings allow you to refine KB usage to optimize results.

Customize Knowledge Bank’s Name

This feature enables you to assign a specific name to the Knowledge Bank, which will be displayed to users within the web application whenever the Knowledge Bank is mentioned.

Use Knowledge Bank by default

With this setting, you can determine whether the Knowledge Bank should be enabled (‘Active’) or disabled (‘Not active’) by default.

KnowledgeBankSettings

Configure your LLM in the context of a Knowledge Bank

This functionality allows you to define a custom prompt that will be utilized when the Knowledge Bank is active.

ConfigureLLMWithKnowledgeBank

Configure your Retrieval System Prompt

You can provide a custom system prompt for a more advanced retrieval prompt configuration in a knowledge bank. To do so, you must enable the advanced settings option, as shown below.

ConfigureKnowledgeBankSystemPrompt

Let ‘Answers’ decide when to use the Knowledge Bank-based

Enabled by default, this option allows you to turn on or off the smart usage of the knowledge bank. If enabled, the LLM will decide when to use the knowledge bank based on its description and the user’s input. Disabled, the LLM will always use the knowledge bank when one is selected. We recommend keeping this option always enabled for optimal results.

SmartKnowledgeBankUsageSetting

Knowledge bank description

Adding a description helps the LLM assess whether accessing the Knowledge Bank is relevant for adding the necessary context to answer the question accurately. Suppose the knowledge is not required. It will not be used.

KnowledgeBankDescription

Number of Documents to Retrieve

Set how many documents the LLM should reference to generate responses.

Search type

You can choose between one of three prioritization techniques to determine which documents augment the LLM’s knowledge.

KnowledgeBankRetrievalSettings

Similarity score only

provides the top n documents based on their similarity to the user question by the similarity score alone.

Similarity score with threshold

will only provide documents to the LLM if they meet a predetermined threshold of similarity score [0,1]. It should be cautioned that this can lead to all documents being excluded and no documents given to the LLM.

Improve Diversity of Documents

Enable this to have the LLM pull from a broader range of documents. Specify the ‘Diversity Selection Documents’ number and adjust the ‘Diversity Factor’ to manage the diversity of retrieved documents.

KnowledgeBankDiversitySettings

Filter logged sources

Enable this option to control the number of data chunks recorded in the logging dataset. It is important to note that users can access only as many chunks as they are logged.

FilterLoggedSources

Display source extracts

Display or hide source extracts to the end user when using a knowledge bank. This option is enabled by default. Disable to hide them.

Select metadata to include in the context

The selected metadata will be added to the retrieved context along with document chunks if selected.

IncludeMetadataInContext

Enable LLM citations

The checkbox is available when you use a Knowledge Bank for RAG. Enabling this option allows you to get citations in the answers provided by the LLM during the text generation process. These citations will reference the IDs of the linked sources and quote the relevant part from these sources that allowed the text generation.

EnableLLMCitations

Filters and Metadata Parameters

All metadata stem from the configuration in the embed recipe that constructed the Knowledge Bank. Set filters, display options, and identify metadata for source URLs and titles.

  • Metadata Filters

    Choose which metadata tags can be used as filters.

  • Metadata Display:

    Select metadata to display alongside source materials.

  • URLs and Titles

    Determine which metadata fields should contain the URLs for source access and the titles for displayed sources.

    FiltersAndMetadataSettings

Dataset Retrieval Parameters

If you connect a Dataiku dataset to your Dataiku Answers, the following settings allow you to refine how this information is handled.

Note

It is strongly advised to use LLMs, which are intended for code generation. LLMs whose primary focus is creative writing will perform poorly on this task.

Choose connection

Choose the SQL connection containing datasets you would like to use to enrich the LLM responses. You can choose from all the connections used in the current Dataiku Project.

Customize how the connection is displayed

This feature enables you to assign a specific, user-friendly name for the connection. This name is displayed to users within the web application whenever the dataset is mentioned.

DatasetConnectionSettings

Choose dataset(s)

Select the datasets you would like the web application to access. You can choose among all the datasets you have selected previously. This means that all the datasets must be on the same connection.

Define Column Mappings

Here you can choose to suggest column mappings that the LLM can decide to follow. For example, in the mapping below, the LLM may choose to create a JOIN like this: LEFT JOIN Orders o ON o.EmployeeID = e.EmployeeID

DefineColumnMappings

Add a description to the dataset and the columns so the retrieval works effectively. This can be done in the following way:

  • For the dataset

    Select the dataset, click the information icon in the right panel, and click edit. Add the description in either text box.

    Warning

    The LLM can only run effective queries if it knows about the data it is querying. You should provide as much detail as possible to clarify what is available.

    AddDatasetDescription

  • For the columns

    Explore the dataset, then click settings and schema. Add a description for each column.

    Warning

    The LLM will not be able to view the entire dataset before creating the query, so you must describe the contents of the column in detail. For example, if defining a categorical variable, then describe the possible values (“Pass,” “Fail,” “UNKNOWN”) and any acronyms (e.g., “US” is used for the United States).

    Warning

    Ensure that data types match the type of questions that you expect to ask the LLM. For example, a datetime column should not be stored as a string. Adding the column descriptions here means the descriptions are tied to the data. As a result, changes to the dataset could cause the LLM to provide inaccurate information.

    AddColumnDescriptions

Configure your LLM in the context of the dataset

This functionality allows you to define a custom prompt that will be utilized when the dataset retrieval is active.

ConfigureLLMWithDataset

Configure your Retrieval System Prompt

You can provide a custom system prompt for a more advanced configuration of the retrieval prompt in a dataset. To do so, you must enable the advanced settings option, as shown below.

ConfigureDatasetSystemPrompt

Hard limit on SQL queries

By default, all queries are limited to 100 rows to avoid excessive data retrieval. However, it may be necessary to adapt this to the type of data being queried.

Display SQL in sources

Selecting this checkbox will add the SQL query to the source information displayed below the LLM’s answers.

DisplaySQLInSources

End User Interface Configuration

Adjust the web app to your business objectives and accelerate user value.

Titles and Headings

Set the title and subheading for clarity and context in the web app.

Placeholder Text

Enter a question prompt in the input field to guide users.

Example Questions

Provide example questions to illustrate the type of inquiries the chatbot can handle. You can add as many questions as you want

ExampleQuestions

User Profile

It allows you to configure a list of settings, excluding language, that users can fill out within the web app. You must set up an SQL user profile dataset (mandatory even if no settings are configured).

  • The language setting will be available by default for all users, initially set to the web app’s chosen language.

  • The language selected by the user will determine the language in which the LLM responses are provided.

  • Once the user has configured their settings, these will be included in the LLM prompt to provide more personalized responses.

  • You can define the settings using a list, where each setting consists of a key (the name of the setting) and a description (a brief explanation of the setting).

  • All settings will be in the form of strings for the time being.

Enable custom rebranding

If checked, the web app will apply your custom styling based on the theme name and different image files you specify in your setup. For more details, check the UI Rebranding capability section.

  • Theme name: The theme name you want to apply. Css, images and fonts will be fetched from the folder answers/YOUR_THEME

  • Logo file name: The file name of your logo that you added to answers/YOUR_THEME/images/image_name.extension_name and you want to choose as the logo in the web app.

  • Icon file name: Same as for the logo file name.

WebApplication Configuration

Language

You can choose the default language for the web application from the available options (English and Korean Ytd; more options to come).

HTTP Headers

Define HTTP headers for the application’s HTTP responses to ensure compatibility and security.

HTTPHeadersConfiguration

UI Rebranding capability

You can rebrand the web app by applying a custom style without changing the code by following these steps:

  • Navigate to ᎒᎒᎒ > Global Shared Code > Static Web Resources, create a folder named answers, and within this folder, create a subfolder corresponding to the theme that the web application settings will reference.. The structure should be as follows:

answers
   └── YOUR_THEME_NAME
       ├── custom.css
       ├── fonts
          └── fonts.css
       └── images
           ├── answer-icon.png
           └── logo.png
  • Example with fonts and images

    RebrandingExample

CSS changes

Add a custom.css file inside the answers folder; you can find an example below:

:root {
   /* Colors */
   --brand: #e8c280; /* Primary color for elements other than action buttons */
   --bg-examples-brand: rgba(255, 173, 9, 0.1); /* Examples background color (visible on landing page/new chat) */
   --bg-examples-brand-hover: rgba(255, 173, 9, 0.4); /* Examples background color on hover */
   --bg-examples-borders: #e8a323; /* Examples border color */
   --examples-question-marks: rgb(179, 124, 15); /* Color of question marks in the examples */
   --examples-text: #422a09; /* Color of the text in the examples */
   --text-brand: #57380c; /* Text color for the question card */
   --bg-query: rgba(245, 245, 245, 0.7); /* Background color for the question card */
   --bg-query-avatar: #F28C37; /* Background color for the question card avatar */
}

.logo-container .logo-img {
   height: 70%;
   width: 70%;
}

Fonts customization

  • First, create the fonts subfolder inside the folder answers.

  • Second, add fonts.css and define your font like below depending on the format you can provide (we support base64 or external URL):

    @font-face {
       font-family: "YourFontName";
       src: url(data:application/octet-stream;base64,your_font_base64);
    }
    
    @font-face {
       font-family: "YourFontName";
       src: url("yourFontPublicUrl") format("yourFontFormat");
    }
    
  • Finally, declare the font in your custom.css file:

    body,
    div {
       font-family: "YourFontName" !important;
    }
    

Images customization

create an images folder where you can import logo.* file to change the logo image inside the landing page, and answer-icon.* to change the icon of the AI answer.

Examples of current customizations

CustomizationExample1

CustomizationExample2

Final Steps

After configuring the settings, thoroughly review them to ensure they match your operational requirements. Conduct tests to verify that the chat solution operates as intended, documenting any issues or FAQs that arise during this phase.

Mobile Compatibility

The web application is designed to be responsive and fully compatible with mobile devices. To target mobile users effectively, configure the application as a Dataiku public web application and distribute the link to the intended users.

Dataiku Answers User Guide

Introduction

Dataiku Answers provides a powerful interface for querying a Large Language Model (LLM) capable of serving a wide array of domains and specialties. Tailored to your needs, it can deliver insights and answers by leveraging a configured Knowledge Bank for context-driven responses or directly accessing the LLM’s extensive knowledge base.

The application supports multimodal queries if configured with compatible LLMs.

Home Page Functionality

  • Query Input: The home page is centered around the query input box. Enter your question here, and the system will either:

    • Perform a semantic search within an active Knowledge Bank to provide the LLM with contextual data related to your query, enhancing the relevance and precision of the answer. Remember that queries need to be as precise as possible to maximize the quality of answers. Don’t hesitate to demand access to query guiding principles to support.

    • Send your question directly to the LLM if no Knowledge Bank is configured or activated, relying on the model’s inbuilt knowledge to provide an answer.

Setting Context with Filters

Setting filters can provide a more efficient and relevant search experience in a knowledge base, maximizing the focus and relevance of the query. This is particularly relevant for knowledge bases with large or diverse content types. To do so:

Metadata Filter Configuration

If metadata filters have been enabled, select your criteria from the available options. These filters pre-define the context, enabling more efficient retrieval from the Knowledge Bank, resulting in answers more aligned with your specific domain or area of interest.

MetadataFilterConfiguration

Conducting Conversations

Engaging with the LLM

To start a conversation with the LLM

  • Set any desired filters first to establish the context for your query.

  • Enter your question in the query box.

  • Review the provided information from the contextual data retrieved by the Knowledge Bank or the LLM.

    Remember, when a Knowledge Bank is activated and configured with your filters, it will enrich the LLM’s response with specific context, making your results more targeted and relevant. If part of the configuration, Dataiku Answers will allow you to see all sources and metadata for each response item, maximizing trust and understanding. This will include:

    • A thumbnail image.

    • A link to the original source.

    • A title for context.

    • An excerpt from the Knowledge Bank.

    • A list of associated metadata tags as set in the settings.

    • Interact with LLM to refine the answer, translate, summarize, or more.

Interaction with Filters and Metadata

  • Filters in Action

    If you’ve set filters before starting the conversation, they’ll be displayed alongside your question. This helps to preserve the context in the LLM’s response.

  • Filter Indicators

    A visual cue next to the ‘Settings’ icon indicates the presence and number of active filters, allowing you to keep track of the context parameters currently influencing the search results. FilterIndicators

Providing Feedback

We encourage users to contribute their experiences:

  • Feedback Button: Visible if general feedback collection is enabled; this feature allows you to express your thoughts on the plugin’s functionality and the quality of interactions. Feedback will be collected in a General Feedback Dataset and analyzed by your Answer set-up team. GeneralFeedbackButton

Conclusion

Dataiku Answers is designed to be user-centric, providing a seamless experience whether you’re seeking detailed responses with the help of a curated Knowledge Bank or Dataset or directly interfacing with the LLM. For additional support, please contact industry-solutions@dataiku.com.