Answers

Overview

Dataiku Answers is a packaged, scalable web application that enables enterprise-ready Large Language Model (LLM) chat and Retrieval Augmented Generation (RAG) to be deployed at scale across business processes and teams.

Homepage

Key Features

  • Simple and Scalable
    Connect Dataiku Answers to your choice of LLM, Knowledge Bank, or Dataset in a few clicks, and start sharing.
  • Customizable
    Set parameters and filters specific to your needs. Additionally, you can customize the visual web application.
  • Governed
    Monitor conversations and user feedback to control and optimize LLM impact in your organization.
  • Mobile-Responsive

    The visual web application is fully responsive, ensuring optimal usability on mobile devices. For seamless operation, it must be directly accessed.

Whether you need to develop an Enterprise LLM Chat in minutes or deploy RAG at scale, Dataiku Answers is a powerful value accelerator with broad customization options to embed LLM chat usage fully across business processes.

Getting Access

Dataiku Answers plugin is available on demand through the dataiku plugin store. Once installed it gives you access to a fully built Visual Webapp which can be used within your chosen Dataiku Projects. Dataiku Answers versions prior to 2.0.0 can be provided by your Dataiku counterparts (please contact your Dataiku Customer Success Manager or Sales Engineer).

VisualWebapp

Configuration

Introduction

This guide details the setup of the Dataiku Answers, outlining the steps to configure conversation logging, document management, and interactive chat functionalities using a Large Language Model (LLM).

Requirements

Dataiku version

  • 12.4.1 and above is required for Dataiku Answers prior to version 2.0.0. The minimum recommended Dataiku version is 12.5. The latest Dataiku version is always the best choice to fully leverage the latest plugin capabilities.

  • 13.4.0 if required for Dataiku Answers 2.0.0 and above.

  • Available for both Dataiku Cloud and Self-Managed.

Infrastructure

  • Web Application Backend Settings:

    • The number of Processes must always be set to 0

    • Container: None - Use backend to execute

  • SQL Datasets: All datasets used by Dataiku Answers must be SQL datasets for compatibility with the plugin’s storage mechanisms.

    • PostgreSQL

    • Snowflake

    • Redshift

    • MS SQL Server

    • BigQuery

    • Databricks

  • Knowledge Bank Configuration: If a Knowledge Bank is used, the web application must run locally on Dataiku DSS. This does not affect scalability despite the shift from a containerized environment.

  • Streaming: The plugin seamlessly enables responses to be streamed when supported by the configured LLM, requiring only a DSS version of 12.5.0 or higher with no additional setup.

    Currently working with openai GPT family, bedrock Anthropic Claude and Amazon Titan.

Mandatory Settings

MandatoryConfig

Conversation History Dataset

Create a new or select an existing SQL dataset for logging queries, responses, and associated metadata (LLM used, Knowledge Bank, feedback, filters, etc.).

User Profile Dataset

This allows you to configure a list of settings that users can customize within the web app. User language choice is included by default. You must set up an SQL user profile dataset (mandatory even if no settings are configured).

LLM

Connect each instance of Dataiku Answers to your choice of LLM, powered by Dataiku’s LLM Mesh. Select from the LLMs configured in Dataiku DSS Connections.

Other Settings

Conversations Store Configuration

ConversationStoreConfig

Dataiku Answers allows you to store all conversations for oversight and usage analysis. Flexible options allow you to define storage approach and mechanism.

Index the chat history dataset

Addition of an index to the conversation history dataset to optimize the performance of the plugin. Indexing is only beneficial for specific database types. It is recommended to consult the database documentation for more information and only change if you are certain it will improve performance.

Conversation Deletion

Toggle ‘Permanent Delete’ to permanently delete conversations or keep them marked as deleted, maintaining a recoverable archive.

Feedback Choices

Configure positive and negative feedback options, enabling end-users to interact and rate their experience.

Document Folder

Choose a folder to store user-uploaded documents and LLM generated media.

Allow User Feedback

As you roll out chat applications in your organization, you can include a feedback option to improve understanding of feedback, enablement needs, and enhancements.

General Feedback Dataset

In addition to conversation-specific feedback, configure a dataset to capture general feedback from users. This dataset can provide valuable insights into the overall user experience with the plugin.

LLM Configuration

LlmConfig

Maximum Number of LLM Output Tokens

Set the maximum number of output tokens that the LLM can generate for each query. To set this value correctly, you should consult the documentation of you LLM provider. Having the value set too low can mean that answers are cut off, while having it set too high can lead to increased costs or errors.

Configure your LLM when no knowledge bank or table retrieval is required

Tailor the prompt that will guide the behavior of the underlying LLM. For example, if the LLM is to function as a life sciences analyst, the prompt could instruct it not to use external knowledge and to structure the responses in a clear and chronological order, with bullet points for clarity where possible. This prompt is only used when no retrieval is performed.

Advanced Prompt Setting

Configure your Conversation system prompt

For more advanced configuration of the LLM prompt, you can provide a custom system prompt or override the prompt in charge of guiding the LLM when generating code. You need to enable the advanced settings option as shown below.

Force Streaming Mode

When enabled the selected model is treated as being capable of streaming. This is particularly beneficial when working with custom models whose capabilities Dataiku Answers cannot automatically detect. Enabling this setting on a model that does not support streaming will result in errors.

Force Multi Modal Mode

When enabled the selected model is treated as being able to accept multi-modal queries. This is particularly beneficial when working with custom models whose capabilities Dataiku Answers cannot automatically detect. Enabling this settingson a model that does not support multi-modal queries will result in errors.

LLM For Title Generation

Set alternative LLM to generate the title for each conversation. Leaving it as None will default to using the main LLM. As this task is less demanding, you can use a smaller model to generate the titles.

LLM For Decisions Generation

Set alternative LLM to use to generate decision objects. As this task is more suited to models that are good at generating structured data, you can choose a model specialized for the task. Leaving as None will default to use the main LLM.

Note

The task of generating SQL queries is among the most demanding tasks for an LLM. It is recommended to use a higher performance model for decisions generation when performing dataset retrieval

Enable Image Generation for Users

This checkbox allows you to activate the image generation feature for users. Once enabled, additional settings will become available.

Note

Important Requirements:
  • An upload folder is necessary for this feature to function, as generated images will be stored there.

  • This feature works only with DSS version >= 13.0.0

Users can adjust the following settings through the UI
  • Image Height

  • Image Width

  • Image Quality

  • Number of Images to Generate

The user settings will be passed to the image generation model. If the selected model does not support certain settings, image generation will fail. Any error messages generated by the model will be forwarded to the user in English, as we do not translate the model’s responses.

Image Generation LLM

The language model used for image generation. This is mandatory when the image generation feature is enabled.

Note

Image generation is available with image generation models supported in Dataiku LLM Mesh; this includes:
  1. OpenAI (DALL-E 3)

  2. Azure OpenAI (DALL-E 3)

  3. Google Vertex (Imagen 1 and Imagen 2)

  4. Stability AI (Stable Image Core, Stable Diffusion 3.0, Stable Diffusion 3.0 Turbo)

  5. Bedrock Titan Image Generator

  6. Bedrock Stable Diffusion XL 1

Configure the Query Builder Prompt for Image Generation

Image generation begins by the main chat model creating an image generation query based on the user’s input and history. You can include a prompt for guidelines and instructions on building this query. Only modify this if you fully understand the process.

Weekly Image Generation Limit Per User

Set the number of images that each user can generate per week.

Document Upload

You can upload multiple files of different types, enabling you to ask questions about each using the answers interface.

DocumentUploadUi

The two main methods that LLMs can use to understand the documents are:

  1. Viewing as an image (multi-modal).

  2. Reading the extracted text (no images).

Note

Important Requirements:
  • Dataiku >= 13.0.2 required for method 1 support of anthropic models.

  • Dataiku >= 12.5.0 required for method 1 support of all other supported models.

Method 1 is only available for multi-modal LLMs such as OpenAI Vision or Gemini Pro. It can be used for image files or PDFs. Method 2 is supported on all LLMs and files containing plain text. Consideration needs to be taken with both methods to avoid exceeding the context window of the LLM you are using. The following parameters will help you manage this.

DocUploadConfig

Maximum upload file size in MB

Allows you to set the file size limit for each uploaded file. The default value is 15 MB; however, some service providers may have lower limits.

Maximum number of files that can be uploaded at once

This parameter controls the number of documents that the LLM can interact with simultaneously using both methods.

Send PDF pages as images instead of extracting text

This parameter allows the LLM to view each page using Method 1. It is most useful when the pages contain visual information such as charts, images, tables, diagrams, etc. This will increase the quality of the answers that the LLM can provide but may lead to higher latency and cost.

Maximum number of PDF pages to send as images

This parameter sets the threshold number of pages to be sent as images. The default value is 5. For example, if 5 concurrent files are allowed and each has a maximum of 5 pages sent as images, then 25 images are sent to the LLM (5 files x 5 pages each = 25 images). If any document exceeds this threshold, the default behavior is to use text extraction alone for that document. Understandably, this increases the cost of each query but can be necessary when asking questions about visual information.

Retrieval Method

RetrievalMethodSelection

In this section, you can decide how you will augment the LLM’s current knowledge with your external sources of information.

No Retrieval. LLM Answer Only: No external sources of information will be provided to the LLM. (Default value).

Use Knowledge Bank Retrieval (for searches within text): The LLM will be provided with information taken from the Dataiku Knowledge Bank.

Use Dataset Retrieval (for specific answers from a table): A SQL query will be crafted to provide information to the LLM.

Knowledge Bank Configuration

If you connect a Knowledge Bank to your Dataiku Answers, the following settings allow you to refine KB usage to optimize results. Currently, Dataiku answers supports the use of

  • Pinecone

  • ChromaDB

  • Qdrant

  • Azure AI search

  • ElasticSearch

Using FAISS is no longer recommended for use with Dataiku Answers but is still supported.

KBRetrievalConfig

Customize Knowledge Bank’s Name

This feature enables you to assign a specific name to the Knowledge Bank, which will be displayed to users within the web application whenever the Knowledge Bank is mentioned.

Activate the Knowledge Bank By Default

With this setting, you can determine whether the Knowledge Bank should be enabled (‘Active’) or disabled (‘Not active’) by default.

Let ‘Answers’ Decide When to use the Knowledge Bank-based

Enabled by default, this option allows you to turn on or off the smart use of the knowledge bank. If enabled, the LLM will decide when to use the knowledge bank based on its description and the user’s input. Disabled, the LLM will always use the knowledge bank when one is selected. We recommend keeping this option always enabled for optimal results.

Knowledge Bank Description

Adding a description helps the LLM assess whether accessing the Knowledge Bank is relevant for adding the necessary context to answer the question accurately. For example, in cases when Let ‘Answers’ decide when to use the Knowledge Bank is enabled and it is not required, it will not be used. Also, when the LLM is crafting a query it will use the description to determine which query to use based on the description.

Configure your LLM in the Context of a Knowledge Bank

This functionality allows you to define a custom prompt that will be utilized when the Knowledge Bank is active.

Configure your Retrieval System Prompt

You can provide a custom system prompt for a more advanced retrieval prompt configuration in a knowledge bank. To do so, you must enable the advanced settings option, as shown below.

Number of Documents to Retrieve

Set how many documents the LLM should reference to generate responses. The value is a maximum but can be less if other settings (e.g. a similarity threshold) reduce the final number of returned documents.

Search Type

You can choose between one of three prioritization techniques to determine which documents augment the LLM’s knowledge.

Note

Incorrectly setting these values can lead to suboptimal results or no results being returned.

  • Similarity score only: provides the top n documents based on their similarity to the user question by the similarity score alone.

  • Similarity score with threshold: will only provide documents to the LLM if they meet a predetermined threshold of similarity score [0,1]. It should be cautioned that this can lead to all documents being excluded and no documents given to the LLM.

  • Improve Diversity of Documents: enable this to have the LLM pull from a broader range of documents. Specify the ‘Diversity Selection Documents’ number and adjust the ‘Diversity Factor’ to manage the diversity of retrieved documents.

Filter Logged Sources

Enable this option to control the number of data chunks recorded in the logging dataset. It is important to note that users can access only as many chunks as are logged.

Display Source Extracts

Display or hide source extracts to the end user when using a knowledge bank. This option is enabled by default. Disable to hide them.

Select Metadata to Include in the Context

When enabled the selected metadata will be added to the retrieved context along with document chunks.

Enable LLM citations

The checkbox is available when you use a Knowledge Bank for RAG. Enabling this option allows you to get citations in the answers provided by the LLM during the text generation process. These citations will reference the IDs of the linked sources and quote the relevant part from these sources that allowed the text generation.

Filters and Metadata Parameters

All metadata stem from the configuration in the embed recipe that constructed the Knowledge Bank. Set filters, display options, and identify metadata for source URLs and titles.

FilterAndMetadataConfig

Metadata Filters: Choose which metadata tags can be used as filters. This feature allows you to run the vector query on a subset of the documents in the Knowledge Bank. Meaning a combination of conditional logic and vector search can be combined in a single query.

  • Development of this feature is ongoing and so it is only currently available for the following vector databases:

    • ChromaDB, Qdrant and Pinecone.

    • Metadata filtering is not supported with FAISS.

  • Auto filtering: Enabling LLM auto filtering means that the LLM will choose if the query would benefit from reducing the documents corpus using a conditional logic based query along with the regular vector based query. If enabled then the LLM will craft the query to create this filter.

Metadata Display: Select metadata to display alongside source materials.

URLs and Titles: Determine which metadata fields should contain the URLs for source access and the titles for displayed sources.

Dataset Retrieval Parameters

DBRetrievalConfig

If you connect a Dataiku dataset to your Dataiku Answers, the following settings allow you to refine how this information is handled.

Note

It is strongly advised to use LLMs which are intended for code generation. LLMs whose primary focus is creative writing will perform poorly on this task. The specific LLM used for query generation can be specified the LLM For Decisions Generation setting.

Choose Connection

Choose the SQL connection containing datasets you would like to use to enrich the LLM responses. You can choose from all the connections used in the current Dataiku Project but only one connection per Dataiku Answers web application.

Customize How the Connection is Displayed

This feature enables you to assign a specific, user-friendly name for the connection. This name is displayed to users within the web application whenever the dataset is mentioned.

Choose Dataset(s)

Select the datasets you would like the web application to access. You can choose among all the datasets from the connection you have selected previously. This means that all the datasets must be on the same connection.

Define Column Mappings

Here you can choose to suggest column mappings that the LLM can decide to follow. For example, in the mapping below, the LLM may choose to create a JOIN like this: LEFT JOIN Orders o ON o.EmployeeID = e.EmployeeID

DefineColumnMappings

Add a description to the dataset and the columns so the retrieval works effectively. This can be done in the following way:

For the dataset

Select the dataset, click the information icon in the right panel, and click edit. Add the description in either text box.

Warning

The LLM can only generate effective queries if it knows about the data it is querying. You should provide as much detail as possible to clarify what is available.

AddDatasetDescription

For the columns

Explore the dataset, then click settings and schema. Add a description for each column.

Warning

The LLM will not be able to view the entire dataset before creating the query, so you must describe the contents of the column in detail. For example, if defining a categorical variable, then describe the possible values (“Pass,” “Fail,” “UNKNOWN”) and any acronyms (e.g., “US” is used for the United States).

Warning

Ensure that data types match the type of questions that you expect to ask the LLM. For example, a datetime column should not be stored as a string. Adding the column descriptions here means the descriptions are tied to the data. As a result, changes to the dataset could cause the LLM to provide inaccurate information.

AddColumnDescriptions

Configure your LLM in the context of the dataset

This functionality allows you to define a custom prompt that will be utilized when the dataset retrieval is active.

Warning

This prompt is not used with the LLM what creates the the SQL so it is important not not make SQL suggests here as this will only lead to confusion. Instead use the Questions and their expected SQL queries section to add examples or give clear descriptions of how to handle the data in the column and dataset descriptions.

ConfigureLLMWithDataset

Configure your Retrieval System Prompt

You can provide a custom system prompt for a more advanced configuration of the retrieval prompt in a dataset. To do so, you must enable the advanced settings option, as shown below.

Questions and their Expected SQL Queries

When using dataset retrieval, you can provide examples of questions and their expected SQL queries. This will help the LLM understand how to interact with the dataset. The LLM will use these examples to generate SQL queries when the user asks questions about the dataset. This is particularly useful if there is a specific way to query the dataset that the LLM should follow. For example, a common way of handling dates, a specific way of joining tables or typical CTE (common table expressions) that are used.

-- Key: question: 'What is the rolling sum of products sold on Mondays?'
-- Value: answer:
WITH parsed_sales AS (
   SELECT
      TO_DATE(sale_date, 'YYYYMMDD') AS sale_date_parsed,
      product_sold
   FROM sales
),
mondays_sales AS (
   SELECT
      sale_date_parsed,
      product_sold
   FROM parsed_sales
   WHERE EXTRACT(DOW FROM sale_date_parsed) = 1  -- 1 = Monday
)
SELECT
   sale_date_parsed,
   product_sold,
   SUM(product_sold) OVER (
      ORDER BY sale_date_parsed
      ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
   ) AS rolling_sum
FROM mondays_sales
ORDER BY sale_date_parsed;
Hard Limit on SQL Queries

By default, all queries are limited to 100 rows to avoid excessive data retrieval. However, it may be necessary to adapt this to the type of data being queried.

Display SQL in Sources

Selecting this checkbox will add the SQL query to the source information displayed below the LLM’s answers.

Answers API Configuration

The Dataiku Answers API allows instance of Dataiku Agent Connect and other sources to make requests to a Dataiku Answers web application without using the Dataiku Answers UI. More information about how to to make requests to the Dataiku Answers API can be found in the dataiku answers API section of this documentation.

AnswersApiConfig

Messages History Dataset

Conversations consist of a series of messages. Each message is a user query and the response from the LLM. The messages history dataset is used to store these messages. As with the all Dataiku Answers datasets, this dataset must be an SQL dataset.

Description

This text description is used by the Dataiku Agent Connect LLM when deciding whether to this instance of Dataiku Answers to answer a user query in Agent Connect. The description should be a brief summary of the purpose of the Dataiku Answers instance and its capabilities. For more details and examples of how to use Agent Connect can be found in the documentation

End User Interface Configuration

Adjust the web app to your business objectives and accelerate user value.

EndUserConfig

Titles and Headings

Set the title and subheading for clarity and context in the web app.

Displayed Placeholder Text in the ‘Question Input’ Field

Enter a question prompt in the input field to guide users.

Example Questions

Provide example questions to illustrate the type of inquiries the chatbot can handle. You can add as many questions as you want

Enable Custom Rebranding

If checked, the web app will apply your custom styling based on the theme name and different image files you specify in your setup. For more details, check the UI Rebranding capability section.

  • Theme name: The theme name you want to apply. Css, images and fonts will be fetched from the folder answers/YOUR_THEME

  • Logo file name: The file name of your logo that you added to answers/YOUR_THEME/images/image_name.extension_name and you want to choose as the logo in the web app.

  • Icon file name: Same as for the logo file name.

User Profile Settings

UserProfileConfig

User profile Languages

  • The language setting will be available by default for all users, initially set to the web app’s chosen language.

  • The language selected by the user will determine the language in which the LLM responses are provided.

  • You can define the settings using a list, where each setting consists of a key (the name of the setting) and a description (a brief explanation of the setting).

User Profile Settings

  • Once the user has configured their settings, these will be included in the LLM prompt to provide more personalized responses.

  • All settings will be in the form of strings for the time being.

Add Profile Information to LLM Context

User profile information can now be included in the query that the LLM receives. This can mean that the LLM can provide more personalized responses based on the user’s settings.

WebApplication Configuration

WebApplicationConfig

Language

You can choose the default language for the web application from the available options (English, French, Spanish, German, Japanese and Korean).

HTTP Headers

Define HTTP headers for the application’s HTTP responses to ensure compatibility and security.

UI Rebranding capability

You can rebrand the web app by applying a custom style without changing the code by following these steps:

  • Navigate to ᎒᎒᎒ > Global Shared Code > Static Web Resources, create a folder named answers, and within this folder, create a subfolder corresponding to the theme that the web application settings will reference.. The structure should be as follows:

answers
   └── YOUR_THEME_NAME
       ├── custom.css
       ├── fonts
          └── fonts.css
       └── images
           ├── answer-icon.png
           └── logo.png

CSS Changes

Add a custom.css file inside the answers folder; you can find an example below:

:root {
   /* Colors */
   --brand: #e8c280; /* Primary color for elements other than action buttons */
   --bg-examples-brand: rgba(255, 173, 9, 0.1); /* Examples background color (visible on landing page/new chat) */
   --bg-examples-brand-hover: rgba(255, 173, 9, 0.4); /* Examples background color on hover */
   --bg-examples-borders: #e8a323; /* Examples border color */
   --examples-question-marks: rgb(179, 124, 15); /* Color of question marks in the examples */
   --examples-text: #422a09; /* Color of the text in the examples */
   --text-brand: #57380c; /* Text color for the question card */
   --bg-query: rgba(245, 245, 245, 0.7); /* Background color for the question card */
   --bg-query-avatar: #F28C37; /* Background color for the question card avatar */
}

.logo-container .logo-img {
   height: 70%;
   width: 70%;
}

Fonts Customization

  • First, create the fonts subfolder inside the folder answers.

  • Second, add fonts.css and define your font like below depending on the format you can provide (we support base64 or external URL):

    @font-face {
       font-family: "YourFontName";
       src: url(data:application/octet-stream;base64,your_font_base64);
    }
    
    @font-face {
       font-family: "YourFontName";
       src: url("yourFontPublicUrl") format("yourFontFormat");
    }
    
  • Finally, declare the font in your custom.css file:

    body,
    div {
       font-family: "YourFontName" !important;
    }
    

Images customization

create an images folder where you can import logo.* file to change the logo image inside the landing page, and answer-icon.* to change the icon of the AI answer.

Examples of Current Customizations

CustomizationExample1

CustomizationExample2

Final Steps

After configuring the settings, thoroughly review them to ensure they match your operational requirements. Conduct tests to verify that the chat solution operates as intended, documenting any issues or FAQs that arise during this phase.

Mobile Compatibility

The web application is designed to be responsive and fully compatible with mobile devices. To target mobile users effectively, configure the application as a Dataiku public web application and distribute the link to the intended users.

Dataiku Answers API

The Dataiku Answers API allows you to query a Dataiku Answers web app without using the Dataiku Answers UI. There are several important considerations when enabling the use of the Dataiku Answers API.

Warning

It is highly recommended authentication is required for all those accessing Dataiku Answers.

  • Require authentication: To ensure that your Dataiku Answers webapp cannot be accessed without being logged into Dataiku ensure that Require authentication is checked (instance administrators may override this setting and require authentication for all webapps).

RequireAuth

  • Public Webapp: The dataiku instance admin can manage public access to specific web apps via

Administration > Settings > (Security & Audit) Other security settings > (Webapps) Authenticated webapps

EnableApi

Call Paths

All Dataiku Answers call paths start with /api/. They can be accessed via the following URL pattern:

"<INSTANCE_URI>/public-webapps/<PROJECT_KEY>/<WEBAPP_ID>/api/<ENDPOINT>"

For example, if your Dataiku server is available at http://mymachine:10000/ and you want to make a call to the webapp’s ask endpoint with the project key MY_PROJECT and the webapp ID ABCDE, you make requests to http://mymachine:10000/public-webapps/MY_PROJECT/ABCDE/api/ask

Endpoints

POST /api/ask

Processes a user query and returns an AI-generated response.

Headers

Content-Type Header: used to specify the media type of the request body.

Accept Header: used to specify the media type of the response body. Can be either text/event-stream or application/json.

  • application/json:(default) Returns a JSON response.

  • text/event-stream : Enables streaming response using Server-Sent Events (SSE)

Request Body (JSON)

Field

Type

Required

Description

user

string

Yes

Identifier for the user sending the query.

query

string

Yes

The user’s query.

context

object

No

Additional context related to the conversation.

conversationId

string

No

Identifier for the ongoing conversation. Can be null.

selectedRetrieval

object

No

Object to specify the retrieval parameters.

files

array

No

List of uploaded files to use with multi-modal queries.

chatSettings

object

No

Additional chat parameters which can be used to specify the preferred query response.

userPreferences

object

No

User preferences to be used in the query response.

Response Body (JSON)

Field

Type

Required

Description

status

string

Yes

API response status (ok or ko).

data

object

Yes

Contains the query response object (see below).

Field

Type

Required

Description

id

string

Yes

Unique identifier for the response.

messageIndex

integer

Yes

Index of the message in the conversation.

answer

string

Yes

The generated response.

query

string

Yes

The original user query.

timestamp

float

Yes

Unix timestamp of the response.

context

object

No

Additional information regarding the source of the request.

namedTraces

array

No

List of named LLM step traces following the standard LLM trace format.

usedRetrieval

object

No

Information regarding the retrieval parameters that were used in the query.

conversationInfo

object

No

Conversation metadata.

generatedMedia

array

No

List of generated media.

files

array

No

List of files associated with the response

llmContext

object

No

Additional context from the LLM.

Examples

New Independent Query

A basic one of conversation with no context, history nor media and without streaming.

import requests
url="<INSTANCE_URI>/public-webapps/<PROJECT_KEY>/<WEBAPP_ID>/api/ask/"
payload = {
   "user": "<DATAIKU_USER_ID>",
   "query": "What is the weather like today?",
}
headers = {
   "Content-Type": "application/json",
   "Accept": "application/json",
}
try:
   response = requests.post(url, json=payload, headers=headers)
   if response.status_code == 200:
      response_data = response.json()['data']
   else:
      print("Error:", response.status_code, response.text)
except requests.exceptions.RequestException as e:
   print(f"Request failed: {e}")
print(response_data['answer'])

Independent Query Using Streaming

As above but with streaming enabled.

# python
import requests
import re
import json

url="<INSTANCE_URI>/public-webapps/<PROJECT_KEY>/<WEBAPP_ID>/api/ask/"
payload = {
   "user": "<DATAIKU_USER_ID>",
   "query": "What is the weather like today?",
}

headers = {
   "Content-Type": "application/json",
   "Accept": "text/event-stream",
}
final_message = b""
try:
   for r in requests.post(url, json=payload, headers=headers):
      print(r)
      final_message += r
   matches = re.findall(r'event: completion-chunk\s+data: (\{[^{}]*"text"[^{}]*\})', final_message.decode(), re.DOTALL)
except json.JSONDecodeError as e:
   print(f"Error decoding JSON: {e}")
except requests.exceptions.RequestException as e:
   print(f"Request failed: {e}")

extracted_text = ''.join(json.loads(match).get("text", "") for match in matches )
print(extracted_text)

Dataiku Answers User Guide

Introduction

Dataiku Answers provides a powerful interface for querying a Large Language Model (LLM) capable of serving a wide array of domains and specialties. Tailored to your needs, it can deliver insights and answers by leveraging a configured Knowledge Bank for context-driven responses or directly accessing the LLM’s extensive knowledge base.

The application supports multi-modal queries if configured with compatible LLMs.

Home Page Functionality

  • Query Input: The home page is centered around the query input box. Enter your question here, and the system will either:

    • Perform a semantic search within an active Knowledge Bank to provide the LLM with contextual data related to your query, enhancing the relevance and precision of the answer. Remember that queries need to be as precise as possible to maximize the quality of answers. Don’t hesitate to demand access to query guiding principles to support.

    • Send your question directly to the LLM if no Knowledge Bank is configured or activated, relying on the model’s inbuilt knowledge to provide an answer.

Setting Context with Filters

Setting filters can provide a more efficient and relevant search experience in a knowledge base, maximizing the focus and relevance of the query. This is particularly relevant for knowledge bases with large or diverse content types. To do so:

Metadata Filter Configuration

If metadata filters have been enabled, select your criteria from the available options. These filters pre-define the context, enabling more efficient retrieval from the Knowledge Bank, resulting in answers more aligned with your specific domain or area of interest. Currently metadata filters are only available for ChromaDB, Qdrant and Pinecone.

MetadataFilterConfiguration

Conducting Conversations

Engaging with the LLM

To start a conversation with the LLM

  • Set any desired filters first to establish the context for your query.

  • Enter your question in the query box.

  • Review the provided information from the contextual data retrieved by the Knowledge Bank or the LLM.

    Remember, when a Knowledge Bank is activated and configured with your filters, it will enrich the LLM’s response with specific context, making your results more targeted and relevant. If part of the configuration, Dataiku Answers will allow you to see all sources and metadata for each response item, maximizing trust and understanding. This will include:

    • A thumbnail image.

    • A link to the original source.

    • A title for context.

    • An excerpt from the Knowledge Bank.

    • A list of associated metadata tags as set in the settings.

    • Interact with LLM to refine the answer, translate, summarize, or more.

Interaction with Filters and Metadata

  • Filters in Action

    If you’ve set filters before starting the conversation, they’ll be displayed alongside your question. This helps to preserve the context in the LLM’s response.

  • Filter Indicators

    A visual cue next to the ‘Settings’ icon indicates the presence and number of active filters, allowing you to keep track of the context parameters currently influencing the search results. FilterIndicators

Providing Feedback

We encourage users to contribute their experiences:

  • Feedback Button: Visible if general feedback collection is enabled; this feature allows you to express your thoughts on the plugin’s functionality and the quality of interactions. Feedback will be collected in a General Feedback Dataset and analyzed by your Answer set-up team. GeneralFeedbackButton

Conclusion

Dataiku Answers is designed to be user-centric, providing a seamless experience whether you’re seeking detailed responses with the help of a curated Knowledge Bank or Dataset or directly interfacing with the LLM. For additional support, please contact industry-solutions@dataiku.com.