Dataiku Answers¶
Overview¶
Dataiku Answers is a packaged, scalable web application that enables enterprise-ready Large Language Model (LLM) chat and Retrieval Augmented Generation (RAG) to be deployed at scale across business processes and teams.
Key Features
- Simple and ScalableConnect Dataiku Answers to your choice of LLM, Knowledge Bank, or Dataset in a few clicks, and start sharing.
- CustomizableSet parameters and filters specific to your needs. Additionally, you can customize the visual web application.
- GovernedMonitor conversations and user feedback to control and optimize LLM impact in your organization.
Mobile-Responsive
The visual web application is fully responsive, ensuring optimal usability on mobile devices. For seamless operation, it requires to be directly accessed.
Whether you need to develop an Enterprise LLM Chat in minutes or deploy RAG at scale, Dataiku Answers is a powerful value accelerator with broad customization options to embed LLM chat usage fully across business processes.
Getting Access¶
Dataiku Answers Plugin is available on demand through your Dataiku counterparts (please contact your Dataiku Customer Success Manager or Sales Engineer). Made available via a zip file, once installed it gives you access to a fully built Visual Webapp which can be used within your choice of Dataiku Projects:
Configuration¶
Introduction¶
This guide details the setup of the Dataiku Answers, outlining the steps to configure conversation logging, document management, and interactive chat functionalities using a Large Language Model (LLM).
Requirements¶
Dataiku version¶
Dataiku 12.4.1 and above. The minimum recommended Dataiku version is 12.5. The Last Dataiku version is always the best choice to leverage the latest plugin capabilities fully..
Available for both Dataiku Cloud and Self-Managed.
Python version¶
Dataiku Answers is compatible with Python versions between 3.8 and 3.11 under the following requirements:
General dependencies¶
Flask==3.0.1 Flask-Cors==4.0.0 flask-socketio==5.3.6 langchain langchain-community pydantic<2 chromadb<0.5.4 pysqlite3-binary; platform_system == "Linux" faiss-cpu pinecone-client>3,<=4.1 qdrant_client protobuf==3.20.* lingua-language-detector==2.0.2
Dependencies for specific Python versions
protobuf==3.20.*; python_version < '3.11'grpcio-tools==1.49.0; python_version >= '3.11' protobuf==4.21.3; python_version >= '3.11'Infrastructure
SQL Datasets: Logging and feedback datasets must be SQL datasets for compatibility with the plugin’s storage mechanisms.
PostgreSQL
Snowflake
Redshift
MS SQL Server
BigQuery
Databricks
Knowledge Bank Configuration: If a Knowledge Bank is used, the web application must run locally on Dataiku DSS, which does not affect scalability despite the shift from a containerized environment.
Streaming: The plugin seamlessly enables for answers to be streamed when supported by the configured LLM, requiring only a DSS version of 12.5.0 or higher with no additional setup.
Currently working with
openai GPT
family andbedrock Anthropic Claude
andAmazon Titan
.
Conversations Store Configuration¶
Dataiku Answers allows you to store all conversations for oversight and usage analysis. Flexible options allow you to define storage approach and mechanism.
Conversation History Dataset¶
Create a new or select an existing SQL dataset for logging queries, responses, and associated metadata (LLM used, Knowledge Bank, feedback, filters, etc.).
Index the chat history dataset¶
Addition of an index to the conversation history dataset to optimize the performance of the plugin. Indexing is only beneficial for specific database types. It is recommended to consult the database documentation for more information and only change if you are certain it will improve performance.
Conversation Deletion¶
Toggle ‘Permanent Delete’ to permanently delete conversations or keep them marked as deleted, maintaining a recoverable archive.
Feedback Choices¶
Configure positive and negative feedback options, enabling end-users to interact and rate their experience.
Document Folder¶
Choose a folder to store user-uploaded documents and LLM generated images.
Overall Feedback collection feature¶
As you roll out chat applications in your organization, you can include a feedback option to improve understanding of feedback, enablement needs, and enhancements.
General Feedback Dataset¶
In addition to conversation-specific feedback, configure a dataset to capture general feedback from users. This dataset can provide valuable insights into the overall user experience with the plugin.
LLM configuration¶
Connect each instance of Dataiku Answers to your choice of LLM, powered by Dataiku’s LLM Mesh
LLM Selection¶
Select from the LLMs configured in Dataiku DSS Connections.
Maximum number of LLM output tokens¶
Set the maximum number of output tokens that the LLM can generate for each query. To set this value correctly you should consult the documentation of you LLM provider. Having the value set too low can mean that answers are cut off, while having it set too high can lead to increased costs.
Configure your LLM when no knowledge bank or table retrieval is required¶
Tailor the prompt that will guide the behavior of the underlying LLM. For example, if the LLM is to function as a life sciences analyst, the prompt could instruct it not to use external knowledge and to structure the responses in a clear and chronological order, with bullet points for clarity where possible. This prompt is only used when no retrieval is performed.
Advanced Prompt Setting¶
Configure your Conversation system prompt¶
For more advanced configuration of the LLM prompt, you can provide a custom system prompt or override the prompt in charge of guiding the LLM when generating code. You need to enable the advanced settings option as shown below.
Enable Image generation for users¶
This checkbox allows you to activate the image generation feature for users. Once enabled, additional settings will become available.
Note
- Important Requirements:
An upload folder is necessary for this feature to function, as generated images will be stored there.
This feature works only with DSS version >= 13.0.0
- Users can adjust the following settings through the UI
Image Height
Image Width
Image Quality
Number of Images to Generate
The user settings will be passed to the image generation model. If the selected model does not support certain settings, the image generation will fail. Any error messages generated by the model will be forwarded to the user in English, as we do not translate the model’s responses.
Image generation LLM¶
The language model to use for image generation. This is mandatory when the image generation feature is enabled.
Note
- Image generation is available with image generation models supported in Dataiku LLM Mesh; this includes:
OpenAI (DALL-E 3)
Azure OpenAI (DALL-E 3)
Google Vertex (Imagen 1 and Imagen 2)
Stability AI (Stable Image Core, Stable Diffusion 3.0, Stable Diffusion 3.0 Turbo)
Bedrock Titan Image Generator
Bedrock Stable Diffusion XL 1
Configure the query builder prompt for image generation¶
Image generation begins by the main chat model creating an image generation query based on the user’s input and history. You can include a prompt for guidelines and instructions on building this query. Only modify this if you fully understand the process.
Document Upload¶
You can upload multiple files of different types, enabling you to ask questions about each using the answers interface.
The two main methods that LLMs can use to understand the documents are:
Viewing as an image (multimodal).
Reading the extracted text (no images).
Note
- Important Requirements:
Dataiku >= 13.0.2 required for method 1 support of anthropic models.
Dataiku >= 12.5.0 required for method 1 support of all other supported models.
Method 1 is only available for multimodal LLMs such as OpenAI Vision or Gemini Pro. It can be used for image files or PDFs. Method 2 is supported on all LLMs and all file types that contain text. Consideration needs to be taken with both methods to avoid exceeding the context window of the LLM you are using. The following parameters will help you manage this.
Maximum upload file size in MB
Allows you to set the file size limit for each uploaded file. The default value is 15 MB; however, some service providers may have lower limits.
Maximum number of files that can be uploaded at once
This parameter controls the number of documents that the LLM can interact with simultaneously using both methods.
Send PDF pages as images instead of extracting text
This parameter allows the LLM to view each page using Method 1. It is most useful when the pages contain visual information such as charts, images, tables, diagrams, etc. This will increase the quality of the answers that the LLM can provide but may lead to higher latency and cost.
Maximum number of PDF pages to send as images
This parameter sets the threshold number of pages to be sent as images. The default value is 5. For example, if 5 concurrent files are allowed and each has a maximum of 5 pages sent as images, then 25 images are sent to the LLM (5 files x 5 pages each = 25 images). If any document exceeds this threshold, the default behavior is to use text extraction alone for that document. Understandably, this increases the cost of each query but can be necessary when asking questions about visual information.
Retrieval Method¶
In this section, you can decide how you will augment the LLM’s current knowledge with your external sources of information.
No retrieval. LLM answer only
No external sources of information will be provided to the LLM. (Default value). If you had settings related to the knowledge bank, they are still there; you just need to select knowledge bank retrieval mode.
Use knowledge bank retrieval (for searches within text)
The LLM will be provided with information taken from the Dataiku Knowledge Bank.
Use dataset retrieval (for specific answers from a table)
A SQL query will be crafted to provide information to the LLM.
Knowledge Bank Configuration¶
If you connect a Knowledge Bank to your Dataiku Answers, the following settings allow you to refine KB usage to optimize results.
Customize Knowledge Bank’s Name¶
This feature enables you to assign a specific name to the Knowledge Bank, which will be displayed to users within the web application whenever the Knowledge Bank is mentioned.
Use Knowledge Bank by default¶
With this setting, you can determine whether the Knowledge Bank should be enabled (‘Active’) or disabled (‘Not active’) by default.
Configure your LLM in the context of a Knowledge Bank¶
This functionality allows you to define a custom prompt that will be utilized when the Knowledge Bank is active.
Configure your Retrieval System Prompt¶
You can provide a custom system prompt for a more advanced retrieval prompt configuration in a knowledge bank. To do so, you must enable the advanced settings option, as shown below.
Let ‘Answers’ decide when to use the Knowledge Bank-based¶
Enabled by default, this option allows you to turn on or off the smart usage of the knowledge bank. If enabled, the LLM will decide when to use the knowledge bank based on its description and the user’s input. Disabled, the LLM will always use the knowledge bank when one is selected. We recommend keeping this option always enabled for optimal results.
Knowledge bank description¶
Adding a description helps the LLM assess whether accessing the Knowledge Bank is relevant for adding the necessary context to answer the question accurately. Suppose the knowledge is not required. It will not be used.
Number of Documents to Retrieve¶
Set how many documents the LLM should reference to generate responses.
Search type¶
You can choose between one of three prioritization techniques to determine which documents augment the LLM’s knowledge.
Similarity score only¶
provides the top n documents based on their similarity to the user question by the similarity score alone.
Similarity score with threshold¶
will only provide documents to the LLM if they meet a predetermined threshold of similarity score [0,1]. It should be cautioned that this can lead to all documents being excluded and no documents given to the LLM.
Improve Diversity of Documents¶
Enable this to have the LLM pull from a broader range of documents. Specify the ‘Diversity Selection Documents’ number and adjust the ‘Diversity Factor’ to manage the diversity of retrieved documents.
Filter logged sources¶
Enable this option to control the number of data chunks recorded in the logging dataset. It is important to note that users can access only as many chunks as they are logged.
Display source extracts¶
Display or hide source extracts to the end user when using a knowledge bank. This option is enabled by default. Disable to hide them.
Select metadata to include in the context¶
The selected metadata will be added to the retrieved context along with document chunks if selected.
Enable LLM citations¶
The checkbox is available when you use a Knowledge Bank for RAG. Enabling this option allows you to get citations in the answers provided by the LLM during the text generation process. These citations will reference the IDs of the linked sources and quote the relevant part from these sources that allowed the text generation.
Filters and Metadata Parameters¶
All metadata stem from the configuration in the embed recipe that constructed the Knowledge Bank. Set filters, display options, and identify metadata for source URLs and titles.
Metadata Filters
Choose which metadata tags can be used as filters.
Metadata Display:
Select metadata to display alongside source materials.
URLs and Titles
Determine which metadata fields should contain the URLs for source access and the titles for displayed sources.
Dataset Retrieval Parameters¶
If you connect a Dataiku dataset to your Dataiku Answers, the following settings allow you to refine how this information is handled.
Note
It is strongly advised to use LLMs, which are intended for code generation. LLMs whose primary focus is creative writing will perform poorly on this task.
Choose connection¶
Choose the SQL connection containing datasets you would like to use to enrich the LLM responses. You can choose from all the connections used in the current Dataiku Project.
Customize how the connection is displayed¶
This feature enables you to assign a specific, user-friendly name for the connection. This name is displayed to users within the web application whenever the dataset is mentioned.
Choose dataset(s)¶
Select the datasets you would like the web application to access. You can choose among all the datasets you have selected previously. This means that all the datasets must be on the same connection.
Define Column Mappings¶
Here you can choose to suggest column mappings that the LLM can decide to follow.
For example, in the mapping below, the LLM may choose to create a JOIN like this:
LEFT JOIN Orders o ON o.EmployeeID = e.EmployeeID
Add a description to the dataset and the columns so the retrieval works effectively. This can be done in the following way:
For the dataset
Select the dataset, click the information icon in the right panel, and click edit. Add the description in either text box.
Warning
The LLM can only run effective queries if it knows about the data it is querying. You should provide as much detail as possible to clarify what is available.
For the columns
Explore the dataset, then click settings and schema. Add a description for each column.
Warning
The LLM will not be able to view the entire dataset before creating the query, so you must describe the contents of the column in detail. For example, if defining a categorical variable, then describe the possible values (“Pass,” “Fail,” “UNKNOWN”) and any acronyms (e.g., “US” is used for the United States).
Warning
Ensure that data types match the type of questions that you expect to ask the LLM. For example, a datetime column should not be stored as a string. Adding the column descriptions here means the descriptions are tied to the data. As a result, changes to the dataset could cause the LLM to provide inaccurate information.
Configure your LLM in the context of the dataset¶
This functionality allows you to define a custom prompt that will be utilized when the dataset retrieval is active.
Configure your Retrieval System Prompt¶
You can provide a custom system prompt for a more advanced configuration of the retrieval prompt in a dataset. To do so, you must enable the advanced settings option, as shown below.
Hard limit on SQL queries¶
By default, all queries are limited to 100 rows to avoid excessive data retrieval. However, it may be necessary to adapt this to the type of data being queried.
Display SQL in sources¶
Selecting this checkbox will add the SQL query to the source information displayed below the LLM’s answers.
End User Interface Configuration¶
Adjust the web app to your business objectives and accelerate user value.
Titles and Headings¶
Set the title and subheading for clarity and context in the web app.
Placeholder Text¶
Enter a question prompt in the input field to guide users.
Example Questions¶
Provide example questions to illustrate the type of inquiries the chatbot can handle. You can add as many questions as you want
User Profile¶
It allows you to configure a list of settings, excluding language, that users can fill out within the web app. You must set up an SQL user profile dataset (mandatory even if no settings are configured).
The language setting will be available by default for all users, initially set to the web app’s chosen language.
The language selected by the user will determine the language in which the LLM responses are provided.
Once the user has configured their settings, these will be included in the LLM prompt to provide more personalized responses.
You can define the settings using a list, where each setting consists of a key (the name of the setting) and a description (a brief explanation of the setting).
All settings will be in the form of strings for the time being.
Enable custom rebranding¶
If checked, the web app will apply your custom styling based on the theme name and different image files you specify in your setup. For more details, check the UI Rebranding capability section.
Theme name: The theme name you want to apply. Css, images and fonts will be fetched from the folder
answers/YOUR_THEME
Logo file name: The file name of your logo that you added to
answers/YOUR_THEME/images/image_name.extension_name
and you want to choose as the logo in the web app.Icon file name: Same as for the logo file name.
WebApplication Configuration¶
Language¶
You can choose the default language for the web application from the available options (English and Korean Ytd; more options to come).
HTTP Headers¶
Define HTTP headers for the application’s HTTP responses to ensure compatibility and security.
UI Rebranding capability¶
You can rebrand the web app by applying a custom style without changing the code by following these steps:
Navigate to ᎒᎒᎒ > Global Shared Code > Static Web Resources, create a folder named
answers
, and within this folder, create a subfolder corresponding to the theme that the web application settings will reference.. The structure should be as follows:
answers
└── YOUR_THEME_NAME
├── custom.css
├── fonts
│ └── fonts.css
└── images
├── answer-icon.png
└── logo.png
Example with fonts and images
CSS changes¶
Add a
custom.css
file inside theanswers
folder; you can find an example below::root { /* Colors */ --brand: #e8c280; /* Primary color for elements other than action buttons */ --bg-examples-brand: rgba(255, 173, 9, 0.1); /* Examples background color (visible on landing page/new chat) */ --bg-examples-brand-hover: rgba(255, 173, 9, 0.4); /* Examples background color on hover */ --bg-examples-borders: #e8a323; /* Examples border color */ --examples-question-marks: rgb(179, 124, 15); /* Color of question marks in the examples */ --examples-text: #422a09; /* Color of the text in the examples */ --text-brand: #57380c; /* Text color for the question card */ --bg-query: rgba(245, 245, 245, 0.7); /* Background color for the question card */ --bg-query-avatar: #F28C37; /* Background color for the question card avatar */ } .logo-container .logo-img { height: 70%; width: 70%; }
Fonts customization¶
First, create the
fonts
subfolder inside the folderanswers
.Second, add
fonts.css
and define your font like below depending on the format you can provide (we support base64 or external URL):@font-face { font-family: "YourFontName"; src: url(data:application/octet-stream;base64,your_font_base64); } @font-face { font-family: "YourFontName"; src: url("yourFontPublicUrl") format("yourFontFormat"); }Finally, declare the font in your
custom.css
file:body, div { font-family: "YourFontName" !important; }
Images customization¶
create an images
folder where you can import logo.*
file to change the logo image inside the landing page, and answer-icon.*
to change the icon of the AI answer.
Examples of current customizations¶
Final Steps¶
After configuring the settings, thoroughly review them to ensure they match your operational requirements. Conduct tests to verify that the chat solution operates as intended, documenting any issues or FAQs that arise during this phase.
Mobile Compatibility
The web application is designed to be responsive and fully compatible with mobile devices. To target mobile users effectively, configure the application as a Dataiku public web application and distribute the link to the intended users.
Dataiku Answers User Guide¶
Introduction¶
Dataiku Answers provides a powerful interface for querying a Large Language Model (LLM) capable of serving a wide array of domains and specialties. Tailored to your needs, it can deliver insights and answers by leveraging a configured Knowledge Bank for context-driven responses or directly accessing the LLM’s extensive knowledge base.
The application supports multimodal queries if configured with compatible LLMs.
Home Page Functionality¶
Query Input: The home page is centered around the query input box. Enter your question here, and the system will either:
Perform a semantic search within an active Knowledge Bank to provide the LLM with contextual data related to your query, enhancing the relevance and precision of the answer. Remember that queries need to be as precise as possible to maximize the quality of answers. Don’t hesitate to demand access to query guiding principles to support.
Send your question directly to the LLM if no Knowledge Bank is configured or activated, relying on the model’s inbuilt knowledge to provide an answer.
Setting Context with Filters¶
Setting filters can provide a more efficient and relevant search experience in a knowledge base, maximizing the focus and relevance of the query. This is particularly relevant for knowledge bases with large or diverse content types. To do so:
Metadata Filter Configuration¶
If metadata filters have been enabled, select your criteria from the available options. These filters pre-define the context, enabling more efficient retrieval from the Knowledge Bank, resulting in answers more aligned with your specific domain or area of interest.
Conducting Conversations¶
Engaging with the LLM¶
To start a conversation with the LLM
Set any desired filters first to establish the context for your query.
Enter your question in the query box.
Review the provided information from the contextual data retrieved by the Knowledge Bank or the LLM.
Remember, when a Knowledge Bank is activated and configured with your filters, it will enrich the LLM’s response with specific context, making your results more targeted and relevant. If part of the configuration, Dataiku Answers will allow you to see all sources and metadata for each response item, maximizing trust and understanding. This will include:
A thumbnail image.
A link to the original source.
A title for context.
An excerpt from the Knowledge Bank.
A list of associated metadata tags as set in the settings.
Interact with LLM to refine the answer, translate, summarize, or more.
Interaction with Filters and Metadata¶
Filters in Action
If you’ve set filters before starting the conversation, they’ll be displayed alongside your question. This helps to preserve the context in the LLM’s response.
Filter Indicators
A visual cue next to the ‘Settings’ icon indicates the presence and number of active filters, allowing you to keep track of the context parameters currently influencing the search results.
Providing Feedback¶
We encourage users to contribute their experiences:
Feedback Button: Visible if general feedback collection is enabled; this feature allows you to express your thoughts on the plugin’s functionality and the quality of interactions. Feedback will be collected in a General Feedback Dataset and analyzed by your Answer set-up team.
Conclusion¶
Dataiku Answers is designed to be user-centric, providing a seamless experience whether you’re seeking detailed responses with the help of a curated Knowledge Bank or Dataset or directly interfacing with the LLM. For additional support, please contact industry-solutions@dataiku.com.