DSS 11.0 Release notes¶
Migration notes¶
Migration paths to DSS 11.0¶
From DSS 10.0: Automatic migration is supported, with the restrictions and warnings described in Limitations and warnings
From DSS 9.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 9.0 -> 10.0
From DSS 8.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 8.0 -> 9.0, 9.0 -> 10.0
From DSS 7.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 7.0 -> 8.0, 8.0 -> 9.0, 9.0 -> 10.0
From DSS 6.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 6.0 -> 7.0, 7.0 -> 8.0, 8.0 -> 9.0, 9.0 -> 10.0
From DSS 5.1: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0, 8.0 -> 9.0, 9.0 -> 10.0
From DSS 5.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0, 8.0 -> 9.0, 9.0 -> 10.0
From DSS 4.3: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.3 -> 5.0, 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0, 8.0 -> 9.0, 9.0 -> 10.0
From DSS 4.2: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.2 -> 4.3, 4.3 -> 5.0, 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0, 8.0 -> 9.0, 9.0 -> 10.0
From DSS 4.1: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.1 -> 4.2, 4.2 -> 4.3, 4.3 -> 5.0, 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0, 8.0 -> 9.0, 9.0 -> 10.0
From DSS 4.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.0 -> 4.1, 4.1 -> 4.2, 4.2 -> 4.3, 4.3 -> 5.0, 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0, 8.0 -> 9.0, 9.0 -> 10.0
Migration from DSS 3.1 and below is not supported. You must first upgrade to 5.0. See DSS 5.0 Release notes
How to upgrade¶
It is strongly recommended that you perform a full backup of your DSS data directory prior to starting the upgrade procedure.
For automatic upgrade information, see Upgrading a DSS instance.
Pay attention to the warnings described in Limitations and warnings.
Limitations and warnings¶
Automatic migration from previous versions (see above) is supported. Please pay attention to the following removal and deprecation notices.
Support removal¶
Some features that were previously announced are deprecated are now removed or unsupported.
Support for MapR
Support for ElasticSearch 1.x and 2.x
Deprecation notice¶
DSS 11.0 deprecates support for some features and versions. Support for these will be removed in a later release.
Support for SuSE 15 and SuSE 15 SP1 is deprecated
Support for CentOS 7.3 to 7.8, RedHat 7.3 to 7.8 and Oracle Linux 7.3 to 7.8 is deprecated
As a reminder from DSS 10.0, the “Build missing datasets” build mode is deprecated and will be removed in a future release. This mode only worked in very specific cases and was never fully operational.
As a reminder from DSS 10.0, support for training Machine Learning models with H2O Sparkling Water is deprecated and will be removed in a future release.
As a reminder from DSS 9.0, support for EMR below 5.30 is deprecated and will be removed in a future release.
As a reminder from DSS 7.0, support for “Hive CLI” execution modes for Hive is deprecated and will be removed in a future release. We recommend that you switch to HiveServer2. Please note that “Hive CLI” execution modes are already incompatible with User Isolation Framework.
Version 11.0.0 - July 12th, 2022¶
DSS 11.0.0 is a major upgrade to DSS with major new features.
Major new features¶
Visual Time Series Forecasting¶
Time Series Forecasting is now natively available in DSS Visual ML. Visual Time Series Forecasting features many capabilities:
Single or multiple series
Multiple horizon forecasting
Multiple algorithms, including deep learning algorithms
Time Series Forecasting are fully deployable and governable like other DSS Visual Models.
For more details, please see Time Series Forecasting
Code Studios, including Visual Studio Code, JupyterLab and RStudio¶
Code Studios allow DSS users to harness the power and versatility of many Web-based IDEs and web application building frameworks.
Code Studios allow you, for example, to:
Edit and debug Python, R, SQL, … recipes and libraries in Visual Studio Code
Edit and debug Python or R recipes, notebooks, libraries, … in JupyterLab
Edit and debug R recipes and libraries in RStudio Server
For more details, please see Code Studios
Image Labeling¶
In order to create and fine-tune image models (classification and object detection), you first need labeled images. Labeling is often a tedious task.
DSS now features a native Image Labeling capability, with the following features:
Support for image classification and object detection use cases
Ability to invite annotators (people who label the images)
Efficient interface for annotators with keyboard shortcuts
Ability to request annotations from multiple annotatorss
Annotations review process with management of conflicts between annotators
This new capability allows you to perform even more of the entire Machine Learning cycle for computer vision in DSS.
MLOps: Experiment Tracking¶
DSS now includes an experiment tracker for logging parameters, performance metrics, models, and other metadata when running your machine learning code, and for visualizing results of such experiments.
The DSS Experiment Tracker leverages the well-known MLflow Tracking API, which allows you to seamlessly port existing or 3rd party experiment tracking code and get all DSS benefits.
For more details, please see Experiment Tracking
MLOps: Feature Store¶
A Feature Store helps Data Scientists, build, find and use relevant data for models in order to build efficient models faster.
Most key components of a Feature Store are native capabilities of DSS:
Feature Storage is handled by Dataiku extensive Connections Library
Data Ingestion and Curation is performed using Recipes in the Flow
Offline serving for batch processing is done using Join Recipes in projects deployed on an Automation node
Online serving for realtime processing is done using Dataset Lookups in API services
Data monitoring is implemented using Metrics & Checks
Automated building and maintenance is managed by Scenarios and Triggers
DSS 11 adds a new Feature Store section, which acts as the central registry of all Feature Groups, a Feature Group being a curated and promoted Dataset containing valuable Features.
For more details, please see Feature Store
Data Visualization: New Pivot Table¶
The Pivot Table has been strongly overhauled. It now supports:
Multiple dimensions on rows and columns, with subtotal support
Excel Export of multiple dimensions and multiple measures
For more details, please see Charts
Quick Sharing¶
Project administrators can now enable “Quick Sharing”, which allows any user who has read access to the project to share a dataset to his own project, without having to ask the project administrator first.
Quick Sharing can be globally disabled by instance administrators.
For more details, please see Shared objects
Access & Sharing requests¶
Project administrators can now choose to make their project “discoverable”, which allows users who don’t have access to the project to still discover its existence and basic information about it (name, description, …), and then to request access to it.
Project administrators receive notifications about access requests, and can manage them, grant them or reject them.
Similarly, users who have access to a project can now request that datasets be shared with their own projects, and project administrators can manage these sharing requests (if they don’t have Quick Sharing enabled).
These mechanisms can be globally disabled by instance administrators.
For more details, please see Requests
Create if, then, else processor¶
This new visual data preparation processor performs actions or calculations based on conditional statements defined using an “if, then else” syntax.
It can be used notably to create new columns based on conditions on the values of other columns. While this was previously feasible using formulas or the Switch case processor, the new Create if, then, else statements processor can provide much more flexibility, without having to write complex formulas.
For more details, please see Create if, then, else statements
Flow Document Generator¶
In regulated industries, it is often required to document flows, at creation and after every change for traceability. This is often tedious. DSS now features the ability to automatically generate a DOCX document from a Flow, which documents the whole flow, including datasets and recipes details.
For more details, please see Flow Document Generator.
Govern: Projects and bundles governance¶
The Govern Node now supports managing, governing, and controlling deployment of Project Bundles in the Deployer
Dataiku Cloud Stacks on GCP¶
Dataiku Cloud Stacks is now available on GCP.
For more details, please see Dataiku Cloud Stacks for GCP
Other notable enhancements and features¶
Outcome Optimization for regression¶
The “What-If” feature now supports Outcome Optimization for regression problems. Outcome Optimization allows you to start from a given record, and to explore the neighborhood of this record to find the changes to input features that would lead to changes in the predicted value, towards either the largest, smallest, or a specific value. You can select which features can be modified and which can’t.
Nested filters¶
In locations where visual filters can be used, it is now possible to nest complex boolean conditions, such as:
If col1 is 2
- AND
col2 is 3
OR col3 is 4
This applies to:
The Filter visual recipe
The “Create-if-then-else” prepare processor
The “Pre/Post filters” of all visual recipes
Filters in Explore and Charts sampling
Filters in Visual ML
OIDC authentication¶
In addition to SAMLv2, OIDC can now be used as SSO protocol for logging in to DSS
For more details, please see Single Sign-On
SSO support for Fleet Manager¶
It is now possible to log in through SSO on Fleet Manager
For more details, please see Installing and setting up
“List folder content” recipe¶
This new visual recipe takes a managed folder as input, a dataset as output, and writes in the dataset the listing of files in the managed folder.
This recipe is especially useful for image labeling and computer vision use cases.
Workspace discussions¶
Discussions are now available on workspaces
Data Visualization: Count Distinct and Count Not Null aggregations¶
All aggregated charts (columns, bars, pies, lines, areas, pivot table, …) now support the “Count Distinct” and “Count Not Null” aggregation functions for measures.
This also now makes it possible to have non-numerical measures
For more details, please see Charts
Data Visualization: multiple layers on Geo Map¶
It is now possible to draw multiple layers with different geometries on the Geo Map chart
For more details, please see Geographic data
Data Visualization: additional customization options¶
The following can now be customized:
Ability to change the name of a measure in the legend and tooltip
Ability to change the name of a dimension in the legend and tooltip
Ability to reformat numbers on axis and in cells of the pivot table
For more details, please see Charts
Georouting and Isochrones¶
DSS now has capabilities for computing itineraries between geopoints and isochrones around geopoints.
For more details, please see Geographic data
Machine Learning: multiple custom metrics¶
You can now define multiple custom metrics for a single Visual ML model.
Streamlit webapps through Code Studios¶
Through the Code Studios mechanism, you can now create and run Streamlit applications in DSS.
For more details, please see Code Studios
Govern: new permissions experience¶
A new editor for permissions for Govern was introduced
Govern: History¶
You can now view the history and timeline of individual govern objects
Govern: Sign off editor¶
Sign-off processes for Govern can now be edited for more sign-off flexibility
Other enhancements and fixes¶
Machine Learning¶
Added Traditional Chinese stop words
Code-based Deep Learning: Tensorflow 2 can now be used
Fixed display on some screens when sample weights are used
Fixed display of the “customize code” box for text features
Fixed potential model display failure for models trained with K-fold-cross-test and sample weights
Fixed bad behavior when trying to use custom metrics without code writing permissions
Fixed display issue for axis legend on the partial dependence distribution chart
Fixed training failure with MLLib engine when “cumulative lift” metric is used
Properly ask users to rebuild train/test set if number of folds changed
Various small UI fixes
Code-based Deep Learning: made unused columns optional in scoring recipe
Fixed display issues with blue information boxes in result screens
Removed display of sample weights options when unsupported
Fixed “Needs probabilities” checkbox for custom metrics
Fixed estimated number of estimators to train when using time ordering
Computer Vision: Fixed training failures when number of epochs is 2
Fixed evaluation of ensemble models with text features
Code-based Deep Learning: added ability to use a custom text preprocessor returning a tensor with more than 3 dimensions
MLOps¶
Added support for partitioning in model evaluations
Prevented non-functional usage of a foreign model evaluation store in evaluation recipe
Added ability to use a foreign model for an evaluation recipe
Small UI fixes
Govern¶
Fixed various issues in DSS/Govern sync
Fixed redirect to URL after login
Fixed various UI issues
Fixed filtering by project on model registry
Fixed display of archived artifacts
Visual Statistics¶
Fixed display issue for dataset selector in “duplicate worksheet” modal
Univariate card: Added placeholder instead of empty chart when the histogram is empty
Small UI fixes
Explore & Datasets¶
Fixed flickering error that could appear on Explore screen
Fixed inability to explore when a bad regular expression was entered in a filter
Fixed multiple issues in listing of buckets and containers for S3, Azure Blob and Google Blob datasets
BigQuery: Added ability to read external tables and materialized views with the native driver
BigQuery: Enabled fast read of tables by default with the native driver
BigQuery: Fixed flooding of logs with Simba driver 1.2.22.1026 and above
Snowflake to cloud: disabled broken ability to use fast path when input is a SQL query dataset
Fixed ability to resize columns in foreign dataset explore
Dataiku Applications¶
New user experience for the “Edit SQL datasets” action, with ability to browse very large databases
Added ability to restrict connection type in the CONNECTION parameter type
Flow & Jobs¶
Improved wrapping of long dataset names
Fixed display of “Python only” logs for containerized recipes
The “Tags” flow view now shows tags from foreign datasets
Added link to parent recipes on managed folders
Visual recipes¶
Fixed autocompletion of formula with non-ASCII column names
Fixed storage of date filters when day is the 31st
Fixed “Increment date” processor in SQL mode when using the “Increment by: value in column” mode
Added automatic regrouping of multiple “clear cells with this value” steps from the Analyze box
Fixed handling of variables in formula editor
Prepare recipe: Improved searching for processors
Fixed ability to use variables in computed columns with DSS engine
Prepare recipe: fixed “filter rows on date” processor on Oracle
Prepare recipe: fixed “concat columns” step failure on Spark 3
Data Visualization¶
Pivot Table: Excel export now exports multiple measures
Pivot Table: Excel export now respects coloring
Fixed issues when reordering charts via drag & drop
Fixed “one tick per bin” wrongfully applying to hexagon charts
Fixed log scale on binned scatter plots
Fixed UI issue on manual axis range edition
Dashboards¶
Improved UI for filter tiles with filter summary and ability to reset filters
Fixed search for existing insights
Added ability to change the dataset of a filters tile
Fixed various issues with filter tiles
API¶
Fixed ability to write chunks of more of 2 Gigabytes when using ManagedFolderWriter.write()
Fixed inability to edit some code env parameters through API
Scenarios¶
Propagate warnings from steps to the outcome of the scenario
Added missing timezones in the temporal trigger timezone selector
Collaboration¶
Fixed sending of “you have been granted access to project” when your grant does not actually give you access to the project
Fixed download of .ipynb attached files in Wiki
Cloud Stacks¶
Upgraded kubectl version in order to deploy latest Kubernetes verions
Fixed renaming of automation node breaking the deployer
Added display of DSS URL directly in Fleet Manager
Plugins & Extensibility¶
Allowed custom model views to be restricted to some prediction types
Forbidden presets are now hidden
Performance & Scalability¶
Fixed API node memory overconsumption when passing huge payloads as inputs or outputs of API services
Made project deletion much faster, especially with large number of datasets
Improved performance of home page with many projects
Misc¶
Added better categorization for admin settings page
Fixed wrong navigation bar when going to the Deployer
Direct webapp access will properly redirect back to the webapp after login
Fixed Streaming Scala recipes with Avro on Kafka
Added API key id in the API node audit log
Improved Industry Solutions creation modal
Fixed ability to modify or delete empty todo list
Fixed custom requests and limits in containerized execution
Fixed “Certification” link on home page with Safari
Fixed missing cleanup of Kubernetes objects for containerized continuous Python recipes
Known issues¶
When using Elastic AI / “standalone” mode for Spark, writing Avro files does not work. We advise you to use Parquet or ORC. Please get in touch with Dataiku Support for workarounds.