DSS 5.0 Release notes¶
- Migration notes
- Version 5.0.2 - October 1st, 2018
- Version 5.0.1 - August 27th, 2018
- Version 5.0.0 - July 25th, 2018
- From DSS 4.3: Automatic migration is supported, with the restrictions and warnings described in Limitations and warnings
- From DSS 4.2: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.2 -> 4.3
- From DSS 4.1: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.1 -> 4.2 and 4.2 -> 4.3
- From DSS 4.0: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.0 -> 4.1, 4.1 -> 4.2 and 4.2 -> 4.3
- From DSS 3.1: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 3.1 -> 4.0 and 4.0 -> 4.1 and 4.1 -> 4.2 and 4.2 -> 4.3
- From DSS 3.0: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying your previous versions. See 3.0 -> 3.1, 3.1 -> 4.0 and 4.0 -> 4.1 and 4.1 -> 4.2 and 4.2 -> 4.3
- From DSS 2.X: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions: see 2.0 -> 2.1 2.1 -> 2.2 2.2 -> 2.3, 2.3 -> 3.0, 3.0 -> 3.1, 3.1 -> 4.0 and 4.0 -> 4.1 and 4.1 -> 4.2 and 4.2 -> 4.3
- Migration from DSS 1.X is not supported. You must first upgrade to 2.0. See DSS 2.0 Relase notes
As previously announced, DSS 5.0 removes support for some OS and Hadoop distributions.
Support for the following OS versions is now removed:
- Redhat/Centos/Oracle Linux 6 versions strictly below 6.8
- Redhat/Centos/Oracle Linux 7 versions strictly below 7.3
- Ubuntu 14.04
- Debian 7
- Amazon Linux versions strictly below 2017.03
Support for the following Hadoop distribution versions is now removed:
- Cloudera distribution for Hadoop versions strictly below 5.9
- HDP versions strictly below 2.5
- EMR versions strictly below 5.7
As previously announced, support for the following R versions is now removed:
- R versions strictly below 3.4
As previously announced, support for Java 7 is now deprecated and will be removed in a later release.
As of DSS 5.0, some features are not available anymore when running Java 7:
- Reading of GeoJSON files
- Reading of Shapefiles
- Geographic charts (all types)
It is strongly recommended that you perform a full backup of your DSS data directory prior to starting the upgrade procedure.
For automatic upgrade information, see Upgrading a DSS instance.
Pay attention to the warnings described in Limitations and warnings.
Automatic migration from previous versions is supported, but there are a few points that need manual attention.
- Models trained with prior versions of DSS should be retrained when upgrading to 5.0 (usual limitations on retraining models and regenerating API node packages - see Upgrading a DSS instance). This includes models deployed to the flow (re-run the training recipe), models in analysis (retrain them before deploying) and API package models (retrain the flow saved model and build a new package)
DSS 5.0.2 is a release containing both bug fixes and new features
- New feature: Experimental support for HDP3 (See Hortonworks HDP)
- New feature: Support for CDH 5.15
- Fixed Spark fast-path for Hive datasets in notebooks and recipes
- New Feature Support of dataset exports using unicode separator
- New Feature: per user credentials for generic JDBC connections
- Fixed export of datasets for non-CSV formats
- Fixed “download all” button for managed folders with no name
- Fixed managed folders when a file name is in uppercase
- Improved support for multi-sheet Excel files
- Added support for Zip files with uppercase extension in filename (.ZIP)
- Added new nicer default images for projects
- Added “loading” status on homepage
- Added search for Wiki articles in quick-go
- Discussions are now included when exporting and importing a project
- Fixed multi selection on Flow on Windows
- Fixed navigator on foreign datasets
- Added support for containers (Docker and Kubernetes) on the “Recipe engines” Flow view
- Fixed the deploy button in the ‘predicted data’ tab of a model in an analysis
- Fixed ineffective early stopping for XGBoost regression and classification
- Experimental Python 3 support for custom models in visual machine learning
- Fixed error when saving an evaluate recipe without a metrics dataset
- New feature: Support for non-equijoins on Impala
- New feature: Best-effort support for window recipes on MySQL 8.
- New feature: Capabilities to retrieve authentication info for plugin recipes
- Filter recipe: don’t lose operator when changing column
- Improved autocompletion for Python and R recipe code editors
- Fixed PySpark recipes when using inline UDF
- New feature: New APIs to retrieve authentication information about the current user. This can be used by plugins to identify which user is running them, and by webapps to perform user authentication and authorization.
- New feature: Added ability to retrieve credentials for a connection directly (if allowed) and improved “location info” on datasets
- New feature: New mechanism for “per-user secrets” that can be used in plugins
- Fixed possible leak of FEK processes leading to their accumulation
- Added ability to test retrieval of user information for LDAP configuration
- Fixed creation of insights on foreign datasets
- Fixed possible memory excursion when reading full datasets in webapps
- Fixed ability to pass multiple arguments for code envs (Fixes ability to use several Conda channels)
- Improved error message when DSS fails to start because of an internal database corruption
- Fixed LDAP login failure when encountering a referral (referrals are now ignored)
- Various performance improvements
- Prevented ability for login page to redirect outside DSS
- Fixed information disclosure throug timing attack that could reveal whether a username was valid
- Added CSRF protection to DSS notifications websocket
- Fixed missing code permission check for code steps, triggers and custom variables in scenarios
- Redacted possibly sensitive information in job and scenario diagnosis when downloaded by non-admin users
- Added support for AES-256 for passwords encryption
DSS 5.0.1 is a bugfix release
- New feature: added support of “SQL Query” datasets when using Redshift-to-S3 fast path
- Do not try to save the sampling settings in explore view if user is not allowed to
- Fixed table import from Hive stored in CSV format with no escaping character
- Fixed occasional failure reading Redshift datasets
- Fixed creation of plugin datasets when schema is not explicitly set by the plugin
- Fixed HDFS connection selection in mass import screen
- Prepare: Added more available time zones to the date processors
- Prepare: Fixed stemming processors on Spark engine
- Sync: Fixed Azure Blob Storage to Azure Data Warehouse fast path if ‘container’ field is empty in Blob storage connection
- Sync: Fixed Redshift-to-S3 fast path with non equals partitioning dependencies.
- Fixed import of a project’s discussions when importing a project created with a previous DSS version
- Fixed broken link when mentioning a user with a ‘.’ in his name
- Preserved comment dates when migrating to discussions
- Fixed inbox when number of watched objects is above 1024
- After migration, a project level discussion is now markable as read
- Enabled direct Parquet reading and writing in Spark when the Parquet files have the “spark_schema” type
- Fixed Hadoop installation script on Redhat 6
- Fixed usage of advanced properties in Impala connection
- In the “tags” flow view, show colors for nodes that have multiple tags but only one of the selected ones
- Properly highlight managed folders in the “Connections” flow view
- Fixed model resuming when using gridsearching and maximum number of iterations
- Restore grid search parameters when reverting the design to a specific model
- Fixed ‘View origin analysis’ link of saved models after importing a project with a different project key
- Fixed error in documentation of custom prediction API endpoints
- Added automatic update of the detected type when changing the processing engine
- Fixed color palette in scatter chart when using logarithmic scale and diverging mode
- Fixed total record counts display on 2D distribution and boxplot charts filters
- Fixed quantiles mode in 2D distribution charts
- New feature: “Edit in safe mode” does not load the webapp frontend or backend, in order to be able to fix crashing issues
- Fixed truncated display in RMarkdown reports view
- Fixed ‘Create RMarkdown export step’ scenario step when the view format is the same that the download format
- Fixed RMarkdown attachments in scenario mails that could send stale versions of reports
- Multi-user-security: add ability for regular users (i.e. without “Write unsafe code”) to write RMarkdown reports
- Multi-user-security: Fixed RMarkdown reports snapshots
- Fixed ‘New snapshot’ button on RMarkdown insight
- Fixed scrolling issue in dashboards
- Preserve tile size when copying a tile to another slide
- Sort groups of a user in the user edition page
- Fixed SMTP channel authentication when the SMTP server configuration does not allow login and password to be provided
- Fixed broken ‘Advanced search’ link in the search side panel
- Fixed ‘list_articles’ method of public api python wrapper when using it on an empty wiki
- Fixed dataset types filtering in catalog
- Fixed long description editing of notebooks metadata
- Fixed various display issues of items lists
- Fixed built-in links to the DSS documentation
- Fixed support for Dutch and Portuguese stop words in Analyze box
- Allowed regular users (i.e. without “Write unsafe code”) to edit project-level Python libraries
- Allowed passing the desired type of output to the ‘dkuManagedFolderDownloadPath’ R API function
- Prevent possible memory overflow when computing metrics
DSS 5.0.0 is a very major upgrade to DSS with major new features. For a summary of the major new features, see: https://www.dataiku.com/learn/whatsnew
DSS now fully integrates deep learning capabilities to build powerful deep-learning models within the DSS visual machine learning component.
Deep learning in DSS is “semi-visual”:
- You write the code that defines the architecture of your deep learning model
- DSS handles all the rest (preprocessing data, feeding model, training, showing charts, integrating Tensorboard, …)
DSS Deep Learning is based on the Keras + TensorFlow couple. You will mostly write Keras code to define your deep learning models.
DSS Deep Learning supports training on CPU and GPU, including multiple GPUs. Through container deployment capabilities, you can train and deploy models on cloud-enabled dynamic GPUs clusters.
Please see Deep Learning for more information
You can now run parts of the processing tasks of the DSS design and automation nodes on one or several hosts, powered by Docker or Kubernetes:
- Python and R recipes
- Plugin recipes
- In-memory machine-learning
This is fully compatible with cloud managed serverless Kubernetes stacks
Please see Running in containers for more information.
Each DSS project now contains a Wiki. You can use the Wiki for documentation, organization, sharing, … purposes.
The DSS wiki is based on the well-known Markdown language.
In addition to writing Wiki pages, the DSS wiki features powerful capabilities like attachments and hierarchical taxonomy.
Please see Wikis for more information.
You can now have full discussions on any DSS object (dataset, recipe, …). Discussions feature rich editing capabilities, notifications, integrations, …
Discussions replace the old “comments” feature.
Please see Discussions for more information.
You can now organize projects on the projects list into hierarchical folders.
Please see Project folders for more information.
Dashboards can now be exported to PDF or image files in order to propagate information inside your organization more easily.
Dashboard exports can be:
- Created and downloaded manually from the dashboard interface
- Created automatically and sent by mail using the “mail reporters” mechanism in a scenario
- Created automatically and stored in a managed folder using a dedicated scenario step
See Exporting dashboards to PDF or images for more information
It is now possible for administrators to automatically stop Jupyter notebooks that have been running or been idle for too long, in order to conserve resources