DSS 5.0 Release notes

Migration notes

Migration paths to DSS 5.0

OS and Hadoop deprecations

As previously announced, DSS 5.0 removes support for some OS and Hadoop distributions.

Support for the following OS versions is now removed:

  • Redhat/Centos/Oracle Linux 6 versions strictly below 6.8
  • Redhat/Centos/Oracle Linux 7 versions strictly below 7.3
  • Ubuntu 14.04
  • Debian 7
  • Amazon Linux versions strictly below 2017.03

Support for the following Hadoop distribution versions is now removed:

  • Cloudera distribution for Hadoop versions strictly below 5.9
  • HDP versions strictly below 2.5
  • EMR versions strictly below 5.7

R deprecation

As previously announced, support for the following R versions is now removed:

  • R versions strictly below 3.4

Java 7 deprecation notice and features restrictions

As previously announced, support for Java 7 is now deprecated and will be removed in a later release.

As of DSS 5.0, some features are not available anymore when running Java 7:

  • Reading of GeoJSON files
  • Reading of Shapefiles
  • Geographic charts (all types)

How to upgrade

It is strongly recommended that you perform a full backup of your DSS data directory prior to starting the upgrade procedure.

For automatic upgrade information, see Upgrading a DSS instance.

Pay attention to the warnings described in Limitations and warnings.

Limitations and warnings

Automatic migration from previous versions is supported, but there are a few points that need manual attention.

Java 7 restrictions

Please see above

Retrain of machine-learning models

  • Models trained with prior versions of DSS should be retrained when upgrading to 4.3 (usual limitations on retraining models and regenerating API node packages - see Upgrading a DSS instance). This includes models deployed to the flow (re-run the training recipe), models in analysis (retrain them before deploying) and API package models (retrain the flow saved model and build a new package)

Version 5.0.0 - July 25th, 2018

DSS 5.0.0 is a very major upgrade to DSS with major new features. For a summary of the major new features, see: https://www.dataiku.com/learn/whatsnew

New features

Deep learning

DSS now fully integrates deep learning capabilities to build powerful deep-learning models within the DSS visual machine learning component.

Deep learning in DSS is “semi-visual”:

  • You write the code that defines the architecture of your deep learning model
  • DSS handles all the rest (preprocessing data, feeding model, training, showing charts, integrating Tensorboard, …)

DSS Deep Learning is based on the Keras + TensorFlow couple. You will mostly write Keras code to define your deep learning models.

DSS Deep Learning supports training on CPU and GPU, including multiple GPUs. Through container deployment capabilities, you can train and deploy models on cloud-enabled dynamic GPUs clusters.

Please see Deep Learning for more information

Containerized execution on Docker and Kubernetes

You can now run parts of the processing tasks of the DSS design and automation nodes on one or several hosts, powered by Docker or Kubernetes:

  • Python and R recipes
  • Plugin recipes
  • In-memory machine-learning

This is fully compatible with cloud managed serverless Kubernetes stacks

Please see Running in containers for more information.

Wiki

Each DSS project now contains a Wiki. You can use the Wiki for documentation, organization, sharing, … purposes.

The DSS wiki is based on the well-known Markdown language.

In addition to writing Wiki pages, the DSS wiki features powerful capabilities like attachments and hierarchical taxonomy.

Please see Wikis for more information.

Discussions

You can now have full discussions on any DSS object (dataset, recipe, …). Discussions feature rich editing capabilities, notifications, integrations, …

Discussions replace the old “comments” feature.

Please see Discussions for more information.

New homepage and navigation

The homepage of DSS has been revamped in order to show to each user the most relevant items.

The homepage will show recently used and favorite items first. It shows projects, dashboards and wikis, but also individual items (recipes, datasets, …) for quick deep links.

In addition, the global navigation of DSS has been overhauled, with menus, and better organization.

Grouping projects into folders

You can now organize projects on the projects list into hierarchical folders.

Please see Project folders for more information.

Dashboards exports

Dashboards can now be exported to PDF or image files in order to propagate information inside your organization more easily.

Dashboard exports can be:

  • Created and downloaded manually from the dashboard interface
  • Created automatically and sent by mail using the “mail reporters” mechanism in a scenario
  • Created automatically and stored in a managed folder using a dedicated scenario step

See Exporting dashboards to PDF or images for more information

Resource control

DSS now features full integration with the Linux cgroups functionality in order to restrict resource usages per project, user, category, … and protect DSS against memory overruns.

See Using cgroups for resource control for more information

Other notable enhancements

Support for culling of idle Jupyter notebooks

It is now possible for administrators to automatically stop Jupyter notebooks that have been running or been idle for too long, in order to conserve resources

Support for XGBoost on GPU

With an additional setup step, it is now possible for models trained with XGBoost to use GPUs for faster training.