DSS 2.2 Relase notes¶
- Migration notes
- Version 2.2.3 - January 22nd 2015
- Version 2.2.2 - December 10th 2015
- Version 2.2.1 - November 17th 2015
- Version 2.2.0 - November 11th 2015
Migration to DSS 2.2 from DSS 1.X is not supported. You should first migrate to the latest 2.0.X version. See DSS 2.0 Relase notes
- Automatic migration from Data Science Studio 2.1.X is fully supported.
- Automatic migration from Data Science Studio 2.0.X is supported, subject to the notes and limitations outlined in DSS 2.1 Relase notes
For automatic upgrade information, see Upgrading a DSS instance
DSS 2.2.3 is a bugfix version. For a summary of the major new features in the 2.2 series, see: https://www.dataiku.com/learn/whatsnew
- Fix handling of timezones in Impala recipes (when running in “stream” mode) by workarounding a JDBC driver bug
- Add support for new S3 signature algorithms, enabling support of newest AWS regions
- Remove excessive debugging in ElasticSearch datasets
- Fix unclickable Create recipe button with foreign datasets
- Fix ability to use Unicode column names in Python recipes
DSS 2.2.2 contains both bug fixes and new features. For a summary of the major new features in the 2.2 series, see: https://learn.dataiku.com/whatsnew
Experimental Spark-on-S3 and EMR support¶
DSS 2.2.2 contains an experimental-only support for running Spark on S3 datasets, and for running on Amazon EMR.
Experimental SFTP support¶
You can now create datasets over SFTP connections. These datasets are available through the “RemoteFiles” (i.e. with local cache) mode.
- S3 dataset can now target a custom endpoint
- Fixed several issues handling complex types, especially on Avro datasets
- Fixed case-sensitivity issues on Parquet datasets.
- Fixed support for timestamp columns in Impala notebooks and recipes
- Fixed issue with Kerberos connectivity
- Fix mappings not correctly propagated in ElasticSearch
- S3 now properly ignores hidden and special files
- Fixed S3 support for single-file datasets
- Fixed partitioned SQL query datasets
- Fixed computation of output probability centiles when working with K-Fold cross-test
Fixed ability to remove input datasets in the Join recipe
The “Run” button in the R recipe editor has been fixed
Fixed “Push to editable” recipes always having the same name
Fixed ability to create recipes on foreign datasets
Fixed “Save” button badly behaving on SQL recipes
- Do not display unavailable mass actions
- Fixed MAX aggregation
- Fixed storage types for custom aggregations
Window recipe: Fixed time-range bounding on Vertica
Visual data preparation¶
- Fixed columns sometimes badly displaying when using mass removal
- Fixed issue with numerical columns used as categorical
- Fixed a few issues on Safari
- Fixed “Add description” button on home page
DSS 2.2.1 is a bugfix release. For a summary of the new features in the 2.2.0, see: https://learn.dataiku.com/whatsnew
- Introduced more support for partitioning and fix some related bugs
- Fix bugs around boolean values in plugins configuration
- Allow expansion of variables in plugin configuration
- Make sure that new plugins are immediately recognized
DSS 2.2.0 is a significant upgrade, that brings major new features to DSS.
For a summary of the major new features, see: https://learn.dataiku.com/whatsnew
Prediction API server¶
DSS now comes with a full-featured API server, called DSS API Node, for real-time prediction of records.
By using DSS only, you can compute predictions for all records of an unlabeled datasets. Using the REST API of the DSS API node, you can request predictions for new previously-unseen records in real time.
The DSS API node provides high availability and scalability for scoring of records.
For more information about the API node, see Real-time prediction and services
Window function recipe¶
DSS now has a new visual recipe to compute SQL-99 style analytic functions (also called window functions).
This visual recipe makes it incredibly easy to create moving averages, ranks, quantiles, ...
It provides the full power of your engine’s analytic support, with multiple windows, unlimited sort and partitioning, ...
This recipe is available on all engines supporting it:
- Most SQL databases
- Cloudera Impala
The user response to our plugins feature has been overwhelming. In DSS 2.2, we have heard your feedback and made a ton of enhancements to the plugins system.
- Activate plugins development tools directly within DSS.
- Edit all plugin files directly within DSS. No command-line nor vi required!
- Plugins can now retrieve a lot of DSS configuration details: know whether Spark or Impala are enabled, get proxy settings, ...
- Plugin-level configuration, retrievable by all datasets and recipes of the plugin. Great for storing access credentials for example.
- Support for partitioned datasets. See our Tutorial <https://learn.dataiku.com/howto/other/partitioning/partitioning-redispatch.html> for more information
- View the logs directly in DSS UI
- The recipe UIs will now properly obey the role definitions
- The new APIs make it much easier to write recipes that automatically dispatch to one of several engines, depending on the dataset configuration and which features are enabled.
- Recipes can now read much more info about datasets (paths, files, DB details, ...). This makes it easy to submit connection details directly to a third-party execution engine instead of having the data go through the DSS engine.
Long-running tasks infrastructure¶
DSS now has a new multi-process infrastructure for handling long-running tasks.
The key benefits are:
- DSS is now more resilient against various data issues that could previously cause crashes
- Aborting some long-running tasks, like computing a random sample on a SQL database is now faster and rock-solid
- Each user can now list and abort all his own tasks from a centralized screen. Administrators can do the same with all tasks
SQL: dates without timezone¶
DSS now has full support for dates without timezone columns in all SQL databases. Previously, support and handling differed depending on the database engine.
For all databases, dates-without-timezone can now be handled as string, server-local dates or as a user-specified timezone.
The public API now includes a set of methods to interact with managed folders
Internal Python API¶
dataiku.get_dss_settings()(see paragraph about plugins)
Dataset.get_files_info()for new kinds of interaction with your datasets. (For example: submit connection details directly to a third-party execution engine instead of having the data go through the DSS engine)
Dataset.get_formatted_data()to retrieve a dataset as a stream of bytes with a specific format.
Multi-connection SQL recipes¶
SQL query recipes can now use (optionally) datasets from multiple connections (of the same type) as input. This lets you create a recipe that uses two databases on the same database server, for example.
It is still your responsibility to write the proper SQL to actually do the cross-database lookups.
- Fixes around partitioned VisualSQL recipes
- Fix for vstack recipe on Oracle
- Fix for grouping recipe on DSS engine with custom aggregations
- Default format of S3 datasets is now compatible with Redshift
- Properly update the Hive metastore when the dataset schema changes
- Fixed SQL script on PostgreSQL with non-standard port
- Fixed Hive error reporting (line numbers were not propagated properly)
- Fixed hidden tooltips on published charts
- Fixed saving of “one tick per bin” option
- Fixed error in clustering when empty columns
- Fixed error when trying to upload a plugin without selecting a file.
- Fixed wrongful detection as “Decimal french format”
- Fixed display of custom date formats on Firefox