DSS 1.2 Relase notes¶

Version 1.2.4 - September 25th, 2014¶

Bug fixes¶

Data preparation¶

Fix filter returning invalid results on first call in some rare cases
Added Indian holidays
Fix an out of memory condition that could happen after several days of usage
Fix overflow with long column names in “Remove columns” processor
Fix caching issues for charts with Live Processing

Flow¶

Fix scrolling in recipe IO screens
Hive recipes and notebooks can now use “pre-init” and “post-init” scripts to deal with authorization issues
Fix date blacklist on non-UTC machines
Fix failure of Hive recipes whenever an Avro dataset existed
Add support for SQL schemas when syncing to Vertica

Notebooks¶

Fix scrolling issues
Improve handling of logs for Hive

Modeling¶

Fix the cutoff selector

Version 1.2.3 - August 26th, 2014¶

Please see the 1.2.0 release notes for information about migrations.

New features¶

New processor to extract information about bank holidays, school holidays and weekends in several countries

Enhancements¶

Improved performance for prediction/scoring/clustering recipes

Bug fixes¶

Core¶

Fixed installation on CentOS following IUS repository location change
Fixed SQL datasets when name contains some reserved keywords
Fixed various scrolling issues
Fixed Community Edition on OpenVZ machines
Improve startup scripts

Data preparation and visualization¶

Don’t use Live Processing charts for partitioned datasets
Fixed “Analyse” in explore view for numerical columns containing NULL values
Fixed Stemming option in Tokenizer processor
Fixed Fuzzy matching when used in Recipe

Recipes/Flow¶

Fixed compatibility issues with Pig 0.13
Fixed “View in Flow” in Chrome
Fixed “Run” button in Recipes Actions

Machine Learning¶

Fixed “Display outliers” in Clustering results scatter plot

Version 1.2.2 - August 8th, 2014¶

Please see the 1.2.0 release notes for information about migrations.

Enhancements¶

Prediction / Clustering scoring recipe works with datasets of any size.
Added write_dataframe in dataset writers.

Bug fixes¶

Fixed Hive script validator compatibility with Hive 0.13
Fixed abnormally high UI memory usage in some cases.
Fixed incorrect SQL types mapping (PostgreSQL/Vertica/Redshift).
Fixed issue in SQL-backed charts when the Impala JDBC driver cannot be loaded.
Fixed a bug preventing file selection in connection’s root folder (S3/HDFS).
Fixed “raw values” numerical axis in charts.
Unicode characters allowed in target values of prediction models.
Worked around Pandas bug when loading large datasets.
Fixed the profiling feature functionality in clustering.
Scoring recipes append all columns to the output rather than only INPUT columns.

Version 1.2.1 - July 29th, 2014¶

Please see the 1.2.0 release notes for information about migrations.

Bug fixes¶

Fixed various UI issues.
Fixed charts in exported project.
Fixed various unicode issues in python & models.
Cleanup temporary files generated by R recipes.

Enhancements¶

Switched to FreshDesk.
Better support of Hive 0.13.
Experimental Oracle and Sybase support.
Improved support of MapR and CDH5.
Dataiku’s Hive UDF jar is loaded in recipes and SQL notebooks by default.

Version 1.2.0 - July 21st, 2014¶

Important notes about migration¶

The automatic data migration procedure is documented in Upgrading a DSS instance

As usual, we strongly recommend that you perform a full backup of your Data Science Studio data directory prior to starting the upgrade procedure.

Automatic migration of data from Data Science Studio 1.1.X is supported, with the following restrictions and warnings:

Notebooks created from trained models using Impact Coding will not work anymore. Please remove the impact coding section.
SQL identifiers are now quoted and case preserving across the studio. This can cause issues with pre-existing mixed-case tables and columns.

If you have mixed-case managed SQL datasets, it is recommended to lowercase the table name in the dataset configuration to avoid another table being created

If you have mixed-case column names, you might need to recompute your table

For migrations from Data Science Studio 1.0.X, please also see the release notes of version 1.1.0

New features¶

Please also see our Blog Post for more information.

General¶

DSS now features a free Community Edition which allows you to use the Studio without any time limit
DSS now integrates a Scheduler for automatic builds of datasets. The scheduler leverages the incremental build support in order to just recompute and rebuild predictions on new data or data that has changed.
Support for complex types in dataset schemas has been improved. Complex types (including nested maps, nested arrays, nested objects) will properly be represented in Hive, MongoDB, ElasticSearch and JSON
Many non-fatal errors that can happen during processing a job are now reported in a “Warnings” tab at the end of a job, for easy verification of warnings

Insights¶

A whole new web app editor has been introduced. It features a much improved ergonomy, integrated code templates and snippets, and the ability to write custom Python-based backends for even more interactive webapps.
Various improvements have been made on the pinboard visual appearance
It is now possible to choose where to display the title of insights, and whether to display insight description

Data Preparation¶

The date parser processor now supports multiple formats. Formats are tested in order until a matching format is found
New processor : Geocoder, based on MapQuest or Bing API.

Machine Learning¶

Multiclass classification has been introduced
Some Machine Learning algorithms can take quite a long time to run. Data Science Studio now features integration with H2O, a distributed machine learning framework. This integration brings distributed implementations of a selected set of machine learning algorithms (Random Forest, Deep Learning, Gradient Boosting Methods).

Flow and recipes¶

A new “Sampling” recipe lets you create a dataset which is a sample extracted from an input dataset.
The Python recipe now comes with a large number of fully documented code samples

Datasets and connection¶

New experimental support for Teradata
Schemas are now supported in SQL databases. All parts of the Studio that deal with SQL tables are now schema-aware
Support for Apache Cassandra datasets (both read and write, partitioned and unpartitioned)

Notebooks¶

The SQL / Hive / Impala notebook has been completely overhauled for better ergonomy

Visual analytics¶

Better information and choices to switch between internal DSS engine or native SQL engine for charts
When performing Visualization on a HDFS dataset, if you use the whole dataset, and if Impala is installed, Impala will be used to compute the aggregations
Charts based on SQL now support Greenplum and Redshift