DSS 1.1 Release notes¶

Version 1.1.5 - July 14th, 2014¶

Please see the 1.1.0 release notes for information about Migrations from 1.0.X

Bug fixes¶

Fixed issue in timestamps system (introduced in 1.1.3) that caused flow to mistakenly consider datasets as out-of-date and needing rebuild.
Improve the way timestamps are computed for managed SQL datasets. This avoids possible OutOfMemory errors with complex SQL-based flows, especially with N->1 partition dependencies (like sliding days)

Version 1.1.4 - July 9th, 2014¶

Please see the 1.1.0 release notes for information about Migrations from 1.0.X

New features¶

Support for Apache Cassandra
Enhancements
Each MapReduce job started from a Pig recipe will now have a proper name (recipe name + output partition)

Bug fixes¶

Fix various issues with pinlets layout on pinboard
Fix encoding issue with Python recipe in “write_tuples” mode
Don’t fail ElasticSearch synchronization if a column of type “map” contains bad data (invalid JSON / duplicate keys)
Fix custom Unicode characters in CSV separators

Version 1.1.3 - July 1st, 2014¶

Please see the 1.1.0 release notes for information about Migrations from 1.0.X

Enhancements¶

New storage system for Flow states. Recomputation of Flow dependencies is now much faster in presence of a large number of partitions

Bug fixes¶

Fix publication to pinboard of “web app” insights
Excel extractor: Fix formatting of dates in formula-computed cells
Fix scroll in various list screens
Improve data preparation behaviour on Firefox
Fix various interface bugs

Version 1.1.2 - June 17th, 2014¶

Please see the 1.1.0 release notes for information about Migrations from 1.0.X

Bug fixes¶

Python processes for models now automatically select their ports to avoid possible port conflicts
Scrollbar in SQL notebook history has been fixed
SQL script recipes that contain multi-lines PL/PGSQL stored procedures have been fixed
The Redetect button of models now always properly works
The Hive recipe validator now properly handles numerical ${hiveconf:} expansions
The “Create insight” button in dataset menu has been fixed

Version 1.1.1 - June 6th, 2014¶

Please see the 1.1.0 release notes for information about Migrations from 1.0.X

Bug fixes¶

Fix an issue with the same recipe name in different projects
Fix Hive schema detection in SELECT DISTINCT queries
Fixes around edition of preparation script titles
Fix custom JDBC properties

Version 1.1.0 - May 23rd, 2014¶

Important notes about migration¶

Automatic migration of data from Data Science Studio 1.0.X is supported, with the exception of the “Models” parts. Models created with Data Science Studio 1.0.X cannot be used with DSS 1.1

The automatic data migration procedure is documented in Upgrading a DSS instance

After upgrading, the modification dates of datasets and recipes will not be correct until you edit them.

After upgrading, your datasets will be considered as out of date and rebuilds will be required

The “dku build” command now requires a “fully qualified” dataset name, ie PROJECTKEY.datasetName

The following deprecated features have been removed : “simple partition deps” in recipes JSON and “schemaInherits” in datasets JSON

As usual, we strongly recommend that you perform a full backup of your Data Science Studio data directory prior to starting the upgrade procedure.

New features¶

General¶

A new projects system was introduced to ease working and collaborating on multiple projects within Data Science Studio
New collaboration features (tags, descriptions, timelines & activity feeds, notifications and comments) lets you easily collaborate with your team
A brand new navigation system provides a smoother user experience and better productivity

Insights¶

You can now create “insights” from visual charts, datasets, IPython notebooks and custom HTML visualizations
Insights can be published on the “Pinboard”, where collaborators can view them and interact with the data analysts

Data preparation¶

A new rendering system makes it much faster to work with datasets with hundreds of columns.
Custom formula editor is much simpler to use
New UX, documentation for all processors

Machine Learning¶

Our guided machine learning has been completely overhauled. It supports a vast choice of algorithms and parameters. You now have the ability to perform many runs and easily compare them. When you are satisfied with a model that you created, you are just one click away from using it in your Data Flow for semi-automatic prediction or even re-training.

Notebooks¶

SQL notebooks can now use Impala. Impala databases are automatically detected without any configuration. Furthermore, Impala databases are automatically refreshed when a HDFS dataset is built by Data Science Studio

Flow and recipes¶

New rebuild modes have been introduced : force-rebuild of all datasets recursively, and “lenient rebuild” : out-of-dates datasets are not recomputed, only non-existing datasets are
Much faster Hive metastore synchronization with many partitions: metastore synchronization is now incremental. The synchronization process also detects schema changes and performs a full resync in that case

Visual analytics¶

When performing Visualization on a SQL dataset (provided that you don’t use sampling but the whole dataset), visualization will natively use the SQL database for optimized computations.

Major bugs fixed¶

Several SQL type mapping issues were fixed