DSS 1.3 Relase notes

Version 1.3.2 - November 12th, 2014

For information about migration, please see the 1.3.0 release notes.

Bug fixes

Core

  • Support for RH 6.6 and above, CentOS 6.6 and above

Datasets

  • Fixed various issues with Cassandra datasets

Flow

  • Fixed UI for sliding_days depednency

  • Fixed ‘;’ splitting in Hive recipes

  • Fixed handling of multi-dimension-partitioning in scheduler

  • Fixed “explicit” rebuild of datasets

Version 1.3.1 - October 27th, 2014

For information about migration, please see the 1.3.0 release notes.

Bug fixes

Datasets

  • Fixed writing to S3 datasets when target data did not already exist

  • Fixed writing CSV datasets with Unicode-encoded separators

Machine learning

  • Fixed logistic regression in multiclass mode

  • Fixed a target remapping issue that could lead to incomplete result screens for multiclass models with null values

  • Fixed the ability to switch from regression to multiclass model

  • Fixed the model comparison screen

Misc

  • Fixed sizing issue that could lead to unreadable categorical analysis on numerical columns

Version 1.3.0 - October 23rd, 2014

Important notes about migration

The automatic data migration procedure is documented in Upgrading a DSS instance

As usual, we strongly recommend that you perform a full backup of your Data Science Studio data directory prior to starting the upgrade procedure.

Automatic migration of data from Data Science Studio 1.2.X is supported, with the following restrictions and warnings:

  • It is strongly recommended to retrain all machine learning models as new features have been added. Non-retrained models might fail to display or to be usable in scoring.

  • JSON datasets now have built-in protections against huge files. You might need to tweak the nested arrays limits in the Dataset settings screen

  • The old job logs are not available anymore in the UI after migrating. They are still available on disk, though.

For migrations from Data Science Studio 1.1.X, please also see the release notes of version 1.2.0

New features

Please also see our Blog Post for more information.

General

  • New R support

    • R recipe to read and write datasets using our new R API.

    • R notebook for interactive work

  • Visual editor for complex types

Datasets and connection

  • DSS now has support for many new Hadoop-related formats

    • Parquet

    • Sequence File

    • RC File

    • ORC File

    All these can be used in Hive recipes, notebooks and (for some of them) Pig recipes and Impala notebooks

  • Avro support has been greatly expanded and now supports nested complex types. Avro datasets can now be used for Hive recipes.

  • Complex types are now properly preserved when writing to MongoDB.

Machine Learning

  • We have introduced brand new clustering results screen. They will help you understand in much more details and with new visualizations what differenciates your clusters

  • Support for Text features has been added. Text features are processed using the Bag of Words model.

  • Random Forests models will now automatically adapt the number of trees to the optimal number.

  • You can now write your own models in Python and use them in the visual interface

  • Large performance enhancements for scoring recipes.

Flow and recipes

  • New “Preview” mode for jobs

    • You can now see which datasets will be rebuilt and why

    • You can disable some parts of a job if you want to ignore it

  • You can now see which datasets are being built and have links to latest/current jobs from the Flow screen

  • You can now edit schema and check consistency directly from the recipe screens

  • New Time Range dependencies function brings much more flexibility for time partitioned datasets.

  • You can now “write-protect” a dataset so that its already computed partitions are never automatically recomputed. This is especially useful for some slow recipes or when you use partitioning for historization.

  • Python recipe can now read and write datasets of any size. Writing was previously limited by the size of the disk of the DSS host

Notebooks

  • New R notebook for interactive work on DSS datasets with R

  • In IPython notebook, you can now request samples of the dataset. Only the sample will be loaded.

Insights

  • “Dataset” insights now feature preview of the dataset directly from within the Pinboard.

  • New JS API for Web app insights. You can now request subsamples of the datasets from your Web apps.

Enhancements

Flow

  • Better validation and checks on variable substitutions

Major bugfixes

  • Under certain circumstances, training recipes produced invalid results