DSS 9.0 Release notes¶
From DSS 8.0: Automatic migration is supported, with the restrictions and warnings described in Limitations and warnings
From DSS 7.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 7.0 -> 8.0
From DSS 6.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 6.0 -> 7.0, 7.0 -> 8.0
From DSS 5.1: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0
From DSS 5.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0
From DSS 4.3: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.3 -> 5.0, 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0
From DSS 4.2: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.2 -> 4.3, 4.3 -> 5.0, 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0
From DSS 4.1: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.1 -> 4.2, 4.2 -> 4.3, 4.3 -> 5.0, 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0
From DSS 4.0: Automatic migration is supported. In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.0 -> 4.1, 4.1 -> 4.2, 4.2 -> 4.3, 4.3 -> 5.0, 5.0 -> 5.1, 5.1 -> 6.0, 6.0 -> 7.0, 7.0 -> 8.0
Migration from DSS 3.1 and below is not supported. You must first upgrade to 5.0. See DSS 5.0 Release notes
It is strongly recommended that you perform a full backup of your DSS data directory prior to starting the upgrade procedure.
For automatic upgrade information, see Upgrading a DSS instance.
Pay attention to the warnings described in Limitations and warnings.
Automatic migration from previous versions (see above) is supported. Please pay attention to the following removal and deprecation notices.
dataikuapi.dss.apideployer.DSSAPIDeployerService.import_version(). This method does not take version_id as a parameter anymore
Some features that were previously announced are deprecated are now removed or unsupported.
Support for RedHat 6, CentOS 6 and Oracle Linux 6 is removed
Support for Amazon Linux 2017.XX is removed
Support for Spark 1 (1.6) is removed. We strongly advise you to migrate to Spark 2. All supported Hadoop distributions can use Spark 2.
Support for Pig is removed
Support for Machine Learning through Vertica Advanced Analytics is removed We recommend that you switch to In-memory based machine learning models. In-database scoring of in-memory-trained machine learnings will remain available
Support for Hive SequenceFile and RCFile formats is removed
DSS 9.0 deprecates support for some features and versions. Support for these will be removed in a later release.
Support for Ubuntu 16.04 LTS is deprecated and will be removed in a future release
Support for Debian 9 is deprecated and will be removed in a future release
Support for SuSE 12 SP2, SP3 and SP4 is deprecated and will be removed in a future release. SuSE 12 SP5 remains supported
Support for Amazon Linux 1 is deprecated and will be removed in a future release.
Support for Hortonworks HDP 2.5 and 2.6 is deprecated and will be removed in a future release. These platforms are not supported anymore by Cloudera.
Support for Cloudera CDH 5 is deprecated and will be removed in a future release. These platforms are not supported anymore by Cloudera.
Support for EMR below 5.30 is deprecated and will be removed in a future release.
Support for Elasticsearch 1.x and 2.x is deprecated and will be removed in a future release.
As a reminder from DSS 7.0, support for “Hive CLI” execution modes for Hive is deprecated and will be removed in a future release. We recommend that you switch to HiveServer2. Please note that “Hive CLI” execution modes are already incompatible with User Isolation Framework.
As a reminder from DSS 7.0, Support for Microsoft HDInsight is now deprecated and will be removed in a future release. We recommend that users plan a migration toward a Kubernetes-based infrastructure.
DSS 9.0.4 is a significant new release with both new features, performance enhancements and bugfixes.
New feature Experimental support for leveraging Snowflake Java UDF for faster (up to 3x faster) in-database scoring of ML models. Requires Snowflake Java UDF (preview) on Snowflake side.
New feature Experimental support for leveraging Snowflake Java UDF for exporting ML models to Snowflake functions that can be reused by any Snowflake user or client application. Requires Snowflake Java UDF (preview) on Snowflake side.
New feature Experimental support for leveraging Snowflake Java UDF for data preparation, allowing push down of the following processors: String transformer, Currency extractor, Filter on bad meaning, Query string parsing, Holidays flagging, GeoIP resolution
New feature Experimental support for direct fast write into Snowflake from any recipe, without having to sync through cloud storage anymore
New feature Fast load and fast unload from/to Google Cloud Storage
New feature Fast load from Azure Blob to Snowflake in Parquet format
Increased max colum names length to the maximum supported by Snowflake, 251
Redshift: New feature Experimental support for direct fast write into Redshift from any recipe, without having to sync through S3 anymore
Synapse: New feature Experimental support for direct fast write into Synapse from any recipe, without having to sync through Azure Blob anymore
Synapse: New feature Ability to set distribution policy in the UI
Synapse: Fixed issue with table creation on partitioned datasets
BigQuery: New feature Added ability to read and write nested and repeated fields
BigQuery: Experimental alternate mode to interact with BigQuery, providing ability to read data samples without incurring the cost of a full scan
BigQuery: Experimental support for displaying query cost estimation in notebook
ElasticSearch: New feature Added ability to use a custom query DSL filter
ElasticSearch: New feature Experimental support for index patterns (for reading)
S3: Fixed issue with partitioned datasets with spaces in partition names
New feature: Added ability to import table definitions from an external Hive-compatible metastore, such as a Databricks metastore
Improved “skip first line” detection for CSV files
Improved detection of schema for CSV files with empty column names
Added ability to override Parquet message style when DSS fails to recognize it
Fixed ability to create a dataset from the files of a shared managed folder
Fixed display of query in “SQL query” datasets
New feature: Added stop words for Afrikaans, Albanian, arabic, Armenian, Basque, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, Estonian, Finnish, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malayalam, Marathi, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Tatar, Telugui, Thai, Turkish, Ukrainian, Urdu, Vietamese, Yoruba - These are available in “Simplify Text”, “Tokenize text”, “Analyze text” and Text feature handling
New plugin: Plugin Model error analysis adds a Saved model custom view to highlight the samples mostly contributing to a predictive model’s errors
Fixed class weights with XGBoost
Update the available code samples when changing prediction type
Fixed possible breakage of models when the preparation script contained a “filter on date range” processor
Fixed issue with duplicating ML Tasks
Fixed wrong result of SQL scoring with numerical columns stored as strings
Fixed various small numerical inconsistencies in SQL scoring
Fixed issues with colors in clustering models reports
Fixed display of custom scores in binary classification model reports
Added ability to specify strings that Pandas should consider as “NA” (i.e. make it possible not to consider the “NA” string as being a NA value)
Added ability to autodetect ElasticSearch dataset settings
Added API for Model Documentation Generator
Fixed creation of values-based meanings with Python 3
dataiku.Folder.upload_filemethod with binary files and Python 3
Fixed mangling of name when importing a notebook with CJK characters in the name
Fixed bad interaction between “Edit recipe in notebook” and notebooks imported from Git
Copying notebooks will now clear the “associated recipe”
Fixed link to editor settings from library editor
Fixed ability to use scenario-level variables in code recipes
Fixed issues with “Git references” in code libraries
Fixed conversion of notebooks with Markdown cells to recipes
Improved the API for setting values of numerical hyperparameters on ML tasks
When creating a code environment, go directly to its page
Fixed display issue when deleting a code env while being on the code env page
Added search in all code env dropdowns
Performance: Strongly improved display performance of Flow page for very large flows with many zones and many projects in the instance
Performance: Strongly improved display performance for “Jobs” page with very large flows
Fixed display of Flow if you enter a wrongful pattern in a Flow filter
Fixed recipes being moved to the default zone when moving them
Fixed bad error display when trying to rebuild a write-protected dataset
Removed bogus ability to edit tags on shared datasets
Removed bogus ability to edit tags in the quick Flow navigator
Fixed display of “collapsed” Flow zones
Fixed failures copying flows or subflows with SQL recipes
Pattern generator: Improved support for detecting non-ASCII text
Fixed support for val(“column”, default_value) in formula
Formula editor: fixed support for regular expressions
Formula editor: fixed support for datePart function
Fixed support for default value on strval for SQL engine
Fixed handling of null values in “min” and “max” formula functions
Fixed the “real Python” mode of the Python processor when running on Spark
Fixed issue in the “Impute missing values” processor
Fixed the help tooltip for “Force numerical range” processor
Fixed duplicate columns appearing in “column name” fields
Fixed various other issues with column name autocompletion
Fixed dynamic select choices for custom views (managed folders and models)
Fixed dynamic select choices for “create cluster” scenario step
Fixed dynamic select choices for PRESET fields
Fixed dynamic select choices in custom Kubernetes exposition plugins
Added ability to use presets in custom Kubernetes exposition plugins
Automatically commit plugin.json at first commit
Fixed typo when reverting plugin to a previous revision
Fixed disappearing “users”, “creation” and “last modification” fields in catalog
Strongly increased maximum character limit of Wiki pages
Fixed missing scroll in profile page
Experimental ability to hide unwanted recipes (legacy Hadoop, R, Scala, …)
Fixed Wiki export when working on a machine without users namespaces enabled
Fixed Dataiku Applications flooding the logs
Dataiku Applications: added the “is a Dataiku application” visual indicator in all project listing pages
Added ability to ignore SSL certificate validation for design-node-to-deployer communication
Various UI fixes
Fixed wrongful display of “Created on” in the “Triggers” page of Automation monitoring
Fixed small display issues in Triggers page
Better default volume sizes and volume resizing strategies for high-activity and high-volumetry instances
Added ability to define tags at fleet creation, that will be propagated both at instance and network levels
AWS: Added ability to encrypt the root EBS volume. Default to encrypting both root and data EBS volumes
AWS: Added ability to use a custom CMK for encrypting root and data EBS volumes on Fleet Manager instance
AWS: Install the AWS Systems Manager agent on both Fleet Manager and DSS images
AWS: Default to automatically creating the security groups
AWS: Upgrade eksctl for compatibility with latest EKS versions
AWS: Fixed startup failures after too many reprovisionings of an instance
Additional hardening of the runtime images following CIS Benchmark guidelines
Fixed support for continuous Python recipes when UIF is enabled
Fixed ability to create a continuous sync recipe directly from the streaming endpoint
Fixed dashboards PDF export sometimes being clipped
Fixed display of preview of files in “file from managed folder” insight
Fixed TLS termination with nginx ingress
Added more transient errors that are recognized as non fatal while monitoring Kubernetes jobs
Fixed startup failure with custom Kubernetes exposition plugins
Fixed support for webapps on Kubernetes
Added an audit event when opening a Jupyter notebook
Added encryption of client secret fields in Azure Blob, SQL Server and Synapse connections
Fixed bad redirect to HTTP when fetching credentials for 3rd-party services with OAuth
Fixed display of error upon failure to acquire an OAuth2 authorization code
Fixed typo when switching project to another branch
Fixed UI issue in export recipe
Fixed migration issue when a DSS 9 project had been imported to a DSS 8 instance and this instance is then added to DSS 9
Fixed bug when multiple DSS instances use the same PostgreSQL database and schema for runtime databases
Fixed failure to display data after migration to DSS 9 when some kinds of date filters were present
Fixed Excel export of charts when some kinds of date filters were present
Fixed default settings for “Push to editable” recipe
Fixed eventserver not refreshing the token when using a S3 connection with “Assume role”
DSS 9.0.3 is a bugfix release. We recommend that you upgrade to DSS 9.0.3
Fixed inability to create recipes based on shared datasets
Fixed various errors in recipe edition screens
DSS 9.0.2 is a significant new release with both new features, performance enhancements and many bugfixes. Note that we recommend that you upgrade to 9.0.3 rather than 9.0.2
New feature Added OAuth2 login for Snowflake
New feature Added OAuth2 login for Azure Blob
Azure Blob: Made the “client secret” field hidden
MongoDB: Removed connection details from logs
BigQuery: Fixed metrics computation on BigQuery “SQL Query” datasets
SCP: Fixed write to managed folders based on SCP connections
Google Cloud Storage: Fixed PDF preview in managed folders
Fixed preview of images with specials characters in their file names in managed folders
New feature: Prepare: Added support for SQL pushdown of “inc” formula function (add to dates) to BigQuery and Snowflake
New feature: Prepare: Added support for SQL pushdown of “coalesce” formula function to BigQuery and Snowflake
New feature: Prepare: Added support for SQL pushdown of “rand” formula function to BigQuery and Snowflake
New feature: Prepare: Added support for SQL pushdown of trigonometric functions to Snowflake
Performance: Sync and Prepare: Strongly improved performance on large partitioned datasets (notably S3 / Azure Blob / Google Cloud Storage)
Performance: Join: Improved performance of DSS engine with non-equijoin conditions
Prepare: Fixed issue with SQL pushdown of “concat” formula function with Snowflake and NULL values
Prepare: Fixed possible SQL pushdown issue with formula processor
Prepare: Added ability to trim white spaces in “Find and replace”
Prepare: Fixed issue with SQL pushdown of date parsing on Snowflake with numeric columns
Prepare: Fixed issue when setting ‘cast output’ to None in Formula step
Prepare: Fixed formula validation issue with column names starting with numbers
Prepare: UX enhancements on “Pattern detector” and “Smart date”
New feature Added support for Cloudera Data Platform CDP Private Cloud Base (CDH 7)
New feature Added support to direct writes from Spark to S3 with SSE-KMS encryption
Performance: Improved performance of Deployer dashboards with large number of deployments
Fixed deployment dialog being stuck when a warning happens during bundle activation
Fixed sticky tooltip for performance charts in API deployer
New feature Central tracking of project reporters in admin monitoring
Performance: Improved performance of home page on Firefox
Fixed a bug when importing a project containing API services that use a code environment with remapping
Fixed wrongful URLs in navigation bar when duplicating projects
New feature The labels of the ‘run button’ and ‘edit variables application’ tiles are now customizable
Added possibility to mass delete application instances
By default app instances are now hidden in the ‘all projects’ list
In application designer always prompt for saving when updating the test instance
Improved error message for ‘download file from folder’ tile
Improved error handling for application-as-recipes
Fixed ‘Append instead of overwrite’ in application-as-recipes
Performance: Strongly improved network and UI performance when creating or opening recipes in projects with large flows or large number of columns
Performance: Strongly improved performance of “Computing job dependencies” for very large flows and flows with large number of “branches”
Fixed possible crash when using flow zones
When copying a recipe, the new recipe now appears in the same zone than the original recipe
Fixed “Set auto count of records” action
Performance: Individual explanations: Improved performance with large number of categories and for text features
Performance: Improved memory usage for ML training
Individual explanations: Fixed scoring recipe with computation of Individual explanations and ‘output probabilities’ disabled
Preprocessing: Fixed “MINMAX” mode of feature rescaling
Preprocessing: Fixed display of feature generation
Preprocessing: Fixed wrong stop words usage for Saved models training
SQL scoring: fixed issue with rejected features
Interactive scoring: Fixed empty categorical dropdown for some preprocessing
Interactive scoring: Fixed first loading of threshold on Firefox
Interactive scoring: Fixed issue with UIF
Custom algorithms: Added ability to display regression coefficients for custom linear models
Custom algorithms: Fixed possible failure when scoring with explanations
Custom views: Made Saved model custom views exportable in the dashboard
Custom views: Made available for analysis models (in addition to saved models)
Partitioned models: Fixed race condition for partitioned training recipe
Partitioned models: Fixed detection of unused partitions of partitioned models when partition name contains extended charsets
Partitioned models: Fixed display of insight for partitioned models
Partitioned models: Fixed duplicated tabs
Notebook export: Added support for instance weights
Model Document Generation: Fixed issue with models coming from imported or duplicated projects
Fixed training in edge cases of numeric features with few values including invalid values on Python 3
Fixed discrepancy between Java scoring and SQL batch scoring on models trained with Python 3
Made the seed of the hyper-parameter search independent from the seed of the train-test split
Rounded display of threshold when evaluating a binary classification model
Fixed scoring recipe with multiclass prediction and python scoring if “Output probabilities” is disabled
Removed non compatible exponential loss training option for Gradient Boosted Tree on multiclass
PMML export: Added back support for dummy-encoded categorical features
PMML export: Improved consistency between PMML models and DSS scoring
PMML export: Added support for models with “treat missing as regular” for categorical features
PMML export: Added support for Extra Trees algorithm
PMML export: Explicitly list drop rows as incompatible preprocessing for PMML
API: Enforced PMML compatibility check
API: Added helpers to manage time-based prediction
New feature Central tracking of scenario reporters on automation monitoring
New feature: Ability to configure the max number of results for “Execute SQL”
Fixed infinite loop with monthly triggers running “on first week”
Made webhooks reporter appear as failed if the webhook gets a non-2XX HTTP return code
Fixed connection remapping for “Execute SQL” steps during bundle activation
Fixed Python API methods ‘add_monthly_trigger’ and ‘add_periodic_trigger’
Fixed addition of newly-created scenarios in catalog
New feature: Support for installing Jupyter nbextensions
When deleting a notebook, unload it first
Fixed adding new tags to notebooks
Fixed unloading notebooks of users with a dot in their name
Fixed “Explain” in SQL notebooks with very large queries
API: Added API to get project creation and last modification dates
API: Fixed “DSSFuture” API in Python client
Webapps: Fixed Bokeh webapps behind a reverse proxy
SQL recipes: Added ability to access recipe inputs by position rather than name
Added support for static private IP for nodes
Fixed display of the “Clusters” tab in Deployer node
Fixed support of special characters in passwords
Added the ‘require authentication’ option at webapp levels for plugin webapps
Fixed “admin-connection-save” audit log entry
Upgraded Nginx version in container images to avoid a Nginx 1.16 vulnerability ([CVE-2019-20372])
Large update of the administrative boundaries for reverse geocoding and administrative charts (notably fixing an issue with some US states)
Performance: Performance enhancement for the “Line chart” with “Interrupt line” mode
Fixed UI issue in “Enrichments” page of API designer
Improved handling of errors in statistics screen
Fixed leak of folders in /tmp when UIF is enabled
DSS 9.0.1 is a significant new release with both performance enhancements and bugfixes
Azure Synapse: Fixed “contains” formula function and visual operator
Snowflake: Added support for explain plans
Azure Blob: Fixed issue with restrictive ACLs on parent folders of datasets
Delta Lake: Fixed preview of large Delta datasets
Fixed failure when re-deploying API services from pre-existing infrastructures and deployments from DSS 7.0
Fixed project list search in Project Deployer
When bundle preload fails, keep the failure logs visible
Improved error message readability in the health status of a deployment
Improved Deployer integration in the Global Finder
Fixed inability to import projects in case of failure during code env remapping
Performance: Improved performance when a very large number of scenarios start at the same time
Performance: Improved performance of automation home page with high number of projects and scenario runs
Performance: Improved performance of project home page with high number of scenario runs
Performance: Improved performance of scenario page with high number of runs
Performance: Improved performance of automation monitoring pages with high number of runs
Performance: Reduced resource consumption of backend with very large number of triggers
Performance: Reduced resource consumption of backend with very large number of “Build” scenario steps
Performance: Reduced resource consumption of backend with very large number of connected users
UI and UX improvements on the smart pattern generator
UI and UX improvements on the smart date modal
Made public IP optional on Fleet Manager CloudFormation template
Added EBS encryption for the Fleet Manager EBS
DSS 9.0.0 is a major upgrade to DSS with major new features.
The DSS Deployer provides a unified environment for fully-managed production deployments of both projects and API services. It allows you to have a central view of all of your production assets, to manage CI/CD pipelines with testing/preproduction/production stages, and is fully API-drivable.
For more details, please see Production deployments and bundles.
Interactive scoring is a simulator that enables any AI builder or consumer to run “what-if” analyses (i.e., qualitative sensibility analyses) to get a better understanding of what impact changing a given feature value has on the prediction by displaying in real time the resulting prediction and the individual prediction explanations.
For more details, please see Interactive scoring.
Dash by Plotly is a framework for easily building rich web applications. DSS now includes the ability to write, deploy and manage Dash webapps. Dash joins Flask, Bokeh and Shiny as webapps building frameworks to help data scientists go much further than simple dashboards and provide full interactivity to users.
For more details, please see Dash web apps.
A very frequent data wrangling use case is to join datasets with “almost equal” data. The new “fuzzy join” recipe is dedicated to joins between two datasets when join keys don’t match exactly. It handles inner, left, right and outer fuzzy joins, and handles text, numerical and geographic fuzziness.
For more details, please see Fuzzy join: joining two datasets
In Data Preparation, you can now highlight a part of a cell in order to automatically generate suggestions to extract information “similar” to the one you highlighted. You can then add other examples to guide the automated pattern builder of DSS, and choose the pattern that provides you with the best results.
ML Diagnostics help you detect common pitfalls while training models, such as overfitting, leakage, insufficient learning and such. It can suggest possible improvements.
For more details, please see ML Diagnostics
Model assertions streamline and accelerate the model evaluation process, by automatically checking that predictions for specified subpopulations meet certain conditions. You can automatically compare “expected predictions” on segments of your test data with the model’s output. DSS will check that the model’s predictions are aligned with your business judgment.
For more details, please see ML Assertions
It is now possible to distribute the training of a single model over multiple containers. Dataiku will automatically distribute all the points of the hyperparameter search. The distribution happens transparently, leveraging Kubernetes. No additional setup is required.
Distributed hyperparamter search permits vastly increased depth and precision of hyperparameter search while keeping an acceptable time for training.
It is now possible to fetch Jupyter notebooks from existing Git repositories, and to push them back to their origin. Pulls and pushes can be made notebook-per-notebook or for a group of notebooks.
For more details, please see Importing Jupyter Notebooks from Git
Wikis can now be exported to PDF, either on a per-article basis or globally.
For more details, please see Wikis
Evaluating the fairness of machine learning models has been a topic of both academic and business interest in recent years. Before prescribing any resolution to the problem of model bias, it is crucial to learn more about how biased a model is, by measuring some fairness metrics. The model fairness report provides you with assistance in this measurement task.
For. more details, please see Model fairness report
DSS now features an experimental real-time processing framework, notably targeting Kafka and Spark Structured Streaming.
For more details, please see Streaming data
DSS now officially supports Azure Synapse (dedicated SQL pools)
For more details, please see Azure Synapse
DSS brings a lot of new capabilities for date preparation:
New visual prepare processors for incrementing or truncating dates, and for finding differences between dates
New ability to delete, keep or flag rows based on various time intervals
Better date filtering capabilities for Explore view
For more details, please see Managing dates
The formula editor has been strongly enhanced with better code completion, inline help for all functions and features, and better examples.
For more details, please see Formula language
DSS now supports Spark 3.
It is also now possible to use
SparkSession in Pyspark code
DSS now supports Python 3.7
You can now create Python 3.7 code envs. In addition, on Linux distributions where Python 3.7 is the default, DSS will automatically use it.
In addition, new DSS setups will now use Python 3.6 or Python 3.7 as the default builtin environment.
In Python 3.7,
async is promoted to a reserved keyword and thus cannot be used as a keyword argument in a method or a function anymore. As a consequence, the DSS
Scenario API is replacing the
async keyword argument, formerly used in some methods, by the
asynchronous keyword argument. Please make sure to update uses of the
Scenario class accordingly if running Python scenarios or Python scenario steps with Python 3.7.
Impacted methods are:
DSS now comes with the Snowflake JDBC driver and native Spark connector builtin. You do not need to install JDBC drivers for Snowflake anymore.
The time-based trigger in scenario has been strongly enhanced with the following capabilities:
Ability to show and handle triggering times in all timezones, not only server timezone
Ability to run every X hours instead of only every hour
Ability to run every X days instead of only every day
Ability to run every X week instead of only every week
Ability to run every X months instead of only every month
For once every X month triggers, ability to run on “last Monday” or “third Tuesday”
Ability to set a starting date for a trigger
SQL recipes can now work without an input dataset. The recipe will run in the connection of the output dataset.
For SQL recipes with both inputs and outputs, it is now possible to enable “cross-connection” handling while using the connection of the output (previously, only inputs could be selected).
You can now grant access to projects to individual users, in addition to groups.
You can now zoom and pan on the flow with the keyboard, and zoom and reset the zoom with dedicated buttons.
The “Build” dialog now supports variables expansion for partitioned datasets
The “Explicit Values” partition dependency function now supports variables expansion
In many situations, it is expected that the schema of a Flow input dataset will change frequently, and that these changes should be accepted and their impacts propagated without further manual intervention.
In order to ease the situations, DSS 9 introduces two new scenario steps:
“Reload dataset schema” to reload the schema of an input dataset from the underlying data source
“Propagate schema” to perform an automated schema propagation across the Flow.
These steps should usually be used before a recursive Build step.
Snowflake: the JDBC driver and Spark connector are now preinstalled and do not need manual installation anymore
Snowflake: added post-connect statements
Snowflake: added support for Snowflake -> S3 fast-path when the target bucket mandates encryption
Vertica: fixed partitioning outside of the default schema
PostgreSQL: the builtin PostgreSQL has been updated to a more recent version, which notably fixes issues with importing tables on PostgreSQL 12
S3: It is now possible to force “path-style” rather than “virtualhost-style” S3 access. This is mainly useful for “S3-compatible” storages.
BigQuery: fixed ability to use “high throughput” mode for the JDBC driver
Added detection of changes in editable datasets, which will now properly trigger rebuilds
Fixed missing refresh of “Building” indicator with flow zones
Fixed wrong “current” flow zone remembered when browsing
Prepare on Snowflake: fixed handling of accentuated column names
Fixed handling of “contains” formula operator on Impala when the string to match contains
Fixed “Use an existing folder” on download recipe
Added variables expansion on “Flag rows where formula matches” processor
The Evaluation recipe can now output the cost matrix gain
PMML export now supports dummy-encoded variables
Custom models can now access the list of feature names
Fixed failure scoring on SQL with numerical features stored as text
Text features: fixed stop words when training in containers
Fixed warning in Jupyter when exporting a model to a Jupyter notebook
Added ability to define a class inline for a custom model
Switched XGBoost feature importances to use the “gain” method (library default since version 0.82)
AKS: fixed node pool creation with a zero minimum number of nodes
AKS: added ability to select the system node pool
Disabling an already-disabled Kubernetes-based API deployment will not fail anymore
Fixed webapps on Kubernetes leaking “Deployment” objects in Kubernetes
Fixed possible failures deploying webapps due to invalid Kubernetes labels
Fixed possible failures running Spark pipelines due to invalid Kubernetes labels
Added support for CUDA 11 when building base images
Fixed validation of Hive recipes containing “UNION ALL” on HDP 3 and EMR
Fixed “Back” button when going to the catalog
Fixed tags filtering with spaces in tag names
Fixed links to DSS items when putting a wiki page on the home page
Fixed display of Scala notebooks in Catalog
Display in project home page when triggers are disbaled
Added ability for administrators to force the SMTP sender, preventing users from setting it
Performance improvements on “Automation monitoring” pages
Fixed handling of records containing
\rin Python when using
Fixed code env rebuilding if the code env folder had been removed
Fixed “with_default_env” on project settings class
Fixed ability to delete a code env if a broken dataset exists
Added a safety against potential memory overruns when requesting too high number of bins
Fixed sort with null values on PostgreSQL
SQL notebooks: added explain plans directly in SQL notebooks
Jupyter: Fixed “File > New” and “File > Copy” actions
Fixed renamed notebooks not appeareding in “recent elements”
Fixed icon of SQL notebooks in “recent elements”