DSS 13 Release notes

Migration notes

How to upgrade

Pay attention to the warnings described in Limitations and warnings.

Migration paths to DSS 13

Limitations and warnings

Automatic migration from previous versions is supported (see above). Please pay attention to the following cautions, removal and deprecation notices.

Cautions

XGBoost models migration

DSS 13.0 now uses XGBoost 1.5 in the default VisualML setup.

Existing models can still be used for scoring without retraining if Optimized scoring is used. Note that in particular, row-level explanations cannot use Optimized scoring.

If Optimized scoring cannot be used, you will need to either:

Python 2.7 builtin env removal

Note

If you are using Dataiku Cloud or Dataiku Cloud Stacks, you do not need to pay attention to this

Very few Dataiku Custom customers are affected by this, as this was a very legacy setup.

Python 2.7 support for the builtin env of Dataiku was deprecated years ago and is now fully removed. If your builtin env was still Python 2.7, it will automatically migrate to Python 3. This may affect:

  • Existing code running on the builtin env, that may need adaptations to work in Python 3.

  • Machine Learning models, that will usually need to be retrained

Support removal

Some features that were previously announced as deprecated are now removed or unsupported

  • Hadoop distributions support

    • Support for Cloudera CDH 6

    • Support for Cloudera HDP 3

    • Support for Amazon EMR

  • OS support

    • Support for Red Hat Enterprise Linux before 7.9

    • Support for CentOS 7 before 7.9

    • Support for Oracle Linux before 7.9

    • Support for SUSE Linux Enterprise Server 15, 15 SP1, 15 SP2

    • Support fot CentOS 8

  • Support for Java 8

  • Support for Python 2.7

Deprecation notices

DSS 13 deprecates support for some features and versions. Support for these will be removed in a later release.

  • Support for Ubuntu 18.04

  • Support for RedHat 7

  • Support for CentOS 7

  • Support for Oracle Linux 7

  • Support for SuSE Linux 12

  • Support for SuSE Linux 15 SP3

  • Support for Scala notebook for Spark

  • Support for multiple Hadoop clusters

Version 13.1.2 - August 29th, 2024

DSS 13.1.2 is a bugfix release

Coding

  • Fixed authentication failure when connecting using python client running inside DSS and connecting to another DSS running 13.0 and below.

Spark

  • Fixed a failure on Spark jobs that need to retrieve credentials

Version 13.1.1 - August 26th, 2024

DSS 13.1.1 is a security and bugfix release

Recipes

  • Prepare recipe: Fixed failure when executing a “Compute difference between dates” step using SQL engine

Coding and API

  • Fixed as_langchain_* methods in a non-containerized kernel on Knowledge Banks built by another user

Security

Version 13.1.0 - August 14th, 2024

DSS 13.1.0 is a significant new release with both new features, performance enhancements and bugfixes.

New feature: Managed LLM fine-tuning

Note

This feature is available in Private Preview as part of the Advanced LLM Mesh Early Adopter Program

LLM Fine-tuning allows you to fine-tune LLMs using your data.

Fine-tuning is available:

  • Using a visual recipe for local models (HuggingFace) and OpenAI models

  • Using Python recipes for local models (HuggingFace)

For more information, please see Model fine-tuning

New feature: Gauge chart

The Gauge chart, also known as speedometer, is used to display data along a circular axis to demonstrate performance or progress. This axis can be colored to offer better segmentation and clarity.

../_images/gauge.png

New feature: Chart median and percentile aggregations

Charts (and pivot tables) can now display median, as well as arbitrary percentiles of numerical values

New feature: enhanced Python dataset read API

The Python API to read datasets has been enhanced with numerous new capabilities and performance improvements.

The new fast-path reading Dataset.get_native_dataframe method performs direct read from data sources. This provides massive performance improvements, especially when reading only a few columns out of a wide dataset. Fast-path reading is available for:

  • Parquet files stored in S3

  • Snowflake tables/views

For regular reading, the following have been added:

  • Ability to disable some thorough data checking, yielding performance improvements up to 50%

  • Ability to read some columns as categoricals to reduce memory usage (depending on the data, can be up to 10-100 times lower)

  • Ability to use pandas “nullable integers”, allowing to read integer columns with missing values as integers (rather than floating-point values)

  • Ability to precisely match integer types to reduce memory usage (up to 8x for columns containing only tinyints)

  • Added ability to completely override dtypes when reading

For samples and documentation, please see the Developer Guide

New feature: Builtin Git merging

In addition to the existing ability to push projects and branches to remote Git repositories and perform merges there, you can now perform Git merges directly within Dataiku, including the ability to view and resolve merge conflicts

Behavior change: handling of schema mismatch on SQL datasets

DSS will now by default refuse to drop SQL tables for managed datasets when the parent recipe is in append mode. In case of schema mismatch, the recipe now fails. This behavior can be reverted in the advanced settings of the output dataset

LLM Mesh

  • New feature: Added local models for toxicity detection (This feature is available in Private Preview as part of the Advanced LLM Mesh Early Adopter Program)

  • New feature: Added support for Tools calling (sometimes called “function calling”) in LLM API and Langchain wrapper. This is available for OpenAI, Azure OpenAI, Bedrock (for Claude 3 & 3.5), Anthropic, and Mistral AI connections

  • New feature: Added support for Gemma, Phi 3, Llama 3.1 8B & 70B, and Mistral NeMo 12B models on local Huggingface connection

  • Pinecone: Added support for Pinecone serverless indices

  • In API, added support for presencePenalty and frequencyPenalty for OpenAI, Azure OpenAI and Vertex

  • In API, added support for logProbs and topLogProbs for OpenAI, Azure OpenAI and Vertex (PaLM only)

  • In API, added support for logitBias for OpenAI and Azure OpenAI

  • In API, added finishReason to LLM responses, for LLMs/providers that support it

  • Added Langchain wrappers for embedding models in the public Python API (was already available in the internal Python API). Using the API client, you can now use the LLM Mesh APIs on embedding models with Langchain from outside Dataiku.

  • Added support for Embedding models in Snowflake Cortex connection

  • Improved API support for stop sequences on local models run with vLLM

  • Fixed issue in complete prompt display for RAG LLMs in Prompt Studio

Machine Learning

  • Isolation Forest: Made training up to ~4 times faster (using parallelism and sparse inputs)

  • Isolation Forest: Added support for “auto” contamination

  • Model Documentation Export: Added support for “Feature effects” chart from feature importance

  • Added ability to not specify an image input features in What-if

  • Improved performance for training of partitioned models with large number of partitions

  • Improved cleanup of temporary data when retraining partitioned models (reduce disk consumption)

  • Improved pre-training validation of ML Overrides and Assertions

  • Fixed computation of optimal threshold on binary classification models using k-fold cross-test

  • Fixed inability to upload 2 different images as input features in What-if

  • Fixed possible broken forecasting models when a model forecasts NaN values

  • Fixed a possible issue when deleting a partitioned model’s version while it was being retrained

  • Fixed some notebook model exports when using scikit-learn 1.2

MLOps

  • Added the possibility to do a full update in “Update API deployment” scenario step

  • Added the possibility to include or not editable datasets when creating bundles

  • Improved MLflow import code-environment errors reporting

  • Fixed the sorting on metrics in Model Evaluation Stores

  • Fixed the Monitoring Wizard to take into account deployment level auto logging settings

Charts and Dashboards

  • Dashboards: Added background opacity settings for chart, text and metrics tiles

  • Dashboards: Added border and title styling options to tiles

  • Dashboards: Added title styling options to dashboard pages

  • Dashboards: Added the ability to hide dashboard pages

  • Dashboards: Improved loading performance

  • Dashboards: Fixed dashboard’s save button wrongly becoming active when selecting a tile

  • Filters: Added support for alphanum filter facets on numerical columns in SQL, and the possibility to include/exclude null values

  • Scatter plots: Improved axis format for dates by displaying time when range is less than a single day

  • Scatter plots: Increased max scale limit when zooming with rectangle selection

  • Pivot tables: Persist column sizes, as well as folded state of rows or columns

  • Line charts: Fixed the “show X axis” option in line charts with a date axis

  • Added support for numeric custom aggregations used in the chart in reference lines displayed aggregations

  • Added an “auto” mode for the “one tick per bin” option, automatically switching to the most appropriate mode depending on the number of bins

  • Fixed locked tick options (interval/number) after switching between charts

  • Fixed the “Add insight (Add to dashboard)” action for chart insights

  • Fixed Y axis title options disappearing in vertical bar charts when there are 2 or more measures

  • Fixed broken X axis when switching to a dimension that doesn’t support log scale from a dimension where it was supported and activated

  • Fixed empty dashboard wrongly considered as modified

  • Fixed dashboard’s insights associated to deleted datasets loading forever

Governance

  • New feature: New Global Timeline: “Instance Timeline” page tracking all the item’s events

  • New feature: Custom filters are now available on all pages and various improvements were brought:

    • Added ability to filter on application template and application instance flags

    • Added support for search on reference fields

    • Added ability to filter on node type and node ID

    • Added ability filter on DSS tags

    • Added ability to filter Model versions and Bundles on deployment stages

    • Added text search filter for all types of fields

  • Added execution of hooks on govern action

  • Added ability to copy/paste view components in the Blueprint Designer

  • Added an option in the Blueprint Designer to allow only selection, only creation, or both, on reference fields

  • Added visual indicators of settings validation in the Blueprint Designer

  • Added validation of blueprint versions forked from the standard to detect issues that could break standard govern features

  • Added the synchronization of DSS project’s “short description” field and the ability to search on it

  • Fixed history of deleted signoff

  • Fixed sticky error panel on next user action

  • Fixed artifact create permission to not imply read permission anymore

Datasets and Connections

  • Fixed jobs writing multiple partitions on an SQL dataset failing when executed in containerized mode

  • Fixed an issue when navigating away from an ElasticSearch dataset before the sample is displayed

Data Quality

  • Added ability to publish Data Quality status of a dataset or a project to a dashboard

  • Added multi-column support to column validity, aggregation in range/set & unique rules

  • Added ability to create, view and edit Data Quality templates

  • Fixed Metrics computed with spark on HDFS partitioned datasets producing incorrect results

Flow

  • Added ability to rename a recipe directly from the Flow

  • Added ability to export the Flow documentation (without screenshots) when the graphics-export feature is not installed.

  • Added support for Spanish and Portuguese languages to AI Explain

Recipes

  • New feature: Prepare: val / strval / numval formula functions now support an additional argument to specify an offset. This allows retrieving values from previous rows to compute for example sliding averages or cumulative sums. This feature is only available on the DSS engine.

  • New feature: Prepare: The new “Split into chunks” step can split a text into multiple chunks, with one new row for each chunk.

  • Prepare: Added a warning on recipes containing both Filter and Empty values steps, which might lead to unexpected output

  • Prepare: Fixed date difference step returning incorrect results on the Hive engine

Scenario and automation

  • New feature: Ability to send datasets with conditional formating, directly inline in email body

  • Added a “Build flow outputs” option in scenarios

  • Added ability to build a flow zone in scenarios

Deployer

  • New feature: added support for Snowflake Snowpark external endpoints in Unified Monitoring

  • Added governance status in Unified Monitoring

  • Added the possibility to define a specific connection for the monitoring of a managed infrastructure

  • Added the possibility to define an “API monitoring user” to support “per-user” connections in Unified Monitoring

  • Added support for labels and annotations in API deployer K8S infrastructure, optionally overridable in related deployments

  • Fixed the status of endpoints of external scopes in Unified Monitoring when there is an authentication issue

  • Fixed external scopes being monitored even when disabled

Coding

  • Added methods to interact with SQL notebooks (DSSProject.list_sql_notebooks, DSSProject.get_sql_notebook, …)

Code Studios

  • Streamlit: Fixed forwarding of query parameters

Notebooks

  • Fixed HTML export of Jupyter notebooks with Python 3.7

Security

  • Added ability to authenticate on the API using a Bearer token (in addition to Basic authentication)

  • Added the ability to store API keys in irreversible hashed form

  • Fixed refresh tokens being requested too often

Cloud Stacks

  • Fixed HTTP proxy setup action not properly encoding passwords containing special characters

  • HTTP proxy setup action now sets the following environment variables: http_proxy, https_proxy and no_proxy, in addition to their uppercase equivalents

  • AWS: Switched to IMDSv2 to access instance metadata

  • Added ability to change the internal ports for DSS (not recommended, for very specific cases only)

Misc

  • Reduced the number of notifications enabled by default for new users

  • Fixed AI services when using authenticated proxies

  • Fixed trial seats when using authenticated proxies

Version 13.0.3 - August 1st, 2024

DSS 13.0.3 is a bugfix release

Dataiku Applications

  • Fixed the “Download file” tile

Charts

  • Fixed rectangle zoom when log scale option is enabled

Spark and Kubernetes

  • Fixed Spark engine on Azure datasets when DSS is installed with Java 17

Version 13.0.2 - July 25th, 2024

DSS 13.0.2 is a feature and bugfix release

LLM Mesh

  • New feature: AWS Bedrock: Added support for Claude 3.5 Sonnet

  • New feature: AWS Bedrock: Added support for Mistral models (Small, 7B, 8x7B, Large)

  • New feature: AWS Bedrock: Added support for Llama3 models (8B, 70B)

  • New feature: AWS Bedrock: Added support for Cohere Command R & R+

  • New feature: AWS Bedrock: Added support for Titan Embedding V2 and Titan Text Premier

  • New feature: AWS Bedrock: Added support for image input on Claude 3 and Claude 3.5

  • New feature: OpenAI: Added support for GPT-4o mini

  • New feature: Added support for generic chat and embedding models on AzureML

  • Added ability to Test custom LLM connections

  • Added ability to clear Knowledge Banks

  • Improved performance of builtin RAG LLMs

  • Improved performance of PII detection

  • HuggingFace: Improved performance of HuggingFace models download

  • HuggingFace: Increase default number of output tokens when using vLLM

  • Gemini: Fixed spaces wrongfully inserted in some LLM responses when using Gemini

  • Snowflake: Fixed Snowflake LLM models listed even when not enabled in the Snowflake Cortex connection

  • Limited ChromaDB version to prevent issues with ChromaDB 0.5.4

Dataset and Connections

  • New feature: Added support for YXDB file format

  • Fixed error message not displayed when previewing an indexed table on which users have no permission

  • Fixed scientific numbers written using the French format (example: “1,23e12”) not properly detected as “Decimal (Comma)” meaning

  • Disabled unimplemented normalization mode for regular expression matching custom column filter

  • Added statistics about length of alphanumerical columns in the Analyze dialog

  • Sharepoint built-in connection: Fixed UnsupportedOperationException returned for some lists

  • BigQuery: Added ability to configure connection timeouts

  • BigQuery: Added ability to include BigQuery datasets when importing/exporting projects or bundles.

  • BigQuery: Fixed error happening when parsing dates with timezone written using the short format (ex: “+0200”)

  • Athena: Fixed wrongful escaping of underscores in table names

Flow

  • When building downstream, correctly skip Flow datasets or models that are marked as “Explicit build” or “Write protected”

Recipes

  • Prepare: Improved wording of summary of empty values step when configured with multiple columns

  • Prepare: Fixed casting issue in Synapse/SQLServer when using a Filter by Value step on a Date column with SQL engine

  • Window: Disabled concat aggregation on Redshift as it is not supported by this database

Charts and Dashboards

  • Fixed Scatter Multi-Pair chart with DSS engine for some combinations of sample size and “Number of points” setting

  • Fixed incorrectly enabled save button in unmodified chart insight

  • Fixed dataset insight creation from the insight page

  • Fixed filtering in dashboard insights from workspace

  • Fixed reference lines sometimes getting doubled on scatter charts

Machine Learning

  • Multimodal models: Improved image embedding performance

  • Fixed serialization of very big models (>4GB)

  • Fixed possible UI slowness when a partitioned model has many partitions in its versions

  • Fixed possible UI issues when creating a clustering model with a hashed text feature

  • Fixed incorrect median prediction value for classification models with sample weights

Coding and API

  • Added ability to retrieve the trial status of users with the Python API

  • Fixed DSSDataset.iter_rows() not correctly returning an error in case of underlying failure

  • Fixed x0b and x0c characters in data producing incorrect results when reading datasets using Python API

  • Fixed DeprecationWarning: invalid escape sequence warnings reported by Python 3.7/3.8/3.9 when importing dataiku package

Code studios

  • Fixed Gradio block as webapp wrongly reported as timed-out after initial start

  • Fixed IDE block failing if python 3.7 is not available in the base image

  • Fixed Streamlit block failing with manual base image with AlmaLinux 8.10 when R is not installed

MLOps

  • Fixed interactive drift score computation

  • Fixed endpoint listing on Azure ML external models when in “environment” authentication mode

  • Fixed text drift section when using interactive drift computation buttons

  • Lowered the log level for too verbose External Models on AzureML

  • Fixed support for “Trust all certificates” when querying the MLflow artifact repository

  • Fixed code environment remapping for Model Evaluations

Webapps

  • Unified direct-access URL of webapps to /webapps/

Deployer & Automation

  • Fixed inability to edit additional code env settings in automation node

  • Fixed failure installing plugins with code env without a requirements.txt file on automation node

  • In Unified Monitoring, added support for new monitoring metrics available on Databricks external scope

  • Fixed API service error when switching from “multiple generation” with hash-based strategy to “single generation”

  • Added output of the logs to apimain.log file for containerized deployments even when using the “redirect logs to stdout” setting

  • Fixed error notification after a successful retry of an API service deployment

  • Fixed API deployer infrastructure creation when there are missing parameters

  • Fixed support for “Trust all certificates” settings in deployer hooks

Governance

  • Added the ability for an admin to invalidate the configuration cache

  • Prevent creation of items from backreference with blueprints that are not compliant with the backreference

  • Removed the “Open creation page” button for the creation of items from backreferences

  • Prevent the creation of Business Initiatives or Governed projects from inactive blueprint versions

  • Improved performances of table pages, especially when there is a matrix or kanban view

  • Fixed the typing of external deployments

  • Fixed disappearance of artifact table header when toggling edit mode

Performance & Scalability

  • Fixed possible hang when changing connections on a non-responsive data source

  • Fixed possible failures starting Jupyter notebooks when the Kubernetes cluster has no resources available

Security

  • Fixed DSS printing in the logs the whole authorization header (which might contains sensitive data) in case of unsupported authorization method

  • Fixed printing of the “token” field when using Snowpark with OAuth authentication

Miscellaneous

  • Fixed deletion of API keys in the API Designer that could delete the wrong key

  • Added support for CDP 7.1.9 with Java 17

Version 13.0.1 - July 16th, 2024

DSS 13.0.1 is a bugfix and security update. (12.6.5) denotes fixes that were also released in 12.6.5, which was published after 13.0.0

LLM Mesh

  • Improved parallelism and performance of locally-running HuggingFace models

Recipes

  • Join: Fixed loss of pre and post filter when replacing dataset in join (12.6.5)

  • Join: Fixed issue when doing a self-join with computed columns (12.6.5)

  • Prepare: Fixed help for “Flag rows with formula” (12.6.5)

  • Prepare: Fixed failing saving recipe when it contains certain types of invalid processors (12.6.5)

  • Stack: Fixed addition of datasets in manual remapping mode that caused issues with columns selection (12.6.5)

Charts & Dashboards

  • Re-added ability to view page titles in dashboards view mode (12.6.5)

  • Fixed filtering in dashboard on charts with zoom capability (12.6.5)

  • Fixed possible migration issue with date filters (12.6.5)

  • Fixed migration issue with alphanum filters filtering on “No value” (12.6.5)

  • Fixed filtering on “No value” with SQL engine (12.6.5)

  • Restore larger font size for metric tiles (12.6.5)

  • Fixed display of Jupyter notebooks in dashboards (12.6.5)

  • Added safety limit on number of different values returned for numerical filters treated as alphanumerical (12.6.5)

  • Fixed migration of MIN/MAX aggregation on alphanumerical measures

Scenarios and automation

  • Added support for Microsoft teams Workflows webhooks (Power Automate) (12.6.5)

Code Studios

  • Fixed Code Studios with encrypted RPC

Cloud Stacks

  • Fixed Ansible module dss_group

Elastic AI

  • Re-add missing Git binary on container images

Performance

  • Fixed performance issue with most activities in projects containing a very large number of managed folders (thousands) (12.6.5)

  • Improved short bursts of backend CPU consumption when dealing with large jobs database (12.6.5)

  • Fixed possible unbounded CPU consumption when renaming a dataset and a code recipe contains extremely long lines (megabytes) (12.6.5)

  • Visual ML: Clustering: Fixed very slow computation of silhouette when there are too many clusters (12.6.5)

Security

Misc

  • Fixed Dataset.get_location_info API

  • Fixed sometimes-irrelevant data quality warning when renaming a dataset (12.6.5)

  • Fixed EKS plugin with Python 2.7 (12.6.5)

  • Fixed wrongful typing of data when exporting SQL notebook results to Excel file (12.6.5)

Version 13.0.0 - June 25th, 2024

DSS 13.0.0 is a major upgrade to DSS with major new features.

Major new feature: Multimodal embeddings

In Visual ML, features can now leverage the LLM Mesh to use embeddings of images and text features

Major new feature: Deploy models to Snowflake Snowpark Container Services

In the API deployer, you can now deploy API services to Snowpark Container Services

Major new feature: Databricks Serving in Unified Monitoring

Databricks Serving endpoints can now be monitored from Dataiku Unified Monitoring

LLM Mesh

  • New feature: Added support for token streaming on local models (when using vLLM inference engine)

  • Added Langchain wrappers in the public Python API (was already available in the internal Python API). Using the API client, you can now use the LLM Mesh APIs from Langchain from outside Dataiku.

  • Added ability to share a Knowledge Bank to another project

  • Added ability to use a custom endpoint URL for OpenAI connections

  • Added ability to deep-link to a prompt inside a prompt studio

  • Added support for embedding models in SageMaker connections

  • Improved error reporting when a call to a RAG-augmented model fails

  • Faster local inference for Llama3 on Huggingface connections

  • Misc improvements to the prompt studio UI

  • Show a job warning when there were errors on some rows of a prompt recipe

  • Fixed erroneous accumulation of metadata when rebuilding a Qdrant Knowledge Bank

  • Fixed Flow propagation when it passes through a Knowledge Bank

  • Fixed RAG failure when using Llama2 on SageMaker

  • Fixed raw prompt display on custom LLM connections

Machine Learning

  • New feature: Added the HDBSCAN clustering algorithm.

  • Improved Feature effects chart (in feature importance) by coloring the top 6 modalities of categorical features.

  • Sped up computation of individual prediction explanations and feature importance.

  • Sped up retrieval of the active version of a Saved Model with many versions.

  • Fixed possible hang when creating an automation bundle including a Saved Model with many versions.

  • Fixed unclear error message in scoring recipe when the input dataset is too small to use as background rows for prediction explanation.

  • Fixed incorrect number of cluster for some AutoML clustering models.

  • Fixed incorrect filtering of time series when a multi-series forecasting model is published to a dashboard.

  • Fixed a rare breakage in feature importances on some models.

Charts & Dashboards

  • New feature: Added MAX and MIN aggregations for dates (as measures in KPI and pivot table charts, in tooltips and in custom aggregations)

  • New feature: Added the option to connect the points on scatter plot and multi-pair scatter plot

  • Added grid lines in Excel export

  • Added grid lines for cartesian charts

  • Added ability to configure max number of points in scatter plots

  • Added ability to customize the display of empty values in pivot tables

  • Added ability to set insight name for charts

  • Improved loading performance of charts with date dimensions

  • Fixed update of points size in scatter plots

  • Fixed rendering of charts when collapsing / expanding the help center

  • Fixed dimensions labels on treemaps

  • Fixed cache for COUNT aggregation

  • Fixed “link neighbors” option in line charts with SQL engine

  • Fixed “show y=x” option on scatter plot

  • Fixed dashboard’s filters when added directly after a dataset

  • Fixed “all values” filter option with SQL engine

  • Fixed dashboard filters when using mixed cased columns names on a database which is case insensitive on columns names

  • Fixed excluding cross-filters for numerical dimensions using “Treat as alphanumerical”

  • Fixed link to insight from dashboards included into workspaces

  • Improved Scatter plot performance

  • Fixed filtering on “No value” in alphanunerical filters with in-database engine

  • Fixed dashboard’s filters migration script

  • Fixed intermittent issue on Chrome browser which prevents rendering of Jupyter notebook in dashboards

  • Fixed error when disabling force inclusion of zero option in time series chart

Datasets

  • New feature: Sharepoint Online connector. DSS can now connect to Microsoft Sharepoint Online (lists and files) without requiring an additional plugin

  • Updated MongoDB support to handle versions from 3.6 up to 7.0, including Atlas and CosmosDB

  • Added read support for CSV and Parquet files compressed with Zstandard (zstd)

  • Added experimental support for Yellowbrick in JDBC connection

Data Quality

  • New feature: Added ability to create templates of Data Quality rules to reuse them across multiple datasets

MLOps

  • New feature: Added text input data drift analysis (standalone evaluation recipe only), relying on LLM Mesh embeddings

  • New feature: Added model export to Databricks Registry

  • Added the ability to create dashboard insights from the latest Model Evaluation in a Model Evaluation Store

  • Added the possibility to use plugins code environments in MLflow imported models

  • Added support for global proxy settings in Databricks managed model deployment connections

  • Added support for MLflow 2.13

  • Fixed incorrect ‘python_version’ field in MLflow exported models

  • Fixed listing of versions on Databricks registries when the model has a quote in its name

  • Fixed incorrect warnings in Evaluation recipe’s dataset diagnosis

Flow

  • Added ability to build Flows even if they contains loops

Recipes

  • Stack: Fixed wrong schema when stacking two datasets both containing a column of type string but with different maximum length

Deployer

  • API Deployer: Added a ‘run_test_queries’ endpoint in the public API to execute the test queries associated with a deployment.

  • Projects Deployer: Added the ability to define “additional content” also in the default configuration of bundles (not just directly on existing bundles)

  • Unified Monitoring: Added support for Unified Monitoring on automation nodes

  • Unified Monitoring: Added Data Quality status in Unified Monitoring

  • Unified Monitoring: Endpoint latency now displays 95th percentile

  • Unified Monitoring: display projects names rather than keys

  • Unified Monitoring: Fixed possible issue when opening project details

  • API designer: Fixed API designer test queries hanging in case of test server bootstrap failure

  • Added the ability to define environment variables for Kubernetes deployments

  • Added an “External URL” option for Project & API deployer infrastructures.

  • API Node: Added new commands to apinode-admin to clean disabled services (services-clean) and unused code environment (__clean-code-env-cache).

Governance

  • New feature: Added ability to set filters on workflow and sign-off statuses

  • New feature: Added ability to use “negate” conditions in filters

  • New feature: Added visibility conditions based on a field for views

  • New feature: Added ability to add additional role assignment rules at the artifact level

  • Removed the workflow step prefix to use only the step name defined in the blueprint version

  • Improved the display of the Dataiku instance information

  • Added project’s cost rating to the overview

  • Fixed multi-selector search filters

  • Fixed possible deadlock in hooks

  • Fixed artifact creation to be possible with just creation permission

  • Fixed file upload being cancelled on browser tab change

  • Fixed password reset for Cloud Stacks deployments

Statistics

  • Time series: when using Quarter or Year granularity, added ability to select on which month to align

Coding

  • Added support for Pandas 2.0, 2.1 and 2.2

  • Added support for conda for Python 3.11 code environments

  • Fixed write_dataframe failing in continuous Python for pandas >= 1.1

  • Upgraded Jupyter notebooks to version 6

Code studios

  • Improved performance when syncing a large number of files at once

  • Added support for ggplot2 in RStudio running inside Code Studios

Elastic AI

  • EKS: Added support for defining nodegroup-level taints

Cloud Stacks

  • Azure: Fixed deploying a new instance from a snapshot if the disk size was different from 50GB

  • Added more information (Ansible Facts) for use in Ansible setup actions

Dataiku Custom

Note: this only concerns Dataiku Custom customers

  • Added support for the following OS

    • RedHat Enterprise Linux 9

    • AlmaLinux 9

    • Rocky Linux 9

    • Oracle Linux 9

    • Amazon Linux 2, 2023

    • Ubuntu 22.04 LTS

    • Debian 11

    • SUSE Linux Enterprise Server 15 SP5

Security

  • Disabled HTTP TRACE verb

  • Fixed LDAP synchronization correctly denying access to DSS to a user that is no longer in the required LDAP groups but failing to synchronize the DSS groups for this user.

Misc

  • Switched default base OS for container images to AlmaLinux 8

  • Fixed a rare failure to restart DSS after a hard restart/crash occurring during a configuration transaction

  • Plugin usage now takes shared datasets into account

  • Added audit message for users dismissing the Alert banner

  • Fixed relative redirect for standard webapps

  • Fixed failure with non-ascii characters in plugin configuration and local UIF execution