Delta Lake

Delta Lake is a file storage format on top of Parquet, that augments Parquet with the ability to perform updates and removals, and other database-oriented features.

Dataiku can read Delta Lake files and process them, either using Spark or any recipe.

Warning

Experimental: Support for Delta Lake is Experimental

Support for Delta Lake requires Spark integration to be done on the DSS instance. Delta Lake then appears as a file format.

Datasets in Delta Lake format can be stored on S3, Azure Blob Storage, Google Cloud Storage or HDFS

While Delta Lake datasets can be processed with any recipe, we strongly recommend processing them with Spark recipes

Delta Lake datasets that have underlying partitioning will be read unpartitioned.