Recipes for partitioned datasets

When a recipe is used to compute a partitioned dataset and/or to compute from a partitioned dataset, the recipe only processes the involved partitions and does not access the full datasets.

If a recipe computes several datasets:

  • All output datasets must have the same partitioning schema
  • The same partition will be computed for all target datasets.

A single invocation of a recipe will therefore:

  • Read one or several partitions of the input datasets
  • Write exactly one partition for each output dataset

Dataiku DSS automatically computes the partitions of the input datasets depending on the requested output partitions using the partition dependencies mechanism. For more information, please refer to DSS concepts and Specifying partition dependencies

See Partitioned Hive recipes and Partitioned SQL recipes about how to read only the input partitions and write to the output partition.