Partitioning variables substitutions¶

When a recipe involves partitioned datasets, some variables are made available to the code that you write for this recipe, to help you manage partitions.

Substituting variables¶

SQL¶

Variables are replaced in your code using the $VARIABLE_NAME syntax. For example, if you have the following code:

SELECT * from mytable WHERE condition='$DKU_DST_country';

with a variable DKU_DST_country which has value France, the following query will actually be executed:

SELECT * from mytable WHERE condition='France';

Hive¶

Variables are replaced in your code using the ${hiveconf:VARIABLE_NAME} syntax. For example, if you have the following code:

SELECT * from mytable WHERE condition='${hiveconf:DKU_DST_date}';

with a variable DKU_DST_date which has value 2020-12-21, the following query will actually be executed:

SELECT * from mytable WHERE condition='2020-12-21';

Python¶

Since read and write is done through Dataiku DSS, you don’t need to specify the source or destination partitions in your code for that, using “get_dataframe()” will automatically give you only the relevant partitions.

For other purposes than reading/writing dataframes, all variables are available in a dictionary called dku_flow_variables in the dataiku module. Example:

import dataiku
print("I am working for year %s" % (dataiku.dku_flow_variables["DKU_DST_YEAR"]))

R¶

Flow variables are retrieved using the dkuFlowVariable(variableName) function

library(dataiku)
dkuFlowVariable("DKU_DST_country")

Available variables¶

Related to the target datasets¶

Variable name	Available if	Value	Examples
DKU_DST_dimensionName	For each dimension	Value of the dimension “dimensionName” for the current activity. For time dimensions, given using time partition identifier syntax.	France 2020-01-22
DKU_DST_YEAR	time partitioned	Value of the year (4 digits) for the time dimension.	2020
DKU_DST_MONTH	time partitioned (month, day or hour)	Value of the month (2 digits, from) 01 to 12) for the time dimension	01
DKU_DST_DAY	time partitioned (day or hour)	Value of the day of month (2 digits, from 01 to 31) for the time dimension	22
DKU_DST_DATE	time partitioned (day or hour)	Date for the time dimension, in yyyy-MM-dd format	2020-01-22
DKU_DST_HOUR	time partitioned (hour)	Value of the hour of day (2 digits, from) 00 to 23) for the time dimension.	21
DKU_DST_YEAR_1DAYAFTER …	the same variable is available	Value of the various date components variables for the day FOLLOWING the dimension value.	2020-01-23
DKU_DST_YEAR_1DAYBEFORE	the same variable is available	Value of the various date components for the day PRECEDING the dimension value	2020-01-21
… _7DAYSBEFORE … _7DAYSAFTER	Idem	Value of the various date components for the date 7 days PRECEDING or FOLLOWING the dimension value	2020-01-15
… _1HOURBEFORE … _1HOURAFTER	Idem	Idem

Related to the source datasets¶

Variable name	Available if	Value	Examples
DKU_SRC_datasetName_dimensionName	For each dimension of each dataset There is only one source partition for this dataset	The value of the dimension dimensionName for input dataset datasetName	2020-01-23
DKU_SRC_dimensionName	There is only one source dataset There is only one source partition for this dataset	The value of the dimension dimensionName for the single input dataset	2020-01-23
DKU_PARTITION_FILTER_datasetName	the recipe is an SQL recipe	filter for the partitions used by the recipe
DKU_PARTITION_FILTER	the recipe is an SQL recipe There is only one source dataset	filter for the partitions used by the recipe
DKU_SRC_FIRST_DATE	There is only one source dataset The dataset is time-partitioned	smallest partition id
DKU_SRC_LAST_DATE	There is only one source dataset The dataset is time-partitioned	biggest partition id

Additionally, if the source dataset has time dimensions, all variables DKU_SRC_datasetName (date/year/month/day/hour), DKU_SRC_datasetName (DATE/YEAR/MONTH/DAY/HOUR)_(timeshift) will be available subject to the same rules as for DKU_DST _

If there is only one input dataset, all DKU_SRC_datasetName_variable variables are also available in the DKU_SRC_variable shortcut.