Partitioning variables substitutions

When a recipe involves partitioned datasets, some variables are made available to the code that you write for this recipe, to help you manage partitions.

Substituting variables

SQL

Variables are replaced in your code using the $VARIABLE_NAME syntax. For example, if you have the following code:

SELECT * from mytable WHERE condition='$DKU_DST_country';

with a variable DKU_DST_country which has value France, the following query will actually be executed:

SELECT * from mytable WHERE condition='France';

Hive

Variables are replaced in your code using the ${hiveconf:VARIABLE_NAME} syntax. For example, if you have the following code:

SELECT * from mytable WHERE condition='${hiveconf:DKU_DST_date}';

with a variable DKU_DST_date which has value 2020-12-21, the following query will actually be executed:

SELECT * from mytable WHERE condition='2020-12-21';

Python

Since read and write is done through Dataiku DSS, you don’t need to specify the source or destination partitions in your code for that, using “get_dataframe()” will automatically give you only the relevant partitions.

For other purposes than reading/writing dataframes, all variables are available in a dictionary called dku_flow_variables in the dataiku module. Example:

import dataiku
print("I am working for year %s" % (dataiku.dku_flow_variables["DKU_DST_YEAR"]))

R

Flow variables are retrieved using the dkuFlowVariable(variableName) function

library(dataiku)
dkuFlowVariable("DKU_DST_country")

Available variables