Partitioning variables substitutions

When a recipe involves partitioned datasets, some variables are made available to the code that you write for this recipe, to help you properly manage partitions.

The following applies to the following recipes:

  • Python
  • Hive
  • Pig
  • SQL query
  • SQL script

Substituting variables

Pig, SQL

Variables are replaced in your code using the $VARIABLE_NAME syntax. For example, if you have the following code:

SELECT * from mytable WHERE condition='$DKU_DST_ctry';

with a variable DKU_DST_ctry which has value France, the following query will actually be executed:

SELECT * from mytable WHERE condition='France';

Hive

Variables are replaced in your code using the ${hiveconf:VARIABLE_NAME} syntax. For example, if you have the following code:

SELECT * from mytable WHERE condition='${hiveconf:DKU_DST_date}';

with a variable DKU_DST_date which has value 2013-12-21, the following query will actually be executed:

SELECT * from mytable WHERE condition='2013-12-21';

Python

All variables variables are available in a python dictionary called dku_flow_variables in the dataiku module.

So for example, you can use in your code:

import dataiku
print "I am working for year %s" % (dataiku.dku_flow_variables["DKU_DST_YEAR"])

R

Flow variables are retrieved using the dkuFlowVariable(variableName) function

library(dataiku)
dkuFlowVariable("DKU_DST_ctry")

Available variables