Executing partial recipes

It is possible to execute a “partial recipe” from a Python recipe, to execute a Hive, Pig, Impala or SQL query.

This allows you to use Python to dynamically generate a SQL (resp Hive, Pig, Impala) query and have DSS execute it, as if your recipe was a SQL query recipe.

This is useful when you need complex business logic to generate the final SQL query and can’t do it with only SQL constructs.

Note

Partial recipes are only possible when you are running a Python recipe. It is not available in the notebooks.

The partial recipe behaves like the corresponding SQL (resp Hive, Pig, Impala) recipe w.r.t. the inputs and outputs. Notably, a Python recipe in which a partial Hive or Pig recipe is executed can only have HDFS datasets as inputs and outputs. Likewise, a Impala or SQL partial recipe having only one ouput, the output dataset has to be specified for the partial recipe execution.

from dataiku.core.sql import SQLExecutor2

# get the needed data to prepare the query
# for example, load from another table
e = SQLExecutor2(my_auxiliary_dataset)
words = e.query_to_df("SELECT word FROM word_frequency WHERE frequency > 0.01 AND frequency < 0.99")

# prepare a query dynamically
sql = 'SELECT id '
for word in words['word']:
    sql = sql + ", (length(text) - length(regexp_replace(text, '" + word + "', ''))) / " + len(word) + " AS count_" + word
sql = sql + " FROM reviews"

# execute it
# no need to instantiate an executor object, the method is static
SQLExecutor2.exec_recipe_fragment(my_output_dataset, sql)

API doc

class dataiku.core.sql.SQLExecutor2(connection=None, dataset=None)
class dataiku.core.sql.HiveExecutor(dataset=None, database=None, connection=None)
class dataiku.core.pig.PigExecutor
class dataiku.core.sql.ImpalaExecutor(dataset=None, database=None, connection=None)