Recipes

This page lists usage examples for performing various operations with recipes through Dataiku Python API. In all examples, project is a dataikuapi.dss.project.DSSProject handle, obtained using client.get_project() or client.get_default_project()

Basic operations

Listing recipes

datasets = project.list_recipes()
# Returns a list of DSSRecipeListItem

for dataset in datasets:
        # Quick access to main information in the recipe list item
        print("Name: %s" % recipe.name)
        print("Type: %s" % recipe.type)
        print("Tags: %s" % recipe.tags) # Returns a list of strings

        # You can also use the list item as a dict of all available recipe information
        print("Raw: %s" % recipe)

Deleting a recipe

recipe = project.get_recipe('myrecipe')
recipe.delete()

Modifying tags for a recipe

dataset = project.get_recipe('myrecipe')
settings = dataset.get_settings()

print("Current tags are %s" % settings.tags)

# Change the tags
settings.tags = ["newtag1", "newtag2"]

# If we changed the settings, we must save
settings.save()

Recipe status

You can compute the status of the recipe, which also provides you with the engine information.

Find the engine used to run a recipe

recipe = project.get_recipe("myrecipe")
status = recipe.get_status()
print(status.get_selected_engine_details())

Check if a recipe is valid

get_status calls the validation code of the recipe

recipe = project.get_recipe("myrecipe")
status = recipe.get_status()
print(status.get_selected_engine_details())

Find the engines for all recipes of a certain type

This example shows how to filter a list, obtain DSSRecipe objects for the list items, and getting their status

for list_item in project.list_recipes():
        if list_item.type == "grouping":
                recipe = list_item.to_recipe()
                engine = recipe.get_status().get_selected_engine_details()["type"]
                print("Recipe %s uses engine %s" % (recipe.name, engine))

Recipe settings

When you use get_settings() on a recipe, you receive a settings object whose class depends on the recipe type. Please see below for the possible types.

Checking if a recipe uses a particular dataset as input

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()
print("Recipe %s uses input:%s" % (recipe.name, settings.has_input("mydataset"))

Replacing an input of a recipe

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()

settings.replace_input("old_input", "new_input")
settings.save()

Setting the code env of a code recipe

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()

# Use this to set the recipe to inherit the project's code env
settings.set_code_env(inherit=True)

# Use this to set the recipe to use a specific code env
settings.set_code_env(code_env="myenv")

settings.save()

Reference documentation

class dataikuapi.dss.recipe.DSSRecipeListItem(client, data)

An item in a list of recipes. Do not instantiate this class, use dataikuapi.dss.project.DSSProject.list_recipes()

to_recipe()

Gets the DSSRecipe corresponding to this dataset

name
id
type
class dataikuapi.dss.recipe.DSSRecipe(client, project_key, recipe_name)

A handle to an existing recipe on the DSS instance. Do not create this directly, use dataikuapi.dss.project.DSSProject.get_recipe()

name

The name of the recipe

compute_schema_updates()

Computes which updates are required to the outputs of this recipe. The required updates are returned as a RequiredSchemaUpdates object, which then allows you to apply() the changes.

Usage example:

required_updates = recipe.compute_schema_updates()
if required_updates.any_action_required():
    print("Some schemas will be updated")

# Note that you can call apply even if no changes are required. This will be noop
required_updates.apply()
run(job_type='NON_RECURSIVE_FORCED_BUILD', partitions=None, wait=True, no_fail=False)

Starts a new job to run this recipe and wait for it to complete. Raises if the job failed.

job = recipe.run()
print("Job %s done" % job.id)
Parameters:
  • job_type – The job type. One of RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD or RECURSIVE_FORCED_BUILD
  • partitions – If the outputs are partitioned, a list of partition ids to build
  • no_fail – if True, does not raise if the job failed.
Returns:

the dataikuapi.dss.job.DSSJob job handle corresponding to the built job

Return type:

dataikuapi.dss.job.DSSJob

delete()

Delete the recipe

get_settings()

Gets the settings of the recipe, as a DSSRecipeSettings or one of its subclasses.

Some recipes have a dedicated class for the settings, with additional helpers to read and modify the settings

Once you are done modifying the returned settings object, you can call save() on it in order to save the modifications to the DSS recipe

get_definition_and_payload()

Deprecated. Use get_settings()

set_definition_and_payload(definition)

Deprecated. Use get_settings() and DSSRecipeSettings.save()

get_status()

Gets the status of this recipe (status messages, engines status, …)

Returns:a dataikuapi.dss.recipe.DSSRecipeStatus object to interact with the status
Return type:dataikuapi.dss.recipe.DSSRecipeStatus
get_metadata()

Get the metadata attached to this recipe. The metadata contains label, description checklists, tags and custom metadata of the recipe

Returns:a dict. For more information on available metadata, please see https://doc.dataiku.com/dss/api/8.0/rest/

:rtype dict

set_metadata(metadata)

Set the metadata on this recipe. :params dict metadata: the new state of the metadata for the recipe. You should only set a metadata object

that has been retrieved using the get_metadata call.
get_object_discussions()

Get a handle to manage discussions on the recipe

Returns:the handle to manage discussions
Return type:dataikuapi.discussion.DSSObjectDiscussions
get_continuous_activity()

Return a handle on the associated recipe

class dataikuapi.dss.recipe.DSSRecipeStatus(client, data)

Status of a recipce. Do not create that directly, use DSSRecipe.get_status()

get_selected_engine_details()

Gets the selected engine for this recipe (for recipes that support engines)

Returns:a dict of the details of the selected recipe. The dict will contain at least fields ‘type’ indicating which engine it is, “statusWarnLevel” which indicates whether the engine is OK / WARN / ERROR
Return type:dict
get_engines_details()

Gets details about all possible engines for this recipe (for recipes that support engines)

Returns:a list of dict of the details of each possible engine. The dict for each engine will contain at least fields ‘type’ indicating which engine it is, “statusWarnLevel” which indicates whether the engine is OK / WARN / ERROR
Return type:list
get_status_severity()

Returns whether the recipe is in SUCCESS, WARNING or ERROR status

Return type:string
get_status_messages()

Returns status messages for this recipe.

Returns:a list of dict, for each status message. Each dict represents a single message, and contains at least a “severity” field (SUCCESS, WARNING or ERROR) and a “message” field
Return type:list
class dataikuapi.dss.recipe.DSSRecipeSettings(recipe, data)

Settings of a recipe. Do not create this directly, use DSSRecipe.get_settings()

save()

Saves back the recipe in DSS.

type
str_payload

The raw “payload” of the recipe, as a string

obj_payload

The raw “payload” of the recipe, as a dict

raw_params

The raw ‘params’ field of the recipe settings, as a dict

get_recipe_raw_definition()

Get the recipe definition as a raw dict :rtype dict

get_recipe_inputs()

Get a structured dict of inputs to this recipe :rtype dict

get_recipe_outputs()

Get a structured dict of outputs of this recipe :rtype dict

get_recipe_params()

Get the parameters of this recipe, as a dict :rtype dict

get_payload()

Get the payload or script of this recipe, as a string :rtype string

get_json_payload()

Get the payload or script of this recipe, parsed from JSON, as a dict :rtype dict

set_payload(payload)

Set the payload of this recipe :param str payload: the payload, as a string

set_json_payload(payload)

Set the payload of this recipe :param dict payload: the payload, as a dict. The payload will be converted to a JSON string internally

has_input(input_ref)

Returns whether this recipe has a given ref as input

has_output(output_ref)

Returns whether this recipe has a given ref as output

replace_input(current_input_ref, new_input_ref)

Replaces an object reference as input of this recipe by another

replace_output(current_output_ref, new_output_ref)

Replaces an object reference as output of this recipe by another

add_input(role, ref, partition_deps=None)
add_output(role, ref, append_mode=False)
get_flat_input_refs()

Returns a list of all input refs of this recipe, regardless of the input role :rtype list of strings

get_flat_output_refs()

Returns a list of all output refs of this recipe, regardless of the output role :rtype list of strings

class dataikuapi.dss.recipe.DSSRecipeDefinitionAndPayload(recipe, data)

Deprecated. Settings of a recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.DSSRecipeCreator(type, name, project)

Helper to create new recipes

Parameters:
  • type (str) – type of the recipe
  • name (str) – name for the recipe

:param dataikuapi.dss.project.DSSProject project: project in which the recipe will be created

set_name(name)
with_input(dataset_name, project_key=None, role='main')

Add an existing object as input to the recipe-to-be-created

Parameters:
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
  • project_key – project containing the object, if different from the one where the recipe is created
  • role (str) – the role of the recipe in which the input should be added
with_output(dataset_name, append=False, role='main')

The output dataset must already exist. If you are creating a visual recipe with a single output, use with_existing_output

Parameters:
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
  • role (str) – the role of the recipe in which the input should be added
build()

Deprecated. Use create()

create()

Creates the new recipe in the project, and return a handle to interact with it.

Returns:
A dataikuapi.dss.recipe.DSSRecipe recipe handle
set_raw_mode()
class dataikuapi.dss.recipe.SingleOutputRecipeCreator(type, name, project)

Create a recipe that has a single output

with_existing_output(dataset_name, append=False)

Add an existing object as output to the recipe-to-be-created

Parameters:
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
with_new_output(name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET', overwrite=False)

Create a new dataset as output to the recipe-to-be-created. The dataset is not created immediately, but when the recipe is created (ie in the create() method)

Parameters:
  • name (str) – name of the dataset or identifier of the managed folder
  • connection_id (str) – name of the connection to create the dataset on
  • typeOptionId (str) – sub-type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection
  • format_option_id (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC
  • override_sql_schema – schema to force dataset, for SQL dataset. If left empty, will be autodetected
  • partitioning_option_id (str) – to copy the partitioning schema of an existing dataset ‘foo’, pass a value of ‘copy:dataset:foo’
  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
  • object_type (str) – DATASET or MANAGED_FOLDER
  • overwrite – If the dataset being created already exists, overwrite it (and delete data)
with_output(dataset_name, append=False)

Alias of with_existing_output

class dataikuapi.dss.recipe.VirtualInputsSingleOutputRecipeCreator(type, name, project)

Create a recipe that has a single output and several inputs

with_input(dataset_name, project_key=None)

Add an existing object as input to the recipe-to-be-created

Parameters:
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
  • project_key – project containing the object, if different from the one where the recipe is created
  • role (str) – the role of the recipe in which the input should be added
class dataikuapi.dss.recipe.GroupingRecipeSettings(recipe, data)

Settings of a grouping recipe. Do not create this directly, use DSSRecipe.get_settings()

clear_grouping_keys()

Removes all grouping keys from this grouping recipe

add_grouping_key(column)

Adds grouping on a column :param str column: Column to group on

set_global_count_enabled(enabled)
get_or_create_column_settings(column)

Gets a dict representing the aggregations to perform on a column. Creates it and adds it to the potential aggregations if it does not already exists :param str column: The column name :rtype dict

set_column_aggregations(column, type, min=False, max=False, count=False, count_distinct=False, sum=False, concat=False, stddev=False, avg=False)

Sets the basic aggregations on a column. Returns the dict representing the aggregations on the column

Parameters:
  • column (str) – The column name
  • type (str) – The type of the column (as a DSS schema type name)

:rtype dict

class dataikuapi.dss.recipe.GroupingRecipeCreator(name, project)

Create a Group recipe

with_group_key(group_key)

Set a column as the first grouping key. Only a single grouping key may be set at recipe creation time. For additional groupings, get the recipe settings

Parameters:group_key (str) – name of a column in the input dataset
class dataikuapi.dss.recipe.WindowRecipeSettings(recipe, data)

Settings of a window recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.WindowRecipeCreator(name, project)

Create a Window recipe

class dataikuapi.dss.recipe.SyncRecipeSettings(recipe, data)

Settings of a sync recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.SyncRecipeCreator(name, project)

Create a Sync recipe

class dataikuapi.dss.recipe.SortRecipeSettings(recipe, data)

Settings of a sort recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.SortRecipeCreator(name, project)

Create a Sort recipe

class dataikuapi.dss.recipe.TopNRecipeSettings(recipe, data)

Settings of a topn recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.TopNRecipeCreator(name, project)

Create a TopN recipe

class dataikuapi.dss.recipe.DistinctRecipeSettings(recipe, data)

Settings of a distinct recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.DistinctRecipeCreator(name, project)

Create a Distinct recipe

class dataikuapi.dss.recipe.PrepareRecipeSettings(recipe, data)

Settings of a prepare recipe. Do not create this directly, use DSSRecipe.get_settings()

raw_steps

Returns a raw list of the steps of this prepare recipe. You can modify the returned list.

Each step is a dict of settings. The precise settings for each step are not documented

add_processor_step(type, params)
add_filter_on_bad_meaning(meaning, columns)
class dataikuapi.dss.recipe.PrepareRecipeCreator(name, project)

Create a Prepare recipe

class dataikuapi.dss.recipe.JoinRecipeSettings(recipe, data)

Settings of a join recipe. Do not create this directly, use DSSRecipe.get_settings()

In order to enable self-joins, join recipes are based on a concept of “virtual inputs”. Every join, computed pre-join column, pre-join filter, … is based on one virtual input, and each virtual input references an input of the recipe, by index

For example, if a recipe has inputs A and B and declares two joins:
  • A->B
  • A->A(based on a computed column)
There are 3 virtual inputs:
  • 0: points to recipe input 0 (i.e. dataset A)
  • 1: points to recipe input 1 (i.e. dataset B)
  • 2: points to recipe input 0 (i.e. dataset A) and includes the computed column
  • The first join is between virtual inputs 0 and 1
  • The second join is between virtual inputs 0 and 2
raw_virtual_inputs

Returns the raw list of virtual inputs :rtype list of dict

raw_joins

Returns the raw list of joins :rtype list of dict

add_virtual_input(input_dataset_index)

Adds a virtual input pointing to the specified input dataset of the recipe (referenced by index in the inputs list)

add_pre_join_computed_column(virtual_input_index, computed_column)

Adds a computed column to a virtual input

Use dataikuapi.dss.utils.DSSComputedColumn to build the computed_column object

add_join(join_type='LEFT', input1=0, input2=1)

Adds a join between two virtual inputs. The join is initialized with no condition.

Use add_condition_to_join() on the return value to add a join condition (for example column equality) to the join

:returns the newly added join as a dict :rtype dict

add_condition_to_join(join, type='EQ', column1=None, column2=None)

Adds a condition to a join :param str column1: Name of “left” column :param str column2: Name of “right” column

add_post_join_computed_column(computed_column)

Adds a post-join computed column

Use dataikuapi.dss.utils.DSSComputedColumn to build the computed_column object

set_post_filter(postfilter)
class dataikuapi.dss.recipe.JoinRecipeCreator(name, project)

Create a Join recipe

class dataikuapi.dss.recipe.StackRecipeSettings(recipe, data)

Settings of a stack recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.StackRecipeCreator(name, project)

Create a Stack recipe

class dataikuapi.dss.recipe.SamplingRecipeSettings(recipe, data)

Settings of a sampling recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.SamplingRecipeCreator(name, project)

Create a Sample/Filter recipe

class dataikuapi.dss.recipe.SplitRecipeSettings(recipe, data)

Settings of a split recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.SplitRecipeCreator(name, project)

Create a Split recipe

class dataikuapi.dss.recipe.DownloadRecipeSettings(recipe, data)

Settings of a download recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.CodeRecipeSettings(recipe, data)

Settings of a code recipe. Do not create this directly, use DSSRecipe.get_settings()

get_code()

Returns the code of the recipe as a string :rtype string

set_code(code)

Updates the code of the recipe :param str code: The new code as a string

get_code_env_settings()

Returns the code env settings for this recipe :rtype dict

set_code_env(code_env=None, inherit=False, use_builtin=False)

Sets the code env to use for this recipe.

Exactly one of code_env, inherit or use_builtin must be passed

Parameters:
  • code_env (str) – The name of a code env
  • inherit (bool) – Use the project’s default code env
  • use_builtin (bool) – Use the builtin code env
class dataikuapi.dss.recipe.CodeRecipeCreator(name, type, project)
with_script(script)

Set the code of the recipe

Parameters:script (str) – the script of the recipe
with_new_output_dataset(name, connection, type=None, format=None, copy_partitioning_from='FIRST_INPUT', append=False, overwrite=False)

Create a new managed dataset as output to the recipe-to-be-created. The dataset is created immediately

Parameters:
  • name (str) – name of the dataset to create
  • connection_id (str) – name of the connection to create the dataset on
  • type (str) – type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection
  • format (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC. If None, uses the default
  • copy_partitioning_from (str) – Whether to copy the partitioning from another thing. Use None for not partitioning the output, “FIRST_INPUT” to copy from the first input of the recipe, “dataset:XXX” to copy from a dataset name, or “folder:XXX” to copy from a folder id
  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
  • overwrite – If the dataset being created already exists, overwrite it (and delete data)
class dataikuapi.dss.recipe.PythonRecipeCreator(name, project)

Creates a Python recipe. A Python recipe can be defined either by its complete code, like a normal Python recipe, or by a function signature.

When using a function, the function must take as arguments:
  • A list of dataframes corresponding to the dataframes of the input datasets
  • Optional named arguments corresponding to arguments passed to the creator
DEFAULT_RECIPE_CODE_TMPL = '\n# This code is autogenerated by PythonRecipeCreator function mode\nimport dataiku, dataiku.recipe, json\nfrom {module_name} import {fname}\ninput_datasets = dataiku.recipe.get_inputs_as_datasets()\noutput_datasets = dataiku.recipe.get_outputs_as_datasets()\nparams = json.loads(\'{params_json}\')\n\nlogging.info("Reading %d input datasets as dataframes" % len(input_datasets))\ninput_dataframes = [ds.get_dataframe() for ds in input_datasets]\n\nlogging.info("Calling user function {fname}")\nfunction_input = input_dataframes if len(input_dataframes) > 1 else input_dataframes[0]\noutput_dataframes = {fname}(function_input, **params)\n\nif not isinstance(output_dataframes, list):\n output_dataframes = [output_dataframes]\n\nif not len(output_dataframes) == len(output_datasets):\n raise Exception("Code function {fname}() returned %d dataframes but recipe expects %d output datasets", \\\n (len(output_dataframes), len(output_datasets)))\noutput = list(zip(output_datasets, output_dataframes))\nfor ds, df in output:\n logging.info("Writing function result to dataset %s" % ds.name)\n ds.write_with_schema(df)\n'
with_function_name(module_name, function_name, custom_template=None, **function_args)

Defines this recipe as being a functional recipe calling a function name from a module name

with_function(fn, custom_template=None, **function_args)
class dataikuapi.dss.recipe.SQLQueryRecipeCreator(name, project)

Create a SQL query recipe

class dataikuapi.dss.recipe.PredictionScoringRecipeCreator(name, project)

Builder for the creation of a new “Prediction scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = PredictionScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
with_input_model(model_id)

Sets the input model

class dataikuapi.dss.recipe.ClusteringScoringRecipeCreator(name, project)

Builder for the creation of a new “Clustering scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = ClusteringScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
with_input_model(model_id)

Sets the input model

class dataikuapi.dss.recipe.DownloadRecipeCreator(name, project)

Create a Download recipe

class dataikuapi.dss.recipe.RequiredSchemaUpdates(recipe, data)

Representation of the updates required to the schema of the outputs of a recipe. Do not create this class directly, use DSSRecipe.compute_schema_updates()

any_action_required()
apply()