Managing recipes

Recipes belong to a given project, so all access to recipes implies getting a handle of the project first. For recipe creation as well, getting a project handle first is required.

Recipe creation

Helpers for creating common recipes are provided in the dataikuapi package. They follow a builder pattern, that is, a builder object is created, on which you add settings, and then call the build() to actually create the recipe object. The builder objects reproduce the functionality available in the recipe creation modals in the UI, so for more control on the recipe’s setup, it is necessary to get its payload and definition after creation, modify it, and save it again.

Example : creating a Python recipe

A Python recipe belongs to the category of code recipes, and can have several inputs and several outputs.

from dataikuapi import CodeRecipeCreator
builder = CodeRecipeCreator("test_recipe_name", "python", project)
builder = builder.with_input("input1_name")
builder = builder.with_input("input2_name")
builder = builder.with_output("some_existing_output_dataset")
builder = builder.with_script("print('hello world')")
recipe = builder.build()

Example : creating a Sync recipe

Sync recipes belong to the category of recipes that have only one output. For these recipes, the builder object offers an additional method dataikuapi.dss.recipe.SingleOutputRecipeCreator.with_new_output() to create the output dataset along with the recipe, thus removing the need to pre-create the outputs of the recipe.

from dataikuapi import SyncRecipeCreator
builder = SyncRecipeCreator("sync_to_parquet", project)
builder = builder.with_input("input_dataset_name")
builder = builder.with_new_output("output_dataset_name", "hdfs_managed", format_option_id="PARQUET_HIVE")
recipe = builder.build()

Example : creating a Group recipe and modifying the aggregates

The recipe creation mostly handles setting up the inputs and outputs of the recipes, so most of the setup of the recipe has to be done by retrieving its definition and payload, altering it, then saving it again

from dataikuapi import GroupingRecipeCreator
builder = GroupingRecipeCreator('test_group', project)
builder = builder.with_input("input_dataset_name")
builder = builder.with_new_output("output_dataset_name", "hdfs_managed", format_option_id="PARQUET_HIVE")
builder = builder.with_group_key("quantity") # the recipe is created with one grouping key
recipe = builder.build()

# once the recipe exists, it's possible to modify its settings
# 1. get the settings
recipe_def = recipe.get_definition_and_payload()
recipe_payload = recipe_def.get_json_payload()
# 2. modify them
for v in recipe_payload["values"]:
    v["count"] = True
# 3. save the changes back
recipe_def.set_json_payload(recipe_payload)
recipe.set_definition_and_payload(recipe_def)

Reference documentation

class dataikuapi.dss.recipe.DSSRecipe(client, project_key, recipe_name)

A handle to an existing recipe on the DSS instance

delete()

Delete the recipe

get_definition_and_payload()

Get the definition of the recipe

Returns:
the definition, as a DSSRecipeDefinitionAndPayload object, containing the recipe definition itself and its payload
set_definition_and_payload(definition)

Set the definition of the recipe

Args:
definition: the definition, as a DSSRecipeDefinitionAndPayload object. You should only set a definition object that has been retrieved using the get_definition call.
get_metadata()

Get the metadata attached to this recipe. The metadata contains label, description checklists, tags and custom metadata of the recipe

Returns:
a dict object. For more information on available metadata, please see https://doc.dataiku.com/dss/api/latest
set_metadata(metadata)

Set the metadata on this recipe.

Args:
metadata: the new state of the metadata for the recipe. You should only set a metadata object that has been retrieved using the get_metadata call.
class dataikuapi.dss.recipe.DSSRecipeDefinitionAndPayload(data)

Definition for a recipe, that is, the recipe definition itself and its payload

get_recipe_raw_definition()

Get the recipe definition as a raw JSON object

get_recipe_inputs()

Get the list of inputs of this recipe

get_recipe_outputs()

Get the list of outputs of this recipe

get_recipe_params()

Get the parameters of this recipe, as a raw JSON object

get_payload()

Get the payload or script of this recipe, as a raw string

get_json_payload()

Get the payload or script of this recipe, as a JSON object

set_payload(payload)

Set the raw payload of this recipe

Parameters:payload (str) – the payload, as a string
set_json_payload(payload)

Set the raw payload of this recipe

Parameters:payload (dict) – the payload, as a dict. The payload will be converted to a JSON string internally
class dataikuapi.dss.recipe.DSSRecipeCreator(type, name, project)

Helper to create new recipes

Parameters:
  • type (str) – type of the recipe
  • name (str) – name for the recipe

:param dataikuapi.dss.project.DSSProject project: project in which the recipe will be created

with_input(dataset_name, project_key=None, role='main')

Add an existing object as input to the recipe-to-be-created

Parameters:
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
  • project_key – project containing the object, if different from the one where the recipe is created
  • role (str) – the role of the recipe in which the input should be added
with_output(dataset_name, append=False, role='main')

The output dataset must already exist. If you are creating a visual recipe with a single output, use with_existing_output

Parameters:
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
  • role (str) – the role of the recipe in which the input should be added
build()

Create a new recipe in the project, and return a handle to interact with it.

Returns:
A dataikuapi.dss.recipe.DSSRecipe recipe handle
class dataikuapi.dss.recipe.SingleOutputRecipeCreator(type, name, project)

Create a recipe that has a single output

with_existing_output(dataset_name, append=False)

Add an existing object as output to the recipe-to-be-created

Parameters:
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
with_new_output(name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET')

Create a new dataset as output to the recipe-to-be-created. The dataset is not created immediately, but when the recipe is created (ie in the build() method)

Parameters:
  • name (str) – name of the dataset or identifier of the managed folder
  • connection_id (str) – name of the connection to create the dataset on
  • typeOptionId (str) – sub-type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection
  • format_option_id (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC
  • override_sql_schema – schema to force dataset, for SQL dataset. If left empty, will be autodetected
  • partitioning_option_id (str) – to copy the partitioning schema of an existing dataset ‘foo’, pass a value of ‘copy:foo’
  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
  • object_type (str) – DATASET or MANAGED_FOLDER
with_output(dataset_name, append=False)

Alias of with_existing_output

class dataikuapi.dss.recipe.VirtualInputsSingleOutputRecipeCreator(type, name, project)

Create a recipe that has a single output and several inputs

with_input(dataset_name, project_key=None)
class dataikuapi.dss.recipe.WindowRecipeCreator(name, project)

Create a Window recipe

class dataikuapi.dss.recipe.SyncRecipeCreator(name, project)

Create a Sync recipe

class dataikuapi.dss.recipe.SortRecipeCreator(name, project)

Create a Sort recipe

class dataikuapi.dss.recipe.TopNRecipeCreator(name, project)

Create a TopN recipe

class dataikuapi.dss.recipe.DistinctRecipeCreator(name, project)

Create a Distinct recipe

class dataikuapi.dss.recipe.GroupingRecipeCreator(name, project)

Create a Group recipe

with_group_key(group_key)

Set a column as grouping key

Parameters:group_key (str) – name of a column in the input
class dataikuapi.dss.recipe.JoinRecipeCreator(name, project)

Create a Join recipe

class dataikuapi.dss.recipe.StackRecipeCreator(name, project)

Create a Stack recipe

class dataikuapi.dss.recipe.SamplingRecipeCreator(name, project)

Create a Sample/Filter recipe

class dataikuapi.dss.recipe.CodeRecipeCreator(name, type, project)
with_script(script)

Set the code of the recipe

Parameters:script (str) – the script of the recipe
class dataikuapi.dss.recipe.SQLQueryRecipeCreator(name, project)

Create a SQL query recipe

class dataikuapi.dss.recipe.SplitRecipeCreator(name, project)

Create a Split recipe

class dataikuapi.dss.recipe.PredictionScoringRecipeCreator(name, project)

Builder for the creation of a new “Prediction scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = PredictionScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
with_input_model(model_id)

Sets the input model

class dataikuapi.dss.recipe.ClusteringScoringRecipeCreator(name, project)

Builder for the creation of a new “Clustering scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = ClusteringScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
with_input_model(model_id)

Sets the input model

class dataikuapi.dss.recipe.DownloadRecipeCreator(name, project)

Create a Download recipe