Managing recipes

Recipes belong to a given project, so all access to recipes implies getting a handle of the project first. For recipe creation as well, getting a project handle first is required.

Recipe creation

Helpers for creating common recipes are provided in the dataikuapi package. They follow a builder pattern, that is, a builder object is created, on which you add settings, and then call the build() to actually create the recipe object. The builder objects reproduce the functionality available in the recipe creation modals in the UI, so for more control on the recipe’s setup, it is necessary to get its payload and definition after creation, modify it, and save it again.

Example : creating a Python recipe

A Python recipe belongs to the category of code recipes, and can have several inputs and several outputs.

from dataikuapi import CodeRecipeCreator
builder = CodeRecipeCreator("test_recipe_name", "python", project)
builder = builder.with_input("input1_name")
builder = builder.with_input("input2_name")
builder = builder.with_output("some_existing_output_dataset")
builder = builder.with_script("print('hello world')")
recipe = builder.build()

Example : creating a Sync recipe

Sync recipes belong to the category of recipes that have only one output. For these recipes, the builder object offers an additional method dataikuapi.dss.recipe.SingleOutputRecipeCreator.with_new_output() to create the output dataset along with the recipe, thus removing the need to pre-create the outputs of the recipe.

from dataikuapi import SyncRecipeCreator
builder = SyncRecipeCreator("sync_to_parquet", project)
builder = builder.with_input("input_dataset_name")
builder = builder.with_new_output("output_dataset_name", "hdfs_managed", format_option_id="PARQUET_HIVE")
recipe = builder.build()

Example : creating a Group recipe and modifying the aggregates

The recipe creation mostly handles setting up the inputs and outputs of the recipes, so most of the setup of the recipe has to be done by retrieving its definition and payload, altering it, then saving it again

from dataikuapi import GroupingRecipeCreator
builder = GroupingRecipeCreator('test_group', project)
builder = builder.with_input("input_dataset_name")
builder = builder.with_new_output("output_dataset_name", "hdfs_managed", format_option_id="PARQUET_HIVE")
builder = builder.with_group_key("quantity") # the recipe is created with one grouping key
recipe = builder.build()

# once the recipe exists, it's possible to modify its settings
# 1. get the settings
recipe_def = recipe.get_definition_and_payload()
recipe_payload = recipe_def.get_json_payload()
# 2. modify them
for v in recipe_payload["values"]:
    v["count"] = True
# 3. save the changes back
recipe_def.set_json_payload(recipe_payload)
recipe.set_definition_and_payload(recipe_def)

Reference documentation

class dataikuapi.dss.recipe.DSSRecipe(client, project_key, recipe_name)

A handle to an existing recipe on the DSS instance

delete()

Delete the recipe

get_definition_and_payload()

Get the definition of the recipe

Returns:
the definition, as a DSSRecipeDefinitionAndPayload object, containing the recipe definition itself and its payload
set_definition_and_payload(definition)

Set the definition of the recipe

Args:
definition: the definition, as a DSSRecipeDefinitionAndPayload object. You should only set a definition object that has been retrieved using the get_definition call.
get_metadata()

Get the metadata attached to this recipe. The metadata contains label, description checklists, tags and custom metadata of the recipe

Returns:
a dict object. For more information on available metadata, please see https://doc.dataiku.com/dss/api/latest
set_metadata(metadata)

Set the metadata on this recipe.

Args:
metadata: the new state of the metadata for the recipe. You should only set a metadata object that has been retrieved using the get_metadata call.
class dataikuapi.dss.recipe.DSSRecipeDefinitionAndPayload(data)

Definition for a recipe, that is, the recipe definition itself and its payload

get_recipe_raw_definition()
get_recipe_inputs()
get_recipe_outputs()
get_recipe_params()
get_payload()
get_json_payload()
set_payload(payload)
set_json_payload(payload)
class dataikuapi.dss.recipe.DSSRecipeCreator(type, name, project)

Helper to create new recipes

with_input(dataset_name, project_key=None, role='main')
with_output(dataset_name, append=False, role='main')

The output dataset must already exist. If you are creating a visual recipe with a single output, use with_existing_output

build()

Create a new recipe in the project, and return a handle to interact with it.

Returns:
A dataikuapi.dss.recipe.DSSRecipe recipe handle
class dataikuapi.dss.recipe.SingleOutputRecipeCreator(type, name, project)
with_existing_output(dataset_name, append=False)
with_new_output(name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET')
with_output(dataset_name, append=False)
class dataikuapi.dss.recipe.VirtualInputsSingleOutputRecipeCreator(type, name, project)
with_input(dataset_name, project_key=None)
class dataikuapi.dss.recipe.WindowRecipeCreator(name, project)
class dataikuapi.dss.recipe.SyncRecipeCreator(name, project)
class dataikuapi.dss.recipe.SortRecipeCreator(name, project)
class dataikuapi.dss.recipe.TopNRecipeCreator(name, project)
class dataikuapi.dss.recipe.DistinctRecipeCreator(name, project)
class dataikuapi.dss.recipe.GroupingRecipeCreator(name, project)
with_group_key(group_key)
class dataikuapi.dss.recipe.JoinRecipeCreator(name, project)
class dataikuapi.dss.recipe.StackRecipeCreator(name, project)
class dataikuapi.dss.recipe.SamplingRecipeCreator(name, project)
class dataikuapi.dss.recipe.CodeRecipeCreator(name, type, project)
with_script(script)
class dataikuapi.dss.recipe.SQLQueryRecipeCreator(name, project)
class dataikuapi.dss.recipe.SplitRecipeCreator(name, project)
class dataikuapi.dss.recipe.PredictionScoringRecipeCreator(name, project)

Builder for the creation of a new “Prediction scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = PredictionScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
with_input_model(model_id)

Sets the input model

class dataikuapi.dss.recipe.ClusteringScoringRecipeCreator(name, project)

Builder for the creation of a new “Clustering scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = ClusteringScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
with_input_model(model_id)

Sets the input model

class dataikuapi.dss.recipe.DownloadRecipeCreator(name, project)