Recipes¶

Basic operations
Recipe creation
Recipe status
Recipe settings
Reference documentation

This page lists usage examples for performing various operations with recipes through Dataiku Python API. In all examples, project is a dataikuapi.dss.project.DSSProject handle, obtained using client.get_project() or client.get_default_project()

Basic operations ¶

Listing recipes ¶

recipes = project.list_recipes()
# Returns a list of DSSRecipeListItem

for recipe in recipes:
        # Quick access to main information in the recipe list item
        print("Name: %s" % recipe.name)
        print("Type: %s" % recipe.type)
        print("Tags: %s" % recipe.tags) # Returns a list of strings

        # You can also use the list item as a dict of all available recipe information
        print("Raw: %s" % recipe)

Deleting a recipe ¶

recipe = project.get_recipe('myrecipe')
recipe.delete()

Modifying tags for a recipe ¶

recipe = project.get_recipe('myrecipe')
settings = dataset.get_settings()

print("Current tags are %s" % settings.tags)

# Change the tags
settings.tags = ["newtag1", "newtag2"]

# If we changed the settings, we must save
settings.save()

Recipe creation ¶

Please see Flow creation and management

Recipe status ¶

You can compute the status of the recipe, which also provides you with the engine information.

Find the engine used to run a recipe ¶

recipe = project.get_recipe("myrecipe")
status = recipe.get_status()
print(status.get_selected_engine_details())

Check if a recipe is valid ¶

get_status calls the validation code of the recipe

recipe = project.get_recipe("myrecipe")
status = recipe.get_status()
print(status.get_selected_engine_details())

Find the engines for all recipes of a certain type ¶

This example shows how to filter a list, obtain DSSRecipe objects for the list items, and getting their status

for list_item in project.list_recipes():
        if list_item.type == "grouping":
                recipe = list_item.to_recipe()
                engine = recipe.get_status().get_selected_engine_details()["type"]
                print("Recipe %s uses engine %s" % (recipe.name, engine))

Recipe settings ¶

When you use get_settings() on a recipe, you receive a settings object whose class depends on the recipe type. Please see below for the possible types.

Checking if a recipe uses a particular dataset as input ¶

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()
print("Recipe %s uses input:%s" % (recipe.name, settings.has_input("mydataset"))

Replacing an input of a recipe ¶

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()

settings.replace_input("old_input", "new_input")
settings.save()

Setting the code env of a code recipe ¶

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()

# Use this to set the recipe to inherit the project's code env
settings.set_code_env(inherit=True)

# Use this to set the recipe to use a specific code env
settings.set_code_env(code_env="myenv")

settings.save()

Reference documentation ¶

class dataikuapi.dss.recipe.DSSRecipe(client, project_key, recipe_name)¶

A handle to an existing recipe on the DSS instance. Do not create this directly, use dataikuapi.dss.project.DSSProject.get_recipe()

property id¶: The id of the recipe

property name¶: The name of the recipe

compute_schema_updates()¶

Computes which updates are required to the outputs of this recipe. The required updates are returned as a RequiredSchemaUpdates object, which then allows you to apply() the changes.

Usage example:

required_updates = recipe.compute_schema_updates()
if required_updates.any_action_required():
    print("Some schemas will be updated")

# Note that you can call apply even if no changes are required. This will be noop
required_updates.apply()

run(job_type='NON_RECURSIVE_FORCED_BUILD', partitions=None, wait=True, no_fail=False)¶

Starts a new job to run this recipe and wait for it to complete. Raises if the job failed.

job = recipe.run()
print("Job %s done" % job.id)

Parameters

job_type – The job type. One of RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD or RECURSIVE_FORCED_BUILD
partitions – If the outputs are partitioned, a list of partition ids to build
no_fail – if True, does not raise if the job failed.

Returns

the dataikuapi.dss.job.DSSJob job handle corresponding to the built job

Return type

dataikuapi.dss.job.DSSJob

delete()¶: Delete the recipe

get_settings()¶

Gets the settings of the recipe, as a DSSRecipeSettings or one of its subclasses.

Some recipes have a dedicated class for the settings, with additional helpers to read and modify the settings

Once you are done modifying the returned settings object, you can call save() on it in order to save the modifications to the DSS recipe

get_definition_and_payload()¶: Deprecated. Use get_settings()

set_definition_and_payload(definition)¶: Deprecated. Use get_settings() and DSSRecipeSettings.save()

get_status()¶

Gets the status of this recipe (status messages, engines status, …)

Returns: a dataikuapi.dss.recipe.DSSRecipeStatus object to interact with the status
Return type: dataikuapi.dss.recipe.DSSRecipeStatus

get_metadata()¶

Get the metadata attached to this recipe. The metadata contains label, description checklists, tags and custom metadata of the recipe

Returns: a dict. For more information on available metadata, please see https://doc.dataiku.com/dss/api/8.0/rest/

:rtype dict

set_metadata(metadata)¶: Set the metadata on this recipe. :params dict metadata: the new state of the metadata for the recipe. You should only set a metadata object

that has been retrieved using the get_metadata call.

get_object_discussions()¶

Get a handle to manage discussions on the recipe

Returns: the handle to manage discussions
Return type: dataikuapi.discussion.DSSObjectDiscussions

get_continuous_activity()¶: Return a handle on the associated recipe

move_to_zone(zone)¶

Moves this object to a flow zone

Parameters: zone (object) – a dataikuapi.dss.flow.DSSFlowZone where to move the object

class dataikuapi.dss.recipe.DSSRecipeListItem(client, data)¶

An item in a list of recipes. Do not instantiate this class, use dataikuapi.dss.project.DSSProject.list_recipes()

to_recipe()¶: Gets the DSSRecipe corresponding to this dataset

property name¶

property id¶

property type¶

property tags¶

class dataikuapi.dss.recipe.DSSRecipeStatus(client, data)¶

Status of a recipce. Do not create that directly, use DSSRecipe.get_status()

get_selected_engine_details()¶

Gets the selected engine for this recipe (for recipes that support engines)

Returns: a dict of the details of the selected recipe. The dict will contain at least fields ‘type’ indicating which engine it is, “statusWarnLevel” which indicates whether the engine is OK / WARN / ERROR
Return type: dict

get_engines_details()¶

Gets details about all possible engines for this recipe (for recipes that support engines)

Returns: a list of dict of the details of each possible engine. The dict for each engine will contain at least fields ‘type’ indicating which engine it is, “statusWarnLevel” which indicates whether the engine is OK / WARN / ERROR
Return type: list

get_status_severity()¶

Returns whether the recipe is in SUCCESS, WARNING or ERROR status

Return type: string

get_status_messages()¶

Returns status messages for this recipe.

Returns: a list of dict, for each status message. Each dict represents a single message, and contains at least a “severity” field (SUCCESS, WARNING or ERROR) and a “message” field
Return type: list

class dataikuapi.dss.recipe.RequiredSchemaUpdates(recipe, data)¶

Representation of the updates required to the schema of the outputs of a recipe. Do not create this class directly, use DSSRecipe.compute_schema_updates()

any_action_required()¶

apply()¶

Settings ¶

class dataikuapi.dss.recipe.DSSRecipeSettings(recipe, data)¶

Settings of a recipe. Do not create this directly, use DSSRecipe.get_settings()

save()¶: Saves back the recipe in DSS.

property type¶

property str_payload¶: The raw “payload” of the recipe, as a string

property obj_payload¶: The raw “payload” of the recipe, as a dict

property raw_params¶: The raw ‘params’ field of the recipe settings, as a dict

get_recipe_raw_definition()¶: Get the recipe definition as a raw dict :rtype dict

get_recipe_inputs()¶: Get a structured dict of inputs to this recipe :rtype dict

get_recipe_outputs()¶: Get a structured dict of outputs of this recipe :rtype dict

get_recipe_params()¶: Get the parameters of this recipe, as a dict :rtype dict

get_payload()¶: Get the payload or script of this recipe, as a string :rtype string

get_json_payload()¶: Get the payload or script of this recipe, parsed from JSON, as a dict :rtype dict

set_payload(payload)¶: Set the payload of this recipe :param str payload: the payload, as a string

set_json_payload(payload)¶: Set the payload of this recipe :param dict payload: the payload, as a dict. The payload will be converted to a JSON string internally

has_input(input_ref)¶: Returns whether this recipe has a given ref as input

has_output(output_ref)¶: Returns whether this recipe has a given ref as output

replace_input(current_input_ref, new_input_ref)¶: Replaces an object reference as input of this recipe by another

replace_output(current_output_ref, new_output_ref)¶: Replaces an object reference as output of this recipe by another

add_input(role, ref, partition_deps=None)¶

add_output(role, ref, append_mode=False)¶

get_flat_input_refs()¶: Returns a list of all input refs of this recipe, regardless of the input role :rtype list of strings

get_flat_output_refs()¶: Returns a list of all output refs of this recipe, regardless of the output role :rtype list of strings

property custom_fields¶: The custom fields of the object as a dict. Returns None if there are no custom fields

property description¶: The description of the object as a string

property short_description¶: The short description of the object as a string

property tags¶: The tags of the object, as a list of strings

class dataikuapi.dss.recipe.DSSRecipeDefinitionAndPayload(recipe, data): Deprecated. Settings of a recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.CodeRecipeSettings(recipe, data)¶

Settings of a code recipe. Do not create this directly, use DSSRecipe.get_settings()

get_code()¶: Returns the code of the recipe as a string :rtype string

set_code(code)¶: Updates the code of the recipe :param str code: The new code as a string

get_code_env_settings()¶: Returns the code env settings for this recipe :rtype dict

set_code_env(code_env=None, inherit=False, use_builtin=False)¶

Sets the code env to use for this recipe.

Exactly one of code_env, inherit or use_builtin must be passed

Parameters

code_env (str) – The name of a code env
inherit (bool) – Use the project’s default code env
use_builtin (bool) – Use the builtin code env

class dataikuapi.dss.recipe.SyncRecipeSettings(recipe, data)¶: Settings of a sync recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.PrepareRecipeSettings(recipe, data)¶

Settings of a prepare recipe. Do not create this directly, use DSSRecipe.get_settings()

property raw_steps¶

Returns a raw list of the steps of this prepare recipe. You can modify the returned list.

Each step is a dict of settings. The precise settings for each step are not documented

add_processor_step(type, params)¶

add_filter_on_bad_meaning(meaning, columns)¶

class dataikuapi.dss.recipe.SamplingRecipeSettings(recipe, data)¶: Settings of a sampling recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.GroupingRecipeSettings(recipe, data)¶

Settings of a grouping recipe. Do not create this directly, use DSSRecipe.get_settings()

clear_grouping_keys()¶: Removes all grouping keys from this grouping recipe

add_grouping_key(column)¶: Adds grouping on a column :param str column: Column to group on

set_global_count_enabled(enabled)¶

get_or_create_column_settings(column)¶: Gets a dict representing the aggregations to perform on a column. Creates it and adds it to the potential aggregations if it does not already exists :param str column: The column name :rtype dict

set_column_aggregations(column, type, min=False, max=False, count=False, count_distinct=False, sum=False, concat=False, stddev=False, avg=False)¶

Sets the basic aggregations on a column. Returns the dict representing the aggregations on the column

Parameters

column (str) – The column name
type (str) – The type of the column (as a DSS schema type name)

:rtype dict

class dataikuapi.dss.recipe.SortRecipeSettings(recipe, data)¶: Settings of a sort recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.TopNRecipeSettings(recipe, data)¶: Settings of a topn recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.DistinctRecipeSettings(recipe, data)¶: Settings of a distinct recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.WindowRecipeSettings(recipe, data)¶: Settings of a window recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.JoinRecipeSettings(recipe, data)¶

Settings of a join recipe. Do not create this directly, use DSSRecipe.get_settings()

In order to enable self-joins, join recipes are based on a concept of “virtual inputs”. Every join, computed pre-join column, pre-join filter, … is based on one virtual input, and each virtual input references an input of the recipe, by index

For example, if a recipe has inputs A and B and declares two joins:

A->B
A->A(based on a computed column)

There are 3 virtual inputs:

0: points to recipe input 0 (i.e. dataset A)
1: points to recipe input 1 (i.e. dataset B)
2: points to recipe input 0 (i.e. dataset A) and includes the computed column

The first join is between virtual inputs 0 and 1
The second join is between virtual inputs 0 and 2

property raw_virtual_inputs¶: Returns the raw list of virtual inputs :rtype list of dict

property raw_joins¶: Returns the raw list of joins :rtype list of dict

add_virtual_input(input_dataset_index)¶: Adds a virtual input pointing to the specified input dataset of the recipe (referenced by index in the inputs list)

add_pre_join_computed_column(virtual_input_index, computed_column)¶

Adds a computed column to a virtual input

Use dataikuapi.dss.utils.DSSComputedColumn to build the computed_column object

add_join(join_type='LEFT', input1=0, input2=1)¶

Adds a join between two virtual inputs. The join is initialized with no condition.

Use add_condition_to_join() on the return value to add a join condition (for example column equality) to the join

:returns the newly added join as a dict :rtype dict

add_condition_to_join(join, type='EQ', column1=None, column2=None)¶: Adds a condition to a join :param str column1: Name of “left” column :param str column2: Name of “right” column

add_post_join_computed_column(computed_column)¶

Adds a post-join computed column

Use dataikuapi.dss.utils.DSSComputedColumn to build the computed_column object

set_post_filter(postfilter)¶

class dataikuapi.dss.recipe.DownloadRecipeSettings(recipe, data)¶: Settings of a download recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.SplitRecipeSettings(recipe, data)¶: Settings of a split recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.StackRecipeSettings(recipe, data)¶: Settings of a stack recipe. Do not create this directly, use DSSRecipe.get_settings()

Creation ¶

class dataikuapi.dss.recipe.DSSRecipeCreator(type, name, project)¶

Helper to create new recipes

Parameters

type (str) – type of the recipe
name (str) – name for the recipe

:param dataikuapi.dss.project.DSSProject project: project in which the recipe will be created

set_name(name)¶

with_input(dataset_name, project_key=None, role='main')¶

Add an existing object as input to the recipe-to-be-created

Parameters

dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
project_key – project containing the object, if different from the one where the recipe is created
role (str) – the role of the recipe in which the input should be added

with_output(dataset_name, append=False, role='main')¶

The output dataset must already exist. If you are creating a visual recipe with a single output, use with_existing_output

Parameters

dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
role (str) – the role of the recipe in which the input should be added

build()¶: Deprecated. Use create()

create()¶

Creates the new recipe in the project, and return a handle to interact with it.

Returns:: A dataikuapi.dss.recipe.DSSRecipe recipe handle

set_raw_mode()¶

class dataikuapi.dss.recipe.SingleOutputRecipeCreator(type, name, project)¶

Create a recipe that has a single output

with_existing_output(dataset_name, append=False)¶

Add an existing object as output to the recipe-to-be-created

Parameters

dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)

with_new_output(name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET', overwrite=False)¶

Create a new dataset as output to the recipe-to-be-created. The dataset is not created immediately, but when the recipe is created (ie in the create() method)

Parameters

name (str) – name of the dataset or identifier of the managed folder
connection_id (str) – name of the connection to create the dataset on
typeOptionId (str) – sub-type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection
format_option_id (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC
override_sql_schema – schema to force dataset, for SQL dataset. If left empty, will be autodetected
partitioning_option_id (str) – to copy the partitioning schema of an existing dataset ‘foo’, pass a value of ‘copy:dataset:foo’
append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
object_type (str) – DATASET or MANAGED_FOLDER
overwrite – If the dataset being created already exists, overwrite it (and delete data)

with_output(dataset_name, append=False)¶: Alias of with_existing_output

class dataikuapi.dss.recipe.VirtualInputsSingleOutputRecipeCreator(type, name, project)¶

Create a recipe that has a single output and several inputs

with_input(dataset_name, project_key=None)¶

Add an existing object as input to the recipe-to-be-created

Parameters

dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
project_key – project containing the object, if different from the one where the recipe is created
role (str) – the role of the recipe in which the input should be added

class dataikuapi.dss.recipe.CodeRecipeCreator(name, type, project)¶

with_script(script)¶

Set the code of the recipe

Parameters: script (str) – the script of the recipe

with_new_output_dataset(name, connection, type=None, format=None, copy_partitioning_from='FIRST_INPUT', append=False, overwrite=False)¶

Create a new managed dataset as output to the recipe-to-be-created. The dataset is created immediately

Parameters

name (str) – name of the dataset to create
connection_id (str) – name of the connection to create the dataset on
type (str) – type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection
format (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC. If None, uses the default
copy_partitioning_from (str) – Whether to copy the partitioning from another thing. Use None for not partitioning the output, “FIRST_INPUT” to copy from the first input of the recipe, “dataset:XXX” to copy from a dataset name, or “folder:XXX” to copy from a folder id
append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
overwrite – If the dataset being created already exists, overwrite it (and delete data)

class dataikuapi.dss.recipe.PythonRecipeCreator(name, project)¶

Creates a Python recipe. A Python recipe can be defined either by its complete code, like a normal Python recipe, or by a function signature.

When using a function, the function must take as arguments:

A list of dataframes corresponding to the dataframes of the input datasets
Optional named arguments corresponding to arguments passed to the creator

DEFAULT_RECIPE_CODE_TMPL = '\n# This code is autogenerated by PythonRecipeCreator function mode\nimport dataiku, dataiku.recipe, json\nfrom {module_name} import {fname}\ninput_datasets = dataiku.recipe.get_inputs_as_datasets()\noutput_datasets = dataiku.recipe.get_outputs_as_datasets()\nparams = json.loads(\'{params_json}\')\n\nlogging.info("Reading %d input datasets as dataframes" % len(input_datasets))\ninput_dataframes = [ds.get_dataframe() for ds in input_datasets]\n\nlogging.info("Calling user function {fname}")\nfunction_input = input_dataframes if len(input_dataframes) > 1 else input_dataframes[0]\noutput_dataframes = {fname}(function_input, **params)\n\nif not isinstance(output_dataframes, list):\n output_dataframes = [output_dataframes]\n\nif not len(output_dataframes) == len(output_datasets):\n raise Exception("Code function {fname}() returned %d dataframes but recipe expects %d output datasets", \\\n (len(output_dataframes), len(output_datasets)))\noutput = list(zip(output_datasets, output_dataframes))\nfor ds, df in output:\n logging.info("Writing function result to dataset %s" % ds.name)\n ds.write_with_schema(df)\n'¶

with_function_name(module_name, function_name, custom_template=None, **function_args)¶: Defines this recipe as being a functional recipe calling a function name from a module name

with_function(fn, custom_template=None, **function_args)¶

class dataikuapi.dss.recipe.SQLQueryRecipeCreator(name, project)¶: Create a SQL query recipe

class dataikuapi.dss.recipe.PrepareRecipeCreator(name, project)¶: Create a Prepare recipe

class dataikuapi.dss.recipe.SyncRecipeCreator(name, project)¶: Create a Sync recipe

class dataikuapi.dss.recipe.SamplingRecipeCreator(name, project)¶: Create a Sample/Filter recipe

class dataikuapi.dss.recipe.DistinctRecipeCreator(name, project)¶: Create a Distinct recipe

class dataikuapi.dss.recipe.GroupingRecipeCreator(name, project)¶

Create a Group recipe

with_group_key(group_key)¶

Set a column as the first grouping key. Only a single grouping key may be set at recipe creation time. For additional groupings, get the recipe settings

Parameters: group_key (str) – name of a column in the input dataset

class dataikuapi.dss.recipe.SortRecipeCreator(name, project)¶: Create a Sort recipe

class dataikuapi.dss.recipe.TopNRecipeCreator(name, project)¶: Create a TopN recipe

class dataikuapi.dss.recipe.WindowRecipeCreator(name, project)¶: Create a Window recipe

class dataikuapi.dss.recipe.JoinRecipeCreator(name, project)¶: Create a Join recipe

class dataikuapi.dss.recipe.FuzzyJoinRecipeCreator(name, project)¶: Create a FuzzyJoin recipe

class dataikuapi.dss.recipe.GeoJoinRecipeCreator(name, project)¶: Create a GeoJoin recipe

class dataikuapi.dss.recipe.SplitRecipeCreator(name, project)¶: Create a Split recipe

class dataikuapi.dss.recipe.StackRecipeCreator(name, project)¶: Create a Stack recipe

class dataikuapi.dss.recipe.DownloadRecipeCreator(name, project)¶: Create a Download recipe

class dataikuapi.dss.recipe.PredictionScoringRecipeCreator(name, project)¶

Builder for the creation of a new “Prediction scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = PredictionScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):

with_input_model(model_id)¶: Sets the input model

class dataikuapi.dss.recipe.ClusteringScoringRecipeCreator(name, project)¶

Builder for the creation of a new “Clustering scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = ClusteringScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):

with_input_model(model_id)¶: Sets the input model

class dataikuapi.dss.recipe.EvaluationRecipeCreator(name, project)¶

Builder for the creation of a new “Evaluate” recipe, from an input dataset, with an input saved model identifier

# Create a new evaluation recipe outputing to a new dataset, to a metrics dataset and/or to a model evaluation store

project = client.get_project("MYPROJECT")
builder = project.new_recipe("evaluation")
builder.with_input_model(saved_model_id)
builder.with_input("dataset_to_evaluate")

builder.with_output("output_scored")
builder.with_output_metrics("output_metrics")
builder.with_output_evaluation_store(evaluation_store_id)

new_recipe = builder.build()

# Access the settings

er_settings = new_recipe.get_settings()
payload = er_settings.obj_payload

# Change the settings

payload['dontComputePerformance'] = True
payload['outputProbabilities'] = False
payload['metrics'] = ["precision", "recall", "auc", "f1", "costMatrixGain"]

# Manage evaluation labels

payload['labels'] = [dict(key="label_1", value="value_1"), dict(key="label_2", value="value_2")]

# Save the settings and run the recipe

er_settings.save()

new_recipe.run()

Outputs must exist. They can be created using the following:

builder = project.new_managed_dataset("output_scored")
builder.with_store_into(connection)
dataset = builder.create()

builder = project.new_managed_dataset("output_scored")
builder.with_store_into(connection)
dataset = builder.create()

evaluation_store_id = project.create_model_evaluation_store("output_model_evaluation").mes_id

with_input_model(model_id)¶: Sets the input model

with_output(name)¶: Sets the ouput dataset containing the scored input

with_output_metrics(name)¶: Sets the output dataset containing the metrics

with_output_evaluation_store(mes_id)¶: Sets the output model evaluation store

class dataikuapi.dss.recipe.StandaloneEvaluationRecipeCreator(name, project)¶

Builder for the creation of a new “Standalone Evaluate” recipe, from an input dataset

# Create a new standalone evaluation of a scored dataset

project = client.get_project("MYPROJECT")
builder = project.new_recipe("standalone_evaluation")
builder.with_input("scored_dataset_to_evaluate")
builder.with_output_evaluation_store(evaluation_store_id)

# Add a reference dataset (optional) to compute data drift

builder.with_reference_dataset("reference_dataset")

# Finish creation of the recipe

new_recipe = builder.create()

# Modify the model parameters in the SER settings

ser_settings = new_recipe.get_settings()
payload = ser_settings.obj_payload

payload['predictionType'] = "BINARY_CLASSIFICATION"
payload['targetVariable'] = "Survived"
payload['predictionVariable'] = "prediction"
payload['isProbaAware'] = True
payload['dontComputePerformance'] = False

# For a classification model with probabilities, the 'probas' section can be filled with the mapping of the class and the probability column
# e.g. for a binary classification model with 2 columns: proba_0 and proba_1

class_0 = dict(key=0, value="proba_0")
class_1 = dict(key=1, value="proba_1")
payload['probas'] = [class_0, class_1]

# Change the 'features' settings for this standalone evaluation
# e.g. reject the features that you do not want to use in the evaluation

feature_passengerid = dict(name="Passenger_Id", role="REJECT", type="TEXT")
feature_ticket = dict(name="Ticket", role="REJECT", type="TEXT")
feature_cabin = dict(name="Cabin", role="REJECT", type="TEXT")

payload['features'] = [feature_passengerid, feature_ticket, feature_cabin]

# To set the cost matrix properly, access the 'metricParams' section of the payload and set the cost matrix weights:

payload['metricParams'] = dict(costMatrixWeights=dict(tpGain=0.4, fpGain=-1.0, tnGain=0.2, fnGain=-0.5))

# Save the recipe and run the recipe
# Note that with this method, all the settings that were not explicitly set are instead set to their default value.

ser_settings.save()

new_recipe.run()

Output model evaluation store must exist. It can be created using the following:

evaluation_store_id = project.create_model_evaluation_store("output_model_evaluation").mes_id

with_output_evaluation_store(mes_id)¶: Sets the output model evaluation store

with_reference_dataset(dataset_name)¶: Sets the dataset to use as a reference in data drift computation (optional).

Utilities ¶

class dataikuapi.dss.utils.DSSComputedColumn¶

static formula(name, formula, type='double')¶

class dataikuapi.dss.utils.DSSFilter¶

Helper class to build filter objects for use in visual recipes

static of_single_condition(column, operator, string=None, num=None, date=None, time=None, date2=None, time2=None, unit=None)¶

static of_and_conditions(conditions)¶

static of_or_conditions(conditions)¶

static of_formula(formula)¶

static of_sql_expression(sql_expression)¶

static condition(column, operator, string=None, num=None, date=None, time=None, date2=None, time2=None, unit=None)¶

class dataikuapi.dss.utils.DSSFilterOperator(value)¶

An enumeration.

EMPTY_ARRAY = 'empty array'¶

NOT_EMPTY_ARRAY = 'not empty array'¶

CONTAINS_ARRAY = 'array contains'¶

NOT_EMPTY = 'not empty'¶

EMPTY = 'is empty'¶

NOT_EMPTY_STRING = 'not empty string'¶

EMPTY_STRING = 'empty string'¶

IS_TRUE = 'true'¶

IS_FALSE = 'false'¶

EQUALS_STRING = '== [string]'¶

EQUALS_CASE_INSENSITIVE_STRING = '== [string]i'¶

NOT_EQUALS_STRING = '!= [string]'¶

SAME = '== [NaNcolumn]'¶

DIFFERENT = '!= [NaNcolumn]'¶

EQUALS_NUMBER = '== [number]'¶

NOT_EQUALS_NUMBER = '!= [number]'¶

GREATER_NUMBER = '> [number]'¶

LESS_NUMBER = '< [number]'¶

GREATER_OR_EQUAL_NUMBER = '>= [number]'¶

LESS_OR_EQUAL_NUMBER = '<= [number]'¶

EQUALS_DATE = '== [date]'¶

GREATER_DATE = '> [date]'¶

GREATER_OR_EQUAL_DATE = '>= [date]'¶

LESS_DATE = '< [date]'¶

LESS_OR_EQUAL_DATE = '<= [date]'¶

BETWEEN_DATE = '>< [date]'¶

EQUALS_COL = '== [column]'¶

NOT_EQUALS_COL = '!= [column]'¶

GREATER_COL = '> [column]'¶

LESS_COL = '< [column]'¶

GREATER_OR_EQUAL_COL = '>= [column]'¶

LESS_OR_EQUAL_COL = '<= [column]'¶

CONTAINS_STRING = 'contains'¶

REGEX = 'regex'¶