Recipes

This page lists usage examples for performing various operations with recipes through Dataiku Python API. In all examples, project is a dataikuapi.dss.project.DSSProject handle, obtained using client.get_project() or client.get_default_project()

Basic operations

Listing recipes

recipes = project.list_recipes()
# Returns a list of DSSRecipeListItem

for recipe in recipes:
        # Quick access to main information in the recipe list item
        print("Name: %s" % recipe.name)
        print("Type: %s" % recipe.type)
        print("Tags: %s" % recipe.tags) # Returns a list of strings

        # You can also use the list item as a dict of all available recipe information
        print("Raw: %s" % recipe)

Deleting a recipe

recipe = project.get_recipe('myrecipe')
recipe.delete()

Modifying tags for a recipe

recipe = project.get_recipe('myrecipe')
settings = dataset.get_settings()

print("Current tags are %s" % settings.tags)

# Change the tags
settings.tags = ["newtag1", "newtag2"]

# If we changed the settings, we must save
settings.save()

Recipe status

You can compute the status of the recipe, which also provides you with the engine information.

Find the engine used to run a recipe

recipe = project.get_recipe("myrecipe")
status = recipe.get_status()
print(status.get_selected_engine_details())

Check if a recipe is valid

get_status calls the validation code of the recipe

recipe = project.get_recipe("myrecipe")
status = recipe.get_status()
print(status.get_selected_engine_details())

Find the engines for all recipes of a certain type

This example shows how to filter a list, obtain DSSRecipe objects for the list items, and getting their status

for list_item in project.list_recipes():
        if list_item.type == "grouping":
                recipe = list_item.to_recipe()
                engine = recipe.get_status().get_selected_engine_details()["type"]
                print("Recipe %s uses engine %s" % (recipe.name, engine))

Recipe settings

When you use get_settings() on a recipe, you receive a settings object whose class depends on the recipe type. Please see below for the possible types.

Checking if a recipe uses a particular dataset as input

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()
print("Recipe %s uses input:%s" % (recipe.name, settings.has_input("mydataset"))

Replacing an input of a recipe

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()

settings.replace_input("old_input", "new_input")
settings.save()

Setting the code env of a code recipe

recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()

# Use this to set the recipe to inherit the project's code env
settings.set_code_env(inherit=True)

# Use this to set the recipe to use a specific code env
settings.set_code_env(code_env="myenv")

settings.save()

Reference documentation

class dataikuapi.dss.recipe.DSSRecipe(client, project_key, recipe_name)

A handle to an existing recipe on the DSS instance. Do not create this directly, use dataikuapi.dss.project.DSSProject.get_recipe()

property id

The id of the recipe

property name

The name of the recipe

compute_schema_updates()

Computes which updates are required to the outputs of this recipe. The required updates are returned as a RequiredSchemaUpdates object, which then allows you to apply() the changes.

Usage example:

required_updates = recipe.compute_schema_updates()
if required_updates.any_action_required():
    print("Some schemas will be updated")

# Note that you can call apply even if no changes are required. This will be noop
required_updates.apply()
run(job_type='NON_RECURSIVE_FORCED_BUILD', partitions=None, wait=True, no_fail=False)

Starts a new job to run this recipe and wait for it to complete. Raises if the job failed.

job = recipe.run()
print("Job %s done" % job.id)
Parameters
  • job_type – The job type. One of RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD or RECURSIVE_FORCED_BUILD

  • partitions – If the outputs are partitioned, a list of partition ids to build

  • no_fail – if True, does not raise if the job failed.

Returns

the dataikuapi.dss.job.DSSJob job handle corresponding to the built job

Return type

dataikuapi.dss.job.DSSJob

delete()

Delete the recipe

get_settings()

Gets the settings of the recipe, as a DSSRecipeSettings or one of its subclasses.

Some recipes have a dedicated class for the settings, with additional helpers to read and modify the settings

Once you are done modifying the returned settings object, you can call save() on it in order to save the modifications to the DSS recipe

get_definition_and_payload()

Deprecated. Use get_settings()

set_definition_and_payload(definition)

Deprecated. Use get_settings() and DSSRecipeSettings.save()

get_status()

Gets the status of this recipe (status messages, engines status, …)

Returns

a dataikuapi.dss.recipe.DSSRecipeStatus object to interact with the status

Return type

dataikuapi.dss.recipe.DSSRecipeStatus

get_metadata()

Get the metadata attached to this recipe. The metadata contains label, description checklists, tags and custom metadata of the recipe

Returns

a dict. For more information on available metadata, please see https://doc.dataiku.com/dss/api/8.0/rest/

:rtype dict

set_metadata(metadata)

Set the metadata on this recipe. :params dict metadata: the new state of the metadata for the recipe. You should only set a metadata object

that has been retrieved using the get_metadata call.

get_object_discussions()

Get a handle to manage discussions on the recipe

Returns

the handle to manage discussions

Return type

dataikuapi.discussion.DSSObjectDiscussions

get_continuous_activity()

Return a handle on the associated recipe

move_to_zone(zone)

Moves this object to a flow zone

Parameters

zone (object) – a dataikuapi.dss.flow.DSSFlowZone where to move the object

class dataikuapi.dss.recipe.DSSRecipeListItem(client, data)

An item in a list of recipes. Do not instantiate this class, use dataikuapi.dss.project.DSSProject.list_recipes()

to_recipe()

Gets the DSSRecipe corresponding to this dataset

property name
property id
property type
property tags
class dataikuapi.dss.recipe.DSSRecipeStatus(client, data)

Status of a recipce. Do not create that directly, use DSSRecipe.get_status()

get_selected_engine_details()

Gets the selected engine for this recipe (for recipes that support engines)

Returns

a dict of the details of the selected recipe. The dict will contain at least fields ‘type’ indicating which engine it is, “statusWarnLevel” which indicates whether the engine is OK / WARN / ERROR

Return type

dict

get_engines_details()

Gets details about all possible engines for this recipe (for recipes that support engines)

Returns

a list of dict of the details of each possible engine. The dict for each engine will contain at least fields ‘type’ indicating which engine it is, “statusWarnLevel” which indicates whether the engine is OK / WARN / ERROR

Return type

list

get_status_severity()

Returns whether the recipe is in SUCCESS, WARNING or ERROR status

Return type

string

get_status_messages()

Returns status messages for this recipe.

Returns

a list of dict, for each status message. Each dict represents a single message, and contains at least a “severity” field (SUCCESS, WARNING or ERROR) and a “message” field

Return type

list

class dataikuapi.dss.recipe.RequiredSchemaUpdates(recipe, data)

Representation of the updates required to the schema of the outputs of a recipe. Do not create this class directly, use DSSRecipe.compute_schema_updates()

any_action_required()
apply()

Settings

class dataikuapi.dss.recipe.DSSRecipeSettings(recipe, data)

Settings of a recipe. Do not create this directly, use DSSRecipe.get_settings()

save()

Saves back the recipe in DSS.

property type
property str_payload

The raw “payload” of the recipe, as a string

property obj_payload

The raw “payload” of the recipe, as a dict

property raw_params

The raw ‘params’ field of the recipe settings, as a dict

get_recipe_raw_definition()

Get the recipe definition as a raw dict :rtype dict

get_recipe_inputs()

Get a structured dict of inputs to this recipe :rtype dict

get_recipe_outputs()

Get a structured dict of outputs of this recipe :rtype dict

get_recipe_params()

Get the parameters of this recipe, as a dict :rtype dict

get_payload()

Get the payload or script of this recipe, as a string :rtype string

get_json_payload()

Get the payload or script of this recipe, parsed from JSON, as a dict :rtype dict

set_payload(payload)

Set the payload of this recipe :param str payload: the payload, as a string

set_json_payload(payload)

Set the payload of this recipe :param dict payload: the payload, as a dict. The payload will be converted to a JSON string internally

has_input(input_ref)

Returns whether this recipe has a given ref as input

has_output(output_ref)

Returns whether this recipe has a given ref as output

replace_input(current_input_ref, new_input_ref)

Replaces an object reference as input of this recipe by another

replace_output(current_output_ref, new_output_ref)

Replaces an object reference as output of this recipe by another

add_input(role, ref, partition_deps=None)
add_output(role, ref, append_mode=False)
get_flat_input_refs()

Returns a list of all input refs of this recipe, regardless of the input role :rtype list of strings

get_flat_output_refs()

Returns a list of all output refs of this recipe, regardless of the output role :rtype list of strings

property custom_fields

The custom fields of the object as a dict. Returns None if there are no custom fields

property description

The description of the object as a string

property short_description

The short description of the object as a string

property tags

The tags of the object, as a list of strings

class dataikuapi.dss.recipe.DSSRecipeDefinitionAndPayload(recipe, data)

Deprecated. Settings of a recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.CodeRecipeSettings(recipe, data)

Settings of a code recipe. Do not create this directly, use DSSRecipe.get_settings()

get_code()

Returns the code of the recipe as a string :rtype string

set_code(code)

Updates the code of the recipe :param str code: The new code as a string

get_code_env_settings()

Returns the code env settings for this recipe :rtype dict

set_code_env(code_env=None, inherit=False, use_builtin=False)

Sets the code env to use for this recipe.

Exactly one of code_env, inherit or use_builtin must be passed

Parameters
  • code_env (str) – The name of a code env

  • inherit (bool) – Use the project’s default code env

  • use_builtin (bool) – Use the builtin code env

class dataikuapi.dss.recipe.SyncRecipeSettings(recipe, data)

Settings of a sync recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.PrepareRecipeSettings(recipe, data)

Settings of a prepare recipe. Do not create this directly, use DSSRecipe.get_settings()

property raw_steps

Returns a raw list of the steps of this prepare recipe. You can modify the returned list.

Each step is a dict of settings. The precise settings for each step are not documented

add_processor_step(type, params)
add_filter_on_bad_meaning(meaning, columns)
class dataikuapi.dss.recipe.SamplingRecipeSettings(recipe, data)

Settings of a sampling recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.GroupingRecipeSettings(recipe, data)

Settings of a grouping recipe. Do not create this directly, use DSSRecipe.get_settings()

clear_grouping_keys()

Removes all grouping keys from this grouping recipe

add_grouping_key(column)

Adds grouping on a column :param str column: Column to group on

set_global_count_enabled(enabled)
get_or_create_column_settings(column)

Gets a dict representing the aggregations to perform on a column. Creates it and adds it to the potential aggregations if it does not already exists :param str column: The column name :rtype dict

set_column_aggregations(column, type, min=False, max=False, count=False, count_distinct=False, sum=False, concat=False, stddev=False, avg=False)

Sets the basic aggregations on a column. Returns the dict representing the aggregations on the column

Parameters
  • column (str) – The column name

  • type (str) – The type of the column (as a DSS schema type name)

:rtype dict

class dataikuapi.dss.recipe.SortRecipeSettings(recipe, data)

Settings of a sort recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.TopNRecipeSettings(recipe, data)

Settings of a topn recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.DistinctRecipeSettings(recipe, data)

Settings of a distinct recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.WindowRecipeSettings(recipe, data)

Settings of a window recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.JoinRecipeSettings(recipe, data)

Settings of a join recipe. Do not create this directly, use DSSRecipe.get_settings()

In order to enable self-joins, join recipes are based on a concept of “virtual inputs”. Every join, computed pre-join column, pre-join filter, … is based on one virtual input, and each virtual input references an input of the recipe, by index

For example, if a recipe has inputs A and B and declares two joins:
  • A->B

  • A->A(based on a computed column)

There are 3 virtual inputs:
  • 0: points to recipe input 0 (i.e. dataset A)

  • 1: points to recipe input 1 (i.e. dataset B)

  • 2: points to recipe input 0 (i.e. dataset A) and includes the computed column

  • The first join is between virtual inputs 0 and 1

  • The second join is between virtual inputs 0 and 2

property raw_virtual_inputs

Returns the raw list of virtual inputs :rtype list of dict

property raw_joins

Returns the raw list of joins :rtype list of dict

add_virtual_input(input_dataset_index)

Adds a virtual input pointing to the specified input dataset of the recipe (referenced by index in the inputs list)

add_pre_join_computed_column(virtual_input_index, computed_column)

Adds a computed column to a virtual input

Use dataikuapi.dss.utils.DSSComputedColumn to build the computed_column object

add_join(join_type='LEFT', input1=0, input2=1)

Adds a join between two virtual inputs. The join is initialized with no condition.

Use add_condition_to_join() on the return value to add a join condition (for example column equality) to the join

:returns the newly added join as a dict :rtype dict

add_condition_to_join(join, type='EQ', column1=None, column2=None)

Adds a condition to a join :param str column1: Name of “left” column :param str column2: Name of “right” column

add_post_join_computed_column(computed_column)

Adds a post-join computed column

Use dataikuapi.dss.utils.DSSComputedColumn to build the computed_column object

set_post_filter(postfilter)
class dataikuapi.dss.recipe.DownloadRecipeSettings(recipe, data)

Settings of a download recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.SplitRecipeSettings(recipe, data)

Settings of a split recipe. Do not create this directly, use DSSRecipe.get_settings()

class dataikuapi.dss.recipe.StackRecipeSettings(recipe, data)

Settings of a stack recipe. Do not create this directly, use DSSRecipe.get_settings()

Creation

class dataikuapi.dss.recipe.DSSRecipeCreator(type, name, project)

Helper to create new recipes

Parameters
  • type (str) – type of the recipe

  • name (str) – name for the recipe

:param dataikuapi.dss.project.DSSProject project: project in which the recipe will be created

set_name(name)
with_input(dataset_name, project_key=None, role='main')

Add an existing object as input to the recipe-to-be-created

Parameters
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model

  • project_key – project containing the object, if different from the one where the recipe is created

  • role (str) – the role of the recipe in which the input should be added

with_output(dataset_name, append=False, role='main')

The output dataset must already exist. If you are creating a visual recipe with a single output, use with_existing_output

Parameters
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model

  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)

  • role (str) – the role of the recipe in which the input should be added

build()

Deprecated. Use create()

create()

Creates the new recipe in the project, and return a handle to interact with it.

Returns:

A dataikuapi.dss.recipe.DSSRecipe recipe handle

set_raw_mode()
class dataikuapi.dss.recipe.SingleOutputRecipeCreator(type, name, project)

Create a recipe that has a single output

with_existing_output(dataset_name, append=False)

Add an existing object as output to the recipe-to-be-created

Parameters
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model

  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)

with_new_output(name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET', overwrite=False)

Create a new dataset as output to the recipe-to-be-created. The dataset is not created immediately, but when the recipe is created (ie in the create() method)

Parameters
  • name (str) – name of the dataset or identifier of the managed folder

  • connection_id (str) – name of the connection to create the dataset on

  • typeOptionId (str) – sub-type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection

  • format_option_id (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC

  • override_sql_schema – schema to force dataset, for SQL dataset. If left empty, will be autodetected

  • partitioning_option_id (str) – to copy the partitioning schema of an existing dataset ‘foo’, pass a value of ‘copy:dataset:foo’

  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)

  • object_type (str) – DATASET or MANAGED_FOLDER

  • overwrite – If the dataset being created already exists, overwrite it (and delete data)

with_output(dataset_name, append=False)

Alias of with_existing_output

class dataikuapi.dss.recipe.VirtualInputsSingleOutputRecipeCreator(type, name, project)

Create a recipe that has a single output and several inputs

with_input(dataset_name, project_key=None)

Add an existing object as input to the recipe-to-be-created

Parameters
  • dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model

  • project_key – project containing the object, if different from the one where the recipe is created

  • role (str) – the role of the recipe in which the input should be added

class dataikuapi.dss.recipe.CodeRecipeCreator(name, type, project)
with_script(script)

Set the code of the recipe

Parameters

script (str) – the script of the recipe

with_new_output_dataset(name, connection, type=None, format=None, copy_partitioning_from='FIRST_INPUT', append=False, overwrite=False)

Create a new managed dataset as output to the recipe-to-be-created. The dataset is created immediately

Parameters
  • name (str) – name of the dataset to create

  • connection_id (str) – name of the connection to create the dataset on

  • type (str) – type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection

  • format (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC. If None, uses the default

  • copy_partitioning_from (str) – Whether to copy the partitioning from another thing. Use None for not partitioning the output, “FIRST_INPUT” to copy from the first input of the recipe, “dataset:XXX” to copy from a dataset name, or “folder:XXX” to copy from a folder id

  • append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)

  • overwrite – If the dataset being created already exists, overwrite it (and delete data)

class dataikuapi.dss.recipe.PythonRecipeCreator(name, project)

Creates a Python recipe. A Python recipe can be defined either by its complete code, like a normal Python recipe, or by a function signature.

When using a function, the function must take as arguments:
  • A list of dataframes corresponding to the dataframes of the input datasets

  • Optional named arguments corresponding to arguments passed to the creator

DEFAULT_RECIPE_CODE_TMPL = '\n# This code is autogenerated by PythonRecipeCreator function mode\nimport dataiku, dataiku.recipe, json\nfrom {module_name} import {fname}\ninput_datasets = dataiku.recipe.get_inputs_as_datasets()\noutput_datasets = dataiku.recipe.get_outputs_as_datasets()\nparams = json.loads(\'{params_json}\')\n\nlogging.info("Reading %d input datasets as dataframes" % len(input_datasets))\ninput_dataframes = [ds.get_dataframe() for ds in input_datasets]\n\nlogging.info("Calling user function {fname}")\nfunction_input = input_dataframes if len(input_dataframes) > 1 else input_dataframes[0]\noutput_dataframes = {fname}(function_input, **params)\n\nif not isinstance(output_dataframes, list):\n output_dataframes = [output_dataframes]\n\nif not len(output_dataframes) == len(output_datasets):\n raise Exception("Code function {fname}() returned %d dataframes but recipe expects %d output datasets", \\\n (len(output_dataframes), len(output_datasets)))\noutput = list(zip(output_datasets, output_dataframes))\nfor ds, df in output:\n logging.info("Writing function result to dataset %s" % ds.name)\n ds.write_with_schema(df)\n'
with_function_name(module_name, function_name, custom_template=None, **function_args)

Defines this recipe as being a functional recipe calling a function name from a module name

with_function(fn, custom_template=None, **function_args)
class dataikuapi.dss.recipe.SQLQueryRecipeCreator(name, project)

Create a SQL query recipe

class dataikuapi.dss.recipe.PrepareRecipeCreator(name, project)

Create a Prepare recipe

class dataikuapi.dss.recipe.SyncRecipeCreator(name, project)

Create a Sync recipe

class dataikuapi.dss.recipe.SamplingRecipeCreator(name, project)

Create a Sample/Filter recipe

class dataikuapi.dss.recipe.DistinctRecipeCreator(name, project)

Create a Distinct recipe

class dataikuapi.dss.recipe.GroupingRecipeCreator(name, project)

Create a Group recipe

with_group_key(group_key)

Set a column as the first grouping key. Only a single grouping key may be set at recipe creation time. For additional groupings, get the recipe settings

Parameters

group_key (str) – name of a column in the input dataset

class dataikuapi.dss.recipe.SortRecipeCreator(name, project)

Create a Sort recipe

class dataikuapi.dss.recipe.TopNRecipeCreator(name, project)

Create a TopN recipe

class dataikuapi.dss.recipe.WindowRecipeCreator(name, project)

Create a Window recipe

class dataikuapi.dss.recipe.JoinRecipeCreator(name, project)

Create a Join recipe

class dataikuapi.dss.recipe.FuzzyJoinRecipeCreator(name, project)

Create a FuzzyJoin recipe

class dataikuapi.dss.recipe.GeoJoinRecipeCreator(name, project)

Create a GeoJoin recipe

class dataikuapi.dss.recipe.SplitRecipeCreator(name, project)

Create a Split recipe

class dataikuapi.dss.recipe.StackRecipeCreator(name, project)

Create a Stack recipe

class dataikuapi.dss.recipe.DownloadRecipeCreator(name, project)

Create a Download recipe

class dataikuapi.dss.recipe.PredictionScoringRecipeCreator(name, project)

Builder for the creation of a new “Prediction scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = PredictionScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
with_input_model(model_id)

Sets the input model

class dataikuapi.dss.recipe.ClusteringScoringRecipeCreator(name, project)

Builder for the creation of a new “Clustering scoring” recipe, from an input dataset, with an input saved model identifier

# Create a new prediction scoring recipe outputing to a new dataset

project = client.get_project("MYPROJECT")
builder = ClusteringScoringRecipeCreator("my_scoring_recipe", project)
builder.with_input_model("saved_model_id")
builder.with_input("dataset_to_score")
builder.with_new_output("my_output_dataset", "myconnection")

# Or for a filesystem output connection
# builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP")

new_recipe = builder.build()

def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
with_input_model(model_id)

Sets the input model

class dataikuapi.dss.recipe.EvaluationRecipeCreator(name, project)

Builder for the creation of a new “Evaluate” recipe, from an input dataset, with an input saved model identifier

# Create a new evaluation recipe outputing to a new dataset, to a metrics dataset and/or to a model evaluation store

project = client.get_project("MYPROJECT")
builder = project.new_recipe("evaluation")
builder.with_input_model(saved_model_id)
builder.with_input("dataset_to_evaluate")

builder.with_output("output_scored")
builder.with_output_metrics("output_metrics")
builder.with_output_evaluation_store(evaluation_store_id)

new_recipe = builder.build()

# Access the settings

er_settings = new_recipe.get_settings()
payload = er_settings.obj_payload

# Change the settings

payload['dontComputePerformance'] = True
payload['outputProbabilities'] = False
payload['metrics'] = ["precision", "recall", "auc", "f1", "costMatrixGain"]

# Manage evaluation labels

payload['labels'] = [dict(key="label_1", value="value_1"), dict(key="label_2", value="value_2")]

# Save the settings and run the recipe

er_settings.save()

new_recipe.run()

Outputs must exist. They can be created using the following:

builder = project.new_managed_dataset("output_scored")
builder.with_store_into(connection)
dataset = builder.create()

builder = project.new_managed_dataset("output_scored")
builder.with_store_into(connection)
dataset = builder.create()

evaluation_store_id = project.create_model_evaluation_store("output_model_evaluation").mes_id
with_input_model(model_id)

Sets the input model

with_output(name)

Sets the ouput dataset containing the scored input

with_output_metrics(name)

Sets the output dataset containing the metrics

with_output_evaluation_store(mes_id)

Sets the output model evaluation store

class dataikuapi.dss.recipe.StandaloneEvaluationRecipeCreator(name, project)

Builder for the creation of a new “Standalone Evaluate” recipe, from an input dataset

# Create a new standalone evaluation of a scored dataset

project = client.get_project("MYPROJECT")
builder = project.new_recipe("standalone_evaluation")
builder.with_input("scored_dataset_to_evaluate")
builder.with_output_evaluation_store(evaluation_store_id)

# Add a reference dataset (optional) to compute data drift

builder.with_reference_dataset("reference_dataset")

# Finish creation of the recipe

new_recipe = builder.create()

# Modify the model parameters in the SER settings

ser_settings = new_recipe.get_settings()
payload = ser_settings.obj_payload

payload['predictionType'] = "BINARY_CLASSIFICATION"
payload['targetVariable'] = "Survived"
payload['predictionVariable'] = "prediction"
payload['isProbaAware'] = True
payload['dontComputePerformance'] = False

# For a classification model with probabilities, the 'probas' section can be filled with the mapping of the class and the probability column
# e.g. for a binary classification model with 2 columns: proba_0 and proba_1

class_0 = dict(key=0, value="proba_0")
class_1 = dict(key=1, value="proba_1")
payload['probas'] = [class_0, class_1]

# Change the 'features' settings for this standalone evaluation
# e.g. reject the features that you do not want to use in the evaluation

feature_passengerid = dict(name="Passenger_Id", role="REJECT", type="TEXT")
feature_ticket = dict(name="Ticket", role="REJECT", type="TEXT")
feature_cabin = dict(name="Cabin", role="REJECT", type="TEXT")

payload['features'] = [feature_passengerid, feature_ticket, feature_cabin]

# To set the cost matrix properly, access the 'metricParams' section of the payload and set the cost matrix weights:

payload['metricParams'] = dict(costMatrixWeights=dict(tpGain=0.4, fpGain=-1.0, tnGain=0.2, fnGain=-0.5))

# Save the recipe and run the recipe
# Note that with this method, all the settings that were not explicitly set are instead set to their default value.

ser_settings.save()

new_recipe.run()

Output model evaluation store must exist. It can be created using the following:

evaluation_store_id = project.create_model_evaluation_store("output_model_evaluation").mes_id
with_output_evaluation_store(mes_id)

Sets the output model evaluation store

with_reference_dataset(dataset_name)

Sets the dataset to use as a reference in data drift computation (optional).

Utilities

class dataikuapi.dss.utils.DSSComputedColumn
static formula(name, formula, type='double')
class dataikuapi.dss.utils.DSSFilter

Helper class to build filter objects for use in visual recipes

static of_single_condition(column, operator, string=None, num=None, date=None, time=None, date2=None, time2=None, unit=None)
static of_and_conditions(conditions)
static of_or_conditions(conditions)
static of_formula(formula)
static of_sql_expression(sql_expression)
static condition(column, operator, string=None, num=None, date=None, time=None, date2=None, time2=None, unit=None)
class dataikuapi.dss.utils.DSSFilterOperator(value)

An enumeration.

EMPTY_ARRAY = 'empty array'
NOT_EMPTY_ARRAY = 'not empty array'
CONTAINS_ARRAY = 'array contains'
NOT_EMPTY = 'not empty'
EMPTY = 'is empty'
NOT_EMPTY_STRING = 'not empty string'
EMPTY_STRING = 'empty string'
IS_TRUE = 'true'
IS_FALSE = 'false'
EQUALS_STRING = '== [string]'
EQUALS_CASE_INSENSITIVE_STRING = '== [string]i'
NOT_EQUALS_STRING = '!= [string]'
SAME = '== [NaNcolumn]'
DIFFERENT = '!= [NaNcolumn]'
EQUALS_NUMBER = '== [number]'
NOT_EQUALS_NUMBER = '!= [number]'
GREATER_NUMBER = '> [number]'
LESS_NUMBER = '< [number]'
GREATER_OR_EQUAL_NUMBER = '>= [number]'
LESS_OR_EQUAL_NUMBER = '<= [number]'
EQUALS_DATE = '== [date]'
GREATER_DATE = '> [date]'
GREATER_OR_EQUAL_DATE = '>= [date]'
LESS_DATE = '< [date]'
LESS_OR_EQUAL_DATE = '<= [date]'
BETWEEN_DATE = '>< [date]'
EQUALS_COL = '== [column]'
NOT_EQUALS_COL = '!= [column]'
GREATER_COL = '> [column]'
LESS_COL = '< [column]'
GREATER_OR_EQUAL_COL = '>= [column]'
LESS_OR_EQUAL_COL = '<= [column]'
CONTAINS_STRING = 'contains'
REGEX = 'regex'