Recipes¶
This page lists usage examples for performing various operations with recipes through Dataiku Python API. In all examples, project is a dataikuapi.dss.project.DSSProject
handle, obtained using client.get_project() or client.get_default_project()
Basic operations¶
Listing recipes¶
recipes = project.list_recipes()
# Returns a list of DSSRecipeListItem
for recipe in recipes:
# Quick access to main information in the recipe list item
print("Name: %s" % recipe.name)
print("Type: %s" % recipe.type)
print("Tags: %s" % recipe.tags) # Returns a list of strings
# You can also use the list item as a dict of all available recipe information
print("Raw: %s" % recipe)
Deleting a recipe¶
recipe = project.get_recipe('myrecipe')
recipe.delete()
Modifying tags for a recipe¶
recipe = project.get_recipe('myrecipe')
settings = dataset.get_settings()
print("Current tags are %s" % settings.tags)
# Change the tags
settings.tags = ["newtag1", "newtag2"]
# If we changed the settings, we must save
settings.save()
Recipe creation¶
Please see Flow creation and management
Recipe status¶
You can compute the status of the recipe, which also provides you with the engine information.
Find the engine used to run a recipe¶
recipe = project.get_recipe("myrecipe")
status = recipe.get_status()
print(status.get_selected_engine_details())
Check if a recipe is valid¶
get_status calls the validation code of the recipe
recipe = project.get_recipe("myrecipe")
status = recipe.get_status()
print(status.get_selected_engine_details())
Find the engines for all recipes of a certain type¶
This example shows how to filter a list, obtain DSSRecipe
objects for the list items, and getting their status
for list_item in project.list_recipes():
if list_item.type == "grouping":
recipe = list_item.to_recipe()
engine = recipe.get_status().get_selected_engine_details()["type"]
print("Recipe %s uses engine %s" % (recipe.name, engine))
Recipe settings¶
When you use get_settings()
on a recipe, you receive a settings object whose class depends on the recipe type. Please see below for the possible types.
Checking if a recipe uses a particular dataset as input¶
recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()
print("Recipe %s uses input:%s" % (recipe.name, settings.has_input("mydataset"))
Replacing an input of a recipe¶
recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()
settings.replace_input("old_input", "new_input")
settings.save()
Setting the code env of a code recipe¶
recipe = project.get_recipe("myrecipe")
settings = recipe.get_settings()
# Use this to set the recipe to inherit the project's code env
settings.set_code_env(inherit=True)
# Use this to set the recipe to use a specific code env
settings.set_code_env(code_env="myenv")
settings.save()
Reference documentation¶
-
class
dataikuapi.dss.recipe.
DSSRecipe
(client, project_key, recipe_name)¶ A handle to an existing recipe on the DSS instance. Do not create this directly, use
dataikuapi.dss.project.DSSProject.get_recipe()
-
property
id
¶ The id of the recipe
-
property
name
¶ The name of the recipe
-
compute_schema_updates
()¶ Computes which updates are required to the outputs of this recipe. The required updates are returned as a
RequiredSchemaUpdates
object, which then allows you toapply()
the changes.Usage example:
required_updates = recipe.compute_schema_updates() if required_updates.any_action_required(): print("Some schemas will be updated") # Note that you can call apply even if no changes are required. This will be noop required_updates.apply()
-
run
(job_type='NON_RECURSIVE_FORCED_BUILD', partitions=None, wait=True, no_fail=False)¶ Starts a new job to run this recipe and wait for it to complete. Raises if the job failed.
job = recipe.run() print("Job %s done" % job.id)
- Parameters
job_type – The job type. One of RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD or RECURSIVE_FORCED_BUILD
partitions – If the outputs are partitioned, a list of partition ids to build
no_fail – if True, does not raise if the job failed.
- Returns
the
dataikuapi.dss.job.DSSJob
job handle corresponding to the built job- Return type
-
delete
()¶ Delete the recipe
-
get_settings
()¶ Gets the settings of the recipe, as a
DSSRecipeSettings
or one of its subclasses.Some recipes have a dedicated class for the settings, with additional helpers to read and modify the settings
Once you are done modifying the returned settings object, you can call
save()
on it in order to save the modifications to the DSS recipe
-
get_definition_and_payload
()¶ Deprecated. Use
get_settings()
-
set_definition_and_payload
(definition)¶ Deprecated. Use
get_settings()
andDSSRecipeSettings.save()
-
get_status
()¶ Gets the status of this recipe (status messages, engines status, …)
- Returns
a
dataikuapi.dss.recipe.DSSRecipeStatus
object to interact with the status- Return type
-
get_metadata
()¶ Get the metadata attached to this recipe. The metadata contains label, description checklists, tags and custom metadata of the recipe
- Returns
a dict. For more information on available metadata, please see https://doc.dataiku.com/dss/api/8.0/rest/
:rtype dict
-
set_metadata
(metadata)¶ Set the metadata on this recipe. :params dict metadata: the new state of the metadata for the recipe. You should only set a metadata object
that has been retrieved using the get_metadata call.
-
get_object_discussions
()¶ Get a handle to manage discussions on the recipe
- Returns
the handle to manage discussions
- Return type
dataikuapi.discussion.DSSObjectDiscussions
-
get_continuous_activity
()¶ Return a handle on the associated recipe
-
move_to_zone
(zone)¶ Moves this object to a flow zone
- Parameters
zone (object) – a
dataikuapi.dss.flow.DSSFlowZone
where to move the object
-
property
-
class
dataikuapi.dss.recipe.
DSSRecipeListItem
(client, data)¶ An item in a list of recipes. Do not instantiate this class, use
dataikuapi.dss.project.DSSProject.list_recipes()
-
property
name
¶
-
property
id
¶
-
property
type
¶
-
property
-
class
dataikuapi.dss.recipe.
DSSRecipeStatus
(client, data)¶ Status of a recipce. Do not create that directly, use
DSSRecipe.get_status()
-
get_selected_engine_details
()¶ Gets the selected engine for this recipe (for recipes that support engines)
- Returns
a dict of the details of the selected recipe. The dict will contain at least fields ‘type’ indicating which engine it is, “statusWarnLevel” which indicates whether the engine is OK / WARN / ERROR
- Return type
dict
-
get_engines_details
()¶ Gets details about all possible engines for this recipe (for recipes that support engines)
- Returns
a list of dict of the details of each possible engine. The dict for each engine will contain at least fields ‘type’ indicating which engine it is, “statusWarnLevel” which indicates whether the engine is OK / WARN / ERROR
- Return type
list
-
get_status_severity
()¶ Returns whether the recipe is in SUCCESS, WARNING or ERROR status
- Return type
string
-
get_status_messages
()¶ Returns status messages for this recipe.
- Returns
a list of dict, for each status message. Each dict represents a single message, and contains at least a “severity” field (SUCCESS, WARNING or ERROR) and a “message” field
- Return type
list
-
-
class
dataikuapi.dss.recipe.
RequiredSchemaUpdates
(recipe, data)¶ Representation of the updates required to the schema of the outputs of a recipe. Do not create this class directly, use
DSSRecipe.compute_schema_updates()
-
any_action_required
()¶
-
apply
()¶
-
Settings¶
-
class
dataikuapi.dss.recipe.
DSSRecipeSettings
(recipe, data)¶ Settings of a recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
save
()¶ Saves back the recipe in DSS.
-
property
type
¶
-
property
str_payload
¶ The raw “payload” of the recipe, as a string
-
property
obj_payload
¶ The raw “payload” of the recipe, as a dict
-
property
raw_params
¶ The raw ‘params’ field of the recipe settings, as a dict
-
get_recipe_raw_definition
()¶ Get the recipe definition as a raw dict :rtype dict
-
get_recipe_inputs
()¶ Get a structured dict of inputs to this recipe :rtype dict
-
get_recipe_outputs
()¶ Get a structured dict of outputs of this recipe :rtype dict
-
get_recipe_params
()¶ Get the parameters of this recipe, as a dict :rtype dict
-
get_payload
()¶ Get the payload or script of this recipe, as a string :rtype string
-
get_json_payload
()¶ Get the payload or script of this recipe, parsed from JSON, as a dict :rtype dict
-
set_payload
(payload)¶ Set the payload of this recipe :param str payload: the payload, as a string
-
set_json_payload
(payload)¶ Set the payload of this recipe :param dict payload: the payload, as a dict. The payload will be converted to a JSON string internally
-
has_input
(input_ref)¶ Returns whether this recipe has a given ref as input
-
has_output
(output_ref)¶ Returns whether this recipe has a given ref as output
-
replace_input
(current_input_ref, new_input_ref)¶ Replaces an object reference as input of this recipe by another
-
replace_output
(current_output_ref, new_output_ref)¶ Replaces an object reference as output of this recipe by another
-
add_input
(role, ref, partition_deps=None)¶
-
add_output
(role, ref, append_mode=False)¶
-
get_flat_input_refs
()¶ Returns a list of all input refs of this recipe, regardless of the input role :rtype list of strings
-
get_flat_output_refs
()¶ Returns a list of all output refs of this recipe, regardless of the output role :rtype list of strings
-
property
custom_fields
¶ The custom fields of the object as a dict. Returns None if there are no custom fields
-
property
description
¶ The description of the object as a string
-
property
short_description
¶ The short description of the object as a string
The tags of the object, as a list of strings
-
-
class
dataikuapi.dss.recipe.
DSSRecipeDefinitionAndPayload
(recipe, data) Deprecated. Settings of a recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
CodeRecipeSettings
(recipe, data)¶ Settings of a code recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
get_code
()¶ Returns the code of the recipe as a string :rtype string
-
set_code
(code)¶ Updates the code of the recipe :param str code: The new code as a string
-
get_code_env_settings
()¶ Returns the code env settings for this recipe :rtype dict
-
set_code_env
(code_env=None, inherit=False, use_builtin=False)¶ Sets the code env to use for this recipe.
Exactly one of code_env, inherit or use_builtin must be passed
- Parameters
code_env (str) – The name of a code env
inherit (bool) – Use the project’s default code env
use_builtin (bool) – Use the builtin code env
-
-
class
dataikuapi.dss.recipe.
SyncRecipeSettings
(recipe, data)¶ Settings of a sync recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
PrepareRecipeSettings
(recipe, data)¶ Settings of a prepare recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
property
raw_steps
¶ Returns a raw list of the steps of this prepare recipe. You can modify the returned list.
Each step is a dict of settings. The precise settings for each step are not documented
-
add_processor_step
(type, params)¶
-
add_filter_on_bad_meaning
(meaning, columns)¶
-
property
-
class
dataikuapi.dss.recipe.
SamplingRecipeSettings
(recipe, data)¶ Settings of a sampling recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
GroupingRecipeSettings
(recipe, data)¶ Settings of a grouping recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
clear_grouping_keys
()¶ Removes all grouping keys from this grouping recipe
-
add_grouping_key
(column)¶ Adds grouping on a column :param str column: Column to group on
-
set_global_count_enabled
(enabled)¶
-
get_or_create_column_settings
(column)¶ Gets a dict representing the aggregations to perform on a column. Creates it and adds it to the potential aggregations if it does not already exists :param str column: The column name :rtype dict
-
set_column_aggregations
(column, type, min=False, max=False, count=False, count_distinct=False, sum=False, concat=False, stddev=False, avg=False)¶ Sets the basic aggregations on a column. Returns the dict representing the aggregations on the column
- Parameters
column (str) – The column name
type (str) – The type of the column (as a DSS schema type name)
:rtype dict
-
-
class
dataikuapi.dss.recipe.
SortRecipeSettings
(recipe, data)¶ Settings of a sort recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
TopNRecipeSettings
(recipe, data)¶ Settings of a topn recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
DistinctRecipeSettings
(recipe, data)¶ Settings of a distinct recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
WindowRecipeSettings
(recipe, data)¶ Settings of a window recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
JoinRecipeSettings
(recipe, data)¶ Settings of a join recipe. Do not create this directly, use
DSSRecipe.get_settings()
In order to enable self-joins, join recipes are based on a concept of “virtual inputs”. Every join, computed pre-join column, pre-join filter, … is based on one virtual input, and each virtual input references an input of the recipe, by index
- For example, if a recipe has inputs A and B and declares two joins:
A->B
A->A(based on a computed column)
- There are 3 virtual inputs:
0: points to recipe input 0 (i.e. dataset A)
1: points to recipe input 1 (i.e. dataset B)
2: points to recipe input 0 (i.e. dataset A) and includes the computed column
The first join is between virtual inputs 0 and 1
The second join is between virtual inputs 0 and 2
-
property
raw_virtual_inputs
¶ Returns the raw list of virtual inputs :rtype list of dict
-
property
raw_joins
¶ Returns the raw list of joins :rtype list of dict
-
add_virtual_input
(input_dataset_index)¶ Adds a virtual input pointing to the specified input dataset of the recipe (referenced by index in the inputs list)
-
add_pre_join_computed_column
(virtual_input_index, computed_column)¶ Adds a computed column to a virtual input
Use
dataikuapi.dss.utils.DSSComputedColumn
to build the computed_column object
-
add_join
(join_type='LEFT', input1=0, input2=1)¶ Adds a join between two virtual inputs. The join is initialized with no condition.
Use
add_condition_to_join()
on the return value to add a join condition (for example column equality) to the join:returns the newly added join as a dict :rtype dict
-
add_condition_to_join
(join, type='EQ', column1=None, column2=None)¶ Adds a condition to a join :param str column1: Name of “left” column :param str column2: Name of “right” column
-
add_post_join_computed_column
(computed_column)¶ Adds a post-join computed column
Use
dataikuapi.dss.utils.DSSComputedColumn
to build the computed_column object
-
set_post_filter
(postfilter)¶
-
class
dataikuapi.dss.recipe.
DownloadRecipeSettings
(recipe, data)¶ Settings of a download recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
SplitRecipeSettings
(recipe, data)¶ Settings of a split recipe. Do not create this directly, use
DSSRecipe.get_settings()
-
class
dataikuapi.dss.recipe.
StackRecipeSettings
(recipe, data)¶ Settings of a stack recipe. Do not create this directly, use
DSSRecipe.get_settings()
Creation¶
-
class
dataikuapi.dss.recipe.
DSSRecipeCreator
(type, name, project)¶ Helper to create new recipes
- Parameters
type (str) – type of the recipe
name (str) – name for the recipe
:param
dataikuapi.dss.project.DSSProject
project: project in which the recipe will be created-
set_name
(name)¶
-
with_input
(dataset_name, project_key=None, role='main')¶ Add an existing object as input to the recipe-to-be-created
- Parameters
dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
project_key – project containing the object, if different from the one where the recipe is created
role (str) – the role of the recipe in which the input should be added
-
with_output
(dataset_name, append=False, role='main')¶ The output dataset must already exist. If you are creating a visual recipe with a single output, use with_existing_output
- Parameters
dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
role (str) – the role of the recipe in which the input should be added
-
build
()¶ Deprecated. Use create()
-
create
()¶ Creates the new recipe in the project, and return a handle to interact with it.
- Returns:
A
dataikuapi.dss.recipe.DSSRecipe
recipe handle
-
set_raw_mode
()¶
-
class
dataikuapi.dss.recipe.
SingleOutputRecipeCreator
(type, name, project)¶ Create a recipe that has a single output
-
with_existing_output
(dataset_name, append=False)¶ Add an existing object as output to the recipe-to-be-created
- Parameters
dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
-
with_new_output
(name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET', overwrite=False)¶ Create a new dataset as output to the recipe-to-be-created. The dataset is not created immediately, but when the recipe is created (ie in the create() method)
- Parameters
name (str) – name of the dataset or identifier of the managed folder
connection_id (str) – name of the connection to create the dataset on
typeOptionId (str) – sub-type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection
format_option_id (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC
override_sql_schema – schema to force dataset, for SQL dataset. If left empty, will be autodetected
partitioning_option_id (str) – to copy the partitioning schema of an existing dataset ‘foo’, pass a value of ‘copy:dataset:foo’
append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
object_type (str) – DATASET or MANAGED_FOLDER
overwrite – If the dataset being created already exists, overwrite it (and delete data)
-
with_output
(dataset_name, append=False)¶ Alias of with_existing_output
-
-
class
dataikuapi.dss.recipe.
VirtualInputsSingleOutputRecipeCreator
(type, name, project)¶ Create a recipe that has a single output and several inputs
-
with_input
(dataset_name, project_key=None)¶ Add an existing object as input to the recipe-to-be-created
- Parameters
dataset_name – name of the dataset, or identifier of the managed folder or identifier of the saved model
project_key – project containing the object, if different from the one where the recipe is created
role (str) – the role of the recipe in which the input should be added
-
-
class
dataikuapi.dss.recipe.
CodeRecipeCreator
(name, type, project)¶ -
with_script
(script)¶ Set the code of the recipe
- Parameters
script (str) – the script of the recipe
-
with_new_output_dataset
(name, connection, type=None, format=None, copy_partitioning_from='FIRST_INPUT', append=False, overwrite=False)¶ Create a new managed dataset as output to the recipe-to-be-created. The dataset is created immediately
- Parameters
name (str) – name of the dataset to create
connection_id (str) – name of the connection to create the dataset on
type (str) – type of dataset, for connection where the type could be ambiguous. Typically, this is SCP or SFTP, for SSH connection
format (str) – name of a format preset relevant for the dataset type. Possible values are: CSV_ESCAPING_NOGZIP_FORHIVE, CSV_UNIX_GZIP, CSV_EXCEL_GZIP, CSV_EXCEL_GZIP_BIGQUERY, CSV_NOQUOTING_NOGZIP_FORPIG, PARQUET_HIVE, AVRO, ORC. If None, uses the default
copy_partitioning_from (str) – Whether to copy the partitioning from another thing. Use None for not partitioning the output, “FIRST_INPUT” to copy from the first input of the recipe, “dataset:XXX” to copy from a dataset name, or “folder:XXX” to copy from a folder id
append – whether the recipe should append or overwrite the output when running (note: not available for all dataset types)
overwrite – If the dataset being created already exists, overwrite it (and delete data)
-
-
class
dataikuapi.dss.recipe.
PythonRecipeCreator
(name, project)¶ Creates a Python recipe. A Python recipe can be defined either by its complete code, like a normal Python recipe, or by a function signature.
- When using a function, the function must take as arguments:
A list of dataframes corresponding to the dataframes of the input datasets
Optional named arguments corresponding to arguments passed to the creator
-
DEFAULT_RECIPE_CODE_TMPL
= '\n# This code is autogenerated by PythonRecipeCreator function mode\nimport dataiku, dataiku.recipe, json\nfrom {module_name} import {fname}\ninput_datasets = dataiku.recipe.get_inputs_as_datasets()\noutput_datasets = dataiku.recipe.get_outputs_as_datasets()\nparams = json.loads(\'{params_json}\')\n\nlogging.info("Reading %d input datasets as dataframes" % len(input_datasets))\ninput_dataframes = [ds.get_dataframe() for ds in input_datasets]\n\nlogging.info("Calling user function {fname}")\nfunction_input = input_dataframes if len(input_dataframes) > 1 else input_dataframes[0]\noutput_dataframes = {fname}(function_input, **params)\n\nif not isinstance(output_dataframes, list):\n output_dataframes = [output_dataframes]\n\nif not len(output_dataframes) == len(output_datasets):\n raise Exception("Code function {fname}() returned %d dataframes but recipe expects %d output datasets", \\\n (len(output_dataframes), len(output_datasets)))\noutput = list(zip(output_datasets, output_dataframes))\nfor ds, df in output:\n logging.info("Writing function result to dataset %s" % ds.name)\n ds.write_with_schema(df)\n'¶
-
with_function_name
(module_name, function_name, custom_template=None, **function_args)¶ Defines this recipe as being a functional recipe calling a function name from a module name
-
with_function
(fn, custom_template=None, **function_args)¶
-
class
dataikuapi.dss.recipe.
SQLQueryRecipeCreator
(name, project)¶ Create a SQL query recipe
-
class
dataikuapi.dss.recipe.
PrepareRecipeCreator
(name, project)¶ Create a Prepare recipe
-
class
dataikuapi.dss.recipe.
SyncRecipeCreator
(name, project)¶ Create a Sync recipe
-
class
dataikuapi.dss.recipe.
SamplingRecipeCreator
(name, project)¶ Create a Sample/Filter recipe
-
class
dataikuapi.dss.recipe.
DistinctRecipeCreator
(name, project)¶ Create a Distinct recipe
-
class
dataikuapi.dss.recipe.
GroupingRecipeCreator
(name, project)¶ Create a Group recipe
-
with_group_key
(group_key)¶ Set a column as the first grouping key. Only a single grouping key may be set at recipe creation time. For additional groupings, get the recipe settings
- Parameters
group_key (str) – name of a column in the input dataset
-
-
class
dataikuapi.dss.recipe.
SortRecipeCreator
(name, project)¶ Create a Sort recipe
-
class
dataikuapi.dss.recipe.
TopNRecipeCreator
(name, project)¶ Create a TopN recipe
-
class
dataikuapi.dss.recipe.
WindowRecipeCreator
(name, project)¶ Create a Window recipe
-
class
dataikuapi.dss.recipe.
JoinRecipeCreator
(name, project)¶ Create a Join recipe
-
class
dataikuapi.dss.recipe.
FuzzyJoinRecipeCreator
(name, project)¶ Create a FuzzyJoin recipe
-
class
dataikuapi.dss.recipe.
GeoJoinRecipeCreator
(name, project)¶ Create a GeoJoin recipe
-
class
dataikuapi.dss.recipe.
SplitRecipeCreator
(name, project)¶ Create a Split recipe
-
class
dataikuapi.dss.recipe.
StackRecipeCreator
(name, project)¶ Create a Stack recipe
-
class
dataikuapi.dss.recipe.
DownloadRecipeCreator
(name, project)¶ Create a Download recipe
-
class
dataikuapi.dss.recipe.
PredictionScoringRecipeCreator
(name, project)¶ Builder for the creation of a new “Prediction scoring” recipe, from an input dataset, with an input saved model identifier
# Create a new prediction scoring recipe outputing to a new dataset project = client.get_project("MYPROJECT") builder = PredictionScoringRecipeCreator("my_scoring_recipe", project) builder.with_input_model("saved_model_id") builder.with_input("dataset_to_score") builder.with_new_output("my_output_dataset", "myconnection") # Or for a filesystem output connection # builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP") new_recipe = builder.build() def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
-
with_input_model
(model_id)¶ Sets the input model
-
-
class
dataikuapi.dss.recipe.
ClusteringScoringRecipeCreator
(name, project)¶ Builder for the creation of a new “Clustering scoring” recipe, from an input dataset, with an input saved model identifier
# Create a new prediction scoring recipe outputing to a new dataset project = client.get_project("MYPROJECT") builder = ClusteringScoringRecipeCreator("my_scoring_recipe", project) builder.with_input_model("saved_model_id") builder.with_input("dataset_to_score") builder.with_new_output("my_output_dataset", "myconnection") # Or for a filesystem output connection # builder.with_new_output("my_output_dataset, "filesystem_managed", format_option_id="CSV_EXCEL_GZIP") new_recipe = builder.build() def with_new_output(self, name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET'):
-
with_input_model
(model_id)¶ Sets the input model
-
-
class
dataikuapi.dss.recipe.
EvaluationRecipeCreator
(name, project)¶ Builder for the creation of a new “Evaluate” recipe, from an input dataset, with an input saved model identifier
# Create a new evaluation recipe outputing to a new dataset, to a metrics dataset and/or to a model evaluation store project = client.get_project("MYPROJECT") builder = project.new_recipe("evaluation") builder.with_input_model(saved_model_id) builder.with_input("dataset_to_evaluate") builder.with_output("output_scored") builder.with_output_metrics("output_metrics") builder.with_output_evaluation_store(evaluation_store_id) new_recipe = builder.build() # Access the settings er_settings = new_recipe.get_settings() payload = er_settings.obj_payload # Change the settings payload['dontComputePerformance'] = True payload['outputProbabilities'] = False payload['metrics'] = ["precision", "recall", "auc", "f1", "costMatrixGain"] # Manage evaluation labels payload['labels'] = [dict(key="label_1", value="value_1"), dict(key="label_2", value="value_2")] # Save the settings and run the recipe er_settings.save() new_recipe.run()
Outputs must exist. They can be created using the following:
builder = project.new_managed_dataset("output_scored") builder.with_store_into(connection) dataset = builder.create() builder = project.new_managed_dataset("output_scored") builder.with_store_into(connection) dataset = builder.create() evaluation_store_id = project.create_model_evaluation_store("output_model_evaluation").mes_id
-
with_input_model
(model_id)¶ Sets the input model
-
with_output
(name)¶ Sets the ouput dataset containing the scored input
-
with_output_metrics
(name)¶ Sets the output dataset containing the metrics
-
with_output_evaluation_store
(mes_id)¶ Sets the output model evaluation store
-
-
class
dataikuapi.dss.recipe.
StandaloneEvaluationRecipeCreator
(name, project)¶ Builder for the creation of a new “Standalone Evaluate” recipe, from an input dataset
# Create a new standalone evaluation of a scored dataset project = client.get_project("MYPROJECT") builder = project.new_recipe("standalone_evaluation") builder.with_input("scored_dataset_to_evaluate") builder.with_output_evaluation_store(evaluation_store_id) # Add a reference dataset (optional) to compute data drift builder.with_reference_dataset("reference_dataset") # Finish creation of the recipe new_recipe = builder.create() # Modify the model parameters in the SER settings ser_settings = new_recipe.get_settings() payload = ser_settings.obj_payload payload['predictionType'] = "BINARY_CLASSIFICATION" payload['targetVariable'] = "Survived" payload['predictionVariable'] = "prediction" payload['isProbaAware'] = True payload['dontComputePerformance'] = False # For a classification model with probabilities, the 'probas' section can be filled with the mapping of the class and the probability column # e.g. for a binary classification model with 2 columns: proba_0 and proba_1 class_0 = dict(key=0, value="proba_0") class_1 = dict(key=1, value="proba_1") payload['probas'] = [class_0, class_1] # Change the 'features' settings for this standalone evaluation # e.g. reject the features that you do not want to use in the evaluation feature_passengerid = dict(name="Passenger_Id", role="REJECT", type="TEXT") feature_ticket = dict(name="Ticket", role="REJECT", type="TEXT") feature_cabin = dict(name="Cabin", role="REJECT", type="TEXT") payload['features'] = [feature_passengerid, feature_ticket, feature_cabin] # To set the cost matrix properly, access the 'metricParams' section of the payload and set the cost matrix weights: payload['metricParams'] = dict(costMatrixWeights=dict(tpGain=0.4, fpGain=-1.0, tnGain=0.2, fnGain=-0.5)) # Save the recipe and run the recipe # Note that with this method, all the settings that were not explicitly set are instead set to their default value. ser_settings.save() new_recipe.run()
Output model evaluation store must exist. It can be created using the following:
evaluation_store_id = project.create_model_evaluation_store("output_model_evaluation").mes_id
-
with_output_evaluation_store
(mes_id)¶ Sets the output model evaluation store
-
with_reference_dataset
(dataset_name)¶ Sets the dataset to use as a reference in data drift computation (optional).
-
Utilities¶
-
class
dataikuapi.dss.utils.
DSSFilter
¶ Helper class to build filter objects for use in visual recipes
-
static
of_single_condition
(column, operator, string=None, num=None, date=None, time=None, date2=None, time2=None, unit=None)¶
-
static
of_and_conditions
(conditions)¶
-
static
of_or_conditions
(conditions)¶
-
static
of_formula
(formula)¶
-
static
of_sql_expression
(sql_expression)¶
-
static
condition
(column, operator, string=None, num=None, date=None, time=None, date2=None, time2=None, unit=None)¶
-
static
-
class
dataikuapi.dss.utils.
DSSFilterOperator
(value)¶ An enumeration.
-
EMPTY_ARRAY
= 'empty array'¶
-
NOT_EMPTY_ARRAY
= 'not empty array'¶
-
CONTAINS_ARRAY
= 'array contains'¶
-
NOT_EMPTY
= 'not empty'¶
-
EMPTY
= 'is empty'¶
-
NOT_EMPTY_STRING
= 'not empty string'¶
-
EMPTY_STRING
= 'empty string'¶
-
IS_TRUE
= 'true'¶
-
IS_FALSE
= 'false'¶
-
EQUALS_STRING
= '== [string]'¶
-
EQUALS_CASE_INSENSITIVE_STRING
= '== [string]i'¶
-
NOT_EQUALS_STRING
= '!= [string]'¶
-
SAME
= '== [NaNcolumn]'¶
-
DIFFERENT
= '!= [NaNcolumn]'¶
-
EQUALS_NUMBER
= '== [number]'¶
-
NOT_EQUALS_NUMBER
= '!= [number]'¶
-
GREATER_NUMBER
= '> [number]'¶
-
LESS_NUMBER
= '< [number]'¶
-
GREATER_OR_EQUAL_NUMBER
= '>= [number]'¶
-
LESS_OR_EQUAL_NUMBER
= '<= [number]'¶
-
EQUALS_DATE
= '== [date]'¶
-
GREATER_DATE
= '> [date]'¶
-
GREATER_OR_EQUAL_DATE
= '>= [date]'¶
-
LESS_DATE
= '< [date]'¶
-
LESS_OR_EQUAL_DATE
= '<= [date]'¶
-
BETWEEN_DATE
= '>< [date]'¶
-
EQUALS_COL
= '== [column]'¶
-
NOT_EQUALS_COL
= '!= [column]'¶
-
GREATER_COL
= '> [column]'¶
-
LESS_COL
= '< [column]'¶
-
GREATER_OR_EQUAL_COL
= '>= [column]'¶
-
LESS_OR_EQUAL_COL
= '<= [column]'¶
-
CONTAINS_STRING
= 'contains'¶
-
REGEX
= 'regex'¶
-