Scenarios (in a scenario)¶
This is the documentation of the API for use in scenarios.
Warning
This API can only be used within a scenario in order to run steps and report on progress of the current scenario.
If you want to control scenarios, please see Scenarios
These functions can be used both for “Execute Python code” steps in steps-based scenarios, and for full Python scenarios
A quick description of Python scenarios can be found in Definitions. More details and usage samples are also available in Custom scenarios
The Scenario is the main class you’ll use to interact with DSS in your “Execute Python code” steps and Python scenarios.
-
class
dataiku.scenario.
Scenario
¶ Handle to the current (running) scenario.
-
add_report_item
(object_ref, partition, report_item)¶ When used in the code of a custom step, adds a report item to the current step run
-
get_message_sender
(channel_id, type=None)¶ Gets a sender for reporting messages, using one of DSS’s Messaging channels
-
get_build_state
()¶ Gets a handle to query previous builds
-
get_trigger_type
()¶ Returns the type of the trigger that launched this scenario run
-
get_trigger_name
()¶ Returns the name (if defined) of the trigger that launched this scenario run
-
get_trigger_params
()¶ Returns a dictionary of the params set by the trigger that launched this scenario run
-
set_scenario_variables
(**kwargs)¶ Define additional variables in this scenario run
-
get_previous_steps_outputs
()¶ Returns the results of the steps previously executed in this scenario run. For example, if a SQL step ran before in the scenario, and its name is ‘the_sql’, then the list returned by this function will be like:
[ ... { 'stepName': 'the_sql', 'result': { 'success': True, 'hasResultset': True, 'columns': [ {'type': 'int8', 'name': 'a'}, {'type': 'varchar', 'name': 'b'} ], 'totalRows': 2, 'rows': [ ['1000', 'min'], ['2500', 'max'] ], 'log': '', 'endedOn': 0, 'totalRowsClipped': False } }, ... ]
Important note: the exact structure of each type of step run output is not precisely defined, and may vary from a DSS release to another
-
get_all_variables
()¶ Returns a dictionary of all variables (including the scenario-specific values)
-
run_step
(step, asynchronous=False, fail_fatal=True, **kwargs)¶ Run a step in this scenario.
- Parameters
step (BuildFlowItemsStepDefHelper) – Must be a step definition returned by
dataiku.scenario.BuildFlowItemsStepDefHelper.get_step()
. (See code sample below)asynchronous (bool) – If True, the function launches a step run and returns immediately a
dataiku.scenario.step.StepHandle
object, on which the user will need to calldataiku.scenario.step.StepHandle.is_done()
ordataiku.scenario.step.StepHandle.wait_for_completion()
. Otherwise the function waits until the step has finished running and returns the result of the step.fail_fatal (bool) – If True, returns an Exception if the step fails.
Code sample:
# Code sample to build several datasets in a scenario step from dataiku.scenario import Scenario from dataiku.scenario import BuildFlowItemsStepDefHelper # The Scenario object is the main handle from which you initiate steps scenario = Scenario() # Create a 'Build Flow Items' step. step = BuildFlowItemsStepDefHelper("build_datasets_step") # Add each dataset / folder / model to build step.add_dataset("dataset_name_1", "project_key") step.add_dataset("dataset_name_2", "project_key") step.add_dataset("dataset_name_3", "project_key") # Run the scenario step. The dependencies engine will parallelize what can be parallelized. scenario.run_step(step.get_step())
-
new_build_flowitems_step
(step_name=None, build_mode='RECURSIVE_BUILD')¶ Creates and returns a helper to prepare a multi-item “build” step.
- Returns
a
BuildFlowItemsStepDefHelper
object
-
build_dataset
(dataset_name, project_key=None, build_mode='RECURSIVE_BUILD', partitions=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Executes the build of a dataset
- Parameters
dataset_name – name of the dataset to build
project_key – optional, project key of the project in which the dataset is built
build_mode – one of “RECURSIVE_BUILD” (default), “NON_RECURSIVE_FORCED_BUILD”, “RECURSIVE_FORCED_BUILD”, “RECURSIVE_MISSING_ONLY_BUILD”
partitions – can be given as a partitions spec, variables expansion is supported
-
build_folder
(folder_id, project_key=None, build_mode='RECURSIVE_BUILD', partitions=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Executes the build of a folder
- Parameters
folder_id – the identifier of the folder (!= its name)
partitions – Can be given as a partitions spec. Variables expansion is supported
-
train_model
(model_id, project_key=None, build_mode='RECURSIVE_BUILD', step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Executes the train of a saved model
- Parameters
model_id – the identifier of the model (!= its name)
-
build_evaluation_store
(evaluation_store_id, project_key=None, build_mode='RECURSIVE_BUILD', step_name=None, asynchronous=False, fail_fatal=True)¶ Executes the build of a model evaluation store, to produce a model evalution
- Parameters
evaluation_store_id – the identifier of the model evaluation store (!= its name)
-
invalidate_dataset_cache
(dataset_name, project_key=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Invalidate the caches of a dataset
-
clear_dataset
(dataset_name, project_key=None, partitions=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Executes a ‘clear’ operation on a dataset
- Parameters
partitions – Can be given as a partitions spec. Variables expansion is supported
-
clear_folder
(folder_id, project_key=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Executes a ‘clear’ operation on a managed folder
-
run_dataset_checks
(dataset_name, project_key=None, partitions=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Runs the checks defined on a dataset
- Parameters
partitions – Can be given as a partitions spec. Variables expansion is supported
-
compute_dataset_metrics
(dataset_name, project_key=None, partitions=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Computes the metrics defined on a dataset
- Parameters
partitions – Can be given as a partitions spec. Variables expansion is supported
-
synchronize_hive_metastore
(dataset_name, project_key=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Synchronizes the Hive metastore from the dataset definition for a single dataset (all partitions).
-
update_from_hive_metastore
(dataset_name, project_key=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Update a single dataset definition (all partitions) from its table in the Hive metastore .
-
execute_sql
(connection, sql, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Executes a sql query
- Parameters
connection – name of the DSS connection to run the query one
sql – the query to run
-
set_project_variables
(project_key=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Sets variables on the project. The variables are passed as named parameters to this function. For example:
s.set_project_variables(‘PROJ’, var1=’value1’, var2=True)
will add 2 variables var1 and var2 in the project’s variables, with values ‘value1’ and True respectively
-
set_global_variables
(step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Sets variables on the DSS instance. The variables are passed as named parameters to this function. For example:
s.set_global_variables(var1=’value1’, var2=True)
will add 2 variables var1 and var2 in the instance’s variables, with values ‘value1’ and True respectively
-
run_global_variables_update
(update_code=None, step_name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Run the code for updating the DSS instance’s variable defined in the global settings.
- Parameters
update_code – custom code to run instead of the one defined in the global settings
-
run_scenario
(scenario_id, project_key=None, name=None, asynchronous=False, fail_fatal=True, **kwargs)¶ Runs a scenario
- Parameters
scenario_id – identifier of the scenario (can be different from its name)
project_key – optional project key of the project where the scenario is defined (defaults to current project)
name – optional name of the step
asynchronous (bool) – If True, waits for result, else immediately returns a future. See
dataiku.scenario.run_step()
for details.fail_fatal (bool) – If True, returns an Exception if the step fails.. See
dataiku.scenario.run_step()
for details.
Code sample:
# Code sample to run another scenario asynchronously without failing from dataiku.scenario import Scenario result = scenario.run_scenario("ANOTHER_SCENARIO", asynchronous=False, fail_fatal=False) print(result.get_outcome())
-
create_jupyter_export
(notebook_id, execute_notebook=False, name=None, asynchronous=False, **kwargs)¶ Create a new export from a jupyter notebook
- Parameters
notebook_id – identifier of the notebook
execute_notebook – should the notebook be executed prior to the export
-
package_api_service
(service_id, package_id, transmogrify=False, name=None, asynchronous=False, **kwargs)¶ Make a package for an API service.
- Parameters
service_id – identifier of the API service
package_id – identifier for the created package
transmogrify – if True, make the package_id unique by appending a number (if not unique already)
-
-
class
dataiku.scenario.
BuildFlowItemsStepDefHelper
(scenario, step_name=None, build_mode='RECURSIVE_BUILD')¶ Helper to build the definition of a ‘Build Flow Items’ step. Multiple items can be added
-
add_dataset
(dataset_name, project_key=None, partitions=None)¶ Add a dataset to build
- Parameters
dataset_name – name of the dataset
partitions – partition spec
-
add_folder
(folder_id, project_key=None, partitions=None)¶ Add a folder to build
- Parameters
folder_id – identifier of a folder (!= its name)
-
add_model
(model_id, project_key=None)¶ Add a saved model to build
- Parameters
model_id – identifier of a saved model (!= its name)
-
add_evaluation_store
(evaluation_store_id, project_key=None)¶ Add a model evaluation store to build
- Parameters
evaluation_store_id – identifier of a model evaluation store (!= its name)
-
get_step
()¶ Get the step definition
-