Scenarios (in a scenario)

This is the documentation of the API for use in scenarios.

Warning

This API can only be used within a scenario in order to run steps and report on progress of the current scenario.

If you want to control scenarios, please see Scenarios

These functions can be used both for “Execute Python code” steps in steps-based scenarios, and for full Python scenarios

A quick description of Python scenarios can be found in Definitions. More details and usage samples are also available in Custom scenarios

The Scenario is the main class you’ll use to interact with DSS in your “Execute Python code” steps and Python scenarios.

class dataiku.scenario.Scenario

Handle to the current (running) scenario.

add_report_item(object_ref, partition, report_item)

When used in the code of a custom step, adds a report item to the current step run

get_message_sender(channel_id, type=None)

Gets a sender for reporting messages, using one of DSS’s Messaging channels

get_build_state()

Gets a handle to query previous builds

get_trigger_type()

Returns the type of the trigger that launched this scenario run

get_trigger_name()

Returns the name (if defined) of the trigger that launched this scenario run

get_trigger_params()

Returns a dictionary of the params set by the trigger that launched this scenario run

set_scenario_variables(**kwargs)

Define additional variables in this scenario run

get_previous_steps_outputs()

Returns the results of the steps previously executed in this scenario run. For example, if a SQL step ran before in the scenario, and its name is ‘the_sql’, then the list returned by this function will be like:

[
    ...
    {
        'stepName': 'the_sql',
        'result': {
            'success': True,
            'hasResultset': True,
            'columns': [ {'type': 'int8', 'name': 'a'}, {'type': 'varchar', 'name': 'b'} ],
            'totalRows': 2,
            'rows': [
                        ['1000', 'min'],
                        ['2500', 'max']
                    ],
            'log': '',
            'endedOn': 0,
            'totalRowsClipped': False
        }
    },
    ...
]

Important note: the exact structure of each type of step run output is not precisely defined, and may vary from a DSS release to another

get_all_variables()

Returns a dictionary of all variables (including the scenario-specific values)

run_step(step, async=False, fail_fatal=True)

Run a step in this scenario.

Parameters:
  • step (BuildFlowItemsStepDefHelper) – Must be a step definition returned by dataiku.scenario.BuildFlowItemsStepDefHelper.get_step(). (See code sample below)
  • async (bool) – If True, the function launches a step run and returns immediately a dataiku.scenario.step.StepHandle object, on which the user will need to call dataiku.scenario.step.StepHandle.is_done() or dataiku.scenario.step.StepHandle.wait_for_completion(). Otherwise the function waits until the step has finished running and returns the result of the step.
  • fail_fatal (bool) – If True, returns an Exception if the step fails.

Code sample:

# Code sample to build several datasets in a scenario step
from dataiku.scenario import Scenario
from dataiku.scenario import BuildFlowItemsStepDefHelper

# The Scenario object is the main handle from which you initiate steps
scenario = Scenario()

# Create a 'Build Flow Items' step.
step = BuildFlowItemsStepDefHelper("build_datasets_step")

# Add each dataset / folder / model to build
step.add_dataset("dataset_name_1", "project_key")
step.add_dataset("dataset_name_2", "project_key")
step.add_dataset("dataset_name_3", "project_key")

# Run the scenario step. The dependencies engine will parallelize what can be parallelized.
scenario.run_step(step.get_step())
new_build_flowitems_step(step_name=None, build_mode='RECURSIVE_BUILD')

Creates and returns a helper to prepare a multi-item “build” step.

Returns:a BuildFlowItemsStepDefHelper object
build_dataset(dataset_name, project_key=None, build_mode='RECURSIVE_BUILD', partitions=None, step_name=None, async=False, fail_fatal=True)

Executes the build of a dataset

Parameters:
  • dataset_name – name of the dataset to build
  • project_key – optional, project key of the project in which the dataset is built
  • build_mode – one of “RECURSIVE_BUILD” (default), “NON_RECURSIVE_FORCED_BUILD”, “RECURSIVE_FORCED_BUILD”, “RECURSIVE_MISSING_ONLY_BUILD”
  • partitions – can be given as a partitions spec, variables expansion is supported
build_folder(folder_id, project_key=None, build_mode='RECURSIVE_BUILD', partitions=None, step_name=None, async=False, fail_fatal=True)

Executes the build of a folder

Parameters:
  • folder_id – the identifier of the folder (!= its name)
  • partitions – Can be given as a partitions spec. Variables expansion is supported
train_model(model_id, project_key=None, build_mode='RECURSIVE_BUILD', step_name=None, async=False, fail_fatal=True)

Executes the train of a saved model

Parameters:model_id – the identifier of the model (!= its name)
invalidate_dataset_cache(dataset_name, project_key=None, step_name=None, async=False, fail_fatal=True)

Invalidate the caches of a dataset

clear_dataset(dataset_name, project_key=None, partitions=None, step_name=None, async=False, fail_fatal=True)

Executes a ‘clear’ operation on a dataset

Parameters:partitions – Can be given as a partitions spec. Variables expansion is supported
clear_folder(folder_id, project_key=None, step_name=None, async=False, fail_fatal=True)

Executes a ‘clear’ operation on a managed folder

run_dataset_checks(dataset_name, project_key=None, partitions=None, step_name=None, async=False, fail_fatal=True)

Runs the checks defined on a dataset

Parameters:partitions – Can be given as a partitions spec. Variables expansion is supported
compute_dataset_metrics(dataset_name, project_key=None, partitions=None, step_name=None, async=False, fail_fatal=True)

Computes the metrics defined on a dataset

Parameters:partitions – Can be given as a partitions spec. Variables expansion is supported
synchronize_hive_metastore(dataset_name, project_key=None, step_name=None, async=False, fail_fatal=True)

Synchronizes the Hive metastore from the dataset definition for a single dataset (all partitions).

update_from_hive_metastore(dataset_name, project_key=None, step_name=None, async=False, fail_fatal=True)

Update a single dataset definition (all partitions) from its table in the Hive metastore .

execute_sql(connection, sql, step_name=None, async=False, fail_fatal=True)

Executes a sql query

Parameters:
  • connection – name of the DSS connection to run the query one
  • sql – the query to run
set_project_variables(project_key=None, step_name=None, async=False, fail_fatal=True, **kwargs)

Sets variables on the project. The variables are passed as named parameters to this function. For example:

s.set_project_variables(‘PROJ’, var1=’value1’, var2=True)

will add 2 variables var1 and var2 in the project’s variables, with values ‘value1’ and True respectively

set_global_variables(step_name=None, async=False, fail_fatal=True, **kwargs)

Sets variables on the DSS instance. The variables are passed as named parameters to this function. For example:

s.set_global_variables(var1=’value1’, var2=True)

will add 2 variables var1 and var2 in the instance’s variables, with values ‘value1’ and True respectively

run_global_variables_update(update_code=None, step_name=None, async=False, fail_fatal=True)

Run the code for updating the DSS instance’s variable defined in the global settings.

Parameters:update_code – custom code to run instead of the one defined in the global settings
run_scenario(scenario_id, project_key=None, name=None, async=False, fail_fatal=True)

Runs a scenario

Parameters:
  • scenario_id – identifier of the scenario (can be different from its name)
  • project_key – optional project key of the project where the scenario is defined (defaults to current project)
  • name – optional name of the step
  • async (bool) – If True, waits for result, else immediately returns a future. See dataiku.scenario.run_step() for details.
  • fail_fatal (bool) – If True, returns an Exception if the step fails.. See dataiku.scenario.run_step() for details.

Code sample:

# Code sample to run another scenario asynchronously without failing
from dataiku.scenario import Scenario

result = scenario.run_scenario("ANOTHER_SCENARIO", async=False, fail_fatal=False)
print(result.get_outcome())
create_jupyter_export(notebook_id, execute_notebook=False, name=None, async=False)

Create a new export from a jupyter notebook

Parameters:
  • notebook_id – identifier of the notebook
  • execute_notebook – should the notebook be executed prior to the export
package_api_service(service_id, package_id, transmogrify=False, name=None, async=False)

Make a package for an API service.

Parameters:
  • service_id – identifier of the API service
  • package_id – identifier for the created package
  • transmogrify – if True, make the package_id unique by appending a number (if not unique already)
class dataiku.scenario.BuildFlowItemsStepDefHelper(scenario, step_name=None, build_mode='RECURSIVE_BUILD')

Helper to build the definition of a ‘Build Flow Items’ step. Multiple items can be added

add_dataset(dataset_name, project_key=None, partitions=None)

Add a dataset to build

Parameters:
  • dataset_name – name of the dataset
  • partitions – partition spec
add_folder(folder_id, project_key=None, partitions=None)

Add a folder to build

Parameters:folder_id – identifier of a folder (!= its name)
add_model(model_id, project_key=None)

Add a saved model to build

Parameters:model_id – identifier of a saved model (!= its name)
get_step()

Get the step definition