Scenario steps

The following steps can be executed by a scenario.

Build / Train

This step builds elements from the Dataiku Flow:

  • Datasets (or dataset partitions in the case of partitioned datasets)
  • Managed folders
  • Saved models

Clear

This step clears the contents of elements from the Dataiku Flow:

  • Datasets (or dataset partitions in the case of partitioned datasets) : the corresponding data is deleted
  • Managed folders : all the contents of the folder are deleted

Run checks

This step runs the checks defined on elements from the Dataiku Flow:

  • Datasets (or dataset partitions in the case of partitioned datasets)
  • Managed folders
  • Saved models

The checks are those defined on the Status tab of the element. The outcomes of the checks are collected and the step fails if at least one check on the selected elements fails.

The outcomes of the checks (OK, ERROR, WARNING), along with the optional message defined by the checks, are available to subsequent steps as variables (See Variables in scenarios).

Compute metrics

This step runs the probes defined on elements from the Dataiku Flow:

  • Datasets (or dataset partitions in the case of partitioned datasets)
  • Managed folders

The probes are those defined on the Status tab of the element.

The values of the metrics are available to subsequent steps as variables (See Variables in scenarios).

Synchronize Hive table

HDFS datasets can be associated by DSS to external Hive tables. This is done automatically by jobs launched in DSS, or from the dataset’s advanced settings pane, and can be scheduled in a scenario using a Sync Hive table step. The schema of the Hive table is regenerated by the step so that it matches the schema of the dataset.

Refresh notebook insight

Python notebooks can be made into insights by publishing them. The insight is then disconnected from the notebook. The Refresh notebook insight step re-publishes a python notebook to the insight it was published to before, and optionally runs the notebook prior to publishing it.

When checking Execute the notebook in this step, one can thus refresh the data used by the published insight.

Execute SQL

This step executes one or more SQL statements on a DSS connection. Both straight SQL connections (ex: a Postgresql connection) and HiveQL connections (Hive and Impala) can be used.

The output of the query, if there is one, is available to subsequent steps as variables (See Variables in scenarios).

Execute Python code

This step runs a chunk of Python code in the context of the scenario. The Dataiku API available in Python notebooks is available in this step, as well as the scenario-specific API.

To get the parameters of the trigger that started the scenario, for example, one can use:

import dataiku.scenario
s = dataiku.scenario.Scenario()
params = s.get_trigger_params()

Define variables, Set project variables, Set global variables

In DSS, there are instance-level variables, project-level variables, and scenario-level variables. The scenario-level variables are only visible for the duration of the scenario run, and are defined either programatically (in Python) or using a Define variables step. All 3 types of variables are available to recipes of jobs started in a scenario run.

The variables can be specified either by inputting a JSON object, in which case the variables values are fixed, or by inputting a list of key-value pairs, where the values are DSS formulas and can depend on pre-existing variables.

Run global variables update

This step runs the update of the DSS instance’s variables as defined in the administration section, or a given update code if specified in the step.

Send message

This step sends a message, like the reporters do at the end of a scenario run. The setup is the same, the only difference being that since the scenario is (obviously) not finished when the step is run, not all variables created during the run are available.

Run another scenario

This step starts a run of a scenario and waits for its completion. The step’s outcome is the outcome of that scenario run.

Only scenarios of projects the user has access to can be used. The user under which this step is run is used for the scenario started by this step, regardless of that scenario’s Run as user setting.

Package API service

This step generates a package of the specified API service. The package identifier needs to be unique. It can be specified using an expression with variables, and can be automatically padded with a number to ensure unicity.