Shell recipes

In order to automate certain operations, DSS provides a “Shell” recipe which executes a script in the shell.

Parameters to the script

No parameter to the script can be passed on the command line, but DSS sets up a handful of environment variables prior to running the script:

  • Usual flow variables : input and output partitioning info Partitioning variables substitutions
  • for each input and output dataset : identifier, and when relevant, filesystem path or jdbc url. Variables named “DKU_INPUT...” and “DKU_OUTPUT...” correspond to the inputs and outputs respectively. The (zero-based) index of the input or output in the list of inputs or outputs to the recipe is passed in the environment variable name. For example, “DKU_INPUT_1_DATASET_ID” will contain the identifier of the second input to the recipe

The list of all variables given by DSS to the script is accessible in the “Variables” tab next to the script pane.

Piping a dataset in and out

DSS allows for one of the input datasets to be piped in the script, via the standard input. This dataset can be selected in the dropdown over the code pane. The data is sent as tab-separated CSV.

Likewise, DSS allows for the standard output of the script to be piped out into one output dataset, again selected with the dropdown over the code pane. This functionality can be used for example to report information in the script and have this information stored in a dataset in DSS. The data is expected as tab-separated CSV. When “auto-infer schema” is checked, the schema of the piped out dataset will be overwritten with columns inferred from the first line of the script output.

Executed binary

By default, the script is run on the standard sh binary. A different binary can be set on the “Advanced” tab, or using a shebang.