Component: Macros¶
Description¶
A macro is a Dataiku component used to automatize tasks or to extend the capability of Dataiku.
It can be used in several places in Dataiku DSS, depending on the role of the macro.
By default, macros are accessible from the “Macros” menu of each project.
In addition, macro can be made accessible in other places, depending on the macroRoles
field.
For example, if the macroRoles
is:
DATASETS
the macro will be available when selecting on or more Datasets in the Flow.PROJECT_MACROS
is about running code on the project in order to achieve global processing on a project (it can be used, to automatically kill running processing, like notebooks).PROJECT_CREATOR
will allow (administrators) to create a project template with some default configurations.
A macro is not limited to only one kind of role, allowing it to appear in several places if it makes sense.
For more information about macros, see DSS Macros.
Creation¶
To start creating a macro, we recommend that you use the plugin developer tools (see the tutorial for an introduction). In the Definition tab, click on “Add Python macro”, and enter the identifier for your new macro. You’ll see a new folder python-runnables
and will have to edit the runnable.py
and runnable.json
files
A macro is essentially a Python function, wrapped in a class, written in a runnable.py
file in the macro’s folder.
A basic macro’s code looks like
from dataiku.runnables import Runnable
class MyMacro(Runnable):
def __init__(self, project_key, config, plugin_config):
self.project_key = project_key
self.config = config
def get_progress_target(self):
return None
def run(self, progress_callback):
# Do some things. You can use the dataiku package here
result = "It worked"
return result
The associated runnable.json
file looks like:
{
"meta" : {
"label" : "A great macros",
"description" : "It does stuff",
"icon" : "icon-trash"
},
"impersonate" : false,
"permissions" : ["READ_CONF"],
"resultType" : "HTML",
"resultLabel" : "The output",
"params": [
{
"name": "param_name",
"label" : "The parameter",
"type": "INT",
"description":"Delete logs for jobs older than this",
"mandatory" : true,
"defaultValue" : 15
}
]
}
The “meta” and “params” fields are similar to all other kinds of DSS components.
Macro roles¶
Macro roles define where this macro will appear in DSS GUI. They are used to pre-fill a macro parameter with context.
E.g,: if a macro has a role of type DATASET that points to an input_dataset
parameter, the dataset’s action menu will show this macro and clicking on it will prefill the input_dataset
parameter will the selected dataset.
Each role consists of:
type: where the macro will be shown
when selecting DSS object(s): DATASET, DATASETS, API_SERVICE, API_SERVICE_VERSION, BUNDLE, VISUAL_ANALYSIS, SAVED_MODEL, MANAGED_FOLDER
in the project list: PROJECT_MACROS
targetParamsKey(s): name of the parameter(s) that will be filled with the selected object
applicableToForeign (boolean, default false): can this role be applied to foreign elements (such as foreign datasets, folders or models)?
For example, a runnable.json
file with macro roles could look like that:
{
"meta" : {
"label" : "A great macros",
"description" : "It does stuff",
"icon" : "icon-trash"
},
"impersonate" : false,
"permissions" : ["READ_CONF"],
"resultType" : "HTML",
"resultLabel" : "The output",
"macroRoles": [
{
"type": "DATASET",
"targetParamsKey": "input_dataset",
"applicableToForeign": true
},
{
"type": "API_SERVICE_PACKAGE",
"targetParamsKeys": ["input_api_service", "input_api_service_package"]
}
],
"params": [
{
"name": "input_dataset",
"type": "DATASET",
"label": "Input dataset"
},
{
"name": "input_api_service",
"type": "API_SERVICE",
"label": "API Service"
},
{
"name": "input_api_service_version",
"type": "API_SERVICE_VERSION",
"apiServiceParamName": "input_api_service",
"label": "API Service version package",
"description": "retrieved from the API Service stated above"
}
]
}
Note
Only the API_SERVICE_VERSION
type needs an array specified through targetParamsKeys
, as it has to fill two related parameters: the API service and the API service package.
All the other types only need to specify one targetParamsKey
.
Result of a macro¶
In addition to performing its action, a macro can return a result, which will be displayed by the user. In many cases, the main job of a macro is to output some kind of report. In that case, the result is actually the main function of the macro.
To return a result from your macro, you must first define the resultType
field in the runnable.json
file.
Valid result types are defined below
HTML¶
In runnable.json
, set "resultType" : "HTML"
Your macro’s run
function must return a HTML string, which will be displayed inline in the result’s page. Users will have the option to download the HTML. You may use CSS declarations in your HTML code but please make sure to properly scope them so that they cannot interfere with DSS.
URL¶
In runnable.json
, set "resultType" : "URL"
Your macro’s run
function must return an URL as a string. Users will be presented with a link.
RESULT_TABLE¶
In runnable.json
, set "resultType" : "RESULT_TABLE"
.
This allows you to build a table view which will be properly formatted for display. We recommend that you use RESULT_TABLE rather than HTML if the output of your macro is a simple table, as you won’t have to handle styling and formatting.
In your macro’s run function, create and fill your result table as follows
from dataiku.runnables import Runnable, ResultTable
rt = ResultTable()
# First, declare the columns of the output
# Parameters to add_Column are: id, label, type
rt.add_column("dataset", "Dataset", "STRING")
rt.add_column("table", "Table", "STRING")
# Then, add records, as lists, in the same order as the columns
record = []
record.append("dataset_name")
record.append("table_name")
rt.add_record(record)
# Return the result table as the return value of the macro's run function
return rt
Valid types for columns are:
STRING: A regular string
STRING_LIST: Add a list of strings in the
record
array. It will be formatted comma-separated
Interacting with DSS in macros¶
The recommended way to interact with DSS in the code of a macro is either the internal Python API or the public API.
For internal API, for example, this includes dataiku.Dataset()
. Interaction with the public API is made easy by dataiku.api_client()
which gives you a public API client handle, automatically configured with the permissions of the user running the macro.
Progress reporting¶
You have the possibility to monitor the progress status of your macro during its execution by leveraging the progress_callback()
function.
The first step is to make the get_progress_target()
function return a (target, unit) tuple where:
target is the “final value” your progress bar will have to reach
unit defines which measure of scale the progress bar is assessing (for example SIZE if you are uploading/downloading a file, FILES if you are processing a list of files, RECORDS if you are processing a list of records, NONE if the unit is arbitrary, e.g. using a percentage).
def get_progress_target(self):
return (3, 'NONE')
The next step is then to invoke progress_callback()
within the run()
function with a “current progress” value to update the status of the progress bar every time it’s necessary.
def run(self, progress_callback):
# Write code for part 1/3 of your macro here
progress_callback(1)
# Write code for part 2/3 of your macro here
progress_callback(2)
# Write code for part 3/3 of your macro here
progress_callback(3)
Security of macros¶
Impersonation¶
You need to configure whether this macro will run with UNIX identity of the user running DSS, or with the identity of the final user. Note that this is only relevant for the users of your plugin if User Isolation Framework is enabled, but since your plugin might be used in both cases, you still need to take care of this.
Generally speaking, we recommend that your macros run with "impersonate" : true
. This means that they may not access the filesystem outside of their working directory and should only use DSS APIs for their operations.
If your macro runs with "impersonate": false
, it can access the filesystem, notably the DSS datadir.
Permissions¶
A macro always runs in the context of a project (which is passed to the macro’s constructor).
Your macro can define the project permissions that users must have to be able to run the macro. This is done in the permissions
field of the runnable.json
file.
Valid permission identifiers are:
ADMIN
READ_CONF
WRITE_CONF
For more information about the meaning of these permissions, see Main project permissions
In addition, if the users running the macro need to have global DSS administrator rights, set the "requiresGlobalAdmin"
field of runnable.json
to true