Custom probes and checks

The predefined probes and checks handle simple cases, and more complex computations can be done using custom probes and custom checks. These are python functions, and run with access to DSS python API /api/public/python/index and DSS python public API Public API Python client .

Custom probe

A custom python probe is a function taking the dataset or folder as parameter and returning values.

Return values

The function can return a single value, in which case the metric gets the generic name “value”:

def process(dataset):
    return dataset.get_dataframe().shape[1]

A custom probe can also compute several values in one pass, and return them as a dictionary of name to value:

def process(dataset):
    df = dataset.get_dataframe()
    return {'num_rows' : df.shape[0], 'num_cols' : df.shape[1]}

DSS automatically infers the type of the metric’s value, among DOUBLE, BOOLEAN, BIGINT, STRING and ARRAY, but in some cases one wants to explicitly specify the type. For example, to get a ISO-formatted UTC timestamp recognized as a date, one has to pass the metric value under the form of a pair of (value, type)

from datetime import datetime as dt
from dataiku.metric import MetricDataTypes

def process(dataset):
    now = dt.strftime(dt.now(), '%Y-%m-%dT%H:%M:%SZ')
    return {'now_as_string' : now, 'now_as_date' : (now, MetricDataTypes.DATE)}

Partitioned datasets

For partitioned datasets, the function of the custom probe receives a second parameter: the partition on which the computation is requested. When the computation is requested on the full dataset and not on just one partition, the value passed is “ALL”.

Custom check

A custom check is a function taking the dataset, folder or saved model as parameter and returning a check outcome.

Note

It is advised to name all custom checks in order to distinguish the values they produce in the checks display, because custom checks can’t auto-generate a meaningful name.

If appropriate, a message can be returned as a second return value.

def process(last_values, dataset):
    if dataset.name == 'PROJ.a_dataset':
        return 'OK'
    else:
        return 'ERROR', 'not the expected dataset'

The last values of each metric for the dataset, folder or saved model are passed as the first parameter. This parameter is a dict of metric identifier to metric data point.

def process(last_values):
    if int(last_values['basic:COUNT_FILES'].get_value()) > 10:
        return 'OK'
    else:
        return 'ERROR', 'not enough files'