Utilities

These classes are various utilities that are used in various parts of the API.

class dataikuapi.dss.utils.DSSDatasetSelectionBuilder

Builder for a “dataset selection”. In DSS, a dataset selection is used to select a part of a dataset for processing.

Depending on the location where it is used, a selection can include: * Sampling * Filtering by partitions (for partitioned datasets) * Filtering by an expression * Selection of columns * Ordering

Please see the sampling documentation of DSS for a detailed explanation of the sampling methods.

build()

Returns the built selection dict

with_head_sampling(limit)

Sets the sampling to ‘first records’ mode

with_all_data_sampling()

Sets the sampling to ‘no sampling, all data’ mode

with_random_fixed_nb_sampling(nb)

Sets the sampling to ‘Random sampling, fixed number of records’ mode

with_selected_partitions(ids)

Sets partition filtering on the given partition identifiers. The dataset to select must be partitioned.

class dataikuapi.dss.utils.DSSFilterBuilder

Builder for a “filter”. In DSS, a filter is used to define a subset of rows for processing.

build()

Returns the built filter dict

with_distinct()

Sets the filter to deduplicate

with_formula(expression)

Sets the filter to deduplicate