Metrics, checks and Data Quality¶
Metrics allow you to automate computation of various measurements on flow items (datasets, managed folders, saved models and model evaluation stores). You can use Checks to assert whether metric values meet certain conditions.
Data Quality rules are an improvement over the check mechanism for datasets. They allow you to define expectations on a dataset’s contents in a single step and also provide different views to monitor and analyze data quality issues across datasets, projects, and the full Dataiku instance.
- Metrics
- Checks
- Data Quality Rules
- Data Quality rule types
- Column min in range
- Column min is within its typical range
- Column avg in range
- Column avg is within its typical range
- Column max in range
- Column max is within its typical range
- Column sum in range
- Column sum is within its typical range
- Column median in range
- Column median is within its typical range
- Column std dev in range
- Column std dev is within its typical range
- Column values are not empty
- Column values are empty
- Column empty value is within its typical range
- Column values are unique
- Column unique value is within its typical range
- Column values in set
- Column top N values in set
- Column most frequent value in set
- Column values are valid according to meaning
- Metric value in range
- Metric value in set
- Metric value is within its typical range
- File size in range
- Record count in range
- Record count is within its typical range
- Column count in range
- Column count is within its typical range
- Python code
- Compare values of two metrics
- Plugin rules
- Rule configuration
- Data Quality monitoring views
- Other data quality views
- Data Quality on partitioned datasets
- Retro-compatibility with Checks
- Data Quality rule types
- Custom probes and checks
- Data Quality Templates