Filtering and flagging rows¶
DSS provides 5 processors for filtering data. These processors can:
- Remove rows based on various conditions
- Clear the content of cells based on the same conditions
- Create “flag” columns indicating whether each row matches a condition
The configuration for most of these processors can be divided into two sections:
- Defining match conditions that will be evaluated on a row
- Defining the action to perform on the rows matching the condition:
- Remove matching rows
- Keep matching rows only
- Clear the content of a cell, only for matching rows
- Clear the content of a cell, only for non-matching rows
- Create an indicator (0 / 1) column indicating whether the row matches the condition
Some of these processors can check their condition on multiple columns:
- A single column
- An explicit list of columns
- All columns matching a given pattern
- All columns
For processors that support column selection, you can select whether the column will be considered as matching if:
- All columns are matching
- Or, at least one column is matching
By setting the match mode, you can specify how you want this processor to search:
- ‘Complete value’ : match where the searched value is the complete cell value
- ‘Substring’ : match when the cell contains the searched value
- ‘Regular expression’: match when the cell matches the searched pattern (note: the regular expression is not anchored)
By setting the normalization mode, you can specifiy how you want this processor to search
- Using a case-sensitive search (‘Exact’ mode)
- Using a case-insensitive search (‘Lowercase’ mode)
- Using an accents-insensitive search (‘Normalize’ mode)
‘Normalize’ mode performs an unicode normalization.
- The boundaries of the numerical range are inclusive.
- Both lower and upper boundaries are optional.
- If the column does not contain a valid numerical value for a row, this row is considered as being out of the range (and thus non-matching).
- The boundaries are inclusive.
- Both lower and upper boundaries are optional
- If the column does not contain a valid date for a row, this row is considered as being out of the range.
- The provided time zone will be used to filter dates.
Note: this processor works on columns with “Date” meaning, i.e. parsed dates. For more information, please see Managing dates
The row matches if the result if the formula is considered as “truish”, which includes:
- A true boolean
- A number (integer or decimal) which is not 0
- The string “true”