Interval extraction

It is sometimes useful to identify periods when time series values are within a given range. For example, a sensor reporting time series measurements may record values that fall outside an acceptable range, thus making it necessary to extract segments of the data.

Interval extraction recipe

The interval extraction recipe allows you to find segments of the time series where values of a column are inside an interval, while allowing small deviations. See Algorithms for more information.

This recipe automatically works on all numerical columns (int or float) in your time series data.

Input Data

Data that consists of equispaced n-dimensional time series in wide or long format.

If input data is in the long format, then the recipe will separately extract the intervals of each time series that is in a column. See Algorithms for more information.

Parameters

Timestamp column

Name of the column that contains the timestamps. Note that the timestamp column must have the date type as its meaning (detected by DSS), and duplicate timestamps cannot exist for a given time series.

Apply threshold to column

Name of the column to which the recipe applies the threshold parameters.

Minimal valid value

Minimum acceptable value in the time series interval, specified as a numerical value (int or float). The minimal valid value and the maximum valid value form the range of acceptable values.

Maximum valid value

Maximum acceptable value in the time series interval, specified as a numerical value (int or float). The maximum valid value and the minimal valid value form the range of acceptable values.

Unit

Unit of the window width, specified as one of these values:

  • Days
  • Hours
  • Minutes
  • Seconds
  • Milliseconds
  • Microseconds
  • Nanoseconds

Acceptable deviation

Maximum time duration during which values within a valid time segment can deviate from the range of acceptable values.

For example, if you specify 400 - 600 as a range of acceptable values, and an acceptable deviation of 30 seconds, then the recipe can return a valid time segment that includes values outside the specified range, provided that those values last for a time duration that is less than 30 seconds.

Minimal segment duration

The minimum time duration for a time segment to be valid, specified as a numerical value.

For example, you can specify 400 - 600 as a range of acceptable values, and a minimal segment duration of 3 minutes. If all the values in a time segment are between 400 and 600 (or satisfy the acceptable deviation), but the segment lasts less than 3 minutes, then the time segment would be invalid.

Long format

Indicator that the input data is in the long format. See Long format.

Column with identifier

Name of column that contains identifiers for the different time series in the input data. This applies when your input data is in the long format.

Output Data

Data consisting of equispaced and discontinuous time series. Each interval in the output data will have an id (“interval_id”).

Algorithms

For values of the minimal segment duration and acceptable deviation, the recipe implements the following steps.

  1. Evaluate if consecutive values of a time series satisfy at least one of these conditions:

    1. the values are in the range of acceptable values (between the minimal valid value and the maximum valid value)
    2. the values deviate from the range of acceptable values but last for a time period that is smaller than the acceptable deviation
  • If yes, then these values form a segment and the recipe proceeds to step 2.
  • If no, then the values are not acceptable, and the recipe repeats step 1 for successive values in the time series.
  1. Evaluate if the segment lasts for a time duration that is greater than the minimal segment duration.
  • If yes, then keep this segment as an acceptable interval
  • If no, then this segment is not an acceptable time interval

Return to step 1 to evaluate successive values in the time series.

Note

If the input data is in the long format, then for each time series in a specified column, the recipe will perform the interval extraction algorithm separately.

Tips

  • If you have irregular timestamp intervals, first resample your data, using the resampling recipe. Then you can apply the interval extraction recipe to the resampled data.
  • The interval extraction recipe automatically works on all numerical columns. If you want to use the recipe on select columns, first create a recipe to extract the columns that you plan to use.