Resampling

Time series data can occur in irregular time intervals. However, to be useful for analytics, the time intervals need to be equispaced.

Resampling recipe

The resampling recipe transforms time series data occurring in irregular time intervals into equispaced data. The recipe is also useful for transforming equispaced data from one frequency level to another (for example, minutes to hours).

This recipe automatically works on all numerical columns (type int or float) in your data.

Input Data

Data that consists of n-dimensional time series in wide or long format.

Parameters

Timestamp column

Name of the column that contains the timestamps. Note that the timestamp column must have the date type as its meaning (detected by DSS), and duplicate timestamps cannot exist for a given time series.

Time step

Number of steps between timestamps of the resampled (output) data, specified as a numerical value.

Unit

Unit of the time step used for resampling, specified as one of these values:

  • Years
  • Months
  • Weeks
  • Days
  • Hours
  • Minutes
  • Seconds
  • Milliseconds
  • Microseconds
  • Nanoseconds

Interpolation method

Method used for interpolating timestamp values, specified as one of these values:

  • Nearest
  • Previous
  • Next
  • Mean
  • Linear: spline interpolation of first order
  • Quadratic: spline interpolation of second order
  • Cubic: spline interpolation of third order
  • Don’t interpolate (no value)

Interpolation is used to infer missing values located between available (not null) time series values. Interpolation methods are based on the scipy implementation.

Extrapolation method

Method used for extrapolating timestamp values, specified as one of these values:

<<<<<<< HEAD - Constant ======= - Constant: set to previous available value or next available value (if previous values are missing) >>>>>>> 5fa5d57… Explain “constant” option of extrapolation parameter in Resampling recipe - Same as interpolation - Don’t extrapolate (no value)

Extrapolation is used to infer missing values located before the first available time series value or after the last available value. Extrapolation prolongs time series that stop earlier than others or start later than others.

Clip start

Number of time steps to remove from the beginning of the time series, specified as a numerical value.

Clip end

Number of time steps to remove from the end of the time series, specified as a numerical value.

Shift value

Amount by which to shift (or offset) all timestamps, specified as a positive or negative numerical value.

Long format

Indicator that the input data is in the long format. See Long format.

Column with identifier

Name of column that contains identifiers for the different time series in the input data. This applies when your input data is in the long format.

Output Data

Data consisting of equispaced time series, and having the same number of columns as the input data.

Algorithms

The resampling recipe automatically upsamples or downsamples time series in your data so that the length of all the time series are aligned. When you specify a given time step (for example, 30 seconds), the recipe will upsample or downsample the time series by an integer multiple of the time step.

The recipe also performs both interpolation (See Interpolation method) and extrapolation (See Extrapolation method) to infer missing values.

Tip

  • The resampling recipe automatically works on all numerical columns. If you want to use the recipe on select columns, first create a recipe to extract the columns that you plan to use.