Format of time series data¶
You can import time series data into Dataiku DSS in one of two formats:
- Wide format
- Long format
Use the Pivot processor to convert data from long to wide format. You can also convert data from wide to long format by using the Fold multiple columns processor or the Fold multiple columns by pattern processor.
The Reshaping Data from Long to Wide Format tutorial shows how to convert data from one format to another.
Time series data is in wide format if you have multiple time series and each distinct time series is in a separate column.
For example, given stock market prices for Chevron Corporation (CVX) and Exxon Mobil (XOM), the data can be represented in wide format if you have one column for each company’s prices.
Furthermore, if each time series in this data has multiple dimensions (such as closing price and volume of stocks traded), then you have multiple multivariate time series data. The following figure shows a snippet of this data in wide format.
Long format is a compact way of representing multiple time series. In long format, values of the same variable (e.g., price) from distinct time series are stored in the same column. Data in the long format also has an identifier column that provides context for the value in each row.
Again, consider the previous example of multiple multivariate time series containing the closing price and volume for CVX and XOM. The following figure shows a snippet of the data in long format. Notice that the Ticker column acts as the identifier column.