Classical Decomposition of a time series

A time series is a data set that has a time component. Yes, it is just what you think about, in the optimal case you have one value in a fixed time interval. It is a pretty actual topic to create massive datasets on behalf of sensors in an Internet of Things scenario, and you usually get too much values (>100 measurements per second), so to create a useful time series requires some preprocessing (filter or average). Missing values can be a problem, also it is usually not difficult to find a good estimate.

You can therefore build a time line which could look like this:


Here I used R's default dataset "AirPassengers" which reflects the monthly international airline passenger numbers in the years 1949 to 1960.

Natural questions are: 
  • Can you see which of the years were good, which of them were bad?
  • Can you split up the time series into more homogenious components?
  • Can you predict the values for future months? And how good are these predictions?
Time series analysis seems not too complex, however in reality often there are combinations of models required to make good predictions. In this post I want to show you how standard approches work. In reality, especially in retail, the timelines are usually not that easy to handle, as it is widely influenced by customer reviews.

The central idea is, that a time series $Y = (Y_t)$ is a combination of a three independent sub - time series:
- A trend component T is a long-term tendence in the data, it does not have to be linear.
- A seasonal component S is a pattern that reoccurs regularily after a fixed period (like every summer, every january or every day at 10:30).
- A random component I, also called irregular or noise.

We want to try to find these three time series in the upper mentioned example. First we have to decide on the type of decomposition, we can choose from additive and multiplicative.
In an additive model we add the 3 sub time series up to get the original time series: $$Y_t = T_t + S_t + I_t$$ You should use it when the seasonal variance does not change that much.

In a multiplicative model we multiply the 3 sub time series: $$Y_t= T_t * S_t * I_t$$ Use it when you see the peeks growing with time, like in the earlier mentioned example of airplane passengers. Here we should go for a multiplicative model.



Tip: A multiplicative model often can be changed into an additive model using the log function.

How would we get the values for the trend, seasonal and random sub time series? We will go step by step, just to motivate, here the result calculated by R with function decompose:


Here the corresponding coding in R: $$plot(decompose(ts(AirPassengers, frequency = 12, start =1949), type = "mult"))$$

To go on:

1. Here is how to determine the trend component
2. Here is how to determine the seasonal and random component
3. Here is a summary on the classical decomposition of time series

Previous
Next Post »
0 Comment