If I were you, I would question why we haven’t discussed AR or MA in isolation before turning to S/ARIMA. This is due to the fact that ARIMA is merely a combination of them, and you can quickly convert an ARIMA to an AR by offsetting the parameter for the MA and vice versa. Before we get started, let’s take a brief look back. We have seen how the ARIMA model operates and how to manually determine the proper method for determining the model parameters through trial and error. I have two questions for you: is there a better way to go about doing this? Secondly, we have seen models like AR, MA, and their combination ARIMA, how do you choose the best model?
Simply put, an MA model uses past forecast errors (residuals) to predict future values, whereas an AR model uses past values of the time series to do so. It assumes that future values are a linear combination of past values, with coefficients representing the weights of each past value. It makes the assumption that the future values are the linear sum of the forecast errors from the past.
Choosing Between AR and MA Models:
Understanding the type of data is necessary in order to select between AR and MA models. An AR model could be appropriate if the data shows distinct trends, but MA models are better at capturing transient fluctuations. Model order selection entails temporal dependency analysis using statistical tools such as the Partial AutoCorrelation Function (PACF) and AutoCorrelation Function (ACF). Exploring both AR and MA models and contrasting their performance using information criteria (AIC, BIC) and diagnostic tests may be part of the iterative process.
A thorough investigation of the features of a given dataset, such as temporal dependencies, trends, and fluctuations, is essential for making the best decision. Furthermore, taking into account ARIMA models—which integrate both AR and MA components—offers flexibility for a variety of time series datasets.
In order to produce the most accurate and pertinent model, the selection process ultimately entails a nuanced understanding of the complexities of the data and an iterative refinement approach.
Let’s get back to our data from “Analyze Boston”. To discern the optimal AutoRegressive (AR) and Moving Average (MA) model orders, Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots are employed.
Since the first notable spike in our plots occurs at 1, we tried to model the AR component with an order of 1. And guess what? After a brief period of optimism, the model stalled at zero. The MA model, which had the same order and a flat line at 0, produced results that were comparable. For this reason, in order to identify the optimal pairing of AR and MA orders, a comprehensive search for parameters is necessary.
As demonstrated in my earlier post, these plots can provide us with important insights and aid in the development of a passable model. However, we can never be too certain of anything, and we don’t always have the time to work through the labor-intensive process of experimenting with the parameters. Why not just have our code handle it.
To maximise the model fit, the grid search methodically assessed different orders under the guidance of the Akaike Information Criterion (AIC). As a result of this rigorous analysis, the most accurate AR and MA models were found, and they were all precisely calibrated and predicted. I could thus conclude that (0,0,7) and (8,0,0) were the optimal orders for MA and AR respectively.
Interesting!
The best-fitting AR and MA models’ forecasting outcomes for second-order differenced Logan International Flights data are shown in the plot. The forecast from the best AR model, which captures the autoregressive temporal dependencies, is shown by the blue dashed line. The forecast from the best MA model, which captures short-term fluctuations, is indicated by the orange dashed line. By attempting to predict the differed data, both models demonstrate their unique advantages. By comparing the models, one can evaluate the degree to which each one accurately depicts the underlying patterns in the time series, which can help determine which forecasting strategy is best.
Looking back, our experience serves as a prime example of why time series analysis must progress beyond visual exploration. The combination of statistical analysis, model fitting, and iterative refinement proved crucial, even though visual cues provide a foundational understanding. Even though the AR and MA models were a part of the ARIMA framework, their behaviours were subtle. ACF, PACF, and the grid search all interacted to determine the orders selected for the AR and MA components, which were critical to the models’ performance.
Now that we are working with a combination corresponding to least AIC without having to perform the work manually, it is finally time to build a model about which we can be optimistic.
The main performance evaluation metric was Mean Absolute Error (MAE). With its lowest MAE of 147.32, the AR model demonstrated the highest predictive accuracy. But the process of making decisions went beyond MAE. Model interpretability, complexity, and robustness were taken into account, acknowledging that choosing a model necessitates a careful assessment.
Call this a win-win!
With both AR and MA components combined, the ARIMA model showed a competitive MAE of 175.82. This all-encompassing strategy guarantees a thorough choice that strikes a balance between practical concerns and statistical measurements, realising that lower MAE, although important, is only one factor in the larger picture of successful forecasting.
What if I told you there are other approaches as well?
I hope that I was able to adequately introduce you to all of the time series concepts that we covered. We will now go over something called the General Modelling Procedure (GMP). It’s like a forecasting road map, guiding us to the best way to model our data and make predictions.
General Modelling Procedure:
- The steps for identifying a stationary ARMA(p,q) process were covered in the previous section.
- Our time series can be modelled by an ARMA(p,q) process if both the ACF and PACF plots show a sinusoidal or decaying pattern.
- Neither plot, however, was useful in determining the orders p and q. In both plots of our simulated ARMA(1,1) process, we found that coefficients were significant after lag 1 and the model flatlined.
- As a result, we had to devise a method for determining the orders p and q. But this procedure we are going to talk about now has the advantage of being applicable in cases where our time series is non-stationary and has seasonal effects. It will also be appropriate in cases where p or q are equal to zero.
Wait a minute, does that mean everything we’ve been doing has been a waste of time when we could have gotten right to this? No!
- The first few steps are the same as those we gradually built up until the first half of this post, as we still need to collect data, test for stationarity, and apply transformations as needed. Then we list the various possible values of p and q. We can fit every unique combination of ARMA(p,q) to our data using a list of possible values.
- After that, we can calculate the Akaike information criterion (AIC). This measures the quality of each model in comparison to the others. After that, the model with the lowest AIC is chosen.
- The residuals of the model, which are the differences between the model’s actual and predicted values, can then be examined. Ideally, the residuals should resemble white noise, implying that any difference between predicted and actual values is due to randomness. As a result, the residuals must be uncorrelated and independent.
- We can evaluate those properties by examining the quantile-quantile plot (Q-Q plot) and performing the Ljung-Box test.
- If the analysis leads us to the conclusion that the residuals are completely random, we have a forecasting model.
- This is all you need!
Understanding AIC (Akaike information criterion)
- A model’s quality in comparison to other models is estimated by the AIC.
- The AIC measures the relative amount of information lost by the model, taking into account that some information will always be lost during the fitting process. The better the model, the lower the AIC value and the less information lost.
- AIC = 2k – 2log(L)
- The maximum value of the likelihood function L and the number of parameters (k) in a model determine the AIC’s value, the lower the AIC, the better the model. We can maintain a balance between a model’s complexity and its goodness of fit to the data by making selections based on the AIC.
- An ARMA(p,q) model’s order (p,q) is directly correlated with the number of estimated parameters, k. To estimate, we have 2 + 2 = 4 parameters if we fit an ARMA(2,2) model.
- It is evident how fitting a more complicated model can penalise the AIC score, the AIC rises as the order (p,q) increases along with the number of parameters (k).
- The likelihood function calculates a model’s goodness of fit. It can be thought of as the distribution function’s opposite. The probability of observing a data point is determined by the distribution function, given a model with fixed parameters.
- The logic is inverted by the likelihood function. It will calculate the likelihood that various model parameters will produce the observed data given a set of observed data.
- We can think of the likelihood function as an answer to the question “How likely is it that my observed data is coming from an ARMA(2,2) model?” If it is very likely, meaning that L is large, then the ARMA(2,2) model fits the data well.