Day 5 with “TSFP” – Curves & Confidence: A Math Stats Explorer's Log

Let’s attempt a thorough analysis of our models today. Residual analysis, as we all know, is a crucial stage in time series modelling to evaluate the goodness of fit and make sure the model assumptions are satisfied. The discrepancies between the values predicted by the model and the observed values are known as residuals.

Here’s how we can perform residual analysis for your AR and MA models:

Compute Residuals:
- Calculate the residuals by subtracting the predicted values from the actual values.
Plot Residuals:
- To visually examine the residuals for trends, patterns, or seasonality, plot them over time. The residuals of a well-fitted model should look random and be centred around zero.
Autocorrelation Function (ACF) of Residuals:
- To see if there is any more autocorrelation, plot the residuals’ ACF. The ACF plot shows significant spikes, which suggest that not all of the temporal dependencies were captured by the model.
Histogram and Q-Q Plot:
- Examine and compare the residuals histogram with a normal distribution. To evaluate normality, additionally employ a Q-Q plot. Deviations from normalcy could indicate that there is a breach in the model’s presumptions.

If you’re wondering why you should compare the histogram of residuals to a normal distribution or why deviations from normality may indicate that the model assumptions are violated, you’re not alone. Normality is a prerequisite for many statistical inference techniques, such as confidence interval estimation and hypothesis testing, that the residuals, or errors, follow a normal distribution. Biassed estimations and inaccurate conclusions can result from deviations from normality.

The underlying theory of time series models, including ARIMA and SARIMA models, frequently assumes residual normality. If the residuals are not normally distributed, the model may not accurately capture the underlying patterns in the data.

Here’s why deviations from normality might suggest that the model assumptions are violated:

Validity of Confidence Intervals:
- The normality assumption is critical for constructing valid confidence intervals. The confidence intervals may be unreliable if the residuals are not normally distributed, resulting in incorrect uncertainty assessments..
Outliers and Skewness:
- Deviations from normality in the histogram could indicate the presence of outliers or residual skewness. It is critical to identify and address these issues in order to improve the model’s performance.

Let’s run a residual analysis on whatever we’ve been doing with “Analyze Boston” data.

Residuals over time: This plot describes the pattern and behaviour of the model residuals, or the discrepancies between the values that the model predicted and the values that were observed, over the course of the prediction period. It is essential to analyse residuals over time in order to evaluate the model’s performance and spot any systematic trends or patterns that the model may have overlooked. There are couple of things to look for:
- Ideally, residuals should appear random and show no consistent pattern over time. A lack of systematic patterns indicates that the model has captured the underlying structure of the data well.
- Residuals should be centered around zero. If there is a noticeable drift or consistent deviation from zero, it may suggest that the model has a bias or is missing important information.
- Heteroscedasticity: Look for consistent variability over time in the residuals. Variations in variability, or heteroscedasticity, may be a sign that the model is not accounting for the inherent variability in the data.
- Outliers: Look for any extreme values or outliers in the residuals. Outliers may indicate unusual events or data points that were not adequately captured by the model
- The absence of a systematic pattern suggests that the models are adequately accounting for the variation in the logan_intl_flights data.
- Residuals being mostly centered around the mean is a good indication. It means that, on average, your models are making accurate predictions. The deviations from the mean are likely due to random noise or unexplained variability.
- Occasional deviations from the mean are normal and can be attributed to random fluctuations or unobserved factors that are challenging to capture in the model. As long as these deviations are not systematic or consistent, they don’t necessarily indicate a problem.
- Heteroscedasticity’s absence indicates that the models are consistently managing the variability. If the variability changed over time, it could mean that the models have trouble during particular times.
The ACF (Autocorrelation Function) of residuals: demonstrates the relationship between the residuals at various lags. It assists in determining whether, following the fitting of a time series model, any residual temporal structure or autocorrelation exists. The ACF of residuals can be interpreted as follows:
- No Significant Spikes: The residuals are probably independent and the model has successfully captured the temporal dependencies in the data if the ACF of the residuals decays rapidly to zero and does not exhibit any significant spikes.
- Significant Spikes: The presence of significant spikes at specific lags indicates the possibility of residual patterns or autocorrelation. This might point to the need for additional model improvement or the need to take into account different model structures.
- There are no significant spikes in our ACF, it suggests that the model has successfully removed the temporal dependencies in the data.
Histogram and Q-Q plot:
- Look at the shape of the histogram. It should resemble a bell curve for normality. A symmetric, bell-shaped histogram suggests that the residuals are approximately normally distributed. Check for outliers or extreme values. If there are significant outliers, it may indicate that the model is not capturing certain patterns in the data. A symmetric distribution has skewness close to zero. Positive skewness indicates a longer right tail, and negative skewness indicates a longer left tail.
- In a Q-Q plot, if the points closely follow a straight line, it suggests that the residuals are normally distributed. Deviations from the line indicate departures from normality. Look for points that deviate from the straight line. Outliers suggest non-normality or the presence of extreme values. Check whether the tails of the Q-Q plot deviate from the straight line. Fat tails or curvature may indicate non-normality.
- A histogram will not reveal much to us because of the small number of data points we have. Each bar’s height in a histogram indicates how frequently or how many data points are in a given range (bin).
- See what I mean?

Now let’s learn about a statistical test that determines whether a time series at various lags has significant autocorrelations. It is frequently used to determine whether autocorrelation exists in a model’s residuals when doing time series analysis.

Ljung-Box Test Procedure:

Null Hypothesis (H0): The null hypothesis of the Ljung-Box test is that there is no autocorrelation in the time series at lags up to a specified maximum lag.
Alternative Hypothesis (H1): The alternative hypothesis is that there is significant autocorrelation in the time series at least at one lag up to the specified maximum lag.
Test Statistic: The test statistic is based on the sum of the squares of the autocorrelations at different lags.
Critical Values: The test compares the test statistic to critical values from the chi-square distribution. If the test statistic exceeds the critical value, the null hypothesis is rejected, suggesting the presence of significant autocorrelation.
When statistically significant autocorrelation is present in about 25% of the data, it suggests that the model is unable to adequately explain temporal dependencies within the residuals. It implies that there are lags where the residuals are not random or independent. In other words, there is a temporal dependency or structure in the residuals that the model has not adequately explained.
Significant autocorrelation suggests that there are undiscovered subtleties or patterns in the time series data, which may originate from variables that were missed or from inherent complexity.
This highlights the need for model improvement, promoting the investigation of different specifications, changes to the parameters, or the addition of new features.
It becomes imperative to analyse individual lags with significant autocorrelations in order to spot patterns and guide iterative model enhancements.

Finally, the covers meet. I feel like I’ve absorbed a lot of information from its pages. It’s time to let the information settle and brew in my mind. Until our next time-series adventure!

Leave a Reply Cancel reply