Day 1 with “Time Series Forecasting with Python” by Marco Peixeiro – Curves & Confidence: A Math Stats Explorer's Log

I feel compelled to share the pearls of knowledge this insightful book on time series analysis has bestowed upon me as I delve deeper into its pages.

Defining Time Series: A data point sequence arranged chronologically is called a time series. It is a compilation of measurements or observations made at regular intervals that are equally spaced apart. Time series data are widely used in many disciplines, such as environmental science, biology, finance, and economics. Understanding the underlying patterns, trends, and behaviours that might be present in the data over time is the main goal when working with time series. Time series analysis is the study of modelling, interpreting, and projecting future values from past trends.
Time Series Decomposition: A technique for dissecting a time series into its fundamental elements-trend, seasonality, and noise, is called time series decomposition. These elements aid us in comprehending data patterns more clearly.
- Trend: The data’s long-term movement or direction. It aids in determining if the series is steadily rising, falling, or staying the same over time.
- Seasonality: Seasonal components identify recurring, regular patterns in the data that happen on a regular basis. Retail sales, for instance, may show seasonality, with higher values around the holidays.
- Noise (or Residuals): This element stands for the sporadic variations or anomalies in the data that are not related to seasonality or trends. It is the time series’ “unexplained” portion, in essence.
Decomposing a time series into these components aids in better understanding the data’s structure, facilitating more accurate forecasting and analysis.
Forecasting Project Lifecycle: The entire process of project lifecycle forecasting entails making predictions about future trends or outcomes using historical data. Usually, this lifecycle has multiple stages:
- Data Collection
- Exploratory Data Analysis (EDA): Examine the data to find patterns, outliers, and other features that might have an impact on the forecasting model.
- Model Selection: Choose an appropriate forecasting model based on the nature of the data. Common models include ARIMA (AutoRegressive Integrated Moving Average), Exponential Smoothing, and machine learning algorithms.
- Training the Model: Utilise past data to train the chosen model. This entails matching the parameters of the model to the historical observations..
- The usuals: Validation and Testing, Deployment ,Monitoring and Maintenance
In order to guarantee precise and current forecasts, the forecasting project lifecycle is iterative, requiring frequent updates and modifications.
Baseline Models: Simple benchmarks or reference points for more complex models are provided by baseline models. They offer a minimal level of prediction, which is expected to be surpassed by more sophisticated models. In order to determine whether the increased complexity of a more complex model is warranted by its superior performance over a simpler method, baseline models are essential.
- Mean or Average Baseline: This is projecting a time series’ future value using the mean of its historical observations. The mean baseline, for instance, would be the average temperature over a given historical period if you were forecasting the daily temperature.
- Naive Baseline: Using the most recent observation as a basis, this model forecasts the future value. This refers to the time series assumption that the subsequent value will coincide with the most recent observed value.
- Seasonal Baseline: The seasonal baseline is a method of forecasting future values for time series that exhibit a distinct seasonal pattern by utilising the average historical values of the corresponding season.
Baseline models are essential for establishing a performance baseline and ensuring that any advanced model provides a significant improvement over these simple approaches.
1. Random Walk Model: For time series forecasting, the random walk model is a straightforward but surprisingly powerful baseline. It assumes that any variations are totally random and that a time series’ future value will be equal to its most recent observed value. It can be stated mathematically as: $Y(_{t)}$ is the value at time, $Y(_{t - 1}$ is the most recent observed value, and E is a random error term.
  Key characteristics of the random walk model:
  1. Persistence: According to the model, the present is the best indicator of the future. In the event that the series exhibits non-stationarity, a trend will be followed by the random walk.
  2. Noisy Movements: The model can capture noise or short-term fluctuations because of the random error term E, which adds randomness.
  3. Usefulness as a Baseline: Even though it is straightforward, the random walk model can be surprisingly successful for some kinds of time series, particularly those that have erratic, unpredictable movements.
  When determining whether more intricate models result in appreciable increases in prediction accuracy, the random walk model is frequently employed as a benchmark. If an advanced model is unable to beat the random walk, it may be difficult to identify the underlying patterns in the data.

Let’s try exploring the ‘Economic Indicators’ dataset from Analyze Boston and try to see what the baseline(mean) for Total International flights at Logan Airport looks like.

The computation of the historical average of international flights at Logan Airport forms the basis of the baseline model. For this simple measure, depending on the temporal granularity of the dataset, the mean of international flights is calculated for each unit of time, such as months or years. The basis for forecasting upcoming international flights is the computed historical average.

The underlying assumption here is that future values will mirror the historical average, a concept that is explained by the formula explained earlier Y(t) = Y(t-1) + E.
The blue and red lines’ alignment in the visualization, or lack thereof, offers a quick indicator of how well the baseline model is performing. A tight harmony implies that the model encapsulates the essence of historical trends, providing a strong foundation for more complex models that come after. On the other hand, deviations invite a more thoughtful analysis as they indicate possible shortcomings in the baseline model and encourage the investigation of more sophisticated forecasting techniques.

Leave a Reply Cancel reply