Measuring Seasonality in Additive Models: An In-Depth Guide

Understanding seasonal patterns when building an MMM is crucial. When we talk about seasonality, we are referring to recurring periodic fluctuations. As such, we may want to model a wide range of things to measure seasonality, including recurring holiday cycles, weekly, monthly, or yearly patterns, and many other signals. We can employ various methods in additive models to measure seasonality. This post will discuss some of these methods: dummy variables, Fourier terms, and Gaussian processes.

Dummy Variables

Dummy or indicator variables are binary variables used to represent categorical data. When applied to seasonality, dummy variables can capture the seasonal effect by assigning a 0 or 1 to each season or period. This may sound not very easy, but it’s straightforward. We create a new column in our dataset and give the column a name, such as “Sunday.” Then, we check for each date in our dataset to see if that day is a Sunday. If it is, we assign it a 1. If it isn’t, we assign it a 0.

We can use this technique to model seasonality by creating a binary variable for each period within a season. For example, if monthly seasonality is present in a dataset, twelve dummy variables (one for each month) are introduced. Each dummy variable takes the value of 1 if the observation belongs to that specific month and 0 otherwise. Then, in our MMM, we incorporate these dummy variables to account for the seasonal effects.

Dummy variables are helpful to have in your toolkit for several reasons:

They’re incredibly straightforward to implement and interpret. Each dummy variable directly represents a specific period, making the model easy to understand.
They are exceedingly flexible; we can include them for any seasonal pattern we are trying to measure as long as it is separable into discrete periods.
The math behind dummy variables is simple, and they do not add much complexity to our underlying model.

One of the main issues with dummy variables is that, as we add more of them to account for different seasonal periods, we increase the number of columns in our dataset. Having too many features in our dataset can cause issues with overfitting and collinearity, both of which make our models less reliable.

Another issue with dummy variables is that, in MMMs, our media may have differing effects depending on seasonal factors. For example, if we run a swimsuit company, we may have increased demand during the summer months. Suppose we add dummy variables for June, July, and August. Adding dummy variables for these months might incorrectly attribute increased revenue to the months rather than the marketing efforts.

This credit assignment issue can mislead the model into giving undue credit to the time period rather than the marketing activities that occurred during that time. Luckily, we can adjust our dummy variable to be an interaction term. Interaction terms are the multiplication of two variables together. This interaction term will help answer the question of how the combination of my marketing spend with the fact that it’s summer relates to my revenue when both are activated together. Again, this is simple to implement and relatively simple to interpret. We multiply our dummy variable by our marketing spend.

Mathematically, the seasonal component $S_t$ can be represented as:

where:

$k$ is the number of periods in the season (e.g., 12 for monthly data)
$\beta_i$ is the coefficient for the $i$ -th dummy variable
$D_{i,t}$ is the dummy variable for the $i$ -th period.

Fourier Terms

Fourier terms leverage sine and cosine functions to model seasonality. Sine and cosine waves are shapes that oscillate above and below a central axis. Fourier terms combine these sine and cosine waves of different sizes and frequencies to measure seasonal changes. When we combine the sine and cosine functions, we can mimic the seasonal ups and downs seen in the data.

To model seasonality using Fourier terms, the seasonal component is expressed as a sum of sine and cosine functions with varying frequencies. The number of frequencies (harmonics) used depends on the complexity of the seasonal pattern. The Fourier terms are then included in the additive model to account for the seasonal variation.

An excellent way to think about this transformation is to start with some unknown seasonality. For something to be seasonal it must repeat, and seasonal patterns can be complex. We aim to break this complex signal into pieces to model it effectively. To do this, we use harmonics. Harmonics are the building blocks of a complex signal, consisting of frequency components that, when summed together, recreate the original signal.

When using a Fourier transform, we must provide the number of harmonics we want to include and the period we want to consider. For example, if we are modeling yearly seasonality in monthly data, we would set our period equal to 12 to represent 12 months. Once we’ve done that, we can focus on choosing the appropriate number of harmonics. We use the variable $K$ to represent the number of harmonics.

The first harmonic we include is called the fundamental frequency. For yearly seasonality, the fundamental frequency corresponds to a cycle that repeats once yearly, $\frac{1}{12}$ cycles per month. The second harmonic captures patterns that repeat twice per year ( $\frac{2}{12}$ ) cycles per month). The third harmonic captures patterns that repeat three times per year ( $\frac{3}{12}$ ) , and so on. We can add as many of these harmonics as we want by increasing the value of k, but this increases the complexity of the model and the risk of overfitting.

Once we have specified the period and the number of harmonics, we can estimate the amplitude of the waves. As in other forms of regression, this is done by placing coefficients in front of our sine and cosine terms, representing the strength of each seasonal effect.

Unlike dummy variables, Fourier terms provide a compact representation of seasonality. This means they require fewer parameters than dummy variables, especially for complex seasonal patterns. Another nice feature of Fourier terms is that we get a continuous representation of seasonality because we leverage sine and cosine waves. Another benefit of Fourier terms is that they can efficiently handle multiple seasonal cycles (e.g., daily and yearly seasonality), and all we have to do is change the period.

Even though Fourier terms have a lot of strengths, they can also be more involved than dummy variables. Choosing the appropriate number of harmonics can be challenging. Too few harmonics may cause us to fail to capture the seasonal pattern accurately, and too many harmonics can lead to overfitting. Fourier terms also aren’t as easy to interpret as dummy variables. Understanding what specific period each wave corresponds to is nontrivial. As I said, having smoothed continuous patterns is a nice feature of Fourier terms; however, this also means they struggle to capture irregular seasonality.

Mathematically, the seasonal component $S_t$ can be represented as:

where:

$K$ is the number of harmonics
$a_k$ and $b_k$ are the Fourier coefficients
$T$ is the period of the seasonality (e.g., 12 for monthly data).

Gaussian Processes

Gaussian Processes (GPs) offer a flexible, non-parametric approach to modeling seasonality. GPs can capture a wide range of seasonal patterns by learning the underlying structure directly from the data.

Gaussian Processes offer many of the same benefits as Fourier terms, except with even more flexibility. GPs allow us to model a function that can take infinite possible shapes. Instead of specifying a single function, a GP provides a distribution over functions. We have an endless number of candidate functions, and each function has its own probability of being the best candidate for our model.

Inside of GP is something called a covariance function, otherwise known as a kernel. This function tells us how two points co-vary with one another (how much they influence each other). These covariance functions determine the properties of the functions drawn from the GP. For example, it determines how smooth the function is or whether it has repeating patterns (like sine or cosine waves). The choice of this covariance function dictates the properties of the seasonality.

Luckily for us, some commonly used kernels excel at modeling seasonality. We call these including periodic kernels because they explicitly model repeating patterns. Using a GP framework, it then learns the seasonal structure by optimizing the kernel parameters based on the data. Similar to our Fourier terms, we have a period to determine the distance between peaks. We also have a length scale which controls the width of the peaks.

Unlike Fourier terms, GPs can model various seasonal patterns, including irregular seasonality, due to the customization of kernels, making them a flexible alternative, albeit involved. Another benefit of GPs is that they provide uncertainty estimates for the seasonal component, something we don’t typically get from Fourier terms. These uncertainty estimates help understand the confidence in predictions and identify regions of high variability. Another benefit of using GPs for seasonality is that they do not have a fixed number of parameters, which allows the model to adapt directly to the data.

While GPs offer tremendous flexibility, they are much more involved than the other methods I mentioned. Selecting appropriate kernels can be challenging. Choosing the wrong kernel can also lead to poor model performance because the GP would fail to capture the seasonal patterns accurately. GPs are also incredibly computationally intensive, specifically in cases where there are a large number of data points.

Mathematically, the seasonal component $S_t$ is modeled as:

$S_t \sim \mathcal{GP}(m(t), k(t, t'))$

where:

$m(t)$ is the mean function
$k(t, t')$ is the covariance function (kernel).

A commonly used kernel for periodic seasonality is the periodic kernel:

$k(t, t') = \sigma^2 \exp\left( -\frac{2\sin^2(\pi (t - t') / T)}{\ell^2} \right)$

where:

$\sigma^2$ is the signal variance
$\ell$ is the length-scale parameter
$T$ is the period of the seasonality.

Practical Considerations

When we look to model seasonality, we must first understand the nature of the seasonal pattern. Do we expect a regular pattern or an irregular pattern? How many cycles do we expect to see? Having enough data to capture the effect we’re looking for is also paramount. For example, if we want to model yearly seasonality, we will need more than a year’s worth of data.

Interpretability is crucial in MMMs. While Fourier terms and GPs have many benefits, they can be more challenging to interpret and explain to downstream consumers.

Leveraging Fourier terms provides an excellent middle ground and a great place to start. In additive models, we can directly see the patterns and visually inspect them to see if they fit our intuition. One of the most important things to remember is to perform backtests when selecting a seasonal and including seasonal terms. Performing backtests ensures we know potential overfitting and track how well our models generalize.

Final Thoughts

Incorporating seasonality into additive models is a fundamental step in time series analysis. Analysts and data scientists can make informed choices that enhance their forecasts’ accuracy and reliability by understanding the different methods available and their respective pros and cons. Whether using the simplicity of dummy variables, the elegance of Fourier terms, or the flexibility of Gaussian processes, the goal remains to effectively capture and leverage the seasonal patterns inherent in the data.