Unraveling Time Series Causality with Granger Methods

Ziyi Zhu

Ziyi Zhu / December 11, 2022

11 min read––– views

This article discusses causal illusions as a form of cognitive bias and explores the use of Granger causality to detect causal structures in time series. It is common practice to analyse (linear) structure, estimate linear models and perform forecasts based on single stationary time series. However, the world does not consist of independent stochastic processes. In accordance with general equilibrium theory, economists usually assume that everything depends on everything else. Therefore, it is important to understand and quantify the (causal) relationships between different time series.

Epiphenomena

Epiphenomena is a class of causal illusions where the direction of causal relationships is ambiguous. For example, when you spend time on the bridge of a ship with a large compass in front, you can easily develop the impression that the compass is directing the ship rather than merely reflecting its direction. Here is an image that perfectly illustrates the point that correlation is not causation:

image

Nassim Nicholas Taleb explored this concept in his book Antifragile to highlight the causal illusion that universities generate wealth in society. He presented a miscellany of evidence which suggests that classroom education does not lead to wealth as much as it comes from wealth (an epiphenomenon). Taleb proposes that antifragile risk-taking is largely responsible for innovation and growth instead of education and formal, organized research. However, it does not mean that theories and research play no role, but rather shows that we are fooled by randomness into overestimating the role of good-sounding ideas. Because of cognitive biases, historians are prone to epiphenomena and other illusions of cause and effect.

In economic and financial time series, epiphenomena are particularly common. For instance, the relationship between interest rates and inflation often appears causal in both directions—do higher interest rates cause lower inflation, or does higher inflation prompt central banks to raise interest rates? Without proper analysis, we might misattribute causality in the wrong direction, leading to flawed policy decisions.

We can debunk epiphenomena in the cultural discourse and consciousness by looking at the sequence of events and checking their order or occurrences. This method is refined by Clive Granger who developed a rigorously scientific approach that can be used to establish causation by looking at time series sequences and measuring the "Granger cause."

Granger Causality

Granger causality is fundamentally based on prediction. The intuition is straightforward: if a variable XX helps predict the future values of variable YY beyond what YY's own history can predict, then XX is said to "Granger cause" YY. This concept, developed by Nobel laureate Clive Granger in 1969, provides a statistical framework for determining causal relationships between time series.

In the following, we present the definition of Granger causality and the different possibilities of causal events resulting from it. Consider two weakly stationary time series xx and yy:

  1. Granger Causality: xx is (simply) Granger causal to yy if and only if the application of an optimal linear prediction function leads to a smaller forecast error variance on the future value of yy if current and past values of xx are used.

  2. Instantaneous Granger Causality: xx is instantaneously Granger causal to yy if and only if the application of an optimal linear prediction function leads to a smaller forecast error variance on the future value of yy if the future value of xx is used in addition to the current and past values of xx.

  3. Feedback: There is feedback between xx and yy if xx is causal to yy and yy is causal to xx. Feedback is only defined for the case of simple causal relations.

To illustrate with an example: consider GDP growth (xx) and unemployment rate (yy). If including past values of GDP growth in our model helps us better predict future unemployment rates compared to just using past unemployment data, then GDP growth "Granger causes" unemployment. This doesn't necessarily mean GDP growth directly causes unemployment in the philosophical sense, but rather that it contains information useful for predicting unemployment.

Hypothesis Testing in General Linear Models

Before diving into specific Granger causality tests, it's important to understand the general framework of hypothesis testing in linear models, as this forms the statistical foundation for detecting causal relationships.

Consider the general linear model Y=Xβ+ε\mathbf{Y} = X \boldsymbol{\beta} + \varepsilon, where Y\mathbf{Y} is the dependent variable, XX is the matrix of explanatory variables, β\boldsymbol{\beta} is the coefficient vector, and ε\varepsilon is the error term. In hypothesis testing, we want to know whether certain variables influence the result. If, say, the variable x1x_1 does not influence Y\mathbf{Y}, then we must have β1=0\beta_1 = 0. So the goal is to test the hypothesis H0:β1=0H_0: \beta_1 = 0 versus H1:β10H_1: \beta_1 \neq 0. We will tackle a more general case, where β\boldsymbol{\beta} can be split into two vectors β0\boldsymbol{\beta}_0 and β1\boldsymbol{\beta}_1, and we test if β1\boldsymbol{\beta}_1 is zero.

Suppose Xn×p=(X0n×p0X1n×(pp0))\underset{n \times p}{X} = \left(\underset{n \times p_0}{X_0} \underset{n \times\left(p - p_0\right)}{X_1}\right) and β=(β0β1)\boldsymbol{\beta} = \left(\begin{array}{c}\boldsymbol{\beta}_0 \\ \boldsymbol{\beta}_1\end{array}\right), where rank(X)=p\operatorname{rank}(X)= p, rank(X0)=p0\operatorname{rank}\left(X_0\right) = p_0. We want to test H0:β1=0H_0: \boldsymbol{\beta}_1 = 0 against H1:β10H_1: \boldsymbol{\beta}_1 \neq 0. Under H0H_0, X1β1X_1 \boldsymbol{\beta}_1 vanishes and

Y=X0β0+ε\mathbf{Y} = X_0 \boldsymbol{\beta}_0 + \varepsilon

Under H0H_0, the maximum likelihood estimation (MLE) of β0\boldsymbol{\beta}_0 and σ2\sigma^2 are

β^^0=(X0TX0)1X0TYσ^^2=1n(YX0β^^0)T(YX0β^^0)=RSS0n\begin{aligned} \hat{\hat{\boldsymbol{\beta}}}_0 &= \left(X_0^T X_0\right)^{-1} X_0^T \mathbf{Y} \\ \hat{\hat{\sigma}}^2 &= \frac{1}{n}\left(\mathbf{Y}-X_0 \hat{\hat{\boldsymbol{\beta}}}_0\right)^T\left(\mathbf{Y}-X_0 \hat{\hat{\boldsymbol{\beta}}}_0\right) \\ &= \frac{\mathrm{RSS}_0}{n} \end{aligned}

and we have previously shown these are independent. So the fitted values under H0H_0 are

Y^^=X0(X0TX0)1X0TY=P0Y\hat{\hat{\mathbf{Y}}} = X_0\left(X_0^T X_0\right)^{-1} X_0^T \mathbf{Y} = P_0 \mathbf{Y}

where P0=X0(X0TX0)1X0TP_0=X_0\left(X_0^T X_0\right)^{-1} X_0^T.

Note that our estimators wear two hats instead of one. We adopt the convention that the estimators of the null hypothesis have two hats, while those of the alternative hypothesis have one. This notation helps distinguish between the restricted model (under H0H_0) and the unrestricted model (under H1H_1).

The generalized likelihood ratio test of H0H_0 against H1H_1 is

ΛY(H0,H1)=(σ^^2σ^2)n/2=(1+RSS0RSSRSS)n/2\begin{aligned} \Lambda_{\mathbf{Y}}\left(H_0, H_1\right) &= \left(\frac{\hat{\hat{\sigma}}^2}{\hat{\sigma}^2}\right)^{n / 2} \\ &= \left(1+\frac{\mathrm{RSS}_0 - \mathrm{RSS}}{\mathrm{RSS}}\right)^{n / 2} \end{aligned}

We reject H0H_0 when 2logΛ2 \log \Lambda is large, equivalently when RSS0RSSRSS\frac{\mathrm{RSS}_0-\mathrm{RSS}}{\mathrm{RSS}} is large. Under H0H_0, we have

2logΛ=nlog(1+RSS0RSSRSS)2 \log \Lambda = n \log \left(1+\frac{\mathrm{RSS}_0 - \mathrm{RSS}}{\mathrm{RSS}}\right)

which is approximately a χpp02\chi_{p-p_0}^2 random variable. We can also get an exact null distribution, and get an exact test. The FF statistic under H0H_0 is given by

F=(RSS0RSS)/(pp0)RSS/(np)Fpp0,npF = \frac{\left(\mathrm{RSS}_0 - \mathrm{RSS}\right) /\left(p-p_0\right)}{\mathrm{RSS} /(n-p)} \sim F_{p-p_0, n-p}

Hence we reject H0H_0 if F>Fpp0,np(α)F > F_{p-p_0, n-p}(\alpha). RSS0RSS\mathrm{RSS}_0 -\mathrm{RSS} is the reduction in the sum of squares due to fitting β1\boldsymbol{\beta}_1 in addition to β0\boldsymbol{\beta}_0. This quantity measures how much the additional variables improve the fit of the model.

The following ANOVA table summarizes the hypothesis testing framework:

Source of var.d.f.sum of squaresmean squares
Fitted modelpp0p - p_0RSS0RSS\mathrm{RSS}_0-\mathrm{RSS}RSS0RSSpp0\frac{\mathrm{RSS}_0-\mathrm{RSS}}{p - p_0}
Residualnpn - pRSS\mathrm{RSS}RSSnp\frac{\mathrm{RSS}}{n - p}
Totalnp0n - p_0RSS0\mathrm{RSS}_0

The ratio RSS0RSSRSS0\frac{\mathrm{RSS}_0-\mathrm{RSS}}{\mathrm{RSS}_0} is sometimes known as the proportion of variance explained by β1\boldsymbol{\beta}_1, and denoted R2R^2. This is a measure of how much additional explanatory power the variables in X1X_1 provide.

The following Python function implements the fitting procedure for a time series model:

def fit(data, p=1):
    n = data.shape[0] - p
    Y = data[p:]
    X = np.stack([np.ones(n)] + [data[p-i-1:-i-1] for i in range(p)], axis=-1)
    
    beta_mle = np.linalg.inv(X.T.dot(X)).dot(X.T.dot(Y))
    R = Y - X.dot(beta_mle)
    RSS = R.T.dot(R) 
    var_mle = RSS / n
    
    return beta_mle, var_mle, RSS

This function takes a time series data and a lag order p, and returns the MLE estimates of the coefficients, the variance, and the residual sum of squares (RSS). The function constructs the design matrix X by stacking the lagged values of the time series and a column of ones for the intercept.

Causality Tests

Now that we understand the general hypothesis testing framework, we can apply it specifically to test for Granger causality. The process involves fitting two models—one with and one without the potential causal variable—and comparing their performance.

To test for simple causality from xx to yy, it is examined whether the lagged values of xx in the regression of yy on lagged values of xx and yy significantly reduce the error variance. By using the ordinary least squares (OLS) method, the following equation is estimated:

yt=α0+k=1k1α11kytk+k=k0k2α12kxtk+u1,ty_{t}=\alpha_{0}+\sum_{k=1}^{k_{1}} \alpha_{11}^{k} y_{t-k}+\sum_{k=k_{0}}^{k_{2}} \alpha_{12}^{k} x_{t-k}+u_{1, t}

with k0=1k_0 = 1. Here, yty_t is the current value of time series yy, α0\alpha_0 is the intercept, ytky_{t-k} are the lagged values of yy, xtkx_{t-k} are the lagged values of xx, and u1,tu_{1,t} is the error term. The first summation represents the autoregressive component (AR) of yy, while the second summation captures the potential causal effect of xx on yy.

An FF test is applied to test the null hypothesis, H0:α121=α122==α12k2=0H_0: \alpha^{1}_{12} = \alpha^{2}_{12} = \cdots = \alpha^{k_2}_{12} = 0. This hypothesis states that the lagged values of xx have no effect on yy. If we reject this hypothesis, we conclude that xx Granger-causes yy.

By interchanging xx and yy, we can test whether yy Granger-causes xx. If the null hypothesis is rejected in both directions, we have a feedback relationship—each variable Granger-causes the other.

To test for instantaneous causality, we set k0=0k_0 = 0 and perform a tt or FF test for the null hypothesis H0:α120=0H_0: \alpha^{0}_{12} = 0. This tests whether the contemporaneous value of xx helps predict yy beyond what the lagged values of both variables can predict.

Here's a Python function that implements these causality tests:

def causality_tests(y, x, alpha=0.05, k_1=1, maxlag=1):
    for k_2 in range(1, maxlag + 1):
        p = 1 + k_1 + k_2
        n = y.shape[0] - np.max([k_1, k_2])

        _, _, RSS = fit_xy(y, x, k_1, k_2)
        _, _, RSS_0 = fit(y, k_1)

        chi2 = n * np.log(RSS_0 / RSS)
        f = ((RSS_0 - RSS) / k_2) / (RSS / (n - p))

The function computes both chi-square and FF-statistics for testing Granger causality. The fit_xy function (not shown) would fit the unrestricted model including both yy and xx, while the fit function fits the restricted model with only yy.

One critical issue with this test is that the results are strongly dependent on the number of lags of the explanatory variable, k2k_2. There is a trade-off: the more lagged values we include, the better the influence of this variable can be captured. This argues for a high maximal lag. On the other hand, the power of this test is lower the more lagged values are included, as we're estimating more parameters with the same amount of data.

In practice, information criteria like AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) are often used to determine the optimal lag length. Alternatively, practitioners may test a range of lag specifications to ensure the robustness of their findings.

Application to Financial Data

We can fit a linear model on real financial data and use the Granger causality test to find causal relationships. The graph below shows a visual representation of the relationship between two financial time series:

image

The results of Granger causality tests with different lag specifications are:

Hypothesis Test (p=1)
F test:       F=24.4610 , p=0.0000, df_denom=199, df_num=1
chi2 test: chi2=23.3023 , p=0.0000, df=1
Reject null hypothesis

Hypothesis Test (p=2)
F test:       F=8.2501  , p=0.0045, df_denom=197, df_num=1
chi2 test: chi2=8.2051  , p=0.0042, df=1
Reject null hypothesis

Hypothesis Test (p=3)
F test:       F=0.4181  , p=0.5187, df_denom=195, df_num=1
chi2 test: chi2=0.4262  , p=0.5139, df=1
Accept null hypothesis

Hypothesis Test (p=4)
F test:       F=4.9506  , p=0.0272, df_denom=193, df_num=1
chi2 test: chi2=5.0148  , p=0.0251, df=1
Reject null hypothesis

These results indicate that the null hypothesis of no Granger causality is rejected for lags p=1p=1, p=2p=2, and p=4p=4, but not for lag p=3p=3. This suggests a complex causal relationship between the two time series that manifests at different time scales. The inconsistency across lag specifications highlights the importance of considering multiple lag structures when drawing conclusions about causal relationships.

In financial markets, such findings can have significant implications for portfolio management, risk assessment, and trading strategies. For instance, if stock returns in one market Granger-cause returns in another market, this information could potentially be exploited for predictive purposes, though always within the constraints of market efficiency.