Hodrick (1992) Standard Errors (IB)¶

This post shows how to correct standard errors for a forward predictive regression with overlapping observations as in Hodrick (1992) in R. For more details, please refer to Alex Chinco.

1. Predictive regression analysis with OLS¶

A typical standard predictive regression model for analyzing aggregate stock market return predictability is given by:

\[ r_{t\rightarrow t+h} = \alpha + \beta\cdot x_t+\varepsilon_{t\rightarrow t+h} \]

where \(R_{t\rightarrow t+h}\) is the \(h\)-period ahead cumulative excess market return from period \(t\) to \(t + h\), and \(x_t\) is a predictor. We can set \(h=1\) and vetorize the above equation:

\[ \underbrace{\left[\begin{array}{c} r_{1 \rightarrow 2} \\ r_{2 \rightarrow 3} \\ r_{3 \rightarrow 4} \\ \vdots \\ r_{(T-1) \rightarrow T} \end{array}\right]}_{R_T(1)}=\underbrace{\left[\begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ 1 & x_3 \\ \vdots & \vdots \\ 1 & x_{T-1} \end{array}\right]}_{X_{T-1}} \underbrace{\left(\begin{array}{c} \alpha \\ \beta \end{array}\right)}_{\Theta(1)}+\underbrace{\left[\begin{array}{c} \varepsilon_{1 \rightarrow 2} \\ \varepsilon_{2 \rightarrow 3} \\ \varepsilon_{3 \rightarrow 4} \\ \vdots \\ \varepsilon_{(T-1) \rightarrow T} \end{array}\right]}_{\mathcal{E}_{T}(1)}. \]

Because the error term \(\varepsilon_{T}(1)\) is assumed to be \(i.i.d.\), we can simply estimate \(\Theta(1)\) with the OLS and use traditional \(t\)-statistic to make statistical inferences.

2. Econometric issues¶

However, for \(h>1\), even though each of the \(\varepsilon_{t \to (t+1)}\) terms is distributed \(i.i.d.\) and act as white noise, the \(\varepsilon_{t \to (t+2)}\) and \(\varepsilon_{(t+1) \to (t+3)}\) terms each contain the \(\varepsilon_{(t+1) \to (t+2)}\) shock. We can clearly see this issue from vectorizing the equation:

\[ \underbrace{\left[\begin{array}{c} r_{1 \rightarrow 3} \\ r_{2 \rightarrow 4} \\ r_{3 \rightarrow 5} \\ \vdots \\ r_{(T-2) \rightarrow T} \end{array}\right]}_{R_T(2)}=\underbrace{\left[\begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ 1 & x_3 \\ \vdots & \vdots \\ 1 & x_{T-1} \end{array}\right]}_{X_{T-1}} \underbrace{\left(\begin{array}{c} \alpha \\ \beta \end{array}\right)}_{\Theta(2)}+\underbrace{\left[\begin{array}{c} \varepsilon_{1 \rightarrow 3} \\ \varepsilon_{2 \rightarrow 4} \\ \varepsilon_{3 \rightarrow 5} \\ \vdots \\ \varepsilon_{(T-2) \rightarrow T} \end{array}\right]}_{\mathcal{E}_{T}(2)}. \]

3. Hodrick (1992) solution¶

The asymptotic distribution of the OLS estimator \(\widehat\Theta(h)\) can be derived using GMM:

\[ \widehat\Theta(h)-\Theta(h)\sim N(0,\Sigma) \]

with the variance covariance matrix given by the expression: ¹

\[ \Sigma = \frac{1}{T-h}(XX^\prime)^{-1}\cdot S \cdot (XX^\prime)^{-1}. \]

Hodrick (1992) proposes a new estimator to correct for the autocorelation between the error term when the \(h>1\):

\[ S=\frac{1}{T}\sum_{t=h}^{T}\left[e_{t+1}^2\left(\sum_{i=0}^{h-1} X_{t-i}\right)\left(\sum_{i=0}^{h-1} X_{t-i}\right)^{\prime}\right], \]

where \(e_{t+1}\) is the serially uncorrelated one-step-ahead forecast error estimated from the residuals of a regression of \(r_{t+1}\) on a constant, and \(X_t=[1,\ x_t]\).

By forming

\[ wk_t = e_{t+1}\left(\sum_{i=0}^{h-1} X_{t-i}\right), \]

the covariance matrix \(S\) can be written as:

\[ S = \frac{1}{T}\sum_{t=h}^{T} wk_t wk_t^\prime. \]

4. Hodrick standard errors in R¶

4.1 Prerequisite¶

library(data.table)
library(sandwich)
library(lmtest)
library(knitr)

4.2 Break down Hodrick standard errors¶

In this subsection, we break down Hodrick standard errors into \(wk_t\) and \(S\).

4.2.1 \(wk\)¶

compute_wk_t <- function(t, h){
  wk_t <- matrix(0, nrow = K, ncol = 1)
  XX <- matrix(0, nrow = K, ncol = 1)
  for (i in 0:(h-1)){
    XX <- XX + x.mat[t-i,]
  }
  wk_t <- ee.mat[t] * XX
  return(wk_t)
}

4.2.2 \(S\)¶

compute_S <- function(h){
  S <- matrix(0, nrow = K, ncol = K)
  for (t in h:T){
    S <- S + (compute_wk_t(t,h) %*% t(compute_wk_t(t,h)))
  }
  S <- S / T
  return(S)
}

4.2.3 Hodrick variance-covariance matrix¶

hodrick1992vcov.forward <- function(x.var, r.var.ahead, h){
  x.mat <- as.matrix(x.var)
  r.mat <- as.matrix(r.var.ahead)
  # 1. Construct demeaned returns or one-period residuals
  ee.mat <- r.mat - colMeans(r.mat) 
  # ee.mat <- as.matrix(lm(r.var.ahead~1)$residuals) # equivalent to the demeaned returns
  x.mat <- cbind(1, x.mat)  # add the constant
  T <- nrow(x.mat)
  K <- ncol(x.mat)
  Exx <- t(x.mat) %*% x.mat / T # compute average of square  (1/T) * (X'X)
  b <- solve(t(x.mat) %*% x.mat) %*% t(x.mat) %*% r.mat


  compute_wk_t <- function(t, h){
    wk_t <- matrix(0, nrow = K, ncol = 1)
    XX <- matrix(0, nrow = K, ncol = 1)
    for (i in 0:(h-1)){
      XX <- XX + x.mat[t-i,]
    }
    wk_t <- ee.mat[t] * XX
    return(wk_t)
  }

  compute_S <- function(h){
    S <- matrix(0, nrow = K, ncol = K)
    for (t in h:T){
      S <- S + (compute_wk_t(t,h) %*% t(compute_wk_t(t,h)))
    }
    S <- S / T
    return(S)
  }


  S <- compute_S(h)
  vcov_hodrick <- (1 / T) * solve(Exx) %*% S %*% solve(Exx)
  std_hodrick <- sqrt(diag(vcov_hodrick))
  return(list(b = b, vcov_hodrick = vcov_hodrick, std_hodrick = std_hodrick, Nobs = T))
}

4.3 Simulation¶

simulate_overlapping_data <- function(n_of_years = 100, delta_t = 1/12){
  MU <- 0.08
  THETA <- 0.75
  SIGMA <- 0.16


  n_of_periods <- n_of_years / delta_t

  simulate_dt <- data.table(
    t = seq(1, n_of_periods),
    x_t = rnorm(n_of_periods, mean = 0, sd = 1),
    r_t = NA_real_,
    r_t_plus_1 = NA_real_
  )
  #< Set the return of the beginning period to the MU.
  set(simulate_dt, i = 1L, j = "r_t", MU)
  for (t in 1:(n_of_periods-1)) {
    t <- as.integer(t)
    set(simulate_dt, t, "r_t_plus_1", THETA * (MU - simulate_dt$r_t[t]) * delta_t + SIGMA * sqrt(delta_t) * rnorm(1))
    set(simulate_dt, t + 1L, "r_t", simulate_dt$r_t_plus_1[t])
  }
  simulate_dt[, `:=`(
    r_t_plus_3 = frollsum(r_t, n = 3, align = "left"),
    r_t_plus_6 = frollsum(r_t, n = 6, align = "left"),
    r_t_plus_12 = frollsum(r_t, n = 12, align = "left")
  )]

  return(simulate_dt[!is.na(r_t_plus_12)])
}

SIM_N <- 500
h <- 6
estimates <- data.table(
  n = seq(1, SIM_N),
  beta = NA,
  se_naive = NA,
  se_nw = NA,
  se_hodrick1992 = NA
)

for (n in 1:SIM_N) {

  simulate_dt <- simulate_overlapping_data()

  m <- lm(r_t_plus_6 ~ x_t, data = simulate_dt)
  estimates$beta[n] <- summary(m)$coef[2, 1]
  estimates$se_naive[n] <- summary(m)$coef[2, 2]

  DF <- summary(m)$df[2]
  hodrick_vcov <- hodrick1992vcov.forward(simulate_dt$r_t_plus_1, simulate_dt$x_t, h)
  m_hodrik1992 <- coeftest(m, df = DF, vcov = hodrick_vcov$vcov_hodrick)
  estimates$se_hodrick1992[n] <- m_hodrik1992[2, 2]

  nw_vcov <- NeweyWest(m, lag = 12, prewhite = FALSE)
  m_nw <- coeftest(m, df = DF, vcov = nw_vcov)
  estimates$se_nw[n] <- m_nw[2, 2]

}

kable(copy(estimates)[, n := NULL][, lapply(.SD, mean)][], format = "pipe")

beta	se_naive	se_nw	se_hodrick1992
0.0001972	0.0031126	0.0030742	0.0032698

Note that in h-step-ahead forecast, we drop the latest h observations. Thus, the sample size is T-h. ↩