# Academia vs Industry: In Pursuit of Truth and Story

In industry, reduced-form economic models need to be simple and understandable. Assume your audience is passively familiar with regression from that one stats course in college around the time they realized beer bongs do not go well with their lexapro. Take your target $Y$ and regress on predictors $X$ via OLS. $X$ is a few select predictors – sometimes leading/lagged, mostly not – and tell a story. Therefore, to forecast the future, you need to introduce exogenous estimates of future $X$. They can come from consensus, guts, another model (not simultaneously estimated), wherever. Anywhere but inside the model itself — unless you for some reason have an arbitrary term lagged by 8 months. By the way, cointegrated variables must all be in regressed differences without an error-correction term.

When this works, it really works well. We have a killer treasury bond model that is pretty accurate and quantifies relationships that most investors will understand in a way that confirms their priors about supply and demand or something like that. Use your regular OLS standard errors, even when blatantly inappropriate. A high $R^2$ makes you look like a genius; what is over-fitting?

In academia, you have an opposite problem. You are trying to approximate an unknown and unknowable data-generating process. You are interested in the joint relationship of $Y | X$, which are both random variables. You use squared-error-loss regression techniques because you want an estimate of $E(Y|X)$ — either to make predictions or make statements about causality within that system. You know you will never quite get the true data-generating process, so always take predictions and causal inferences with some grain of salt – “all models are wrong, but some are useful.” And despite this Sisyphean effort – where I acknowledge I do not know and cannot know the true data-generating process – I can say with 100% certainty that your model is problematic. Your model suffers from unchecked heteroskedasticity, autoregressive conditional heteroskedasticity, spatial heteroskedasticity, serial correlation, cointegration, sample selection bias, simultaneity bias, omitted variable bias, measurement error, naive priors, and an inefficient estimator. But it’s okay — because reasonable people can disagree about something unknown and unknowable; the same way people reasonably disagree about God.

When this works, it also really works well. There are asymptotic properties that can steer you into the right direction. Those should be taken to advantage. I do not know how to truly define the concept of physical health, but I know that Michael Phelps is healthier than Artie Lang. And if enough people can come to this conclusion with their subjective feelings about health, it probably is true.

Like an enlightened centrist, I want combine these worlds. One goal I have is to loosen the stigma around lagged variables, especially dependent variables.

I love the Wold Representation Theorem because it mathematically formalizes something very intuitive: the past can predict the future. Any stationary time series (if not stationary, it can be made stationary through differences) can be written using a deterministic term and stochastic term, which is a linear function of it past errors.

$y_t = \eta_t + \sum_k b_k \varepsilon_{t-k}, \ \ k = 1,2,3...$

Often, more recent observations are more useful for prediction than those in the distant past. Hence, the beauty of the AR process.

$y_t = \phi_0 + \phi_1 y_{t-1} = \frac{\phi_0}{1 - \phi_1} + \sum_k \phi_1^k \varepsilon_{t-k}, \ \ k = 1,2,3..., \ \ -1 < \phi_1 < 1$

Generalize this idea to multivariate systems and you are on your way to a Nobel Prize (I love Sims, don’t hurt me).

Industry: But this makes things hard to interpret?

Somewhat. But I think this is how impulse response functions become useful. A one-unit change to $y_1$ j periods ago affects $y_2$ today by $\Psi$. There may be a lot to keep track of, but the interpretation is nice. Be careful about ordering if you use Cholesky decomposition.

Industry: But isn’t a lagged dependent variable just a way of saying “I could not find a real relationship with other variables”?

It’s easy to say that if you are just using ARIMA. But if you use a VAR, if $y_1$ is a function of $y_2$ and $y_2$ is a function of old, then $y_1$ is a function of old $y_2$ too.

Academics: Did you account for cointegration? Run a VECM.

This is where I start to get sympathetic to industry. I know a lot of brilliant economists who took a while to wrap their heads around cointegration. A VAR in levels is still statistically consistent (one would rather run the VAR in levels than a VAR in differences without the EC term). The short-run vs. long-run interpretation is not in the vanilla VAR, but I am not sure if the marginal benefits outweigh the costs. It should be easy to convince clients that the past matters for predicting the future. It is far less easy to convince them that because your non-stationary variables can be written as a stationary linear combination, you need this special term that may or may not substantially improve predictions. You probably can convince them that relationships tend to regress to a long-run equilibrium but probably do not care about “by how much.”

What is the point here? Industry people are smarter than you think. The soft bigotry of low GPAs must end. You don’t have to assume they are experts – that is why they come to you. But if you can provide a better service in a way that people can understand, do it.