Dear Dr. Resmeth,
I am wrestling with a political economy model that includes both regime type and socio-economic level as explanatory variables. The problem is that these variables are strongly (positively) correlated, making it very hard to separate their independent effects on my outcome. And the problem gets worse: I also need to test an interaction of the two variables, but their high correlation makes it impossible to find any statistically significant effect for the interaction. A friend has suggested that if I mean center the variables, it will solve the multicollinearity of the interaction term. Can the solution be so simple? Please help, collinearity is making me desperate.
Yours, Francine Needlebush, WC2R
Unfortunately, such relationships between independent variables are a common problem in social science data analysis. The crux of your difficulty is that it is very hard for techniques such as multiple regression analysis to separate the independent effects of two covariates that are highly related. When you compound this problem by multiplying them as an interaction effect, the problem usually just gets worse. But because they are correlated, leaving one out and estimating the effects of the other separately causes an even nastier problem known as omitted variable bias.
While your friend is no doubt well-intentioned, mean centering the variables will unfortunately not solve the problem – even though there are misplaced suggestions in some academic fields that suggest this will work. To understand this conceptually why this will not work, think of trying to fit some model using Celsius temperature as a predictor. If you also included temperature in Fahrenheit, you would be unable to estimate the model with both variables because one is a perfect linear function of the other. This fact does not change when you mean center the variables, since a mean centered variable is also a perfect linear combination of its pre-centered version. Multiplying two mean centered variables will not change this.
It might seem like mean-centering prior to interaction helps, since it will change the coefficient estimate and standard errors in the centered interaction variable as compared to the uncentered version, but this is only because you are now estimating a different marginal effect. If we call your new mean-centered interaction term as XcZc, then the new coefficient is an estimate of the marginal effect of a one-unit increase in X when Z is at its mean, versus the uncentered coefficient being an estimate of a one-unit increase in X when Z is zero. Algebraically, it is simple to show that if you were to calculate the marginal effect of a one-unit increase in X at the same level of Z from the estimates of the centered and uncentered models, you obtain exactly the same marginal effect and measure of uncertainty. (For details, I suggest you see Brambor, Clark and Golder 2006 (“Understanding Interaction Models”, Political Analysis 14:63–82).
So in short, there is no free lunch when it comes to collinearity. As we say where I am from, you cannot turn Hühnerkacke into Hähnchen-Salat. Your best bet in this case is to gather sample data with more variation (get some poor democracies in your sample, and some rich authoritarian states for instance).