Professor Jean-Paul Faguet explains the error behind a just-published ‘Corrigendum’ to a co-authored paper on land distribution in Colombia, how the error was discovered, the extensive work the authors did to correct it, and what the implications are for their results.
It takes a certain thickness of skin to be an academic. You have an idea, pursue it for months or (more often) years gathering and analysing evidence, connecting it to theory. You write a paper and start presenting it to colleagues… who tear it to shreds. Often they’re nice about it; sometimes not so much. When you’re young it can really hurt. But exposing your ideas to expert criticism is the point. The honest truth is that your ideas improve enormously the more they face the dispassionate gaze of knowledgeable colleagues. So you keep doing it, over and over, and at some point you freeze on the stage a little less often. By the time you’ve published your paper in good journal (probably after several rounds of rejections), you can be forgiven for relaxing. That paper you revised 9 or 17 times is done. It’s published! People are starting to cite it in their research. Colleagues might disagree about interpretations or implications, but at some basic level the paper is correct. And so you move on to your next idea.
That was us a few months ago, when colleagues at the Universidad EAFIT in Medellín wrote to enquire – very politely – whether some data in our 2020 World Development paper might be incorrect. World Development is widely considered the leading journal in development studies, and is highly regarded in political science, economics, and other fields as well. If true, this was a very visible blunder. Not the sort of thing academics look forward to.
My co-authors and I looked into the matter very carefully. The bottom line is that our EAFIT colleagues, Germán Tabares and Dr. Thomas Goda, were correct. We made a mistake. This post is to thank them and detail what we did to correct the mistake in a slightly longer format than the Corrigendum just published in World Development. Our gratitude extends to the journal as well. World Development’s requirement that we post replication data alongside the paper made this possible. In the end, this is hopefully a constructive example of the academic system working as it should to challenge understandings and perfect our knowledge.
What’s this all about, anyway?
The paper examines Colombia’s vast land distribution scheme, which has been operating in one form or another since independence in 1821. During that time, Colombia transferred into private hands public lands equivalent to the entire United Kingdom, twice South Korea, or six times Switzerland. Officially at least, the process was meant to reduce landlessness and poverty. And yet Colombia today retains one of the highest concentrations of land ownership in the world. How can both facts be true?
Our paper shows that the effects of land distribution differed greatly across Colombia’s 1100+ municipalities. In places where small and medium-sized farms dominate, a great deal of public land reached poor peasants. Average farm size increased, land inequality decreased, and local development accelerated, just as you might hope. But where land was already concentrated in the hands of a rural elite, additional distributed lands were diverted to them. The incremental effect of ‘giving land to landless people’ was more large farms, fewer small ones, and greater plot size dispersion. How did such perverse outcomes come about? By itself, land distribution increases turnout, makes politics more competitive, and increases public service provision. But where landed elites dominate, patron-client ties distort local policy and decision-making to their benefit. In such places, giving away land to the poor actively increases inequality and poverty, and worsens local development outcomes. The interesting implication is that the land program’s ‘side-effects’ on the distribution of power outweighed its ‘main effects’ on the distribution of land.
What about the mistake?
To obtain these results, we used data on pre-existing latifundia – large farms of 500 hectares or more. We thought the data for this variable came from 1960, but in fact it came from 1984. Messrs. Tabares and Goda uncovered this when they compared our replication data to their own dataset. Once it became clear that they were right and we were wrong, we spent 3 months exploring the implications of the error for our findings, as well as figuring out how it came about in the first place – not to punish anyone, but rather to identify faulty processes and prevent their recurrence.
Next, some professional context.
A project like this takes years; the earliest version of this research dates from 2013. We built out the dataset continuously over 7 years, gathering, comparing, cleaning and refining key indicators. That data is all now publicly available, not least in the replication dataset. Our initial idea was to push the analysis all the way back to 1960. That would give us a 50-year panel – the bulk of the main reform period – on which to estimate the effects of encomienda and land distribution on broad development outcomes. So we began working with the earliest national data series on latifundia from 1960. Unfortunately, this data is low-quality and incomplete. We spent a fair amount of time over several years trying to fill in those gaps and improve data quality, before finally admitting that the problems were too deep and reverting to better-known data from 1984. Somewhere along the way, the right data acquired the wrong name. We’re confident this was an honest mistake.
The next step was figure out the implications for our results (above). The basic econometric problem of estimating the effects of an X variable from 1984 (and not 1960) on a Y variable from, say, 1975 is endogeneity. You might also think of this as reverse causality. There’s no space to explain endogeneity further here. But the bottom line is that all results where the Y variable is from 1985 or later are unaffected by an X variable from 1984. Those findings stand. The other results are potentially problematic, so we re-estimated everything in the paper where the Y variable is from 1984 or earlier.
The error does not affect any of the findings on land inequality, the structure of landholding, or public goods, services and taxes, all of which rely on post-1984 data. It does affect results for UBN (our measure of overall development; table 4) and electoral outcomes (tables 5 & 6), where we used data going back to the 1970s. Hence we re-estimated those models dropping pre-1984 data on the dependent (Y) variable. All of that is attached below and appears in the Corrigendum.
The good news is that the new results are very similar.
For UBN (table 4), the headline finding that public land distribution reduces unsatisfied needs (and hence increases development) remains, with high levels of significance and coefficients of somewhat larger magnitude. The interaction of that term with latifundia retains its sign and magnitude, but loses statistical significance. We think this is because the loss of 815 observations (out of 3,260) reduces the precision of our estimates, increasing standard errors from around 0.40 to 0.85. Our interpretation of these results is substantively the same, though with more caution attached to the interaction term. The new figure 6, which relies on these estimates, is basically identical to the original. This is because all of the “action” is in the highest two deciles, where estimated effects continue to be significant and are, if anything, slightly larger in magnitude than before.
New estimates for voter turnout (table 5) are essentially identical, despite 39-42% fewer observations, with coefficients on the main land distribution variable that are slightly larger than before. In new estimates for political concentration in municipal elections (table 6), the land distribution variable loses some significance for electoral margins of victory, but remains the same for winning party support. All of the signs are unchanged. There is no change in the interaction term. For lower house elections, the land distribution variable loses significance across the board, but the land distribution*latifundia interaction term gains significance in models 6, 7 and 8. The latter implies that latifundistas counteract the beneficial effects of land distribution in places where they are abundant, as found elsewhere in the paper. Once again, we think the differences in estimates are due to a loss of statistical power from a 42-45% reduction in sample size. Overall, our interpretation of the new results is substantively the same, although more cautious in the case of table 6.
Stepping back from the detail, what’s the bottom line?
All of our main findings on the effects of land distribution and latifundia on inequality, the structure of landholding, and development are either unaffected by our mistake or go through in the new estimations. Many of our secondary results also go through despite being based on around 40% less data, but others lose statistical significance. Although the effects of the latter seem to be of the same magnitude and direction as before, we interpret them with more caution.
Lastly, in the course of all of this we discovered an additional, typographical error in the last full sentence of p.15 (final dates), which should instead read:
We employ a ﬁxed effects seemingly unrelated regressions (SUR) model to examine the relationship between land distribution and latifundia on per capita public investment, public service expenditure, and tax revenues for the period 1985-2010.
Our thanks again to Germán Tabares and Dr. Thomas Goda for bringing the data error to our attention. The investigation that followed was difficult, but unexpectedly rewarding in the end. Our sincere apologies to readers of World Development for both errors, and for any inconvenience they have caused.
Faguet, J.P., F. Sánchez and J. Villaveces. 2020. “The Perversion of Public Land Distribution by Landed Elites: Power, Inequality and Development in Colombia.” World Development, 136: 1-23.
The views expressed in this post are those of the author and in no way reflect those of the International Development LSE blog or the London School of Economics and Political Science.
Featured image credit: mohamed_hassan via Pixabay