Using search engine activity to predict events has increased in popularity in recent years. The basic idea behind these approaches is to monitor the use of certain internet search terms, under the assumption that patterns of term usage will be predictive of phenomena of interest. In this post, Chris Hanretty examines a recent paper which looks at the relationship between Google search trends and vote intention in the recent Scottish referendum.
Ronald MacDonald and Xuxin Mao, of the University of Glasgow, have published a working paper looking at Google search activity and the Scottish independence referendum.
- that search trends can be used to predict election outcomes,
- “the more information that people searched for online, the less likely they were to vote for independence”
- that the “Vow” had no effect on the vote.
It’s rather unfortunate that this paper has received so much media attention, because it’s a very, very bad paper. It
- is poorly written (Ipsos Mori features as “Dipso Mori”: clearly a pollster who has had a bit too much to drink)
- misrepresents the state of the literature on election forecasting using search activity
- bandies around concepts like “clear information”, “rationality”, and “emotion” with scant regard for the meaning of those words.
- does not attempt to examine other sources of information like the British Election Study
Let’s take the first main claim made by the paper: that search activity can be used to predict elections. How?
The first thing to note is that the authors are attempting to use information on search activity to try and predict a variable which they label “Potential Yes votes”. Those who read the paper will realize that “potential Yes votes” is actually a rolling average of polls. So the authors are using search activity to try and predict polling numbers. Without some polling data, you cannot use search trends to predict elections.
There are situations where using search activity to predict polling numbers is useful. Some countries (Italy, Greece) ban the publication of polls in the run up to elections. I can imagine “predicting” polling would be useful in these contexts.
But any exercise in forecasting will ultimately rely on polling data. If polling data suffers from a systemic bias, forecasting based on search activity will also suffer from systemic bias.
The second thing to note is that the authors are attempting to use searches for particular terms to predict polling numbers. In the paper, they try two terms: “Alex Salmond” and “SNP”. Their assumption is that searching for these terms will be correlated with voting yes — or equivalently, weeks in which there are more searches for Alex Salmond will be weeks in which the Yes campaign is doing better in the polls.
Unfortunately, the authors themselves show that in the latter part of the period under study, there is in fact no correlation between the volume of searches for Alex Salmond and the Yes vote. The authors write
“the effect of Google Trends on Potential Yes Votes became insignificant after 15th March 2014. Based on the testing criteria on clear information, voters in the Scottish referendum encountered difficulties in finding enough clear information to justify a decision to vote Yes”.
In other words, because the authors assume that there is a relationship between particular types of searches and Yes voting, the fact that that relationship breaks down becomes evidence not that this was a poor assumption to begin with, but rather that voters faced difficulty in finding information supporting a Yes vote.
I struggle to accept this reasoning. The only justification I can see for assuming that searching for these terms will be correlated with voting yes is the significant correlation during the first period under study. But it seems more likely that this correlation is entirely epiphenomenal. During the early part of the campaign, the Yes campaign’s polling numbers improved. During the early part of the campaign, search activity increased. But the two are not linked. Search activity is hardly likely to fall during this period.
So, Google search trends can be used to forecast elections if you have polling data, and can identify a search term which correlates with the polling data over the entire period — but these authors couldn’t.
Let’s turn to the second main claim of the paper — that “the more information that people searched for online, the less likely they were to vote for independence”. This claim is supported by a vector autoregressive model of polling averages, with different dummy variables for different days of the campaign, and lagged values of search activity. The coefficient on one of the variables relating to search trends is negative, and statistically significant.
There are three key problems with this claim. First, the authors report only one of the coefficients relating to search activity. They show that search activity is negatively associated with yes intention four days later. But they don’t show the results for search activity with lags of one, two, or three days. I assume that these effects aren’t significant, for some level of significance. But I can’t tell, because the information isn’t there.
The second problem is that this analysis, covering a portion of the referendum campaign, inverts the assumptions the authors started out with. The authors started by assuming that more searches for “Alex Salmond” would be positively associated with Yes vote intention, such that search activity could be used for forecasting. Now, we’re told that that’s not the case.
The third, and most important problem, is that the authors constantly slip between levels of analysis. The authors nowhere show that the more people “search for information” — sorry, Google “Alex Salmond” — the less likely they are to vote for independence. The authors only show that days in which more searches were made for Alex Salmond were followed, four days after, by polls indicating lower support for Yes — controlling for the previous levels of support for Yes.
For what it’s worth, I don’t think we have enough information to judge whether internet activity or the Vow made a difference. I’m not aware of any polling questions which ask specifically about the Vow, though I’m happy to be corrected on this point. But I’m afraid that this paper doesn’t help us forecast elections, or answer substantive questions about the determinants of voting in the independence referendum.
Note: This piece was originally published on Chris’ blog.
Chris Hanretty is a Reader in Politics at the University of East Anglia.