LSE - Small Logo
LSE - Small Logo

Neal Haddaway

Michael Gusenbauer

February 3rd, 2020

A broken system – why literature searching needs a FAIR revolution

8 comments | 158 shares

Estimated reading time: 10 minutes

Neal Haddaway

Michael Gusenbauer

February 3rd, 2020

A broken system – why literature searching needs a FAIR revolution

8 comments | 158 shares

Estimated reading time: 10 minutes

The volume of academic research articles is increasing exponentially. However, the ease with which we are able to find these articles depends on the capabilities of the search systems that we use. These systems (bibliographic databases like Scopus and academic search engines like Google Scholar) act as important gatekeepers between authors and readers. A recent study found that many of these systems are difficult to use, non-transparent and do not adhere to scientific standards. As a result, researchers find fewer relevant records, searching takes longer, or does not have the necessary scientific rigour. In this post, Neal Haddaway and Michael Gusenbauer argue that to address these issues academic searching needs to adopt the principles of FAIR (Findable, Accessible, Interoperable, Reusable), and be radically overhauled.

Finding the right information for your research has always been difficult. Before computers, academic search consisted of scanning abstract card catalogues in libraries. This was time-consuming and impossible to sustain as publication rates increased. The advent of the internet allowed researchers to perform targeted searches in specialised digital bibliographic databases that use ‘exact-match models’ for finding relevant terms, but subscriptions to these services are hugely expensive and users require knowledge of search syntax. In other words, you would need to know in advance exactly what you are searching for. In recent years, researchers have largely switched to ‘intelligent’ search systems, like Google Scholar, that are free, highly intuitive, and use semantics, algorithms, or artificial intelligence to suggest the most relevant research based on what the software ‘thinks’ is most likely to be relevant.

However, the two most popular forms of academic search, web-based search engines and commercial bibliographic databases, both present flaws, for this reason we believe the whole system of academic search is in need of drastic overhaul.

Firstly, search engines like Google Scholar are increasingly the first choice for research discovery, because they offer easy access to a large stock of literature for day-to-day searches. However, what is often forgotten, is that searches on Google Scholar are neither reproducible, nor transparent. Repeated searches often retrieve different results and users cannot specify detailed search queries, leaving it to the system to interpret what the user wants. As the matching and ranking algorithms of semantic or AI-based search engines are often unknown – even to the providers themselves – these systems do not allow comprehensive searching.

By developing academic search systems in this way, we can futureproof research discovery against increasingly appreciated limitations, like bias and lack of comprehensiveness, and make it an equitable and FAIR practice.

These issues are perhaps less significant in day-to-day searches, where we want to locate a particular research paper efficiently. However, systematic reviews in particular need to use rigorous, scientific methods in their quest for research evidence. Searches for articles must be as objective, reproducible and transparent as possible. With systems like Google Scholar, searches are not reproducible – a central tenet of the scientific method. Furthermore, it is virtually impossible to export results for documentation in systematic reviews/maps and if you try to manually download search results in bulk (as would be needed in a systematic review), your IP address is likely to be locked after a short time: an effort to stop ‘bots’ from reverse engineering the Google Scholar algorithm and the information in their databases.

Secondly, commercial bibliographic databases and platforms, like Web of Science and Scopus, might seem powerful and efficient, but they also have limitations that make accessing articles for research, such as rigorous evidence synthesis, highly challenging, not to mention frustrating. For example, most of these databases restrict users in downloading records. Some only allow a maximum of 500 citations at a time. Information retrieval specialists typically have to export tens of thousands of search results within a systematic review or systematic map for the purposes of transparency and rigour: doing so takes days because of these restrictions and introduces bias and error to the retrieval process. In addition, the costs of these paywalled resources are restrictively high, prohibitively so for researchers working in resource-constrained contexts like low- and middle- income countries and small organisations. As a result, even though research articles are increasingly being published Open Access, researchers cannot easily identify them because of the lack of access to these search systems.

In a recent study we investigated the specific capabilities and limitations of 28 popular search systems, showing that many bibliographic resources are not fit-for-purpose in systematic reviews. In one way or another, all of the systems have limitations in how users can combine keywords into search strings, or interact with the search results. Because of these restrictions they are less suitable for academic searches and load the burden on users to be aware of these limitations to search most effectively. This is especially problematic, as academics typically have their go-to search system – in many cases Google Scholar – and use it for all kinds of searches without knowing that their search is highly biased, non-transparent and non-repeatable.

The problem of inadequate search capabilities is getting more relevant: on the one hand more and more so-called ‘semantic’ or ‘intelligent’ search systems, like Microsoft Academic or Semantic Scholar, are being developed. On the other hand, researchers increasingly need to search systematically (i.e. in a repeatable and transparent manner) – the stock of systematic reviews doubled only within the last four years. This is not surprising, as researchers need these reviews to stay up to date in their field and to get insights on a specific topic based on a systematic synthesis of evidence across contexts.

Despite the limitations of current search systems, we see promise in the increasingly dynamic and diverse search system landscape. New solutions are regularly appearing, like Lens.org, that aim to improve how we discover research. Now, we must direct these technical efforts to respect scientific standards that improve accessibility of research findings. Specifically, we believe there is a very real need to drastically overhaul how we discover research, driven by the same ethos as in the Open Science movement. The FAIR data principles offer an excellent set of criteria that search system providers can adapt to make their search systems more adequate for scientific search, not just for systematic searching, but also in day-to-day research discovery:

  • Findable: Databases should be transparent in how search queries are interpreted and in the way they select and rank relevant records. With this transparency researchers should be able choose fit-for-purpose databases clearly based on their merits.
  • Accessible: Databases should be free-to-use for research discovery (detailed analysis or visualisation could require payment). This way researchers can access all knowledge available via search.
  • Interoperable: Search results should be readily exportable in bulk for integration into evidence synthesis and citation network analysis (similar to the concept of ‘research weaving’ proposed by Shinichi Nakagawa and colleagues). Standardised export formats help analysis across databases.
  • Reusable: Citation information (including abstracts) should not be restricted by copyright to permit reuse/publication of summaries/text analysis etc.

By developing academic search systems in this way, we can futureproof research discovery against increasingly appreciated limitations, like bias and lack of comprehensiveness, and make it an equitable and FAIR practice. In addition, we need to educate users to be able to decide which systems fit their search needs, so they use the best systems, in the best way. In this regard, we want to use our research to make the search system landscape more transparent. We hope to raise awareness among academics to be more attentive, and search system providers to elevate their quality to the necessary standard in science – for better search and better results.

 

Note: This article gives the views of the author, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our comments policy if you have any concerns on posting a comment below.

Image credit: People in hedge maze, via Good Free Photos, (Public Domain)

Print Friendly, PDF & Email

About the author

Neal Haddaway

Neal Haddaway is a Senior Research Fellow at the Stockholm Environment Institute, a Humboldt Research Fellow at the Mercator Research Institute on Global Commons and Climate Change, and a Research Associate at the Africa Centre for Evidence. He researches evidence synthesis methodology and conducts systematic reviews and maps in the field of sustainability and environmental science. His main research interests focus on improving the transparency, efficiency and reliability of evidence synthesis as a methodology and supporting evidence synthesis in resource constrained contexts. He co-founded and coordinates the Evidence Synthesis Hackathon (www.eshackathon.org) and is the leader of the Collaboration for Environmental Evidence centre at SEI. @nealhaddaway

Michael Gusenbauer

Michael Gusenbauer is a PostDoc researcher at the Department of Strategic Management, Marketing and Tourism at the University of Innsbruck, and a senior research fellow at the Institute of Innovation Management at Johannes Kepler University Linz. His main research interest is at the crossroads of innovation management and strategy. His research makes use of the exponentially growing body of scientific knowledge and contributes to evidence-based management. Michael has a strong interest in evidence-synthesis methods and scientometrics, where he investigates into how to better identify evidence across barriers of discipline, language and search system. He has received his PhD in 2016 from the Johannes Kepler University Linz.

Posted In: Digital scholarship

8 Comments