The ways in which journals are indexed online is essential to how they can be searched for and found. Inclusion in certain indexes is also closely linked to quality assessment, with research funders often requiring their grantees to publish in outlets listed in certain indexes. In this post Danielle Padula explains the importance of good journal indexing and how journals that apply key standards can increase the reach and impact of their publications.
If a research article is published without being added to any academic indexes, does it have an impact? Contrary to the thought experiment — “If a tree falls in a forest and no one is around to hear it, does it make a sound?” — there is a pretty definitive answer to the former question. Intangible impacts aside, it’s almost certain that without being added to academic indexes an article’s impact will be pretty muffled.
Indexing is vital to the reputation, reach, and consequently impacts of journal articles. Reports in recent years have found that academic indexes, such as Google Scholar, PubMed, MathSciNet, and the Directory of Open Access Journals are the top research starting points for most scholars. Additionally, many scholars prioritize referencing and submitting to journals that are included in leading indexes, because indexing is a marker of journal quality.
Every organization publishing journals should prioritize indexing, to increase the reach of their articles and better serve the needs of researchers. For journal publishers to achieve the widest indexing impacts, meeting both basic publishing standards AND the highest technical indexing standards is key.
Basic indexing standards
All academic indexes require journals to follow certain core publishing standards. To meet basic indexing requirements journals should have:
- An International Standard Serial Number (ISSN)
- Digital Object Identifiers (DOIs)
- An established publishing schedule
- A copyright policy
- Basic article-level metadata
From there, indexes will have different inclusion requirements such as:
- Publication scope: Many indexes only accept journals that publish within particular subject areas. For example, MEDLINE and PubMed Central only index journals in the biomedical and life sciences.
- Editorial board and policies: Often indexes require the full names and affiliations of journal editors, as well as information about journal editorial policies such as a publicly available peer review policy and publication ethics statement.
- Level of publishing professionalization: Some indexes look at publishing professionalization including readability of articles and production quality.
- Archiving policy: Some indexes require journals to show that their articles are being archived by a long-term digital preservation service.
You can find a full breakdown of publishing standards for academic indexes in Scholastica’s eBook How to publish low-cost, high-quality open access journals online. Publishing standards ensure the uniformity and reputability of indexes. Consequently, indexes with higher standards tend to be more trusted by scholars, improving the reputation and reach of the journals in them.
Examples of top general indexes include:
- Academic Search (EBSCO)
- Directory of Open Access Journals (DOAJ)
- Web of Science
Reaching full indexing potential: Why technical standards are key
Once journals meet core publishing standards, like those outlined above, they’ll be eligible for relevant indexes. But, to get the most value out of indexing, journals must also meet the highest technical standards.
There are two main models for how indexes collect and process information:
- Web crawlers: Some indexes, such as Google Scholar, index journal articles on their own via web crawlers, which are automated internet programs that “crawl” websites to gather information. In order for crawlers to easily identify new content, publishers must apply metadata to articles and maintain a website structure that complies with the index’s requirements.
- Metadata/content deposits: Many indexes do not have web crawlers and instead require information to be submitted to them in machine-readable formats. In this case, machine-readable metadata files (often XML) must be deposited into the index so the index can process article information and know what to return in search results.
While web crawler indexes do most of the work for journals, there are steps that publishers must take to ensure articles can be crawled. For example, for an academic search engine like Google Scholar, technical steps include:
- Checking HTML and PDF files to make sure the text is searchable
- Configuring journal websites to export bibliographic data in HTML meta tags
- Making sure journal websites can be crawled by robots
It is important to note that most academic indexes don’t have web crawlers and instead require machine-readable metadata to be submitted to them. While some indexes have forms for making manual metadata deposits, directly depositing machine-readable metadata files into indexes is the highest technical standard and yields the best results.
Machine-readable metadata files are richer, more uniform, and less prone to inaccuracies as compared to manually entered metadata. They also have data mining potential (or text-and-data-mining potential if they are full-text files). Articles that allow for text and data mining can be processed by online scripts and machine-learning tools to analyze article information for purposes such as language or citation analysis. For example, Scite, a new software provider, is using machine learning to scan article citations to check if papers have been supported or contradicted.
The technical indexing standard for academic journals is XML, or extensible markup language, in the JATS format, which stands for Journal Article Tag Suite. Whereas XML is a language, JATS is a type of syntax. It is a specific way of formatting XML files developed by the National Information Standards Organization (NISO). JATS is preferred or required by many academic indexes, including all National Library of Medicine indexes and search engines (i.e. PubMed, PubMed Central, and MEDLINE). cOAlition S also strongly recommends that articles be formatted in JATS XML in its updated Plan S implementation guidelines.
Producing XML in the JATS format is on the more technical side, but software can automate much of the process. Software can also be used to generate full-text XML files and avoid steps like having to manually add and check for copyright data or citation metadata, saving time and costs.
Journals should at least produce front-matter XML files for all articles with basic metadata like article title, publisher, and DOI. However, as noted, full-text JATS XML files are better for text and data mining. They’re also required by some indexes like PubMed Central. Full-text JATS XML files include all of the metadata mentioned as well as the full text of the article.
You get what you put in
Including journal articles in relevant indexes can greatly improve their reputation and reach, providing greater impact potential for journals and the scholars publishing in them. Inclusion in leading indexes is an indicator of journal quality to scholars and their institutions, and indexes are one of the main outlets scholars use to find articles, serving as powerful discovery outlets. But, the potential benefits of indexes are dependent on the quality of the machine-readable metadata and article files journals put into them. For journal publishers and authors to get the most impacts from indexing, journals taking steps to meet both the highest publishing and technical standards is key.
This post includes excerpts from Scholastica’s eBook How to publish low-cost, high-quality open access journals online.
Note: This article gives the views of the author, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our comments policy if you have any concerns on posting a comment below