When a researcher in most areas of Physics, Mathematics or Computer Science (and increasingly also Statistics, Quantitative Finance and Quantitative Biology) is looking for recent publications in their field, one of the first places they will look is the arXiv. (Pronounced “archive”, with the “X” standing in for the Greek letter chi.) The arXiv was started in 1991 as a simple central repository of electronic preprints in physics, based on servers at the Los Alamos National Laboratory. Soon it expended its scope to other areas. In 1999 it moved to Cornell University Library, which is still its main base. The statistics page of the arXiv gives a good indication of its size and activity: over 1 million submissions since its start; currently between 8,000 and 9,000 new submissions per month and around 10 millions downloads per month.

So why has the arXiv become so important for researchers in these particular fields? Why is it that it is now more or less standard that any active researcher in these areas will deposit a close to final version of their publications in the archive? Part of it can be explained by the increasing prominence of Open Access and related developments in academic publishing. But that can only explain a small part of the success of the arXiv. The main reason of its success, in my opinion, is a specific feature of these research areas: the very long lead time between submission and publication in a journal of papers in those fields, and hence the historic prominence of “preprints” and “reports”. I will describe some of that background below, specifically for Mathematics (my field), but similar factors play a role in the other subjects covered by the arXiv as well.

In Mathematics, a period of one year between submission and publication is quite common, while periods of 3-4 years are nothing exceptional. A major reason for those long lead times is the thorough refereeing that is expected. Most papers in Mathematics consist for a large part of one or more detailed proofs of the main result(s). These proofs can vary in length from a few paragraphs to several hundred pages (although anything over roughly 30 pages is considered long). And it is one of the main duties of the referees to convince themselves of the correctness of those proof(s); a process that involves carefully going through the arguments, checking if the logic is correct, checking if old results are used correctly, etc. Thoroughly checking one page of a proof can easily take more than a day. This means that the refereeing process usually takes at least several months, or even years if the referees need to find the time to do a proper job. And if errors are found, the author(s) might be asked to try to correct them, and a 2nd or 3rd version op the paper may need considerable amount of time to be scrutinised again. Added to the lengthy refereeing process in the past was the specialised typesetting that was required for mathematical texts.

Because of the long time between submission and publication, the existence of “preprints” or “reports” was standard in the mathematical community. As soon as a version of a new paper was submitted to a journal, the author(s) would make a number of hard-copies of it, often in the form of a report in a “Reports Series” based in any respectable Mathematics department. (The one at the LSE was called CDAM Research Report Series; although still accessible online, it stopped accepting new material in 2009.) When you would go to a conference or gave a seminar, you would bring a couple of those preprints. And after the presentation, interested members of the audience would come forward and ask “do you have a preprint of this?”. Note that these preprints were different from the “working papers” that exist in some other fields. Where a “working paper” is a publication that is still in development, a preprint or report would be a (hopefully) close to final version, more or less identical to the manuscript that was already submitted to a journal.

Once the World Wide Web became more prominent, those preprints went online, usually via personal homepages of the author(s). At the same time, institutional preprint series were going online. And once the advantages of having a central repository became clear, most of us started uploading our work to one of those, and personal homepages and the surviving preprint series just link to the article on the arXiv.

So the arXiv is not something that came into existence because of the move towards Open Access. It’s more that it was the solution to a practical problem: “if it will take several years before my paper will be published, how do I tell the world about my brilliant work in the meantime?”. Of course, the arXiv is now seen as a prime example of Open Access: it is completely free to search and download all publications. It allows uploading new versions of a paper, while at the same time keeping previous versions accessible.

On the other hand, in its present form the arXiv is not in a position to replace traditional journals. The main reason for that is the lack of refereeing. There is a group of moderators who can reject publications that are not scientific or recategorise off-topic submissions. But in general any paper can be a brilliant proof of a long-standing conjecture, a piece of high-school Mathematics, or something that upon serious reading is clearly wrong. As long as academic recruitment panels and promotion committees attach value to papers published in specific journals only, repositories such as the arXiv can have a limited role in the whole publication process.

An interesting new development is the appearance of “overlay journals”. These are journals that have an independent (online) presence, but who use a central repository to host the papers appearing in them. In other words, the journal will have editors, an editorial board, a review process, etc., but in the end the list of papers in it will just be a list of links to the relevant papers in some repository. Although these overlay journals have existed for a while, they became a lot better known when Timothy Gowers announced on his blog that he and a number of extremely eminent collaborators would start an arXiv overlay journal in their specialism. Gowers became quite well-known because of his activities and called for a boycott of the traditional commercial scientific publishers, in particular Elsevier. (See here, here and here for more on that.) So anything he does regarding Open Access and the use of open repositories immediately makes people sit up and pay attention.

So could we see a more prominent role of completely open repositories such as the arXiv in the scientific publication process? Maybe. But two main obstacles remain, from my point of view. How do you set up a review process that makes it possible to recognise (top-)quality among the publications in the repositories? And how do you overcome the inbuilt conservatism in academic recruitment panels and promotion committees to look firstly and  mainly at publications at journals they recognise? As long as those hurdles are not removed, commercial publishers won’t have to worry too much, unfortunately.

20140709-jan2cropped

Professor van den Heuvel teaches and researches in the Department of Mathematics at LSE. He can also be found on Twitter @JanvadeHe