The recent uptake of crowdsourcing has seen institutions and scholars engage the public in large-scale research ventures. By recruiting volunteers to transcribe the unpublished manuscripts of Jeremy Bentham, the award-winning Transcribe Bentham project engages students, researchers, scholars, and the general public alike with Bentham’s life and work. Tim Causer describes the project, and suggests that we should not underestimate the capabilities of volunteers.

Most people will know Jeremy Bentham as the proponent of the Panopticon prison, or for his being preserved and displayed as an auto-icon. But Bentham was a hugely influential thinker and prolific writer, who turned his attention to a wide range of subjects, ranging from economics, representative democracy, religion, social welfare, law, and sexual morality.

UCL is custodian of both Bentham’s corpse, and his corpus: UCL Special Collections holds around 60,000 manuscript folios (c. 30 million words) of material written and composed by Bentham, while the British Library holds a further 12,500 folios (c. 6 million words). Prior to the launch of Transcribe Bentham, some 20,000 folios had been transcribed. So, while the Bentham Papers are a resource of enormous historical and philosophical importance, much of the collection is barely known, let alone adequately studied. As a result, our knowledge of Bentham’s thought and work—together with their historical and continuing importance—are rendered at best provisional, and at worst a caricature.

Which is where Transcribe Bentham comes in. Recent successful experimentations with crowdsourcing include the National Library of Australia’s digitised newspapers, Dickens Journals Online, and—of course—Galaxy Zoo. There have also been attempts to crowdsource a more complex task: the accurate transcription of manuscripts, represented by projects such as Old Weather, and open-source transcription tools such as Scripto, FromThePage, and T-Pen.

Transcribe Bentham was initiated to test the feasibility of crowdsourcing transcription of complex manuscripts. Users decipher Bentham’s handwriting, navigate his dense and challenging ideas, and his idiosyncratic style. Manuscripts are complicated further by deletions, insertions, and other irregular features. On top of this, volunteers encode their work in Text-Encoding Initiative compliant XML for preservation and interoperability purposes.

Volunteer-produced transcripts have two main purposes. First, they will contribute to scholarship, by providing Collected Works editors with diplomatic transcripts of the material. Volunteers are often the first to read manuscripts since Bentham wrote them, and could make new discoveries; one volunteer identified a hitherto unknown recollection from Bentham’s childhood related to his views on animal welfare, while another found a manuscript in which Bentham justifies the overthrow of the Governor of New South Wales.

Second, transcripts are uploaded to UCL’s digital repository, making the collection accessible to all. Each submission is checked by Transcribe Bentham staff, who assess whether crowdsourced transcripts are of sufficient quality for editorial and searching purposes.

Transcribe Bentham volunteers have more than risen to the challenge. As of 2 November 2012, 4,612 manuscripts have been transcribed or partially-transcribed, of which 94 per cent are of the requisite standard. This is a real testament to the care taken by untrained volunteers, and to the quality of their work. Since the project began, an average of 41 manuscripts (c. 20,000 words) have been transcribed each week; since 28 January 2012, however, the weekly average has increased to 51 manuscripts (c.25,500 words).

Anyone planning to crowdsource such a complex manuscript collection should be prepared to spend time and effort on checking submissions, and supporting volunteers in their work. Our quality control process is speedier than before, but can never be completely eliminated. Volunteers are currently transcribing at a faster rate than a full-time member of staff, and even though we currently spend the equivalent of a day per week checking submissions, we should still avoid significant staff costs in the future which could offset the cost of the project.

Our recently published findings suggest that volunteers are motivated to participate by an interest in history and/or philosophy, in Bentham, and/or by a general interest in crowdsourcing. Yet also very notable are altruistic motivations; as one volunteer put it, Transcribe Bentham is a ‘literary form of archaeology’, whereby ‘instead of using a brush to uncover an object, you get to uncover historical information by reading and transcribing it. It leaves his legacy available for all to access’. In our experience, volunteers will engage with a project more enthusiastically when there is a clearly articulated task, one in which volunteers are equals, and which will benefit others as well as meet your research needs.

Though we have recruited a large crowd of over 2,000 users, like other volunteer-powered projects most of the work is carried out by a minority of participants: only 316 have transcribed anything, and 194 of these worked on only one manuscript. In fact, most of the work has been carried out by fifteen regular ‘Super Transcribers’.

Why have so few registered users taken part? Alastair Dunning’s point about being sensitive to and understanding the ‘needs and motivations of those taking part’ in crowdsourcing endeavours is a vital one. Volunteers have suggested what may have dissuaded participation: lack of time in which to learn how to transcribe and/or add markup; inability to read Bentham’s handwriting; the XML markup itself; and trouble with the transcription interface. So, while participants have proven technologically savvy, work remains to be done to make participation more straightforward.

Owing to the cessation of our initial twelve-month funding, we weren’t able to meet these requests. Now, under a two-year grant from the Andrew W. Mellon Foundation, we are in a position to implement alterations, one of the most important of which will be the introduction of a What-You-See-Is-What-You-Get transcription interface, which will hide the XML markup. We will also digitise much of the remainder of the UCL Bentham Papers, and all of those manuscripts held by the British Library.

Transcribe Bentham has had a significant impact. It has brought a new audience to Bentham, who, incidentally, loved technology: his house had central heating, and he experimented with refrigeration, and counterfeit-proof banknotes. Transcribe Bentham won a major award, and shown that complex tasks and material can be crowdsourced. If we can crowdsource Bentham, then surely anything is possible!

Perhaps most importantly, the ultimate fruit of Transcribe Bentham will be a digital collection of enduring national and international historical and philosophical importance, accessible to all, created through a genuine partnership between scholars and the general public.


Note: This article gives the views of the author(s), and not the position of the Impact of Social Sciences blog, nor of the London School of Economics.

About the author:
Tim Causer is a Research Associate at the Bentham Project, in the Faculty of Laws at University College London. His research specialism is convict transportation, with an especial emphasis on colonial Australia.

About the project:
Transcribe Bentham
is hosted by UCL’s Bentham Project, and produced in association with UCL’s Centre for Digital Humanities, UCL Library Services, UCL Creative Media Services, and the University of London Computer Centre. It was established under a one-year grant from the Arts and Humanities Research Council. From 1 October 2012 is funded for two years by the Andrew W. Mellon Foundation, with the British Library joining the project consortium. You can sign up for a user account and take part at the Transcription Desk.



