An article in Scientific American suggests further ethical considerations should be made for research derived from Twitter data. Ernesto Priego questions first the extent to which Twitter will actually release all of its valuable data and also argues archiving and disseminating information from Twitter and other public archives does not have to be cause for an “ethical dilemma” so long as the information was public in the first instance.
“Twitter to Release All Tweets to Scientists”, says the Scientific American headline. The 344-word post fails to quote a single source at Twitter where this claim can be verified. It is not clear it is just a very belated reaction to the February 5 2014 “Twitter Data Grants” call [now closed]. Please note that the Data Grants call is a pilot programme, and the February post clearly indicated that
For this initial pilot, we’ll select a small number of proposals to receive free datasets. We can do this thanks to Gnip, one of our certified data reseller partners. They are working with us to give selected institutions free and easy access to Twitter datasets. In addition to the data, we will also be offering opportunities for the selected institutions to collaborate with Twitter engineers and researchers.
Nothing in there says that “Twitter” will “Release All Tweets to Scientists”, as the über-retweeted Scientific American headline claims. (Note the above quote and source post was not mentioned or linked to from the Scientific American article in question).
It is frustrating a publication like Scientific American (or the Smithsonian Magazine, which repeated the same copy) would not cite or link to the sources for such a claim. All Twitter means “All Twitter” and, “Scientists” means… who exactly? Given Twitter’s business model and data storage and management strategy, it seems highly unlikely that the headline would be a reality (taken verbatim); it is more likely that some researchers might be able to get access to large-yet-curated datasets for research; it is also likely most researchers would have to pay for it.
The short Scientific American post is more like an excuse to mention an opinion article approved with reservations at F1000Research (that most unhelpfully also fails to link to) by Caitlin Rivers and Bryan Lewis, computational epidemiologists at Virginia Tech. The Scientific American post asks: “Is the use of Twitter as a research tool ethical, given that its users do not intend to contribute to research?”
To be honest the question makes me want to reply: these days what’s unethical is not to use Twitter as a research tool. But seriously. It has to be taken into account that Rivers and Lewis’ suggested ethics framework is for Twitter research in mental health [PDF]. This is an essential qualifier to the context in which their framework should be interpreted. Please read the open referee report of the piece by Tristan Henderson, which provides very valuable observations and further reading (Read the Referee Report). The version I read and refer to is version 1 of the paper. Click “track” to see updates.
As the Twitter Hive Mind often says, there is no such thing as ‘raw’ data– all data is always already subject to curation and editing at all stages of the process. Once one has the data it is relatively easy to delete or edit specific columns for the different metadata obtained from the Twitter API. Nevertheless, as someone who has been collecting and sharing Twitter datasets for Library and Information Science research (see my figshare data) I would be worried if the ethical specificities of a particular field (mental health research, epidemiology in this case) were imposed on other fields. Research involving network analysis and geovisualisation often relies on publicly-available metadata obtained from tweets consciously and willingly provided by users publicly online through their public Twitter accounts.
Image credit: Cory Grenier (Flickr, CC BY-SA)
Rivers and Lewis say that “Twitter participants can reasonably expect to rely on some anonymity of the crowd to manage privacy.” Though I can see why this is being said in the context of an opinion piece on ethics of mental health research, I disagree.
Tweets, Following, Lists and other Public Information:Our Services are primarily designed to help you share information with the world. Most of the information you provide us is information you are asking us to make public. This includes not only the messages you Tweet and the metadata provided with Tweets, such as when you Tweeted, but also the lists you create, the people you follow, the Tweets you mark as favorites or Retweet, and many other bits of information that result from your use of the Services. Our default is almost always to make the information you provide public for as long as you do not delete it from Twitter, but we generally give you settings to make the information more private if you want. Your public information is broadly and instantly disseminated. For instance, your public user profile information and public Tweets may be searchable by search engines and are immediately delivered via SMS and our APIs to a wide range of users and services, with one example being the United States Library of Congress, which archives Tweets for historical purposes. [My emphasis – EP]. When you share information or content like photos, videos, and links via the Services, you should think carefully about what you are making public.
Location Information: You may choose to publish your location in your Tweets and in your Twitter profile. You may also tell us your location when you set your trend location on Twitter.com or enable your computer or mobile device to send us location information. You can set your Tweet location preferences in your account settingsand learn more about this feature here. Learn how to set your mobile location preferences here. We may use and store information about your location to provide features of our Services, such as Tweeting with your location, and to improve and customize the Services, for example, with more relevant content like local trends, stories, ads, and suggestions for people to follow.
There is a wealth of information in a tweet’s metadata that can be beneficial for research in fields other than the Life Sciences. The act of archiving and disseminating public information publicly does not have to be cause for an “ethical dilemma”, as long as the archived and disseminated information was public in the first instance. If the researcher were collecting and sharing data impossible to obtain freely and publicly we would be facing a different situtation. Publicly published data is public evidence and it should be subject to public research– Facebook is not Twitter, and Twitter research is not hacking into private mobile phone messages or emails. There is a difference between surveillance and recording for historical/sociological/scientific other research. Surveillance implies the collection and analysis of information that the public did not mean to be publicly accessible. Transparency is a positive consequence of publicness, and the public research of publicly-available data is an exercise in transparency and accountability.
A researcher like me is interested in scholarly and artistic networks online composed by individuals who have willingly set up public accounts on Twitter and who post content willingly using hashtags to organise their postings under particular categories and therefore be found under such categories. Individuals worried about the data they publish publicly, freely, openly on Twitter being collected by researchers for research purposes other than the ones they intended should perhaps reconsider how Twitter works. Moreover, it seems to me the likelihood of an individual user’s sensitive data being further disseminated from an academic’s research Twitter dataset is much smaller than the likelihood of it going viral as originally published through a Twitter client.
I suppose the most efficient ethical framework for Twitter use is the simplest one. If you don’t want it found, viewed, collected and potentially researched, don’t tweet it publicly.
More on qualitative research and anonymity: you might also be interested in this post by Mark Carrigan, and in the post by Pat Thompson he links to here.
This piece originally appeared on Ernesto Priego’s personal blog as “Twitter as Public Evidence and the Ethics of Twitter Research” and is reposted with permission.
Note: This article gives the views of the authors, and not the position of the Impact of Social Science blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.
Ernesto Priego is a Lecturer in Library Science at the Centre for Information Science at City University London. He did his PhD in Information Studies at University College London.
I think this misses the point that was uncovered by danah boyd’s research with youth about putting content online and “unintended audiences” (See http://www.danah.org/papers/TakenOutOfContext.pdf).
Also, check out Wasim Ahmed’s slides on ethics of using social media content: https://www.slideshare.net/was3210
I’m also trying to track down a reference to a longitudinal health study. Children’s data was used with parental permission and made public. Years later, researchers contacted the grown participants, and there were a number that now objected to sharing what the data had revealed about them. I think it was a UK longitudinal health study. If I find it, I will add it.
There is a clear implication for research ethics when data is used to label or group identifiable individuals in ways that they would find problematic–and had they known others would be using the data for this purpose, the individuals would have denied access–especially when it relies on an incomplete understanding of the individual publicly posting the content. There is much to be discussed regarding researchers as “unintended audiences” for content people put online.
Apart from Twitter, for AI and Machine Learning Projects You can find good Free Public Dataset Links in the below link:
Do check out.