Different social media platforms allow different levels of access to the data they hold for academic research. In this cross-post Daniela Duca explores some of the ways in which LinkedIn has been used by social scientists and provides a list resources for researchers looking to work with LinkedIn data.
Back in 2012, when LinkedIn was close to the 200 million users mark, a young but very computational (and quite resourceful) assistant professor, hustled through his contacts and somehow managed to get access to the trove of LinkedIn data. Prasanna Tambe—at the NYU Stern School of Business at the time—was not the first to use the information on LinkedIn for research, but definitely the first to use LinkedIn data to this scale. Tambe mined the skills and roles of all 175 million users at the time, though he probably ended up working with a smaller sample, to understand how the rapid evolution of skills and know-how in the technology sector is impacting investments in new IT innovations.
Today, researchers are using LinkedIn data in a variety of ways: to find and recruit participants for research and experiments (Using Facebook and LinkedIn to Recruit Nurses for an Online Survey), to analyze how the features of this network affect people’s behavior and identity or how data is used for hiring and recruiting purposes, or most often to enrich other data sources with publicly available information from selected LinkedIn profiles (Examining the Career Trajectories of Nonprofit Executive Leaders, The Tech Industry Meets Presidential Politics: Explaining the Democratic Party’s Technological Advantage in Electoral Campaigning).Most of these uses involve manual lookups and graduate students spending days to sift through the site, copy pasting the information into a spreadsheet. A LinkedIn API is available for larger scale datasets, but there are limitations—such as no more than 100k lifetime users, no storing of content, and it cannot be used for research purposes. If you had a large enough network, you could also download your network’s data and work with that csv output. Essentially, you need some computational skills to collect and use the LinkedIn data, and you would still be limited in the type of research you could do. Gian Marco Campagnolo, a Turing Fellow and lecturer at the University of Edinburgh used some LinkedIn data for his team’s research into the career evolution of IT professionals, but they still needed to get a list of names from another database.
Economic Graph
The LinkedIn Economic Graph team continues to work with the data independently of academics, forming partnerships with organisations such as The World Bank Group. I was recently looking at the data made available (to the public through this collaboration) to explore the migration patterns of highly trained people from my home country. I was surprised to find that UK is now #2 after Romania. As the website states, in this first Digital Data for Development collaboration, the two organizations opened up an anonymized and aggregated dataset on “100+ countries with at least 100,000 LinkedIn members each, distributed across 148 industries and 50,000 skills categories”.
Further Resources*
- Kaggle dataset of anonymized LinkedIn profiles
- Data.world LinkedIn dataset of top skills by year
- Data.world LinkedIn dataset on job data by US state
- A 2015 list of links to datasets of LinkedIn profiles on reddit
- Network repository LinkedIn dataset
- The World Bank LinkedIn dataset
- Harvested LinkedIn search data with recipe from getdata.io
- Explore LinkedIn membership by field of study and geography, 2017
- Statista’s LinkedIn user volume dataset
*Scraping web pages and using the LinkedIn API for research purposes violates LinkedIn’s terms and conditions.
This post originally appeared as Social scientists working with LinkedIn data, on the SAGE Ocean blog under a CC BY-NC 4.0 licence. Interested in utilising social media data? Check out the latest course from SAGE Campus – collecting social media data.
About the author
Daniela Duca works on new products within SAGE Ocean, collaborating with startups to help them bring their tools to market. Before joining SAGE, she worked with student and researcher-led teams that developed new software tools and services, providing business planning and market development guidance and support. She designed and ran a 2-year programme offering innovation grants for researchers working with publishers on new software services to support the management of research data. She is also a visual artist, with experience in financial technology and has a PhD in innovation management. You can connect with Daniela on Twitter.
Note: This article gives the views of the authors, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our comments policy if you have any concerns on posting a comment below
Image Credit, Dimitar Belchev via Unsplash (Licensed under a CC0 1.0 licence)
I do alot of LinkedIn scraping as well to collect profile data for a predictive model I’m building. This platform, Mantheos (https://profiles.mantheos.com/)has an API that allows me to extract LinkedIn profile data at mass. With other solutions, I usually find a volume limitation seems to be quite significant with a limited number of profiles extractable and the number of connection requests.