Social Media Content Analysis


Social media content analysis is the analysis of items shared by people through their use of Social Media Networking Sites (SNS). SNS provides a wide variety of content that can be collected and studied as ethnographic data, with variation in the items shared over a broad array of formats. Click here for an overview of content analysis that does not relate to data collected through SNS.

Relevant Characteristics

Data within the public domain is becoming more prevalent with the proliferation of SNS, such as: Facebook, Twitter, Instagram, Flikr, and others. Mining and analyzing data can be done both manually and automatically. This is demonstrated below, using Twitter as an example.

Data can be culled from numerous SNS. The strategies for collecting this data will vary according to what site you pull from and what type of information you are interested in. It is possible to have software assist your collection, such as NVIVO, but these programs can be costly. The benefit is that you can collect a lot of data in a very short amount of time.That said, data can be manually collected from Twitter and input into an excel spreadsheet.[1] Some researchers have also used an open-source program called Infovigil to mine and analyze Twitter data.[2] If an individual is particularly tech savvy, they can use Application Programming Interfaces (APIs) within various SNS, which may also assist in the collection of data.[3] Most of these data collecting techniques are used to collect SNS content available in the present. For historical data, APIs and pay sites, like Topsy, can collect data that’s a few years old.

The collection of data made publicly available is commonly viewed as ethical. However, reproducing data with username information without explicit consent may be seen as inappropriate, and some SNS even have policies that restrict what type of data is reproduced (

“Method Made Easy”

  1. Decide the topic to investigate upon generating research questions
  2. Determine which SNS to use to collect data (i.e., Twitter, Instagram, etc.) and specify specific intervals (if any) for collection
  3. Collect the data from the SNS manually or with the assistance of a software program
  4. Transfer data to the software, which can be as basic as a word processor or as advanced as a qualitative analysis package
  5. Filter according to aim of the research (e.g. whether or not to include retweets)
  6. Utilize a sampling technique if the dataset is too large to be analyzed in its entirety
  7. Code the content according to pre-selected or generated schema
  8. Analyze the data using the themes selected


  • Easily allows for the remote collection of data by enabling researchers to be physically removed from users generating content
  • Large amount of data is made publicly available on a continuous basis
  • The electronic format facilitates collection and analysis within software programs
  • Data can lend itself to analysis due to the fact that many users disseminate it according to topics (e.g. hashtags)
  • The method allows for an interpretation of public perception regarding numerous topics


  • Content may include slang and other forms of language difficult to interpret, see example
  • Tweets and other items may have links that need to be investigate in order to better understand the content and the broader context it is being used within
  • Cannot verify many claims that are made as users may hide behind the anonymity that the internet provides
  • Could be considered unethical to reproduce certain information, such as users’ names and geographical location
  • Difficult to manage large datasets that may be created due to a plethora of content generated on a given topic


Analysis is dependent on the type of data collected. For instance, if the data is comprised of photographs (e.g. from Instagram or Flickr), photo content analysis is more appropriate.[4] [5] If the data is text, content analysis is the best option. You would code the data according to a framework/scheme you generate or borrow. Most researchers seem to prefer creating their own coding scheme, with some going so far as to run a statistical analysis testing for significance and refining their codes accordingly.[6] [7] It is possible to borrow or adapt existing coding frameworks.[8] Upon coding, analysis becomes easier and it is possible to view the volume of tweets associated with each code, thus lending itself to statistical analysis. It is also possible to notice trends over time if you have collected longitudinal data. [9]

Social Media Analysis with NVIVO

This demonstration uses data collected from Twitter and analyzed through NVIVO. You will need NVIVO, the NVIVO Capture add-on for your web browser, and a Twitter account before you can begin.

  1. First decide what hashtags will be associated with the opic. For this example we are studying the dialogue that is occurring on Twitter about the Affordable Care Act. The hashtags most used on this topic are #ACA, #AffordableCareAct, and #Obamacare. For this example we are going to analyze the content associated with just #Obamacare to reduce the amount of tweets to analyze, but data from all the hashtags can be analyzed together. If you are not sure what hashtag or SNS to use, you can use websites like Tagboard to explore different hashtags and see the content associated with the hashtag on several SNS. It is important to note that different hashtags tied to the same topic can generate quite different content, and that this must be kept in mind if only analyzing one independent of the others. This phenomenon itself can be assessed if comparing the results of the analyses of two different hashtags (e.g. #Obamacare vs. #AffordableCareAct)
  2. View the hashtag #Obamacare through your Twitter account.
    Cropped Twitter and NCapture.png
    Portion of screen showing Twitter website with #Obamacare results. NVIVO Capture symbol is highlighted in the corner.
  3. In Chrome, you will see a symbol for NVIVO Capture on the right-hand corner. Click on the symbol and a small dialogue box will ask for a description of the dataset. We named this file Obamacare Tweets. Capture NVIVO.png
  4. Next, go to NVIVO and create a new project. We named our project AffordableCare. Go to the External Data tab and choose From Other Sources, which will open a drop down menu where you pick From NCapture. Find the location of the NVCX file, which may be in your Download folder. Check the box next to the file that should be listed as Twitter~Search - #obamacare, and click Import.Import NCapture.png
  5. This action uploads a spreadsheet of tweets into NVIVO. For this example, we have 96 tweets but many of them are retweets.Retweets.pngThese can be filtered by right-clicking on the Tweet Type column and hiding the Retweets. Now we are left with 61 tweets to analyze.No retweets.png
  6. For our analysis we borrowed codes from Chew and Eysenbach (2010) to divide tweets into one of the following categories, referred to in NVIVO as nodes: humor/sarcasm, concern, frustration, misinformation, personal experience, personal opinion, and resources. We coded each tweet into the appropriate nodes; some tweets may fall into more than one node (see video on how to code in NVIVO). For example, a tweet could be referring to an online article about the Affordable Care Act, which we coded as a resource, but it can also include a comment reflecting personal opinion, frustration, or humor/sarcasm and can be coded under this node as well.
  7. Analyze the data using NVIVO. We couldn’t use a cluster analysis on this sample because it was too small. However, a bar graph of the codes (nodes) can be generated as well as a Word Cloud and Tree Map. Here is the result of our example, which shows that most of the tweets were personal opinion and many had online articles: Bar graph.png

Method in Context

Due to the more recent nature of SNS, this particular method does not have many studies employing it. However, there are a handful of scholars crafting approaches to collecting, managing, and analyzing data from SNS, with several coming from Public Health.

Cynthia Chew and Gunther Eysenbach collected over 2 million tweets during an 8 month period in 2009. They used an infoveillance system called Infovigil to assist their efforts.

The two note that the terms used to find tweets were any of the following: swine flu, swineflu, and H1N1. Manual coding occurred in the early stages of the research and the overall number of tweets collected was reduced using randomly selected quotes from data subsets. The authors then generated a coding scheme based on iterations of evaluating the content of the tweets, resulting in the following codes: Resources, Personal Experiences, Personal Opinion/Interest, Humour/Sarcasm, Relief, Downplayed Risk, Concern, Frustration, Misinformation, and Question. Infovigil was configured to automatically code tweets using this schema based on the appearance of certain terms, and the results were compared to manual coding efforts. All statistically significantly trends identified occurred comparably between manual and automated coding.

Ultimately, the authors identified that tweets concerning the topic were primarily resources offering credible information with insights on opinion and experiences. They suggest that this framework can be built upon to help public officials address health concerns.

Online Resources

APIs through Twitter -
Infovigil -
NVIVO Capture demonstration -
Tagboard lets you view hashtags from most SNS sources on one page,
Topsy lets you view historic data for free and the pro (pay) site provides trend information and allows you to download data -

Further Reading

Eysenbach, Gunther
2009 Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet. Journal of Medical Internet Research 11(1).

Humphreys, Lee, Phillipa Gill, Balachander Krishnamurthy, and Elizabeth Newbury
2013 Historicizing New Media: A Content Analysis of Twitter. Journal of Communication.

Lewis, Seth C., Rodrigo Zamith, and Alfred Hermida
2013 Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods. Journal of Broadcasting & Electronic Media 57(1):34-52.

Rui, Jian Raymond, Yixin Chen, and Amanda Damiano
2013 Health Organizations Providing and Seeking Social Support: A Twitter-Based Content Analysis. Cyberpsychology, Behavior, and Social Networking 2013(1).

Thornton, Leslie-Jean
2013 “Time of the Month” on Twitter: Taboo, Stereotype and Bonding in a no-Holds-Barred Public Arena. Sex Roles 68(1-2):41-54.


  1. ^ Sullivan, S. John, Anthony G. Schneiders, Choon-Wi Cheang, Emma Kitto, Hopin Lee, Jason Redhead, Sarah Ward, Osman H. Ahmed, and Paul R. McCrory (2012) ‘What's Happening?’A Content Analysis of Concussion-Related Traffic on Twitter. British Journal of Sports Medicine 46(4):258-263.
  2. ^ Chew, Cynthia, and Gunther Eysenbach (2010) Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak. PloS One 5(11):e14118.
  3. ^ Angus, Emma, David Stuart, and Mike Thelwall (2010) Flickr’s potential as an academic image resource: An exploratory study. Journal of Librarianship and Information Science 42(4) 268–278.
  4. ^ Angus, Emma, David Stuart, and Mike Thelwall (2010) Flickr’s potential as an academic image resource: An exploratory study. Journal of Librarianship and Information Science 42(4) 268–278.
  5. ^ Ozel, Bulent, and Han Woo Park(2012) Online Image Content Analysis of Political Figures: An Exploratory Study. Quality & Quantity 46(4):1013-1024.
  6. ^ Angus, Emma, David Stuart, and Mike Thelwall (2010) Flickr’s potential as an academic image resource: An exploratory study. Journal of Librarianship and Information Science 42(4) 268–278.
  7. ^ Chew, Cynthia, and Gunther Eysenbach (2010) Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak. PloS One 5(11):e14118.
  8. ^ Angus, Emma, David Stuart, and Mike Thelwall (2010) Flickr’s potential as an academic image resource: An exploratory study. Journal of Librarianship and Information Science 42(4) 268–278.
  9. ^ Chew, Cynthia, and Gunther Eysenbach (2010) Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak. PloS One 5(11):e14118.