Skip to main content

Epidemiology from Tweets: Estimating Misuse of Prescription Opioids in the USA from Social Media



The misuse of prescription opioids (MUPO) is a leading public health concern. Social media are playing an expanded role in public health research, but there are few methods for estimating established epidemiological metrics from social media. The purpose of this study was to demonstrate that the geographic variation of social media posts mentioning prescription opioid misuse strongly correlates with government estimates of MUPO in the last month.


We wrote software to acquire publicly available tweets from Twitter from 2012 to 2014 that contained at least one keyword related to prescription opioid use (n = 3,611,528). A medical toxicologist and emergency physician curated the list of keywords. We used the semantic distance (SemD) to automatically quantify the similarity of meaning between tweets and identify tweets that mentioned MUPO. We defined the SemD between two words as the shortest distance between the two corresponding word-centroids. Each word-centroid represented all recognized meanings of a word. We validated this automatic identification with manual curation. We used Twitter metadata to estimate the location of each tweet. We compared our estimated geographic distribution with the 2013–2015 National Surveys on Drug Usage and Health (NSDUH).


Tweets that mentioned MUPO formed a distinct cluster far away from semantically unrelated tweets. The state-by-state correlation between Twitter and NSDUH was highly significant across all NSDUH survey years. The correlation was strongest between Twitter and NSDUH data from those aged 18–25 (r = 0.94, p < 0.01 for 2012; r = 0.94, p < 0.01 for 2013; r = 0.71, p = 0.02 for 2014). The correlation was driven by discussions of opioid use, even after controlling for geographic variation in Twitter usage.


Mentions of MUPO on Twitter correlate strongly with state-by-state NSDUH estimates of MUPO. We have also demonstrated that a natural language processing can be used to analyze social media to provide insights for syndromic toxicosurveillance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Abuse S. Results from the 2010 National Survey on Drug Use and Health: Summary Of National Findings 2011.

  2. 2.

    Manchikanti L, Singh A. Therapeutic opioids: a ten-year perspective on the complexities and complications of the escalating use, abuse, and nonmedical use of opioids. Pain physician. 2008;11(2 Suppl):S63–88.

    PubMed  Google Scholar 

  3. 3.

    Hansen RN, Oster G, Edelsberg J, Woody GE, Sullivan SD. Economic costs of nonmedical use of prescription opioids. Clin J Pain. 2011;27(3):194–202.

    Article  PubMed  Google Scholar 

  4. 4.

    Florence CS, Zhou C, Luo F, Xu L. The economic burden of prescription opioid overdose, abuse, and dependence in the United States, 2013. Med Care. 2016;54(10):901–6.

    Article  PubMed  Google Scholar 

  5. 5.

    Chew C, Eysenbach G. Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLoS One. 2010;5(11):e14118.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PLoS One. 2011;6(5):e19467.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Lenhart A, Purcell K, Smith A, Zickuhr K. Social media & mobile Internet use among teens and young adults. Millennials. Pew Internet & American life project. 2010.

  8. 8.

    Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res. 2009;11(1):e11.

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Nascimento TD, DosSantos MF, Danciu T, DeBoer M, van Holsbeeck H, Lucas SR, et al. Real-time sharing and expression of migraine headache suffering on Twitter: a cross-sectional infodemiology study. J Med Internet Res. 2014;16(4):e96.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Dredze M. How social media will change public health. IEEE Intell Syst. 2012;27(4):81–4.

    Article  Google Scholar 

  11. 11.

    Cavazos-Rehg P, Krauss M, Grucza R, Bierut L. Characterizing the followers and tweets of a marijuana-focused Twitter handle. J Med Internet Res. 2014;16(6):e157.

    Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Hanson CL, Burton SH, Giraud-Carrier C, West JH, Barnes MD, Hansen B. Tweaking and tweeting: exploring Twitter for nonmedical use of a psychostimulant drug (Adderall) among college students. J Med Internet Res. 2013;15(4):e62.

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Chary M, Genes N, Manini AF. Using Twitter to measure underage alcohol usage. Clinical Toxicology. 2014;52(4):304. 52 VANDERBILT AVE, NEW YORK, NY 10017 USA: INFORMA HEALTHCARE

    Google Scholar 

  14. 14.

    Halpern JH, Pope HG Jr. Hallucinogens on the Internet: a vast new source of underground drug information. Am J Psychiatr. 2001;158(3):481–3.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Jimenez-Feltström A, inventor; Telefonaktiebolaget LM Ericsson (Publ), assignee. Text language detection. United States patent US 7,035,801. 2006.

  16. 16.

    Bird S. NLTK: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions 2006 Jul 17 (pp. 69-72). Assoc Comput Linguist.

  17. 17.

    Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008. 1997.

  18. 18.

    Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.

    Article  Google Scholar 

  19. 19.

    Hartigan JA, Wong MA. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics). 1979;28(1):100–8.

    Google Scholar 

  20. 20.

    Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Mat. 1987;20:53–65.

    Article  Google Scholar 

  21. 21.

    Burton SH, Tanner KW, Giraud-Carrier CG, West JH, Barnes MD. “Right time, right place” health communication on Twitter: value and accuracy of location information. J Med Int Res. 2012;14(6):e156.

    Google Scholar 

  22. 22.

    Graham M, Hale SA, Gaffney D. Where in the world are you? Geolocation and language identification in Twitter. Prof Geogr. 2014;66(4):568–78.

    Article  Google Scholar 

  23. 23.

    Dredze M, Paul MJ, Bergsma S, Tran H. Carmen: a twitter geolocation system with applications to public health. In AAAI workshop on expanding the boundaries of health informatics using AI (HIAI) 2013 Jun 29 (pp. 20–24).

  24. 24.

    Van Rossum G, Drake Jr FL. Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica; 1995.

  25. 25.

    Jolliffe I. Principal component analysis. Wiley, Ltd; 2002.

  26. 26.

    Chary M, Park EH, McKenzie A, Sun J, Manini AF, Genes N. Signs & symptoms of dextromethorphan exposure from YouTube. PLoS One. 2014;9(2):e82452.

    Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Caspi A, Gorsky P. Online deception: prevalence, motivation, and emotion. CyberPsychol & Behav. 2006;9(1):54–9.

    Article  Google Scholar 

  28. 28.

    Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychol Sci. 2015;26(2):159–69.

    Article  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Nambisan P, Luo Z, Kapoor A, Patrick TB, Cisler RA. Social media, big data, and public health informatics: ruminating behavior of depression revealed through twitter. In System Sciences (HICSS), 2015 48th Hawaii International Conference on 2015 Jan 5 (pp. 2906-2913). IEEE.

  30. 30.

    Mowery D, Smith HA, Cheney T, Bryan C, Conway M. Identifying depression-related tweets from twitter for public health monitoring. On J Public Health Inform. 2016;24:8(1).

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Michael Chary.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflicts of interest.

Sources of Funding


Electronic supplementary material


(DOCX 809 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chary, M., Genes, N., Giraud-Carrier, C. et al. Epidemiology from Tweets: Estimating Misuse of Prescription Opioids in the USA from Social Media. J. Med. Toxicol. 13, 278–286 (2017).

Download citation


  • Social media
  • Epidemiology
  • Misuse
  • Opioids
  • Natural language processing
  • Computational linguistics