Journal of Medical Toxicology

, Volume 13, Issue 4, pp 278–286 | Cite as

Epidemiology from Tweets: Estimating Misuse of Prescription Opioids in the USA from Social Media

  • Michael CharyEmail author
  • Nicholas Genes
  • Christophe Giraud-Carrier
  • Carl Hanson
  • Lewis S. Nelson
  • Alex F. Manini
Original Article



The misuse of prescription opioids (MUPO) is a leading public health concern. Social media are playing an expanded role in public health research, but there are few methods for estimating established epidemiological metrics from social media. The purpose of this study was to demonstrate that the geographic variation of social media posts mentioning prescription opioid misuse strongly correlates with government estimates of MUPO in the last month.


We wrote software to acquire publicly available tweets from Twitter from 2012 to 2014 that contained at least one keyword related to prescription opioid use (n = 3,611,528). A medical toxicologist and emergency physician curated the list of keywords. We used the semantic distance (SemD) to automatically quantify the similarity of meaning between tweets and identify tweets that mentioned MUPO. We defined the SemD between two words as the shortest distance between the two corresponding word-centroids. Each word-centroid represented all recognized meanings of a word. We validated this automatic identification with manual curation. We used Twitter metadata to estimate the location of each tweet. We compared our estimated geographic distribution with the 2013–2015 National Surveys on Drug Usage and Health (NSDUH).


Tweets that mentioned MUPO formed a distinct cluster far away from semantically unrelated tweets. The state-by-state correlation between Twitter and NSDUH was highly significant across all NSDUH survey years. The correlation was strongest between Twitter and NSDUH data from those aged 18–25 (r = 0.94, p < 0.01 for 2012; r = 0.94, p < 0.01 for 2013; r = 0.71, p = 0.02 for 2014). The correlation was driven by discussions of opioid use, even after controlling for geographic variation in Twitter usage.


Mentions of MUPO on Twitter correlate strongly with state-by-state NSDUH estimates of MUPO. We have also demonstrated that a natural language processing can be used to analyze social media to provide insights for syndromic toxicosurveillance.


Social media Epidemiology Misuse Opioids Natural language processing Computational linguistics 


Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflicts of interest.

Sources of Funding


Supplementary material

13181_2017_625_MOESM1_ESM.docx (810 kb)
ESM 1 (DOCX 809 kb)


  1. 1.
    Abuse S. Results from the 2010 National Survey on Drug Use and Health: Summary Of National Findings 2011.Google Scholar
  2. 2.
    Manchikanti L, Singh A. Therapeutic opioids: a ten-year perspective on the complexities and complications of the escalating use, abuse, and nonmedical use of opioids. Pain physician. 2008;11(2 Suppl):S63–88.PubMedGoogle Scholar
  3. 3.
    Hansen RN, Oster G, Edelsberg J, Woody GE, Sullivan SD. Economic costs of nonmedical use of prescription opioids. Clin J Pain. 2011;27(3):194–202.CrossRefPubMedGoogle Scholar
  4. 4.
    Florence CS, Zhou C, Luo F, Xu L. The economic burden of prescription opioid overdose, abuse, and dependence in the United States, 2013. Med Care. 2016;54(10):901–6.CrossRefPubMedGoogle Scholar
  5. 5.
    Chew C, Eysenbach G. Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLoS One. 2010;5(11):e14118.CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PLoS One. 2011;6(5):e19467.CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Lenhart A, Purcell K, Smith A, Zickuhr K. Social media & mobile Internet use among teens and young adults. Millennials. Pew Internet & American life project. 2010.Google Scholar
  8. 8.
    Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res. 2009;11(1):e11.CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Nascimento TD, DosSantos MF, Danciu T, DeBoer M, van Holsbeeck H, Lucas SR, et al. Real-time sharing and expression of migraine headache suffering on Twitter: a cross-sectional infodemiology study. J Med Internet Res. 2014;16(4):e96.CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Dredze M. How social media will change public health. IEEE Intell Syst. 2012;27(4):81–4.CrossRefGoogle Scholar
  11. 11.
    Cavazos-Rehg P, Krauss M, Grucza R, Bierut L. Characterizing the followers and tweets of a marijuana-focused Twitter handle. J Med Internet Res. 2014;16(6):e157.CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Hanson CL, Burton SH, Giraud-Carrier C, West JH, Barnes MD, Hansen B. Tweaking and tweeting: exploring Twitter for nonmedical use of a psychostimulant drug (Adderall) among college students. J Med Internet Res. 2013;15(4):e62.CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Chary M, Genes N, Manini AF. Using Twitter to measure underage alcohol usage. Clinical Toxicology. 2014;52(4):304. 52 VANDERBILT AVE, NEW YORK, NY 10017 USA: INFORMA HEALTHCAREGoogle Scholar
  14. 14.
    Halpern JH, Pope HG Jr. Hallucinogens on the Internet: a vast new source of underground drug information. Am J Psychiatr. 2001;158(3):481–3.CrossRefPubMedGoogle Scholar
  15. 15.
    Jimenez-Feltström A, inventor; Telefonaktiebolaget LM Ericsson (Publ), assignee. Text language detection. United States patent US 7,035,801. 2006.Google Scholar
  16. 16.
    Bird S. NLTK: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions 2006 Jul 17 (pp. 69-72). Assoc Comput Linguist.Google Scholar
  17. 17.
    Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008. 1997.Google Scholar
  18. 18.
    Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.CrossRefGoogle Scholar
  19. 19.
    Hartigan JA, Wong MA. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics). 1979;28(1):100–8.Google Scholar
  20. 20.
    Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Mat. 1987;20:53–65.CrossRefGoogle Scholar
  21. 21.
    Burton SH, Tanner KW, Giraud-Carrier CG, West JH, Barnes MD. “Right time, right place” health communication on Twitter: value and accuracy of location information. J Med Int Res. 2012;14(6):e156.Google Scholar
  22. 22.
    Graham M, Hale SA, Gaffney D. Where in the world are you? Geolocation and language identification in Twitter. Prof Geogr. 2014;66(4):568–78.CrossRefGoogle Scholar
  23. 23.
    Dredze M, Paul MJ, Bergsma S, Tran H. Carmen: a twitter geolocation system with applications to public health. In AAAI workshop on expanding the boundaries of health informatics using AI (HIAI) 2013 Jun 29 (pp. 20–24).Google Scholar
  24. 24.
    Van Rossum G, Drake Jr FL. Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica; 1995.Google Scholar
  25. 25.
    Jolliffe I. Principal component analysis. Wiley, Ltd; 2002.Google Scholar
  26. 26.
    Chary M, Park EH, McKenzie A, Sun J, Manini AF, Genes N. Signs & symptoms of dextromethorphan exposure from YouTube. PLoS One. 2014;9(2):e82452.CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Caspi A, Gorsky P. Online deception: prevalence, motivation, and emotion. CyberPsychol & Behav. 2006;9(1):54–9.CrossRefGoogle Scholar
  28. 28.
    Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychol Sci. 2015;26(2):159–69.CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Nambisan P, Luo Z, Kapoor A, Patrick TB, Cisler RA. Social media, big data, and public health informatics: ruminating behavior of depression revealed through twitter. In System Sciences (HICSS), 2015 48th Hawaii International Conference on 2015 Jan 5 (pp. 2906-2913). IEEE.Google Scholar
  30. 30.
    Mowery D, Smith HA, Cheney T, Bryan C, Conway M. Identifying depression-related tweets from twitter for public health monitoring. On J Public Health Inform. 2016;24:8(1).Google Scholar

Copyright information

© American College of Medical Toxicology 2017

Authors and Affiliations

  1. 1.Department of Emergency MedicineNewYork-Presbyterian/QueensQueensUSA
  2. 2.Department of Emergency MedicineMount Sinai HospitalNew YorkUSA
  3. 3.Department of Computer ScienceBrigham Young UniversityProvoUSA
  4. 4.Department of Health ScienceBrigham Young UniversityProvoUSA
  5. 5.Department of Emergency MedicineRutgers New Jersey Medical SchoolNewarkUSA
  6. 6.Division of Medical ToxicologyThe Icahn School of MedicineNew YorkUSA
  7. 7.Department of Emergency MedicineElmhurst Hospital CenterQueensUSA

Personalised recommendations