AIDS and Behavior

, Volume 22, Issue 7, pp 2322–2333 | Cite as

An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years

  • Man-pui Sally ChanEmail author
  • Sophie Lohmann
  • Alex Morales
  • Chengxiang Zhai
  • Lyle Ungar
  • David R. Holtgrave
  • Dolores Albarracín
Original Paper


The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.


HIV Chlamydia Gonorrhea Social media Big data 



This work was funded by a National Institutes of Health grant. We are grateful to Travis Sanchez, Patrick S. Sullivan, and Yisi Liu for their help in data collection.


This study was funded by the National Institutes of Health (Grant Number R56 AI114501 to D. A.).

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

10461_2018_2046_MOESM1_ESM.docx (79 kb)
Supplementary material 1 (DOCX 78 kb)


  1. 1.
    Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2016. Atlanta, GA; 2017Google Scholar
  2. 2.
    Centers for Disease Control and Prevention. NCHHSTP AtlasPlus. Accessed 25 May 2017.
  3. 3.
    Owusu-Edusei K, Chesson HW, Gift TL, et al. The estimated direct medical cost of selected sexually transmitted infections in the United States, 2008. Sex Transm Dis. 2013;40(3):197–201. Scholar
  4. 4.
    Himmelstein DU, Woolhandler S. Public health’s falling share of US health spending. Am J Public Health. 2016;106(1):56–7. Scholar
  5. 5.
    Centers for Disease Control and Prevention. Overview of the CDC FY 2018 budget request. 2017.
  6. 6.
    Garcia-Calleja JM, Jacobson J, Garg R, et al. Has the quality of serosurveillance in low- and middle-income countries improved since the last HIV estimates round in 2007? Status and trends through 2009. Sex Transm Infect. 2010;86(Suppl 2):ii35–ii42.
  7. 7.
    Davis SL, Goedel WC, Emerson J, Guven BS. Punitive laws, key population size estimates, and Global AIDS response progress reports: an ecological study of 154 countries. J Int AIDS Soc. 2017;20(1):21386. Scholar
  8. 8.
    Sun CJ, Reboussin B, Mann L, Garcia M, Rhodes SD. The HIV risk profiles of Latino sexual minorities and transgender women who use websites and mobile apps designed for social and sexual networking. Heal Educ Behav. 2016;43(1):86–93. Scholar
  9. 9.
    Ayers JW, Althouse BM, Dredze M, Leas EC, Noar SM. News and internet searches about human immunodeficiency virus after Charlie Sheen’s disclosure. JAMA Intern Med. 2016;176(4):552. Scholar
  10. 10.
    Aicken CR, Estcourt CS, Johnson AM, Sonnenberg P, Wellings K, Mercer CH. Use of the internet for sexual health among sexually experienced persons aged 16 to 44 years: evidence from a nationally representative survey of the British population. J Med Internet Res. 2016;18(1):e14. Scholar
  11. 11.
    Young SD, Nianogo RA, Chiu CJ, Menacho L, Galea J. Substance use and sexual risk behaviors among Peruvian MSM social media users. AIDS Care. 2016;28(1):112–8. Scholar
  12. 12.
    Saberi P, Johnson MO. Correlation of Internet use for health care engagement purposes and HIV clinical outcomes among HIV-positive individuals using online social media. J Health Commun. 2015;20(9):1026–32. Scholar
  13. 13.
    Leite L, Buresh M, Rios N, Conley A, Flys T, Page KR. Cell phone utilization among foreign-born Latinos: a promising tool for dissemination of health and HIV information. J Immigr Minor Heal. 2014;16(4):661–9. Scholar
  14. 14.
    Blackstock OJ, Cunningham CO, Haughton LJ, Garner RY, Norwood C, Horvath KJ. Higher eHealth literacy is associated with HIV risk behaviors among HIV-infected women who use the internet. J Assoc Nurses AIDS Care. 2016;27(1):102–8. Scholar
  15. 15.
    Pennise M, Inscho R, Herpin K, et al. Using smartphone apps in STD interviews to find sexual partners. Public Health Rep. 2015;130(3):245–52. Scholar
  16. 16.
    Lenhart A, Purcell K, Smith A, Zickuhr K. Social media & mobile internet use among teens and young adults. Pew Research Center. Published 2010.
  17. 17.
    Pew Research Center. Social media fact sheet.
  18. 18.
    Benotsch EG, Kalichman S, Cage M. Men who have met sex partners via the Internet: prevalence, predictors, and implications for HIV prevention. Arch Sex Behav. 2002;31(2):177–83. Scholar
  19. 19.
    Harfenist E, Cohen A. How opioid addicts are using social media to get clean. The Week. April 30, 2017.Google Scholar
  20. 20.
    Saito S, Howard AA, Chege D, et al. Monitoring quality at scale. AIDS. 2015;29:S129–36. Scholar
  21. 21.
    Bushman FD, Barton S, Bailey A, et al. Bringing it all together. AIDS. 2013;27(5):835–8. Scholar
  22. 22.
    Young SD, Rivers C, Lewis B. Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. Prev Med An Int J Devoted to Pract Theory. 2014;63:112–5. Scholar
  23. 23.
    Khoury MJ, Ioannidis JPA. Medicine. Big data meets public health. Science. 2014;346(6213):1054–5. Scholar
  24. 24.
    TechAmerica Foundation. Demystifying Big Data: A Practical Guide to Transforming the Business of Government. 2012.
  25. 25.
    Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS ONE. 2011;6(5):e19467. Scholar
  26. 26.
    Aslam AA, Tsou M-H, Spitzberg BH, et al. The reliability of tweets as a supplementary method of seasonal influenza surveillance. J Med Internet Res. 2014;16(11):e250. Scholar
  27. 27.
    Santos JC, Matos S. Analysing Twitter and web queries for flu trend prediction. Theor Biol Med Model. 2014;11(Suppl 1):S6. Scholar
  28. 28.
    Young SD. Behavioral insights on big data: using social media for predicting biomedical outcomes. Trends Microbiol. 2014;22(11):601–2. Scholar
  29. 29.
    Ireland ME, Chen Q, Schwartz HA, Ungar LH, Albarracín D. Action tweets linked to reduced county-level HIV prevalence in the United States: online messages and structural determinants. AIDS Behav. 2016;20(6):1256–64. Scholar
  30. 30.
    Ireland ME, Schwartz HA, Chen Q, Ungar LH, Albarracín D. Future-oriented tweets predict lower county-level HIV prevalence in the United States. Heal Psychol. 2015;34(Suppl):1252–60. Scholar
  31. 31.
    Eichstaedt JC, Schwartz HA, Kern ML, et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychol Sci. 2015;26(2):159–69. Scholar
  32. 32.
    Twitter. Twitter usage. Company Facts. Published 2016.
  33. 33.
    Twitter. Getting started with Twitter. The Basics. Published 2016. Accessed 18 April 2016.
  34. 34.
    Statista. Social media: daily usage in selected countries as 4th quarter 2015 (fee-based). Social Media & User-Generated Content. Published 2015. Accessed April 18, 2016.
  35. 35.
    Lenhart A, Smith A, Anderson M, Duggan M, Perrin A. Teens, technology and friendships. Pew Research Center. Published 2015. Accessed 21 March 2016.
  36. 36.
    Greenwood S, Perrin A, Duggan M. Social Media Update 2016. 2016.
  37. 37.
    Schwartz HA, Eichstaedt JC, Kern ML, et al. Characterizing geographic variation in well-being using tweets. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM). Boston, MA 2013.Google Scholar
  38. 38.
    Schwartz HA, Giorgi S, Sap M, Crutchley P, Eichstaedt JC, Ungar LH. DLATK: Differential language analysis ToolKit. In: Proceedings of the 2017 EMNLP system demonstrations. 2017:55–60.Google Scholar
  39. 39.
    Pennebaker JW, Mehl MR, Niederhoffer KG. Psychological aspects of natural language use: our words, our selves. Annu Rev Psychol. 2003;54:547–77. Scholar
  40. 40.
    Lazer D, Kennedy R, King G, Vespignani A. The parable of google flu: traps in big data analysis. Science (80-). 2014;343(6176):1203–1205.
  41. 41.
    Gouws S, Metzler D, Cai C, Hovy E, Rey M. Contextual bearing on linguistic variation in social media. In: Proceedings of the workshop on languages in social media. 2011, pp. 20–29Google Scholar
  42. 42.
    Schwartz HA, Eichstaedt JC, Kern ML, et al. Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE. 2013;8(9):e73791. Scholar
  43. 43.
    Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2012;3(4–5):993–1022. Scholar
  44. 44.
    Park G, Schwartz HA, Eichstaedt JC, et al. Automatic personality assessment through social media language. J Pers Soc Psychol. 2015;108(6):934–52. Scholar
  45. 45.
    Kosinski M, Stillwell D, Graepel T. Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci. 2013;110(15):5802–5. Scholar
  46. 46.
    Karon BP. The clinical interpretation of the Thematic Apperception Test, Rorschach, and other clinical data: a reexamination of statistical versus clinical prediction. Prof Psychol Res Pract. 2000;31(2):230–3.CrossRefGoogle Scholar
  47. 47.
    Iacobelli F, Gill AJ, Nowson S, Oberlander J. Large scale personality classification of bloggers. In: Proceedings of the 4th international conference on affective computing and intelligent interaction. New York, NY: Springer 2011:568–577.
  48. 48.
    Centers for Disease Control and Prevention. National center for health: health indicators warehouse. Accessed February 28, 2016.
  49. 49.
    Emory University. Rollins school of public health. AIDSVu. 2016.
  50. 50.
    Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42. Scholar
  51. 51.
    Howell DC. Statistical methods for psychology. 6th ed. Belmont, CA: Thomson Wadsworth; 2007.Google Scholar
  52. 52.
    Cohen J. A power primer. Psychol Bull. 1992;112(1):155–9. Scholar
  53. 53.
    Adrover C, Bodnar T, Huang Z, Telenti A, Salathé M. Identifying adverse effects of HIV drug treatment and associated sentiments using Twitter. JMIR Public Heal Surveill. 2015;1(2):e7. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignChampaignUSA
  3. 3.Department of Computer and Information ScienceUniversity of PennsylvaniaPhiladelphiaUSA
  4. 4.School of Public HealthJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations