Language Resources and Evaluation

, Volume 47, Issue 1, pp 217–238

Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages

Original Paper

DOI: 10.1007/s10579-012-9185-0

Cite this article as:
Culotta, A. Lang Resources & Evaluation (2013) 47: 217. doi:10.1007/s10579-012-9185-0


We analyze over 570 million Twitter messages from an eight month period and find that tracking a small number of keywords allows us to estimate influenza rates and alcohol sales volume with high accuracy. We validate our approach against government statistics and find strong correlations with influenza-like illnesses reported by the U.S. Centers for Disease Control and Prevention (r(14) = .964, p < .001) and with alcohol sales volume reported by the U.S. Census Bureau (r(5) = .932, p < .01). We analyze the robustness of this approach to spurious keyword matches, and we propose a document classification component to filter these misleading messages. We find that this document classifier can reduce error rates by over half in simulated false alarm experiments, though more research is needed to develop methods that are robust in cases of extremely high noise.


Social mediaRegressionClassification

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Department of Computer Science & Industrial TechnologySoutheastern Louisiana UniversityHammondUSA