A Classification Model to Analyze the Spread and Emerging Trends of the Zika Virus in Twitter

  • B. K. TripathyEmail author
  • Saurabh Thakur
  • Rahul Chowdhury
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 556)


The Zika disease is a 2015–16 virus epidemic and continues to be a global health issue. The recent trend in sharing critical information on social networks such as Twitter has been a motivation for us to propose a classification model that classifies tweets related to Zika and thus enables us to extract helpful insights into the community. In this paper, we try to explain the process of data collection from Twitter, the preprocessing of the data, building a model to fit the data, comparing the accuracy of support vector machines and Naïve Bayes algorithm for text classification and state the reason for the superiority of support vector machine over Naïve Bayes algorithm. Useful analytical tools such as word clouds are also presented in this research work to provide a more sophisticated method to retrieve community support from social networks such as Twitter.


Zika Twitter analysis Twitter classification Support vector machines Naïve Bayes algorithm 


  1. 1.
    Cristianini, Nello, and John Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, 2000.Google Scholar
  2. 2.
    El Kourdi, Mohamed, Amine Bensaid, and Tajje-eddine Rachidi. “Automatic Arabic document categorization based on the Naïve Bayes algorithm.” Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages. Association for Computational Linguistics, 2004.Google Scholar
  3. 3.
    Hassan, Sundus, Muhammad Rafi, and Muhammad Shahid Shaikh. “Comparing SVM and naive bayes classifiers for text categorization with Wikitology as knowledge enrichment.” Multitopic Conference (INMIC), 2011 IEEE 14th International. IEEE, 2011.Google Scholar
  4. 4.
    Joachims, Thorsten. “Text categorization with support vector machines: Learning with many relevant features.” European conference on machine learning. Springer Berlin Heidelberg, 1998.Google Scholar
  5. 5.
    Khan, Aamera ZH, Mohammad Atique, and V. M. Thakare. “Combining lexicon-based and learning-based methods for Twitter sentiment analysis.” International Journal of Electronics, Communication and Soft Computing Science & Engineering (IJECSCSE) (2015): 89.Google Scholar
  6. 6.
    Lerman, Kristina, and Rumi Ghosh. “Information contagion: An empirical study of the spread of news on Digg and Twitter social networks.” ICWSM 10 (2010): 90–97.Google Scholar
  7. 7.
    McCallum, Andrew, and Kamal Nigam. “A comparison of event models for naive bayes text classification.” AAAI-98 workshop on learning for text categorization. Vol. 752. 1998.Google Scholar
  8. 8.
    Pak, Alexander, and Patrick Paroubek. “Twitter as a Corpus for Sentiment Analysis and Opinion Mining.” LREc. Vol. 10. 2010.Google Scholar
  9. 9.
    Sakaki, Takeshi, Makoto Okazaki, and Yutaka Matsuo. “Earthquake shakes Twitter users: real-time event detection by social sensors.” Proceedings of the 19th international conference on World wide web. ACM, 2010.Google Scholar
  10. 10.
    Sebastiani, Fabrizio. “Machine learning in automated text categorization.” ACM computing surveys (CSUR) 34.1 (2002): 1–47.Google Scholar
  11. 11.
    Shmilovici, Armin. “Support vector machines.” Data Mining and Knowledge Discovery Handbook. Springer US, 2005. 257–276.Google Scholar
  12. 12.
    Tong, Simon, and Daphne Koller. “Support vector machine active learning with applications to text classification.” Journal of machine learning research 2. Nov (2001): 45–66.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • B. K. Tripathy
    • 1
    Email author
  • Saurabh Thakur
    • 1
  • Rahul Chowdhury
    • 1
  1. 1.School of Computing Science and EngineeringVIT UniversityVelloreIndia

Personalised recommendations