Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2015: Machine Learning and Knowledge Discovery in Databases pp 401-416

Ageing-Based Multinomial Naive Bayes Classifiers Over Opinionated Data Streams

  • Sebastian Wagner
  • Max Zimmermann
  • Eirini Ntoutsi
  • Myra Spiliopoulou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)

Abstract

The long-term analysis of opinionated streams requires algorithms that predict the polarity of opinionated documents, while adapting to different forms of concept drift: the class distribution may change but also the vocabulary used by the document authors may change. One of the key properties of a stream classifier is adaptation to concept drifts and shifts; this is typically achieved through ageing of the data. Surprisingly, for one of the most popular classifiers, Multinomial Naive Bayes (MNB), no ageing has been considered thus far. MNB is particularly appropriate for opinionated streams, because it allows the seamless adjustment of word probabilities, as new words appear for the first time. However, to adapt properly to drift, MNB must also be extended to take the age of documents and words into account.

In this study, we incorporate ageing into the learning process of MNB, by introducing the notion of fading for words, on the basis of the recency of the documents containing them. We propose two fading versions, gradual fading and aggressive fading, of which the latter discards old data at a faster pace. Our experiments with Twitter data show that the ageing based MNBs outperform the standard accumulative MNB approach and manage to recover very fast in times of change. We experiment with different data granularities in the stream and different data ageing degrees and we show how they “work together” towards adaptation to change.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada (2004)Google Scholar
  2. 2.
    Bermingham, A., Smeaton, A.F.: Classifying sentiment in microblogs: Is brevity an advantage? In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1833–1836. ACM, New York (2010)Google Scholar
  3. 3.
    Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS, vol. 6332, pp. 1–15. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  4. 4.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 6th SIAM International Conference on Data Mining (SDM), Bethesda, MD (2006)Google Scholar
  5. 5.
    Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29(2–3), 103–130 (1997)CrossRefMATHGoogle Scholar
  6. 6.
    Gama, J.A., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)CrossRefGoogle Scholar
  7. 7.
    Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC (2010)Google Scholar
  8. 8.
    Gama, J., Kosina, P.: Recurrent concepts in data streams classification. Knowl. Inf. Syst. 40(3), 489–507 (2014)CrossRefGoogle Scholar
  9. 9.
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. In: Processing, pp. 1–6 (2009). http://www.stanford.edu/ alecmgo/papers/TwitterDistantSupervision09.pdf
  10. 10.
    Gokulakrishnan, B., Priyanthan, P., Ragavan, T., Prasath, N., Perera, A.S.: Opinion mining and sentiment analysis on a twitter data stream. In: Proceedings of 2012 International Conference on Advances in ICT for Emerging Regions (ICTer), ICTer 2012, pp. 182–188. IEEE (2012)Google Scholar
  11. 11.
    Guerra, P.C., Meira, Jr., W., Cardie, C.: Sentiment analysis on evolving social streams: how self-report imbalances can help. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, pp. 443–452. ACM, New York (2014)Google Scholar
  12. 12.
    Lazarescu, M.: A multi-resolution learning approach to tracking concept drift and recurrent concepts. In: Gamboa, H., Fred, A.L.N. (eds.) PRIS, p. 52. INSTICC Press (2005)Google Scholar
  13. 13.
    Liu, Y., Yu, X., An, A., Huang, X.: Riding the tide of sentiment change: Sentiment analysis with evolving online reviews. World Wide Web 16(4), 477–496 (2013)CrossRefGoogle Scholar
  14. 14.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, pp. 41–48. AAAI Press (1998)Google Scholar
  15. 15.
    Ntoutsi, E., Zimek, A., Palpanas, T., Krger, P., peter Kriegel, H.: Density-based projected clustering over high dimensional data streams. In: Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, pp. 987–998 (2012)Google Scholar
  16. 16.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  17. 17.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. EMNLP, ACL, Stroudsburg (2002)Google Scholar
  18. 18.
    Plaza, L., Carrillo de Albornoz, J.: Sentiment Analysis in Business Intelligence: A survey, pp. 231–252. IGI-Global (2011)Google Scholar
  19. 19.
    Sentiment140: Sentiment140 - a Twitter sentiment analysis tool. http://help.sentiment140.com/
  20. 20.
    Sinelnikova, A.: Sentiment analysis in the Twitter stream. Bachelor thesis, LMU, Munich (2012)Google Scholar
  21. 21.
    Sinelnikova, A., Ntoutsi, E., Kriegel, H.P.: Sentiment analysis in the twitter stream. In: 36th Annual Conf. of the German Classification Society (GfKl 2012), Hildesheim, Germany (2012)Google Scholar
  22. 22.
    Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
  23. 23.
    Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: The kappa statistic. Family Medicine 37(5), 360–363 (2005)Google Scholar
  24. 24.
    Zimmermann, M., Ntoutsi, E., Spiliopoulou, M.: Adaptive semi supervised opinion classifier with forgetting mechanism. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC 2014, pp. 805–812. ACM, New York (2014)Google Scholar
  25. 25.
    Zimmermann, M., Ntoutsi, E., Spiliopoulou, M.: Discovering and monitoring product features and the opinions on them with OPINSTREAM. Neurocomputing 150, 318–330 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Sebastian Wagner
    • 1
  • Max Zimmermann
    • 2
  • Eirini Ntoutsi
    • 1
  • Myra Spiliopoulou
    • 2
  1. 1.Ludwig-Maximilians University of Munich (LMU)MunichGermany
  2. 2.Otto-von-Guericke-University MagdeburgMagdeburgGermany

Personalised recommendations