Sentiment Knowledge Discovery in Twitter Streaming Data

  • Albert Bifet
  • Eibe Frank
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6332)


Micro-blogs are a challenging new source of information for data mining techniques. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. We briefly discuss the challenges that Twitter data streams pose, focusing on classification problems, and then consider these streams for opinion mining and sentiment analysis. To deal with streaming unbalanced classes, we propose a sliding window Kappa statistic for evaluation in time-changing data streams. Using this statistic we perform a study on Twitter data using learning algorithms for data streams.


Data Stream Application Program Interface Opinion Mining Sentiment Analysis Stochastic Gradient Descent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Twitter API: (2010),
  2. 2.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis Journal of Machine Learning Research, JMLR (2010),
  3. 3.
    Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast perceptron decision tree learning from evolving data streams. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 299–310 (2010)Google Scholar
  4. 4.
    Carvalho, P., Sarmento, L., Silva, M.J., de Oliveira, E.: Clues for detecting irony in user-generated contents: oh..!! it’s ”so easy”;-). In: Proceeding of the 1st International CIKM Workshop on Topic-sentiment Analysis for Mass Opinion, pp. 53–56 (2009)Google Scholar
  5. 5.
    Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.P.: Measuring User Influence in Twitter: The Million Follower Fallacy. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, pp. 10–17 (2010)Google Scholar
  6. 6.
    Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
  7. 7.
    De Choudhury, M., Lin, Y.-R., Sundaram, H., Candan, K.S., Xie, L., Kelliher, A.: How does the data sampling strategy impact the discovery of information diffusion in social media. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, pp. 34–41 (2010)Google Scholar
  8. 8.
    Derenyi, I., Palla, G., Vicsek, T.: Clique percolation in random networks. Physical Review Letters 94(16) (2005)Google Scholar
  9. 9.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)Google Scholar
  10. 10.
    Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338 (2009)Google Scholar
  11. 11.
    Go, A., Bhayani, R., Raghunathan, K., Huangi, L.: (2009),
  12. 12.
    Go, A., Huang, L., Bhayani, R.: Twitter sentiment classification using distant supervision. In: CS224N Project Report, Stanford (2009)Google Scholar
  13. 13.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  14. 14.
    Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Micro-blogging as online word of mouth branding. In: Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems, pp. 3859–3864 (2009)Google Scholar
  15. 15.
    Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 56–65 (2007)Google Scholar
  16. 16.
    Kalucki, J.: Twitter streaming API (2010),
  17. 17.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Liu, B.: Web data mining; Exploring hyperlinks, contents, and usage data. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  19. 19.
    O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: Linking text sentiment to public opinion time series. In: Proceedings of the International AAAI Conference on Weblogs and Social Media, pp. 122–129 (2010)Google Scholar
  20. 20.
    Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation, pp. 1320–1326 (2010)Google Scholar
  21. 21.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)CrossRefGoogle Scholar
  22. 22.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)Google Scholar
  23. 23.
    Petrovic, S., Osborne, M., Lavrenko, V.: The Edinburgh Twitter corpus. In: #SocialMedia Workshop: Computational Linguistics in a World of Social Media, pp. 25–26 (2010)Google Scholar
  24. 24.
    Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43–48 (2005)Google Scholar
  25. 25.
    Romero, D.M., Kleinberg, J.: The directed closure process in hybrid social-information networks, with an analysis of link formation on Twitter. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, pp. 138–145 (2010)Google Scholar
  26. 26.
    Schonfeld, E.: Mining the thought stream. TechCrunch Weblog Article (2009),
  27. 27.
    Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In: Proceedings of the 24th International Conference on Machine learning, pp. 807–814 (2007)Google Scholar
  28. 28.
    Yarow, J.: Twitter finally reveals all its secret stats. BusinessInsider Weblog Article (2010),

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Albert Bifet
    • 1
  • Eibe Frank
    • 1
  1. 1.University of WaikatoHamiltonNew Zealand

Personalised recommendations