Using time-sensitive interactions to improve topic derivation in twitter

Abstract

Twitter has become one of the most popular social media platforms, widely used for discussion and information dissemination on all kinds of topics. As a result, both business and academics have researched methods to identify the topics being discussed on Twitter. Those methods can be employed for a number of applications, including emergency management, advertisements, and corporate/government communication. However, deriving topics from this short text based and highly dynamic environment remains a huge challenge. Most current methods use the content of tweets as the only source for topic derivation. Recently, tweet interactions have been considered for improving the quality of topic derivation. In this paper, we propose a method that considers both content and interactions with a temporal aspect to further improve the quality of topic derivation. The impact of the temporal aspect in user/tweet interactions is analyzed based on several Twitter datasets. The proposed method incorporates time when it clusters tweets and identifies representative terms for each topic. Experimental results show that the inclusion of the temporal aspect in the interactions results in a significant improvement in the quality of topic derivation comparing to existing baseline methods.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Notes

  1. 1.

    https://about.twitter.com/company, accessed 9 February 2016

  2. 2.

    Twitter FAQs about following (https://support.twitter.com/articles/14019, accessed 4 February 2016)

  3. 3.

    What’s a Twitter timeline? (https://support.twitter.com/articles/164083, accessed 4 February 2016)

  4. 4.

    https://dev.twitter.com/overview/api/tweets, accessed 6 February 2016

  5. 5.

    JSON (JavaScript Object Notation), is a syntax for storing and exchanging data. It is an easier-to-use alternative to XML. (http://www.w3schools.com/json/default.asp, accessed 6 February 2016)

  6. 6.

    https://dev.twitter.com/streaming/overview

  7. 7.

    https://followerwonk.com/bio/?q_type=all&l=Australia, accessed January 11, 2015, ordered by number of followers

  8. 8.

    https://dev.twitter.com/rest/public, accessed 9 February 2016

  9. 9.

    http://www.sananalytics.com/ accessed January 20, 2014

  10. 10.

    http://www.nltk.org/

References

  1. 1.

    Albakour, M., Macdonald, C., Ounis, I., et al.: On sparsity and drift for effective real-time filtering in microblogs. In: Proceedings of the 22nd ACM International Conference on Information andamp; Knowledge Management (CIKM 2013), pp. 419–428 (2013)

  2. 2.

    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. 3.

    Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, pp. 4. ACM, Washington DC USA (2010)

  4. 4.

    Cha, Y., Bi, B., Hsieh, C.C., Cho, J.: Incorporating popularity in topic models for social network analysis. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp. 223-232. ACM, Dublin, Ireland (2013)

  5. 5.

    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons (2012)

  6. 6.

    de Moor, A.: Conversations in context: a twitter case for social media systems design. In: Proceedings of the 6th International Conference on Semantic Systems, p. 29. ACM, New York, NY, USA (2010)

  7. 7.

    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)

    Article  Google Scholar 

  8. 8.

    He, Z., Xie, S., Zdunek, R., Zhou, G., Cichocki, A.: Symmetric nonnegative matrix factorization: Algorithms and applications to probabilistic clustering. IEEE Trans. Neural Netw. 22(12), 2117–2131 (2011)

    Article  Google Scholar 

  9. 9.

    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 50–57. ACM, Berkeley, CA, USA (1999)

  10. 10.

    Hu, Y., John, A., Wang, F., Kambhampati, S.: Et-lda: Joint topic modeling for aligning events and their twitter feedback. In: AAAI Conference on Artificial Intelligence (AAAI 2012), vol. 12, pp. 59–65. Toronto, Ontario, Canada (2012)

  11. 11.

    Kietzmann, J.H., Hermkens, K., McCarthy, I.P., Silvestre, B.S.: Social media? get serious! understanding the functional building blocks of social media. Bus. Horiz. 54(3), 241–251 (2011)

    Article  Google Scholar 

  12. 12.

    Kim, J., Park, H.: Sparse nonnegative matrix factorization for clustering (2008)

  13. 13.

    Kuang, D., Park, H., Ding, C.: Symmetric nonnegative matrix factorization for graph clustering. In: SIAM International Conference on Data Mining (SDM), vol. 12, pp. 106–117. SIAM, Anaheim, California, USA (2012)

  14. 11.

    Kuczma, M.: An introduction to the theory of functional equations and inequalities: Cauchy’s equation and Jensen’s inequality Springer Science & Business Media (2009)

  15. 15.

    Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: #twitter trends detection topic model online. In: Proceedings of COLING 2012, pp. 1519–1534. The COLING 2012 Organizing Committee, Mumbai, India. (2012). http://www.aclweb.org/anthology/C12-1093

  16. 16.

    Lee, D., Seung, H.: Algorithms for non-negative matrix factorization. Denver, CO, USA (2000)

  17. 17.

    Lin, J., Efron, M., Wang, Y., Sherman, G.: Overview of the trec-2014 microblog track. Tech. rep., DTIC Document (2014)

  18. 18.

    Liu, C., Yang, H.C., Fan, J., He, L.W., Wang, Y.M.: Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 681–690. ACM, New York, NY, USA. (2010)

  19. 19.

    Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1 Cambridge (2008)

  20. 20.

    Nugroho, R., Molla-Aliod, D., Yang, J., Paris, C., Nepal, S.: Incorporating tweet relationships into topic derivation. In: Conference of the Pacific Association for Computational Linguistics (PACLING 2015), p. 2015. PACLING, Bali, Indonesia (2015)

  21. 21.

    Nugroho, R., Yang, J., Zhong, Y., Paris, C., Nepal, S.: Deriving topics in twitter by exploiting tweet interactions. In: Proceedings of the 4th IEEE International Congress on Big Data. IEEE Services Computing Community, New York, USA (2015)

  22. 22.

    Nugroho, R., Zhao, W., Yang, J., Paris, C., Nepal, S., Mei, Y.: Time-sensitive topic derivation in twitter. In: Web Information Systems Engineering – WISE 2015: 16th International Conference, Miami, FL, USA, November 1-3, 2015, Proceedings, Part I, pp. 138–152. Springer International Publishing, Cham (2015)

  23. 23.

    Nugroho, R., Zhong, Y., Yang, J., Paris, C., Nepal, S.: Matrix inter-joint factorization - a new approach for topic derivation in twitter. In: Proceedings of the 4th IEEE International Congress on Big Data. IEEE Services Computing, New York, USA (2015)

  24. 24.

    Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. The International AAAI Conference on Web and Social Media (ICWSM) 10, 130–137 (2010)

    Google Scholar 

  25. 25.

    Richard, J., Landis, G.G.K.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Saha, A., Sindhwani, V.: Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization. In: Proceedings of the fifth ACM international conference on Web search and data mining (WSDM 2012), pp. 693–702. ACM, Seattle, Washington (2012)

  27. 27.

    Salton, G.: Automatic Text Processing. Addison-Wesley, The Transformation, Analysis, and Retrieval of Information by Computer (1989)

  28. 28.

    Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42(2), 373–386 (2006)

    Article  MATH  Google Scholar 

  29. 29.

    Stilo, G., Velardi, P.: Time makes sense: Event discovery in twitter using temporal similarity. In: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-vol. 02, pp. 186–193. IEEE Computer Society, Warsaw, Poland (2014)

  30. 30.

    Takeuchi, K., Ishiguro, K., Kimura, A., Sawada, H.: Non-negative multiple matrix factorization. In: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pp. 1713–1720. AAAI Press (2013)

  31. 31.

    Von Seggern, D.H.: CRC Standard Curves and Surfaces with Mathematica CRC Press (2006)

  32. 32.

    Vosecky, J., Jiang, D., Leung, K.W.T., Xing, K., Ng, W.: Integrating social and auxiliary semantics for multifaceted topic modeling in twitter. ACM Trans. Internet Technol. (TOIT) 14(4), 27 (2014)

    Article  Google Scholar 

  33. 33.

    Wan, S., Paris, C.: Improving government services with social media feedback. In: Proceedings of the 19th International Conference on Intelligent User Interfaces, IUI ’14, pp. 27–36. ACM, New York, NY, USA. (2014)

  34. 34.

    Wang, F., Li, P., König, A.C.: Efficient document clustering via online nonnegative matrix factorizations. In: SIAM International Conference on Data Mining (SDM), vol. 11, pp. 908–919. SIAM, Arizona, USA (2011)

  35. 35.

    Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd international conference on World Wide Web (WWW 2013), pp. 1445-1456. International World Wide Web Conferences Steering Committee, Rio de Janeiro, Brazil (2013)

  36. 36.

    Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the SIAM International Conference on Data Mining (SIAM 2013). SDM, San Diego, California, USA (2013)

  37. 37.

    Yang, L., Sun, T., Zhang, M., Mei, Q.: We know what@ you# tag: Does the dual role affect hashtag adoption?.In: Proceedings of the 21st International Conference on World Wide Web (WWW 2012), pp. 261–270. ACM, Lyon, France (2012)

Download references

Acknowledgments

This work is partially supported by the Indonesian Directorate General of Higher Education (DGHE), Macquarie University, CSIRO Data61, Australian Research Council LP120200231, and Australian Research Council DP140101369.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Robertus Nugroho.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nugroho, R., Zhao, W., Yang, J. et al. Using time-sensitive interactions to improve topic derivation in twitter. World Wide Web 20, 61–87 (2017). https://doi.org/10.1007/s11280-016-0417-x

Download citation

Keywords

  • Topic derivation
  • Temporal aspect in twitter
  • Joint matrix factorization