Semi-supervised Nonnegative Matrix Factorization for Microblog Clustering Based on Term Correlation

  • Huifang Ma
  • Meihuizi Jia
  • YaKai Shi
  • Zhanjun Hao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8709)

Abstract

Clustering microblogs is very important in many web applications. In this paper, we propose a semi-supervised Nonnegative Matrix Factorization clustering method based on term correlation. The key idea is to explore term correlation data, which well captures the semantic information for term weighting. We then formulate microblog clustering problem as a non-negative matrix factorization using word-level constraints. Empirical study of real-world dataset shows the superior performance of our framework in handling noisy and short microblogs.

Keywords

Semi-supervised Clustering Microblogs Term correlation matrix Nonnegative Matrix Factorization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lee, C.H.: Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Systems with Applications 39(10), 9623–9641 (2012)CrossRefGoogle Scholar
  2. 2.
    Yan, X., Guo, J., Liu, S., et al.: Clustering short text using ncut-weighted non-negative matrix factorization. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2259–2262. ACM (2012)Google Scholar
  3. 3.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2000)Google Scholar
  4. 4.
    Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM (2007)Google Scholar
  5. 5.
    Cheng, X., Miao, D., Wang, C., et al.: Coupled term-term relation analysis for document clustering. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2013)Google Scholar
  6. 6.
    Ma, H., Zhao, W., Shi, Z.: A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints. Knowledge and Information Systems 36(3), 629–651 (2013)CrossRefGoogle Scholar
  7. 7.
    Li, T., Ding, C., Zhang, Y., et al.: Knowledge transformation from word space to document space. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 187–194. ACM (2008)Google Scholar
  8. 8.
    Ma, H., Wang, B., Li, N.: A Novel Online Event Analysis Framework for Micro-blog Based on Incremental Topic Modeling. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 73–76. IEEE (2012)Google Scholar
  9. 9.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge university press, Cambridge (2008)CrossRefMATHGoogle Scholar
  10. 10.
    Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Huifang Ma
    • 1
  • Meihuizi Jia
    • 1
  • YaKai Shi
    • 1
  • Zhanjun Hao
    • 1
  1. 1.College of Computer Science and EngineeringNorthwest Normal UniversityLanzhouChina

Personalised recommendations