International Conference on Knowledge Science, Engineering and Management

Knowledge Science, Engineering and Management pp 360-369 | Cite as

Semi-supervised Microblog Clustering Method via Dual Constraints

  • Huifang Ma
  • Meihuizi Jia
  • Weizhong Zhao
  • Xianghong Lin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9403)

Abstract

In this paper, we present a semi-supervised clustering method for microblog in which both word-level and microblog (document)-level constraints are automatically generated totally based on statistical information rather than any kind of external knowledge. The key idea is first to explore term correlation data, which investigates both inter and intra correlation of words, and the initial similarity between words can therefore be deduced. And then an iterative method is established to calculate both word similarity and microblog similarity. The mechanism of incorporating dual constraints is presented based on word similarity and microblog similarity. We then formulate short text clustering problem as a non-negative matrix factorization based on dual constraints. Empirical study of two real-world dataset shows the superior performance of our framework in handling noisy and microblogs.

Keywords

Semi-supervised clustering Microblogs Term correlation matrix Dual constraints Nonnegative matrix factorization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using Wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM (2007)Google Scholar
  2. 2.
    Carter, S., Tsagkias, M., Weerkamp, W.: Semi-supervised priors for microblog language identification. In: Dutch-Belgian Information Retrieval Workshop (2011)Google Scholar
  3. 3.
    Cheng, X., Miao, D.X., Wang, C., et al.: Coupled term-term relation analysis for document clustering. In: Proceedings of International Joint Conference on Neural Networks Neural Networks, pp. 1–8. IEEE (2013)Google Scholar
  4. 4.
    Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)Google Scholar
  5. 5.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001)Google Scholar
  6. 6.
    Liu, W.X., Quan, X.J., Feng, M., et al.: A short text modeling method combining semantic and statistical information. Information Sciences 180(20), 4031–4041 (2010)CrossRefGoogle Scholar
  7. 7.
    Ma, H., Jia, M., Shi, Y., Hao, Z.: Semi-supervised nonnegative matrix factorization for microblog clustering based on term correlation. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds.) APWeb 2014. LNCS, vol. 8709, pp. 511–516. Springer, Heidelberg (2014)Google Scholar
  8. 8.
    Ma, H.F., Zhao, W.Z., Shi, Z.Z.: A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints. Knowledge and Information Systems 36(3), 629–651 (2013)CrossRefGoogle Scholar
  9. 9.
    Ma, H., Zhao, W., Tan, Q., Shi, Z.: Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 189–200. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar
  11. 11.
    Wagstaff, K., Cardie, C., Rogers, S., et al.: Constrained k-means clustering with background knowledge. In: Proceedings of Eighteenth International Conference on Machine Learning. ICML, pp. 577–584 (2001)Google Scholar
  12. 12.
    Yan, X.H., Guo, J.F., Liu, S.H., et al.: Clustering short text using Ncut-weighted non-negative matrix factorization. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2259–2262. ACM (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Huifang Ma
    • 1
    • 2
  • Meihuizi Jia
    • 1
  • Weizhong Zhao
    • 3
  • Xianghong Lin
    • 1
  1. 1.College of Computer Science and EngineeringNorthwest Normal UniversityLanzhou GansuChina
  2. 2.Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  3. 3.College of Information EngineeringXiangtan UniversityXiangtan HunanChina

Personalised recommendations