Abstract
Clustering microblogs is very important in many web applications. In this paper, we propose a semi-supervised Nonnegative Matrix Factorization clustering method based on term correlation. The key idea is to explore term correlation data, which well captures the semantic information for term weighting. We then formulate microblog clustering problem as a non-negative matrix factorization using word-level constraints. Empirical study of real-world dataset shows the superior performance of our framework in handling noisy and short microblogs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, C.H.: Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Systems with Applications 39(10), 9623–9641 (2012)
Yan, X., Guo, J., Liu, S., et al.: Clustering short text using ncut-weighted non-negative matrix factorization. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2259–2262. ACM (2012)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2000)
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM (2007)
Cheng, X., Miao, D., Wang, C., et al.: Coupled term-term relation analysis for document clustering. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2013)
Ma, H., Zhao, W., Shi, Z.: A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints. Knowledge and Information Systems 36(3), 629–651 (2013)
Li, T., Ding, C., Zhang, Y., et al.: Knowledge transformation from word space to document space. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 187–194. ACM (2008)
Ma, H., Wang, B., Li, N.: A Novel Online Event Analysis Framework for Micro-blog Based on Incremental Topic Modeling. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 73–76. IEEE (2012)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge university press, Cambridge (2008)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ma, H., Jia, M., Shi, Y., Hao, Z. (2014). Semi-supervised Nonnegative Matrix Factorization for Microblog Clustering Based on Term Correlation. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds) Web Technologies and Applications. APWeb 2014. Lecture Notes in Computer Science, vol 8709. Springer, Cham. https://doi.org/10.1007/978-3-319-11116-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-11116-2_46
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11115-5
Online ISBN: 978-3-319-11116-2
eBook Packages: Computer ScienceComputer Science (R0)