Clustering Short-Text Using Non-negative Matrix Factorization of Hadamard Product of Similarities

Verma, Krutika; Jadon, Mukesh K.; Pujari, Arun K.

doi:10.1007/978-3-642-45068-6_13

Krutika Verma²⁰,
Mukesh K. Jadon²¹ &
Arun K. Pujari²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8281))

Included in the following conference series:

Asia Information Retrieval Symposium

1582 Accesses
2 Citations

Abstract

Short-texts mining has become an important area of research in IR and data mining. Ncut-term weighting is recently proposed for clustering of short-texts using non-negative matrix factorization. Non-negative factorization can be employed for such term weighting when the similarity measure is the inner product of term-document matrix. We propose a new weighting scheme and devise a new clustering algorithm using Hadamard product of similarity matrices. We demonstrate that our technique yields much better clustering in comparison to ncut weighting scheme. We use three measures for evaluating clustering qualities, namely purity, normalized mutual information and adjusted Rand index. We use standard benchmark datasets and also compare the performance of our algorithm with well-known document clustering technique of Ng-Jordan-Weiss. Experimental results suggest that the weighting process by Hadamard product gives better clustering of document of short-texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adamic, L., Glance, N.: The political blogosphere and the 2004 u.s. election: Divided they blog. In: LinkKDD 2005: Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43 (2005)
Google Scholar
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM, New York (2007)
Google Scholar
Buckley, C., Singhal, A., Mitra, M.: New retrieval approaches using SMART. In: Proc. of the 4th Text Retrieval conference (TREC-4), Gaithersburg (1996)
Google Scholar
Jin, R., Falusos, C., Hauptmann, A.G.: Meta-scoring: automatically evaluating term weighting schemes in IR without precision-recall. In: Proc. of the 24th ACM International Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 83–89 (2001)
Google Scholar
Kim, Y.-D., Choi, S.: Weighted non negative matrix factorization. In: ICASSP (2009)
Google Scholar
Kim, H., Park, H.: Sparse non-negative matrix factorization via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23, 1495–1502 (2007)
Article Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative factorization. In: Advances in Neural Information Processing Systems, vol. 13, pp. 556–562 (2001)
Google Scholar
Lin, F., Cohen, W.: Power iteration clustering. In: 27th International Conference on Machine Learning (ICML), Haifa, Israel (2010)
Google Scholar
Makagonov, P., Alexandrov, M., Gelbukh, A.: Clustering abstracts instead of full texts. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 129–135. Springer, Heidelberg (2004)
Chapter Google Scholar
Manning, C., Raghavan, P., Schutze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances of Neural Information Processing Systems, vol. 14 (2001)
Google Scholar
Pantel, P., Lin, D.: Document clustering with committees. In: Proc. of the 25th ACM International Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 199–206 (2002)
Google Scholar
Pinto, A.: On Clustering and Evaluation of Narrow Domain Short-Text Corpora. PhD thesis, Universidad Politécnica de Valencia, Spain (2008)
Google Scholar
Rawat, S., Gulati, V.P., Pujari, A.K.: Frequency and ordering based similarity measure for host-based intrusion detection. Info. Mngt. Computer Security 12(5), 411–421 (2004)
Article Google Scholar
Sharma, A., Pujari, A.K., Paliwal, K.K.: Intrusion detection using text processing techniques with a kernel based similarity measure. Computer & Security 26(7-8), 488–495 (2007)
Article Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. PAMI 22(8), 888–905 (2000)
Article Google Scholar
Yan, X., Guo, J.: Clustering Short Text Using Ncut-weighted Non-negative Matrix Factorization. In: CIKM 2012, Mami, HI, USA, pp. 2259–2262 (2012)
Google Scholar
Yan, X., Guo, J.: Learning Topics in short text Using Ncut-weighted non-negative matrix Factorization on term correlation matrix, http://xiaohuiyan.com/papers/TNMF-SDM-13.pdf
Yu, S., Shi, J.: Multiclass spectral clustering. In: Proceedings of Ninth IEEE International Conference on Computer Vision, pp. 313–319. IEEE (2003)
Google Scholar
http://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits
http://archive.ics.uci.edu/ml/datasets/Iris
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology, Sambalpur University, Sambalpur, Odisha, India
Krutika Verma
Department of CSE, The LNMIIT, Jaipur, Rajasthan, India
Mukesh K. Jadon
School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Andhra Pradesh, India
Arun K. Pujari

Authors

Krutika Verma
View author publications
You can also search for this author in PubMed Google Scholar
Mukesh K. Jadon
View author publications
You can also search for this author in PubMed Google Scholar
Arun K. Pujari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Infocomm Research, Human Language Technology, 1 Fusionopolis Way #21-01, Connexis South, 138632, Singapore
Rafael E. Banchs , Min Zhang & Sheng Gao , &
Yahoo Labs, Avinguda Diagonal 177, 08018, Barcelona, Spain
Fabrizio Silvestri
Microsoft Research Asia, No. 5, Danling Street, Haidian District, 100080, Beijing, China
Tie-Yan Liu
Institute for Infocomm Research, Human Language Technology, 1 Fusionopolis Way #21-01, Connexis South,, 138632, Singapore
Jun Lang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verma, K., Jadon, M.K., Pujari, A.K. (2013). Clustering Short-Text Using Non-negative Matrix Factorization of Hadamard Product of Similarities. In: Banchs, R.E., Silvestri, F., Liu, TY., Zhang, M., Gao, S., Lang, J. (eds) Information Retrieval Technology. AIRS 2013. Lecture Notes in Computer Science, vol 8281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45068-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-45068-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45067-9
Online ISBN: 978-3-642-45068-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics