Matching Similarity for Keyword-Based Clustering

Rezaei, Mohammad; Fränti, Pasi

doi:10.1007/978-3-662-44415-3_20

Mohammad Rezaei²⁰ &
Pasi Fränti²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8621))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

2534 Accesses
2 Citations

Abstract

Semantic clustering of objects such as documents, web sites and movies based on their keywords is a challenging problem. This requires a similarity measure between two sets of keywords. We present a new measure based on matching the words of two groups assuming that a similarity measure between two individual words is available. The proposed matching similarity measure avoids the problems of traditional measures including minimum, maximum and average similarities. We demonstrate that it provides better clustering than other measures in a location-based service application.

Download to read the full chapter text

Chapter PDF

A Semantic Comparison of Clustering Algorithms for the Evaluation of Web-Based Similarity Measures

Combining semantic and term frequency similarities for text clustering

Article 02 January 2019

A Web service search engine for large-scale Web service discovery based on the probabilistic topic modeling and clustering

Article 24 March 2018

Keywords

References

Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Mining Text Data, pp. 77–128. Springer US (2012)
Google Scholar
Ricca, F., Pianta, E., Tonella, P., Girardi, C.: Improving Web site understanding with keyword-based clustering. Journal of Software Maintenance and Evolution: Research and Practice 20(1), 1–29 (2008)
Article Google Scholar
Hasan, B., Korukoglu, S.: Analysis and Clustering of Movie Genres. Journal of Computing 3(10) (2011)
Google Scholar
Ricca, F., Tonella, P., Girardi, C., Pianta, E.: An empirical study on keyword-based web site clustering. In: Proceedings of the 12th IEEE International Workshop on Program Comprehension. IEEE (2004)
Google Scholar
Kang, S.S.: Keyword-based document clustering. In: Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages, vol. 11. Association for Computational Linguistics (2003)
Google Scholar
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1993)
Google Scholar
Ushioda, A., Kawasaki, J.: Hierarchical clustering of words and application to NLP tasks. In: Proceedings of the Fourth Workshop on Very Large Corpora (1996)
Google Scholar
Matsuo, Y., Sakaki, T., Uchiyama, K., Ishizuka, M.: Graph-based word clustering using a web search engine. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2006)
Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6 (2006)
Google Scholar
Cilibrasi, R.L., Vitanyi, P.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)
Article Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: A web search engine-based approach to measure semantic similarity between words. IEEE Transactions on Knowledge and Data Engineering 23(7), 977–990 (2011)
Article Google Scholar
Wu, L., et al.: Flickr distance: a relationship measure for visual concepts. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(5), 863–875 (2012)
Article Google Scholar
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32(1), 13–47 (2006)
Article MATH Google Scholar
Kaur, I., Hornof, A.J.: A comparison of LSA, WordNet and PMI-IR for predicting user click behavior. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM (2005)
Google Scholar
Gledson, A., Keane, J.: Using web-search results to measure word-group similarity. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1. Association for Computational Linguistics (2008)
Google Scholar
Zhao, Q., Rezaei, M., Chen, H., Franti, P.: Keyword clustering for automatic categorization. In: 2012 21st International Conference on Pattern Recognition (ICPR). IEEE (2012)
Google Scholar
Michael Pucher, F.T.W.: Performance Evaluation of WordNet-based Semantic Relatedness Measures for Word Prediction in Conversational Speech (2004)
Google Scholar
Markines, B., et al.: Evaluating similarity measures for emergent semantics of social tagging. In: Proceedings of the 18th International Conference on World Wide Web. ACM (2009)
Google Scholar
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Short text clustering by finding core terms. Knowledge and Information Systems 27(3), 345–365 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Eastern Finland, Finland
Mohammad Rezaei & Pasi Fränti

Authors

Mohammad Rezaei
View author publications
You can also search for this author in PubMed Google Scholar
Pasi Fränti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Eastern Finland, 80101, Joensuu, Finland
Pasi Fränti
School of Computer Science, The University of Manchester, Manchester, UK
Gavin Brown
Delft University of Technology, Delft, The Netherlands
Marco Loog
Universidad de Alicante, Spain
Francisco Escolano
Università Ca’ Foscari Venezia, Venezia Mestre, Italy
Marcello Pelillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rezaei, M., Fränti, P. (2014). Matching Similarity for Keyword-Based Clustering. In: Fränti, P., Brown, G., Loog, M., Escolano, F., Pelillo, M. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2014. Lecture Notes in Computer Science, vol 8621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44415-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-662-44415-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44414-6
Online ISBN: 978-3-662-44415-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Matching Similarity for Keyword-Based Clustering

Abstract

Chapter PDF

Similar content being viewed by others

A Semantic Comparison of Clustering Algorithms for the Evaluation of Web-Based Similarity Measures

Combining semantic and term frequency similarities for text clustering

A Web service search engine for large-scale Web service discovery based on the probabilistic topic modeling and clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Matching Similarity for Keyword-Based Clustering

Abstract

Chapter PDF

Similar content being viewed by others

A Semantic Comparison of Clustering Algorithms for the Evaluation of Web-Based Similarity Measures

Combining semantic and term frequency similarities for text clustering

A Web service search engine for large-scale Web service discovery based on the probabilistic topic modeling and clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation