MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction

Hope, David; Keller, Bill

doi:10.1007/978-3-642-37247-6_30

David Hope¹⁷ &
Bill Keller¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2320 Accesses
11 Citations

Abstract

This paper introduces a linear time graph-based soft clustering algorithm. The algorithm applies a simple idea: given a graph, vertex pairs are assigned to the same cluster if either vertex has maximal affinity to the other. Clusters of varying size, shape, and density are found automatically making the algorithm suited to tasks such Word Sense Induction (WSI), where the number of classes is unknown and where class distributions may be skewed. The algorithm is applied to two WSI tasks, obtaining results comparable with those of systems adopting existing, state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography 3, 235 (1990)
Article Google Scholar
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: the 90% Solution. In: Proceedings of the Human Language Technology Conference of the NAACL, pp. 57–60. Association for Computational Linguistics (2006)
Google Scholar
Wittgenstein, L.: Philosophical Investigations. Blackwell (1953)
Google Scholar
Klapaftis, I., Manandhar, S.: Word Sense Induction Using Graphs of Collocations. In: Proceeding of the 2008 Conference on ECAI, pp. 298–302 (2008)
Google Scholar
Dorow, B.: A Graph Model for Words and their Meanings. PhD thesis, Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart (2007)
Google Scholar
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19, 61–74 (1993)
Google Scholar
Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD thesis, Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart (2005)
Google Scholar
Widdows, D.: Geometry and Meaning. CSLI Lecture Notes. CSLI Publications, Center for the Study of Language and Information (2004)
Google Scholar
Pantel, P., Lin, D.: Discovering Word Senses from Text. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 613–619. ACM (2002)
Google Scholar
Biemann, C.: Unsupervised and Knowledge-Free Natural Language Processing in the Structure Discovery Paradigm. PhD thesis, University of Leipzig (2007)
Google Scholar
Agirre, E., Soroa, A.: SemEval-2007 Task 02: Evaluating Word Sense Induction and Discrimination Systems. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 7–12. Association for Computational Linguistics (2007)
Google Scholar
Manandhar, S., Klapaftis, I., Dligach, D., Pradhan, S.: SemEval-2010 Task 14: Word Sense Induction and Disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 63–68. Association for Computational Linguistics (2010)
Google Scholar
Dasgupta, S., Papadimitriou, C., Vazirani, U.: Algorithms. McGraw-Hill (2006)
Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison Wesley (2006)
Google Scholar
Yuret, D.: Discovery of Linguistic Relations Using Lexical Attraction. PhD thesis, Massachusetts Institute of Technology (1998)
Google Scholar
Rosenberg, A., Hirschberg, J.: V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 410–420 (2007)
Google Scholar
Artiles, J., Amigó, E., Gonzalo, J.: The Role of Named Entities in Web People Search. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 534–542 (2009)
Google Scholar
Pedersen, T.: Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 363–366. Association for Computational Linguistics (2010)
Google Scholar
Hope, D.: Graph-Based Approaches to Word Sense Induction. PhD thesis, University of Sussex (2012) (unpublished)
Google Scholar
Reichart, R., Rappoport, A.: The NVI Clustering Evaluation Measure. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, pp. 165–173. Association for Computational Linguistics (2009)
Google Scholar
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A Comparison of Extrinsic Clustering Evaluation Metrics Based on Formal Constraints. Information Retrieval 12, 461–486 (2009)
Article Google Scholar
Watts, D., Strogatz, S.: Collective Dynamics of ‘Small-World’ Networks. Nature 393, 440–442 (1998)
Article Google Scholar
Eckmann, J., Moses, E.: Curvature of Co-Links Uncovers Hidden Thematic Layers in the World Wide Web. Proceedings of the National Academy of Sciences 99, 5825 (2002)
Article MathSciNet Google Scholar
Erdős, P., Rényi, A.: On the Evolution of Random Graphs. Akad. Kiadó (1960)
Google Scholar
Bollobás, B., Riordan, O.: Percolation. Cambridge University Press (2006)
Google Scholar
Van Rijsbergen, C.: Information Retrieval. Butterworths (1979)
Google Scholar
Dorow, B., Widdows, D.: Discovering Corpus-Specific Word Senses. In: Proceedings of the Tenth Conference, European Chapter of the Association for Computational Linguistics, vol. 2, pp. 79–82. Association for Computational Linguistics (2003)
Google Scholar
Véronis, J.: Hyperlex: Lexical Cartography for Information Retrieval. Computer Speech & Language 18, 223–252 (2004)
Article Google Scholar
Agirre, E., Martínez, D., de Lacalle, O., Soroa, A.: Two Graph-Based Algorithms for State-of-the-Art WSD. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 585–593. Association for Computational Linguistics (2006)
Google Scholar
Di Marco, A., Navigli, R.: Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction. Computational Linguistics 39(4) (2013)
Google Scholar
Navigli, R., Crisafulli, G.: Inducing Word Senses to Improve Web Search Result Clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 116–126. Association for Computational Linguistics (2010)
Google Scholar
Dorow, B., Widdows, D., Ling, K., Eckmann, J., Sergi, D., Moses, E.: Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination. In: 2nd Workshop Organized by the MEANING Project (2005)
Google Scholar
Klapaftis, I., Manandhar, S.: Word Sense Induction Using Graphs of Collocations. In: Proceeding of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence, pp. 298–302. IOS Press (2008)
Google Scholar
Klapaftis, I., Manandhar, S.: Word Sense Induction and Disambiguation Using Hierarchical Random Graphs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 745–755. Association for Computational Linguistics (2010)
Google Scholar
Bordag, S.: Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation. In: Proceedings of EACL 2006, Trento (2006)
Google Scholar
Schütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24, 97–123 (1998)
Google Scholar
Purandare, A., Pedersen, T.: Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces. In: Proceedings of the Conference on Computational Natural Language Learning, pp. 41–48 (2004)
Google Scholar
Van de Cruys, T.: Using Three Way Data for Word Sense Discrimination. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 929–936 (2008)
Google Scholar
Van de Cruys, T., Apidianaki, M.: Latent Semantic Word Sense Induction and Disambiguation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL/HLT), pp. 1476–1485 (2011)
Google Scholar
Brody, S., Lapata, M.: Bayesian Word Sense Induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 103–111. Association for Computational Linguistics (2009)
Google Scholar
Navigli, R.: Word Sense Disambiguation: A Survey. ACM Computing Surveys (CSUR) 41, 10 (2009)
Article Google Scholar
Apidianaki, M., Van de Cruys, T.: A Quantitative Evaluation of Global Word Sense Induction. In: Gelbukh, A.F. (ed.) CICLing 2011, Part I. LNCS, vol. 6608, pp. 253–264. Springer, Heidelberg (2011)
Chapter Google Scholar
Navigli, R.: A Quick Tour of Word Sense Disambiguation, Induction and Related Approaches. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) SOFSEM 2012. LNCS, vol. 7147, pp. 115–129. Springer, Heidelberg (2012)
Chapter Google Scholar
Klapaftis, I.: Unsupervised Concept Hierarchy Induction: Learning the Semantics of Words. PhD thesis, University of York (2008)
Google Scholar
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196. Association for Computational Linguistics (1995)
Google Scholar
Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers (1981)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 1–38 (1977)
Google Scholar
Biemann, C.: Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. In: Proceedings of the HLT-NAACL 2006 Workshop on Textgraphs 2006 (2006)
Google Scholar
Korkontzelos, I., Manandhar, S.: Detecting Compositionality in Multi-Word Expressions. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 65–68. Association for Computational Linguistics (2009)
Google Scholar
Zhang, Z., Sun, L.: Improving Word Sense Induction by Exploiting Semantic Relevance. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 1387–1391 (2011)
Google Scholar
Jurgens, D.: An Evaluation of Graded Sense Disambiguation Using Word Sense Induction. In: Proceedings of *SEM First Joint Conference on Lexical and Computational Semantics. ACL (2012)
Google Scholar
Fountain, T., Lapata, M.: Taxonomy Induction Using Hierarchical Random Graphs. In: 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 446–476 (2012)
Google Scholar
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press (1994)
Google Scholar
Newman, M., Barabási, A.L., Watts, D.J.: The Structure and Dynamics of Networks. Princeton University Press (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Cognitive and Language Processing Systems Group, University of Sussex, Brighton, Sussex, UK
David Hope & Bill Keller

Authors

David Hope
View author publications
You can also search for this author in PubMed Google Scholar
Bill Keller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hope, D., Keller, B. (2013). MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-37247-6_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics