Skip to main content

MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

This paper introduces a linear time graph-based soft clustering algorithm. The algorithm applies a simple idea: given a graph, vertex pairs are assigned to the same cluster if either vertex has maximal affinity to the other. Clusters of varying size, shape, and density are found automatically making the algorithm suited to tasks such Word Sense Induction (WSI), where the number of classes is unknown and where class distributions may be skewed. The algorithm is applied to two WSI tasks, obtaining results comparable with those of systems adopting existing, state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography 3, 235 (1990)

    Article  Google Scholar 

  2. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: the 90% Solution. In: Proceedings of the Human Language Technology Conference of the NAACL, pp. 57–60. Association for Computational Linguistics (2006)

    Google Scholar 

  3. Wittgenstein, L.: Philosophical Investigations. Blackwell (1953)

    Google Scholar 

  4. Klapaftis, I., Manandhar, S.: Word Sense Induction Using Graphs of Collocations. In: Proceeding of the 2008 Conference on ECAI, pp. 298–302 (2008)

    Google Scholar 

  5. Dorow, B.: A Graph Model for Words and their Meanings. PhD thesis, Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart (2007)

    Google Scholar 

  6. Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19, 61–74 (1993)

    Google Scholar 

  7. Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD thesis, Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart (2005)

    Google Scholar 

  8. Widdows, D.: Geometry and Meaning. CSLI Lecture Notes. CSLI Publications, Center for the Study of Language and Information (2004)

    Google Scholar 

  9. Pantel, P., Lin, D.: Discovering Word Senses from Text. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 613–619. ACM (2002)

    Google Scholar 

  10. Biemann, C.: Unsupervised and Knowledge-Free Natural Language Processing in the Structure Discovery Paradigm. PhD thesis, University of Leipzig (2007)

    Google Scholar 

  11. Agirre, E., Soroa, A.: SemEval-2007 Task 02: Evaluating Word Sense Induction and Discrimination Systems. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 7–12. Association for Computational Linguistics (2007)

    Google Scholar 

  12. Manandhar, S., Klapaftis, I., Dligach, D., Pradhan, S.: SemEval-2010 Task 14: Word Sense Induction and Disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 63–68. Association for Computational Linguistics (2010)

    Google Scholar 

  13. Dasgupta, S., Papadimitriou, C., Vazirani, U.: Algorithms. McGraw-Hill (2006)

    Google Scholar 

  14. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison Wesley (2006)

    Google Scholar 

  15. Yuret, D.: Discovery of Linguistic Relations Using Lexical Attraction. PhD thesis, Massachusetts Institute of Technology (1998)

    Google Scholar 

  16. Rosenberg, A., Hirschberg, J.: V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 410–420 (2007)

    Google Scholar 

  17. Artiles, J., Amigó, E., Gonzalo, J.: The Role of Named Entities in Web People Search. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 534–542 (2009)

    Google Scholar 

  18. Pedersen, T.: Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 363–366. Association for Computational Linguistics (2010)

    Google Scholar 

  19. Hope, D.: Graph-Based Approaches to Word Sense Induction. PhD thesis, University of Sussex (2012) (unpublished)

    Google Scholar 

  20. Reichart, R., Rappoport, A.: The NVI Clustering Evaluation Measure. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, pp. 165–173. Association for Computational Linguistics (2009)

    Google Scholar 

  21. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A Comparison of Extrinsic Clustering Evaluation Metrics Based on Formal Constraints. Information Retrieval 12, 461–486 (2009)

    Article  Google Scholar 

  22. Watts, D., Strogatz, S.: Collective Dynamics of ‘Small-World’ Networks. Nature 393, 440–442 (1998)

    Article  Google Scholar 

  23. Eckmann, J., Moses, E.: Curvature of Co-Links Uncovers Hidden Thematic Layers in the World Wide Web. Proceedings of the National Academy of Sciences 99, 5825 (2002)

    Article  MathSciNet  Google Scholar 

  24. Erdős, P., Rényi, A.: On the Evolution of Random Graphs. Akad. Kiadó (1960)

    Google Scholar 

  25. Bollobás, B., Riordan, O.: Percolation. Cambridge University Press (2006)

    Google Scholar 

  26. Van Rijsbergen, C.: Information Retrieval. Butterworths (1979)

    Google Scholar 

  27. Dorow, B., Widdows, D.: Discovering Corpus-Specific Word Senses. In: Proceedings of the Tenth Conference, European Chapter of the Association for Computational Linguistics, vol. 2, pp. 79–82. Association for Computational Linguistics (2003)

    Google Scholar 

  28. Véronis, J.: Hyperlex: Lexical Cartography for Information Retrieval. Computer Speech & Language 18, 223–252 (2004)

    Article  Google Scholar 

  29. Agirre, E., Martínez, D., de Lacalle, O., Soroa, A.: Two Graph-Based Algorithms for State-of-the-Art WSD. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 585–593. Association for Computational Linguistics (2006)

    Google Scholar 

  30. Di Marco, A., Navigli, R.: Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction. Computational Linguistics 39(4) (2013)

    Google Scholar 

  31. Navigli, R., Crisafulli, G.: Inducing Word Senses to Improve Web Search Result Clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 116–126. Association for Computational Linguistics (2010)

    Google Scholar 

  32. Dorow, B., Widdows, D., Ling, K., Eckmann, J., Sergi, D., Moses, E.: Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination. In: 2nd Workshop Organized by the MEANING Project (2005)

    Google Scholar 

  33. Klapaftis, I., Manandhar, S.: Word Sense Induction Using Graphs of Collocations. In: Proceeding of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence, pp. 298–302. IOS Press (2008)

    Google Scholar 

  34. Klapaftis, I., Manandhar, S.: Word Sense Induction and Disambiguation Using Hierarchical Random Graphs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 745–755. Association for Computational Linguistics (2010)

    Google Scholar 

  35. Bordag, S.: Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation. In: Proceedings of EACL 2006, Trento (2006)

    Google Scholar 

  36. Schütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24, 97–123 (1998)

    Google Scholar 

  37. Purandare, A., Pedersen, T.: Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces. In: Proceedings of the Conference on Computational Natural Language Learning, pp. 41–48 (2004)

    Google Scholar 

  38. Van de Cruys, T.: Using Three Way Data for Word Sense Discrimination. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 929–936 (2008)

    Google Scholar 

  39. Van de Cruys, T., Apidianaki, M.: Latent Semantic Word Sense Induction and Disambiguation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL/HLT), pp. 1476–1485 (2011)

    Google Scholar 

  40. Brody, S., Lapata, M.: Bayesian Word Sense Induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 103–111. Association for Computational Linguistics (2009)

    Google Scholar 

  41. Navigli, R.: Word Sense Disambiguation: A Survey. ACM Computing Surveys (CSUR) 41, 10 (2009)

    Article  Google Scholar 

  42. Apidianaki, M., Van de Cruys, T.: A Quantitative Evaluation of Global Word Sense Induction. In: Gelbukh, A.F. (ed.) CICLing 2011, Part I. LNCS, vol. 6608, pp. 253–264. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  43. Navigli, R.: A Quick Tour of Word Sense Disambiguation, Induction and Related Approaches. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) SOFSEM 2012. LNCS, vol. 7147, pp. 115–129. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  44. Klapaftis, I.: Unsupervised Concept Hierarchy Induction: Learning the Semantics of Words. PhD thesis, University of York (2008)

    Google Scholar 

  45. Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196. Association for Computational Linguistics (1995)

    Google Scholar 

  46. Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers (1981)

    Google Scholar 

  47. Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 1–38 (1977)

    Google Scholar 

  48. Biemann, C.: Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. In: Proceedings of the HLT-NAACL 2006 Workshop on Textgraphs 2006 (2006)

    Google Scholar 

  49. Korkontzelos, I., Manandhar, S.: Detecting Compositionality in Multi-Word Expressions. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 65–68. Association for Computational Linguistics (2009)

    Google Scholar 

  50. Zhang, Z., Sun, L.: Improving Word Sense Induction by Exploiting Semantic Relevance. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 1387–1391 (2011)

    Google Scholar 

  51. Jurgens, D.: An Evaluation of Graded Sense Disambiguation Using Word Sense Induction. In: Proceedings of *SEM First Joint Conference on Lexical and Computational Semantics. ACL (2012)

    Google Scholar 

  52. Fountain, T., Lapata, M.: Taxonomy Induction Using Hierarchical Random Graphs. In: 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 446–476 (2012)

    Google Scholar 

  53. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press (1994)

    Google Scholar 

  54. Newman, M., Barabási, A.L., Watts, D.J.: The Structure and Dynamics of Networks. Princeton University Press (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hope, D., Keller, B. (2013). MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37247-6_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37246-9

  • Online ISBN: 978-3-642-37247-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics