Skip to main content
Log in

A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This article presents a comparison of different Word Sense Induction (wsi) clustering algorithms on two novel pseudoword data sets of semantic-similarity and co-occurrence-based word graphs, with a special focus on the detection of homonymic polysemy. We follow the original definition of a pseudoword as the combination of two monosemous terms and their contexts to simulate a polysemous word. The evaluation is performed comparing the algorithm’s output on a pseudoword’s ego word graph (i.e., a graph that represents the pseudoword’s context in the corpus) with the known subdivision given by the components corresponding to the monosemous source words forming the pseudoword. The main contribution of this article is to present a self-sufficient pseudoword-based evaluation framework for wsi graph-based clustering algorithms, thereby defining a new evaluation measure (top2) and a secondary clustering process (hyperclustering). To our knowledge, we are the first to conduct and discuss a large-scale systematic pseudoword evaluation targeting the induction of coarse-grained homonymous word senses across a large number of graph clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. An ego word graph of a word w is a graph that represents the context of w in the corpus; alternatively, it can be seen as the neighbourhood of w in a word graph that globally represents the corpus. See Sect. 5.1.1 for the definition of ego word graph in our framework.

  2. http://wordnetweb.princeton.edu/perl/webwn, Miller (1995).

  3. See for example the results at task 14 of SemEval 2010 (Manandhar et al. 2010), where adjusted mutual information was introduced to correct the bias: https://www.cs.york.ac.uk/semeval2010_WSI/task_14_ranking.html.

  4. In this example a context is informally understood as the lemmatised versions of content words co-occurring with the target word. A formal definition of the kind of context used in our work will be given in Sect. 5.1.1.

  5. If \(\gamma \ne \emptyset \) and/or \(\delta \ne \emptyset \), we are actually considering a non-exhaustive partition or a subpartition, i.e., a collection of disjoint, non-empty subsets whose union is not necessarily the whole set.

  6. http://corpora.uni-leipzig.de.

  7. https://sourceforge.net/p/jobimtext/wiki/Home/.

  8. Precisely, the implementation found in https://sourceforge.net/p/jobimtext/wiki/Sense_Clustering/ with parameters: -n 200 -N 200.

  9. On this topic cf. Lyons (1968).

  10. The quintiles are the four values that divide a quantity in five parts: in this case, they are the multiples of ca 4.52, i.e., 4.52, 9.04, 13.56 and 18.08.

  11. On this graph-theoretical topic, see e.g., Haynes et al. (1998).

  12. Despite some similarities, our definition of hypergraph is different than the common graph-theoretical concept that goes by the same name, namely that of a graph \(G=(V,E)\) whose edges can be generic subsets of v. See Berge and Minieka (1973) for more details about the subject.

  13. We define the clustering of a set \({\mathcal {S}}\) as a finite collection of non-empty subsets of \({\mathcal {S}}\) whose union is the whole \({\mathcal {S}}\). In this paper, we often assume a clustering to also be a partition, i.e., that the subsets are all disjoint, but for some algorithms like MaxMax this is not always the case.

  14. barque is another word for ship, while pennywhistle is a small, inexpensive flute.

  15. A clustering coefficient of a node or a graph can be defined in different ways. The first definition of a local clustering coefficient is found in Watts and Strogatz (1998); a global one based on triangles is in Feld (1981); Karlberg (1997).

  16. The mean absolute deviation of a data set of observations is the average of the absolute values of the differences between the mean data set value and the observations (Dixon and Massey 1957).

  17. We could normalise the mad score with respect to the number of total clustered elements. However, since the order of our ego graphs is nearly constant, we just take the absolute mean deviations. The same goes for the mean number of clusters.

References

  • Amigó, E., Gonzalo, J., Artiles, J., & Verdejo, F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12(4), 461–486.

    Article  Google Scholar 

  • Bagga, A., Baldwin, B. (1998). Algorithms for scoring coreference chains. In Proceedings of the first international Conference on Language Resources and Evaluation (LREC’98), workshop on linguistic coreference (pp. 563–566). European Language Resources Association, Granada, Spain.

  • Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    Article  Google Scholar 

  • Başkaya, O., & Jurgens, D. (2016). Semi-supervised learning with induced word senses for state of the art word sense disambiguation. Journal of Artificial Intelligence Research, 55, 1025–1058.

    Article  Google Scholar 

  • Berge, C., & Minieka, E. (1973). Graphs and hypergraphs (Vol. 7). Amsterdam: North-Holland.

    Google Scholar 

  • Biemann, C. (2006). Chinese whispers: An efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing (pp. 73–80), New York, NY, USA.

  • Biemann, C., & Quasthoff, U. (2009). Networks generated from natural language text. In N. Ganguly, A. Deutsch & A. Mukherjee (Eds.), Dynamics on and of complex networks: Applications to biology, computer science, and the social sciences (pp. 167–185). Springer.

  • Biemann, C., & Riedl, M. (2013). Text: Now in 2D! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1), 55–95.

    Article  Google Scholar 

  • Bordag, S. (2006). Word sense induction: Triplet-based clustering and automatic evaluation. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics (pp. 137–144). EACL, Trento, Italy.

  • i Cancho, R. F., & Solé, R. (2001). The small world of human language. Proceedings of the Royal Society of London Series B: Biological Sciences, 268(1482), 2261–2265.

    Article  Google Scholar 

  • Cecchini, F. M. (2017). Graph-based clustering algorithms for word sense induction. Ph.D. thesis, Università degli Studi di Milano-Bicocca.

  • Cecchini, F. M., Fersini, E. (2015) . Word sense discrimination: A gangplank algorithm. In Proceedings of the second Italian conference on computational linguistics CLiC-it 2015 (pp. 77–81). Trento, Italy.

  • Cecchini, F. M., Fersini, E., & Messina, E. (2015). Word sense discrimination on tweets: A graph-based approach. In KDIR 2015—Proceedings of the international conference on knowledge discovery and information retrieval (Vol. 1, pp. 138–146). IC3K, Lisbon.

  • Cover, T., & Thomas, J. (2012 [1991]). Elements of information theory. Wiley, Hoboken, NJ.

  • De Marneffe, M. C., MacCartney, B., & Manning, C. (2006) . Generating typed dependency parses from phrase structure parses. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06), 2006 (pp. 449–454). European Language Resources Association, Genoa.

  • De Saussure, F. (1916) . Cours de linguistique générale. Payot&Rivage, Paris, France (1995 [1916]). Critical edition of 1st edition

  • Di Marco, A., & Navigli, R. (2013). Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 39(3), 709–754.

    Article  Google Scholar 

  • Dixon, W., & Massey, F, Jr. (1957). Introduction to statistical analysis. New York, NY: McGraw-Hill.

    Google Scholar 

  • van Dongen, S. (2000). Graph clustering by flow simulation. Ph.D. thesis, Universiteit Utrecht

  • Evert, S. (2004) . The statistics of word cooccurrences: Word pairs and collocations. Ph.D. thesis, Universität Stuttgart

  • Feld, S. L. (1981). The focused organization of social ties. American Journal of Sociology, 86(5), 1015–1035.

    Article  Google Scholar 

  • Gale, W., Church, K., & Yarowsky, D. (1992) . Work on statistical methods for word sense disambiguation. In Technical Report of 1992 fall symposium—Probabilistic approaches to natural language, pp. 54–60. AAAI, Cambridge, Massachusetts, USA

  • Grätzer, G. (2011). Lattice theory: Foundation. New York: Springer.

    Book  Google Scholar 

  • Harris, Z. (1954). Distributional structure. Word, 10(2–3), 146–162.

    Article  Google Scholar 

  • Haynes, T. W., Hedetniemi, S., & Slater, P. (1998). Fundamentals of domination in graphs. Boca Raton, FL: CRC Press.

    Google Scholar 

  • Hope, D., & Keller, B. (2013). MaxMax: A graph-based soft clustering algorithm applied to word sense induction. In Proceedings of the 14th international conference on computational linguistics and intelligent text processing (pp. 368–381). Samos, Greece

  • Karlberg, M. (1997). Testing transitivity in graphs. Social Networks, 19(4), 325–343.

    Article  Google Scholar 

  • Kilgarriff, A., Rychlý, P., Smrž, P., & Tugwell, D. (2004). The sketch engine. In Proceedings of the eleventh Euralex Conference (pp. 105–116). Lorient, France.

  • Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.

    Article  Google Scholar 

  • Lyons, J. (1968). Introduction to theoretical linguistics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Manandhar, S., Klapaftis, I., Dligach, D., & Pradhan, S. (2010) . Semeval-2010 task 14: Word sense induction & disambiguation. In Proceedings of the 5th international workshop on semantic evaluation (pp. 63–68). Association for Computational Linguistics, Los Angeles, CA.

  • Martin, J., & Jurafsky, D. (2000). Speech and language processing. Upper Saddle River, NJ: Pearson.

    Google Scholar 

  • Miller, G. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Nakov, P., & Hearst, M. (2003). Category-based pseudowords. In Companion volume of the proceedings of the human language technology conference of the North American chapter of the association for computational linguistics (HTL-NAACL) 2003—Short Papers (pp. 70–72). Association for Computational Linguistics, Edmonton, Alberta, Canada.

  • Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2), 10.

    Article  Google Scholar 

  • Navigli, R., Litkowski, K., & Hargraves, O. (2007) . Semeval-2007 task 07: Coarse-grained english all-words task. In Proceedings of the 4th international workshop on semantic evaluations (pp. 30–35). Association for Computational Linguistics, Prague.

  • Otrusina, L., Smrž, P. (2010) . A new approach to pseudoword generation. In Proceedings of the seventh international Conference on language resources and evaluation (LREC’10) (pp. 1195–1199). European Language Resources Association, Valletta.

  • Parker, R., Graff, D., Kong, J., Chen, K., & Maeda, K. (2011) . English Gigaword, 5th edn. Linguistic Data Consortium, Philadelphia, PA. https://catalog.ldc.upenn.edu/LDC2011T07.

  • Pilehvar, M. T., & Navigli, R. (2013). Paving the way to a large-scale pseudosense-annotated dataset. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (HTL-NAACL) (pp. 1100–1109). Association for Computational Linguistics, Atlanta, GA.

  • Pilehvar, M. T., & Navigli, R. (2014). A large-scale pseudoword-based evaluation framework for state-of-the-art word sense disambiguation. Computational Linguistics, 40(4), 837–881.

    Article  Google Scholar 

  • Richter, M., Quasthoff, U., Hallsteinsdóttir, E., & Biemann, C. (2006) . Exploiting the Leipzig corpora collection. In Proceedings of the fifth Slovenian and first international language technologies conference, IS-LTC ’06 (pp. 68–73). Slovenian Language Technologies Society, Ljubljana.

  • Riedl, M. (2016) . Unsupervised methods for learning and using semantics of natural language. Ph.D. thesis, Technische Universität Darmstadt

  • Ruohonen, K. (2013) . Graph theory. Tampereen teknillinen yliopisto (trans: Tamminen, J., Lee, K.-C., & Piché, R.). http://math.tut.fi/~ruohonen/GT_English.pdf. Originally titled Graafiteoria, lecture notes.

  • Schütze, H. (1992) . Dimensions of meaning. In Proceedings of Supercomputing’92 (pp. 787–796). ACM/IEEE, Minneapolis, MN.

  • Strehl, A., & Ghosh, J. (2002). Cluster ensembles–A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.

    Google Scholar 

  • Turney, P., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.

    Article  Google Scholar 

  • Véronis, J. (2004). Hyperlex: Lexical cartography for information retrieval. Computer Speech & Language, 18(3), 223–252.

    Article  Google Scholar 

  • Watts, D., & Strogatz, S. (1998). Collective dynamics of small-world networks. Nature, 393(6684), 440–442.

    Article  Google Scholar 

  • Widdows, D., & Dorow, B. (2002) . A graph model for unsupervised lexical acquisition. In Proceedings of the 19th international conference on computational linguistics (vol. 1, pp. 1–7). Association for Computational Linguistics, Taipei.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Flavio Massimiliano Cecchini.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cecchini, F.M., Riedl, M., Fersini, E. et al. A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework. Lang Resources & Evaluation 52, 733–770 (2018). https://doi.org/10.1007/s10579-018-9415-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-018-9415-1

Keywords

Navigation