Skip to main content

Combining Neural Language Models for Word Sense Induction

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11832))

Abstract

Word sense induction (WSI) is the problem of grouping occurrences of an ambiguous word according to the expressed sense of this word. Recently a new approach to this task was proposed, which generates possible substitutes for the ambiguous word in a particular context using neural language models, and then clusters sparse bag-of-words vectors built from these substitutes. In this work, we apply this approach to the Russian language and improve it in two ways. First, we propose methods of combining left and right contexts, resulting in better substitutes generated. Second, instead of fixed number of clusters for all ambiguous words we propose a technique for selecting individual number of clusters for each word. Our approach established new state-of-the-art level, improving current best results of WSI for the Russian language on two RUSSE 2018 datasets by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://competitions.codalab.org/competitions/public_submissions/17806, https://competitions.codalab.org/competitions/public_submissions/17809, see post-competition tabs.

  2. 2.

    http://ruscorpora.ru.

  3. 3.

    http://gramota.ru/slovari/info/bts.

  4. 4.

    http://docs.deeppavlov.ai/en/master/intro/pretrained_vectors.html.

  5. 5.

    https://github.com/mamamot/Russian-ULMFit.

References

  1. Alagić, D., Šnajder, J., Padó, S.: Leveraging lexical substitutes for unsupervised word sense induction. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  2. Amplayo, R.K., won Hwang, S., Song, M.: AutoSense model for word sense induction. In: AAAI (2019)

    Google Scholar 

  3. Amrami, A., Goldberg, Y.: Word sense induction with neural biLM and symmetric patterns. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4860–4867. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/D18-1523

  4. Apresjan, V.: Active dictionary of the Russian language: theory and practice. In: Meaning-Text Theory 2011, pp. 13–24 (2011)

    Google Scholar 

  5. Arefyev, N., Ermolaev, P., Panchenko, A.: How much does a word weigh? Weighting word embeddings for word sense induction. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, Moscow, Russia, pp. 68–84. RSUH (2018)

    Google Scholar 

  6. Bartunov, S., Kondrashkin, D., Osokin, A., Vetrov, D.: Breaking sticks and ambiguities with adaptive skip-gram. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) (2016)

    Google Scholar 

  7. Baskaya, O., Sert, E., Cirik, V., Yuret, D.: AI-KU: using substitute vectors and co-occurrence modeling for word sense induction and disambiguation. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 300–306 (2013)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  9. Hope, D., Keller, B.: UoS: a graph-based system for graded word sense induction. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), no. 1, Atlanta, Georgia, USA, pp. 689–694 (2013). http://www.aclweb.org/anthology/S13-2113

  10. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 328–339. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/P18-1031

  11. Jurgens, D., Klapaftis, I.: Semeval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 290–299 (2013)

    Google Scholar 

  12. Kutuzov, A.: Russian word sense induction by clustering averaged word embeddings. CoRR abs/1805.02258 (2018). http://arxiv.org/abs/1805.02258

  13. Lau, J.H., Cook, P., Baldwin, T.: unimelb: topic modelling-based word sense induction. In: Second Joint Conference on Lexical and Computational Semantics (*SEM): SemEval 2013), vol. 2, Atlanta, Georgia, USA, pp. 307–311 (2013). http://www.aclweb.org/anthology/S13-2051

  14. Manandhar, S., Klapaftis, I.P., Dligach, D., Pradhan, S.S.: SemEval-2010 task 14: word sense induction & disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 63–68. Association for Computational Linguistics (2010)

    Google Scholar 

  15. Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)

    Google Scholar 

  16. Panchenko, A., et al.: RUSSE’2018: a shared task on word sense induction for the Russian language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, Moscow, Russia, pp. 547–564. RSUH (2018). http://www.dialog-21.ru/media/4324/panchenkoa.pdf

  17. Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the NAACL (2018)

    Google Scholar 

  18. Struyanskiy, O., Arefyev, N.: Neural networks with attention for word sense induction. In: Supplementary Proceedings of the Seventh International Conference on Analysis of Images, Social Networks and Texts (AIST 2018), Moscow, Russia, 5–7 July 2018, pp. 208–213 (2018). http://ceur-ws.org/Vol-2268/paper23.pdf

  19. Tang, G., Müller, M., Rios, A., Sennrich, R.: Why self-attention? A targeted evaluation of neural machine translation architectures. arXiv preprint arXiv:1808.08946 (2018)

  20. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  21. Véronis, J.: HyperLex: lexical cartography for information retrieval. Comput. Speech Lang. 18(3), 223–252 (2004)

    Article  Google Scholar 

  22. Wang, A., Cho, K.: BERT has a mouth, and it must speak: BERT as a Markov random field language model. CoRR abs/1902.04094 (2019). http://arxiv.org/abs/1902.04094

  23. Wang, J., Bansal, M., Gimpel, K., Ziebart, B.D., Yu, C.T.: A sense-topic model for word sense induction with unsupervised data enrichment. TACL 3, 59–71 (2015)

    Google Scholar 

Download references

Acknowledgements

We are grateful to Dima Lipin, Artem Grachev and Alex Nevidomsky for their valuable help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolay Arefyev .

Editor information

Editors and Affiliations

Appendices

A Examples of Substitutes Generated

Table 5 provides examples of discriminative substitutes with their relative frequencies for each of two most frequent senses of several words. A substitute is called discriminative if it is frequently generated for one sense of an ambigusous word, but rarely for another. Formally, we take substitutes with the largest \(\frac{P(w|sense_1)}{P(w|sense_2)}\), where \(P(w|sense_i)\) is estimated using add-one smoothing:

$$P(w|sense_i) = \frac{cnt(w|sense_i) + 1}{cnt(sense_i)+|vocab|}$$

Additionally, we leave only substitutes which were generated at lest 10 times for one of the senses.

Table 5. Discriminative substitutes for several words from bts-rnc train

Table 6 lists ten most probable substitutes according to the combined distribution and according to the forward and the backward LM distributions separately for several examples. Substitutes from unidirectional distributions are very sensitive to the position of the target word. When either left or right context doesn’t contain enough information at least halve of the substitutes will be not related to the target word. Combined distribution provides more relevant substitutes.

Table 6. Substitutes generated for randomly selected examples.

B The Number of Clusters Selected

Figure 4 plots distributions of the differences between the true number of senses, the number of clusters in submissions and the optimal number of clusters. Silhouette score gives the number of clusters, which is usually larger than the number of senses, but is near the optimum with respect to ARI and given our vectors. The previous best submissions better estimate the true number of senses.

Fig. 4.
figure 4

Comparison of the number of clusters in our (silnc) and previous best submissions (prev_best_nc) with the true number of senses (true_nc) and the optimal number of clusters (max_ari_nc).

Table 7. Selected hyperparameters

C Hyperparameters

Table 7 shows the selected hyperparameters for the methods described in Sect. 3. For bts-rnc and active-dict datasets hyperparameters were selected using grid search on corresponding train sets. For wiki-wiki we used the hyperparameters from bts-rnc due to very small size of wiki-wiki train set. We selected the following hyperparameters.

  1. 1.

    Add bias (True/False). Ignoring bias in the softmax layer of the LM was proposed by [3] to improve substitutes, because adding bias results in prediction of frequency words instead of rare but relevant substitutes.

  2. 2.

    Normalize output embeddings (True/False). Similarly to ignoring bias, this may result in prediction of more relevant substitutes.

  3. 3.

    K (10–400). The number of substitutes from each distribution.

  4. 4.

    Exclude Target (True/False). We want the substitutes for different senses of the target word to be non-overlapping. Thus, it may be beneficial to exclude the target word from the substitutes.

  5. 5.

    TFIDF (True/False). Applying TFIDF transformation to bag-of-words vectors of substitutes sometimes improve performance.

  6. 6.

    S (=20). The number of representatives for each example. It didn’t affect the performance so we use the value from [3].

  7. 7.

    L (4–30). The number of substitutes to sample from top K predictions.

  8. 8.

    z (1.0–3.0). The parameter of Zipf distribution.

  9. 9.

    \(\varvec{\beta }\) (0.1–0.5). Relative length of the left or the right context after which the discounting of the corresponding LM begins.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arefyev, N., Sheludko, B., Aleksashina, T. (2019). Combining Neural Language Models for Word Sense Induction. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Lecture Notes in Computer Science(), vol 11832. Springer, Cham. https://doi.org/10.1007/978-3-030-37334-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37334-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37333-7

  • Online ISBN: 978-3-030-37334-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics