Skip to main content

Semantic Clustering of Russian Web Search Results: Possibilities and Problems

  • Chapter
  • First Online:
Book cover Information Retrieval (RuSSIR 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 505))

Included in the following conference series:

Abstract

The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://opencorpora.org

  2. 2.

    http://ruscorpora.ru

  3. 3.

    http://go.mail.ru

  4. 4.

    http://analyzethis.ru/?analyzer=homonymous

References

  1. Harris, Z.S.: Distributional Structure. Springer, Heidelberg (1970)

    Book  Google Scholar 

  2. Bybee, J.: Frequency of use and the Organization of Language. Oxford University Press, USA (2006)

    Google Scholar 

  3. Kilgarriff, A.: Dictionary word sense distinctions: An enquiry into their nature. Comput. Humanit. 26(5–6), 365–387 (1992)

    Article  Google Scholar 

  4. Schütze, H., Pedersen, J.O.: Information retrieval based on word senses. In: Proceedings 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1995), pp. 161–175 (1995)

    Google Scholar 

  5. Navigli, R.: A quick tour of word sense disambiguation, induction and related approaches. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) SOFSEM 2012. LNCS, vol. 7147, pp. 115–129. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Marco, A.D., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Comput. Linguist. 39(3), 709–754 (2013)

    Article  Google Scholar 

  7. Véronis, J.: Hyperlex: lexical cartography for information retrieval. Comput. Speech Lang. 18(3), 223–252 (2004)

    Article  Google Scholar 

  8. Padró, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on LanguageResources and Evaluation (LREC’12). Istanbul, Turkey, European LanguageResources Association (ELRA), May 2012

    Google Scholar 

  9. Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Comput. Linguist. 22(1), 1–38 (1996)

    Google Scholar 

  10. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey Kutuzov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kutuzov, A. (2015). Semantic Clustering of Russian Web Search Results: Possibilities and Problems. In: Braslavski, P., Karpov, N., Worring, M., Volkovich, Y., Ignatov, D.I. (eds) Information Retrieval. RuSSIR 2014. Communications in Computer and Information Science, vol 505. Springer, Cham. https://doi.org/10.1007/978-3-319-25485-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25485-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25484-5

  • Online ISBN: 978-3-319-25485-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics