Advertisement

The WSD Development Environment

  • Rafał Młodzki
  • Adam Przepiórkowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6562)

Abstract

In this paper we present the Word Sense Disambiguation Development Environment (WSDDE), a platform for testing various Word Sense Disambiguation (WSD) technologies, as well as the results of first experiments in applying the platform to WSD in Polish. The current development version of the environment facilitates the construction and evaluation of WSD methods in the supervised Machine Learning (ML) paradigm using various knowledge sources. Experiments were conducted on a small manually sense-tagged corpus of 13 Polish words. The usual groups of features were implemented including bag-of-words, parts-of-speech, words with their positions, etc. (with different settings), in connection with popular ML algorithms (including Naive Bayes, Decision Trees and Support Vector Machines). The aim was to test to what extent standard approaches to the English WSD task may be adopted to free word order and rich inflection languages such as Polish. In accordance with earlier results in the literature, the initial experiments suggest that these standard approaches are relatively well-suited for Polish. On the other hand, contrary to earlier findings, the experiments also show that adding of some features beyond bag-of-words increases the average accuracy of the results.

Keywords

word sense disambiguation machine learning feature selection Polish 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Acedański, S., Przepiókowski, A.: Towards the adequate evaluation of morphosyntactic taggers. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010); Poster Session, Beijing, pp. 1–8 (2010)Google Scholar
  2. 2.
    Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation: Algorithms and Applications, Text, Speech and Language Technology, vol. 33. Springer, Dordrecht (2006)Google Scholar
  3. 3.
    Baś, D., Broda, B., Piasecki, M.: Towards Word Sense Disambiguation of Polish. In: Proceedings of the International Multiconference on Computer Science and Information Technology (IMCSIT 2008): Computational Linguistics – Applications (CLA 2008), pp. 73–78. PTI, Wisła (2008)Google Scholar
  4. 4.
    Gale, W.A., Church, K.W., Yarowsky, D.: Work on statistical methods for word sense disambiguation. In: AAAI Fall Symposium on Probabilistic Approaches to Natural Language, Cambridge, pp. 54–60 (1992)Google Scholar
  5. 5.
    Karwańska, D., Przepiórkowski, A.: On the evaluation of two Polish taggers. In: Goźdź-Roszkowski, S. (ed.) The Proceedings of Practical Applications in Language and Computers PALC 2009, Peter Lang, Frankfurt am Main (2009)Google Scholar
  6. 6.
    Piasecki, M.: Polish tagger TaKIPI: Rule based construction and optimisation. Task Quarterly 11(1–2), 151–167 (2007)Google Scholar
  7. 7.
    Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)Google Scholar
  8. 8.
    Przepiórkowski, A.: The IPI PAN Corpus in numbers. In: Vetulani, Z. (ed.) Proceedings of the 2nd Language & Technology Conference, Poznań, Poland, pp. 27–31 (2005)Google Scholar
  9. 9.
    Przepiórkowski, A.: A comparison of two morphosyntactic tagsets of Polish. In: Koseska-Toszewa, V., Dimitrova, L., Roszko, R. (eds.) Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop, Warsaw, pp. 138–144 (2009)Google Scholar
  10. 10.
    Przepiórkowski, A., Górski, R.L., Lewandowska-Tomaszczyk, B., Łazinski, M.: Towards the National Corpus of Polish. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2008. ELRA, Marrakech (2008)Google Scholar
  11. 11.
    Przepiókowski, A., Górski, R.L., Łazinski, M., Pęzik, P.: Recent developments in the National Corpus of Polish. In: Levická, J., Garabík, R. (eds.) Proceedings of the Fifth International Conference on NLP, Corpus Linguistics, Corpus Based Grammar Research, Smolenice, Slovakia, November 25-27, pp. 302–309. Tribun, Brno (2009)Google Scholar
  12. 12.
    Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003, Budapest, pp. 33–40 (2003)Google Scholar
  13. 13.
    Przepiórkowski, A., Woliński, M.: The unbearable lightness of tagging: A case study in morphosyntactic tagging of Polish. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC 2003), EACL 2003, pp. 109–116 (2003)Google Scholar
  14. 14.
    Schütze, H.: Context space. In: AAAI Fall Symposium on Probabilistic Approaches to Natural Language, Cambridge, pp. 113–120 (1992)Google Scholar
  15. 15.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005), http://www.cs.waikato.ac.nz/ml/weka/ MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Rafał Młodzki
    • 1
  • Adam Przepiórkowski
    • 1
    • 2
  1. 1.Institute of Computer Science PASWarszawaPoland
  2. 2.University of WarsawWarszawaPoland

Personalised recommendations