Skip to main content

Tagset Conversion with Decision Trees

  • Conference paper
Book cover Advances in Natural Language Processing (JapTAL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7614))

Included in the following conference series:

  • 1528 Accesses

Abstract

This paper addresses the problem of converting part of speech – or, more generally, morphosyntactic – annotations within a single language. Conversion between tagsets is a difficult task and, typically, it is either expensive (when performed manually) or inaccurate (lossy automatic conversion or re-tagging with classical taggers). A statistical method of annotation conversion is proposed here which achieves high accuracy, provided the source annotation is of high quality. The paper also presents an evaluation of an implementation of the converter when applied to a pair of Polish tagsets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bień, J.S., Woliński, M.: Wzbogacony korpus Słownika frekwencyjnego polszczyzny współczesnej. In: Linde-Usiekniewicz, J. (ed.) Prace Lingwistyczne Dedykowane Prof. Jadwidze Sambor, pp. 6–10. Uniwersytet Warszawski, Wydział Polonistyki (2003)

    Google Scholar 

  2. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)

    Article  MATH  Google Scholar 

  3. Kurcz, I., Lewicki, A., Sambor, J., Szafran, K., Woronczak, J.: Słownik frekwencyjny polszczyzny współczesnej. Wydawnictwo Instytutu Języka Polskiego PAN, Cracow (1990)

    Google Scholar 

  4. Ogrodniczuk, M.: Nowa edycja wzbogaconego korpusu słownika frekwencyjnego. In: Gajda, S. (ed.) Językoznawstwo w Polsce. Stan i perspektywy, pp. 181–190. Komitet Językoznawstwa, Polska Akademia Nauk and Instytut Filologii Polskiej, Uniwersytet Opolski, Opole (2003), http://www.mimuw.edu.pl/~jsbien/MO/JwP03/

  5. Przepiórkowski, A.: A comparison of two morphosyntactic tagsets of Polish. In: Koseska-Toszewa, V., Dimitrova, L., Roszko, R. (eds.) Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop, Warsaw, pp. 138–144 (2009)

    Google Scholar 

  6. Przepiórkowski, A., Woliński, M.: The unbearable lightness of tagging: A case study in morphosyntactic tagging of Polish. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC 2003), EACL 2003, pp. 109–116 (2003)

    Google Scholar 

  7. Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, Los Alios (1993)

    Google Scholar 

  8. Radziszewski, A., Acedański, S.: Taggers Gonna Tag: An Argument against Evaluating Disambiguation Capacities of Morphosyntactic Taggers. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 81–87. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Saloni, Z., Gruszczyński, W., Woliński, M., Wołosz, R.: Słownik gramatyczny języka polskiego. Wiedza Powszechna, Warsaw (2007)

    Google Scholar 

  10. Szałkiewicz, Ł., Przepiórkowski, A.: Anotacja morfoskładniowa NKJP. In: Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.) Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)

    Google Scholar 

  11. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005), http://www.cs.waikato.ac.nz/ml/weka/

    MATH  Google Scholar 

  12. Woliński, M.: Morfeusz — a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining. Advances in Soft Computing, pp. 503–512. Springer, Berlin (2006)

    Google Scholar 

  13. Zeman, D.: Reusable tagset conversion using tagset drivers. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2008. ELRA, Marrakech (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zaborowski, B., Przepiórkowski, A. (2012). Tagset Conversion with Decision Trees. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33983-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33982-0

  • Online ISBN: 978-3-642-33983-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics