Skip to main content

Application of Different Learning Methods to Hungarian Part-of-Speech Tagging

  • Conference paper
  • First Online:
Inductive Logic Programming (ILP 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1634))

Included in the following conference series:

Abstract

From the point of view of computational linguistics, Hungarian is a difficult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hungarian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large differences in accuracy, and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some different cascade connections of the taggers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. H. L. Aarts and J. K. Lenstra, editors. Local Search in Combinatorial Optimization. Discrete Mathematics and Optimization. Wiley-Interscience, 1997.

    Google Scholar 

  2. Z. Alexin, S. Zvada, and T. Gyimóthy. Application of AGLEARN on Hungarian Part-of-speech Tagging. In D. Parigot and M. Mernik, editors, Second Workshop on Attribute Grammars and their Applications, WAGA’99, pages 133–152, Amsterdam, The Netherlands, 1999. INRIA Rocquencourt.

    Google Scholar 

  3. U. Bohnebeck, T. Horváth, and S. Wrobel. Term comparisons in first-order similarity measures. In D. Page, editor, Proc. 8th Int. Conference on Inductive Logic Programming (ILP98), pages 65–79. Springer Verlag, 1998.

    Google Scholar 

  4. E. Brill. Some advances in transformation-based part of speech tagging. In Proceedings of the 12th National Conference on Artificial Intelligence. Volume 1, pages 722–727. AAAI Press, July 31-Aug. 4 1994.

    Google Scholar 

  5. J. Cussens. Part-of-speech tagging using Progol. In N. Lavrač and S. Džeroski, editors, Proceedings of the 7th International Workshop on Inductive Logic Programming, volume 1297 of LNAI, pages 93–108. Springer, Sept. 17–20 1997.

    Google Scholar 

  6. J. Cussens. Using prior probabilities and density estimation for relational classification. Lecture Notes in Computer Science, 1446:106–115, 1998.

    Article  Google Scholar 

  7. W. Daelemans, A. v. d. Bosch, and J. Zavrel. Rapid development of NLP modules with memory-based learning. In Proc. of ELSNET in Wonderland,Utrecht, pages 105–113, 1998.

    Google Scholar 

  8. S. Dzeroski and N. Lavrac. Inductive learning in deductive databases. IEEE Transactions on Knowledge and Data Engineering: Special Issue on Learning and Discovery in Knowledge-Based Databases, 5(6):939–949, Dec. 1993.

    Google Scholar 

  9. M. Eineborg and N. Lindberg. Induction of constraint grammar-rules using Progol. Lecture Notes in Computer Science, 1446:116–124, 1998.

    Article  Google Scholar 

  10. W. Emde and D. Wettschereck. Relational instance based learning. In L. Saitta, editor, Machine Learning-Proceedings 13th International Conference on Machine Learning, pages 122–130. Morgan Kaufmann Publishers, 1996.

    Google Scholar 

  11. T. Erjavec, A. Lawson, and L. R. (eds.). East meets west: A compendium of multilingual resources, 1998. CD-ROM, produced and distributed by TELRI Association e.V., ISBN:3-922641-46-6.

    Google Scholar 

  12. T. Gyimóthy and T. Horváth. Learning semantic functions of attribute grammars. Nordic Journal of Computing, 4(3):287–302, Fall 1997.

    MATH  MathSciNet  Google Scholar 

  13. H. v. Halteren, J. Zavrel, and W. Daelemans. Improving data driven wordclass tagging by system combination. In Proc. of COLING-ACL’98, Montreal, Canada, pages 491–497, 1998.

    Google Scholar 

  14. T. Horváth. Learning logic programs with structured background knowledge. PhD thesis, German National Research Center for Information Technology, 1999.

    Google Scholar 

  15. T. Horváth, R. H. Sloan, and G. Turán. Learning logic programs by using the product homomorphism method. In Proceedings of the 10th Annual Conference on Computational Learning Theory, pages 10–20. ACM Press, July 6–9 1997.

    Google Scholar 

  16. T. Horváth and G. Turán. Learning logic programs with structured background knowledge. In L. D. Raedt, editor, Advances in Inductive Logic Programming, pages 172–191. IOS Press, 1996.

    Google Scholar 

  17. B. Megyesi. Brill’s rule based part-of-speech tagger for Hungarian. Master’s thesis, University of Stockholm, 1998.

    Google Scholar 

  18. S. Muggleton. Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming, 13(3–4):245–286, 1995.

    Google Scholar 

  19. C. Oravecz. Morfoszintaktikai annotáció a magyar nemzeti szövegtárban. Technical report, Research Institute for Linguistics, Hungarian Academy of Sciences, 1998. (in Hungarian).

    Google Scholar 

  20. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Horváth, T., Alexin, Z., Gyimóthy, T., Wrobel, S. (1999). Application of Different Learning Methods to Hungarian Part-of-Speech Tagging. In: Džeroski, S., Flach, P. (eds) Inductive Logic Programming. ILP 1999. Lecture Notes in Computer Science(), vol 1634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48751-4_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-48751-4_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66109-2

  • Online ISBN: 978-3-540-48751-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics