Skip to main content

Named-Entity Recognition for Polish with SProUT

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3490))

Abstract

Although considerable work on named-entity recognition for few major languages exists, research on this topic in the context of Slavonic languages has been almost neglected. This paper presents a rule-based named-entity recognition system for Polish built on top of SProUT, a novel multi-lingual NLP platform. We pinpoint the encountered difficulties and present some promising evaluation results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Appelt, D., Israel, D.: An introduction to information extraction technology. In: A Tutorial prepared for IJCAI-1999 Conference (1999)

    Google Scholar 

  2. Becker, M., Drożdżyński, W., Krieger, H.-U., Piskorski, J., Schäfer, U., Xu, F.: SProUT – Shallow Processing with Typed Feature Structures and Unification. In: Proceedings of ICON 2002, Mumbai, India (2002)

    Google Scholar 

  3. Busemann, S., Krieger, H.-U.: Resources and Techniques for Multilingual Information Extraction. In: Proceedings of International Conference on Language Resources an Evaluation–LREC 2004, Lissabon, Portugal (2004)

    Google Scholar 

  4. Chinchor, N., Robinson, P.: MUC-7 Named Entity Task Definition (version 3.5). In: Proceedings of the MUC-7, Fairfax, Virginia, USA (1998)

    Google Scholar 

  5. Cunningham, H., Paskaleva, E., Bontcheva, K., Angelova, G.: Proceedings of the Workshop IESL – Information Extraction for Slavonic Languages, Borovets, Bulgaria (2003)

    Google Scholar 

  6. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the ACL 2002, Philadelphia, USA (2002)

    Google Scholar 

  7. Dȩbowski, Ł.: Trigram morphosyntactic tagger for Polish. In: Proceedings of IIS 2004, Zakopane, Poland (2004)

    Google Scholar 

  8. Drożdżyński, W., Krieger, H.-U., Piskorski, J., Schäfer, U., Xu, F.: Shallow Processing with Unification and Typed Feature Structures – Foundations and Applications. German AI Journal KI-Zeitschrift, vol. 01/04, Gesellschaft für Informatik e.V. (2004)

    Google Scholar 

  9. Erjavec, T., Džeroski, S.: Lemmatising Unknown Words in Highly Inflective Languages. In: Proceedings of the IESL 2003, Borovets, Bulgaria (2003)

    Google Scholar 

  10. Grzenia, J.: Słownik nazw własnych – ortografia, wymowa, słowotwórstwo i odmiana. PWN, Seria: Słowniki Jȩzyka Polskiego (1998) ISBN: 83-01-12500-4

    Google Scholar 

  11. Krieger, H.-U., Drożdżyński, W., Piskorski, J., Scha̧fer, U., Xu, F.: A Bag of Usefull Techniques for Unification-Based Finite-State Transducers. In: Proceedings of KONVENS 2004, Vienna, Austria (2004)

    Google Scholar 

  12. Przepiórkowski, A.: Towards the design of a Syntactico-Semantic Lexicon for Polish. In: Proceedings of IIS 2004, Zakopane, Poland (2004)

    Google Scholar 

  13. Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proceedings of Morph logical Processing of Slavic Languages, EACL-2003, Budapest, Hungary (2003)

    Google Scholar 

  14. Świdziński, M., Saloni, Z.: Składnia współczesnego jȩzyka polskiego. PWN (1998) ISBN: 83-01-12712-0

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Piskorski, J. (2005). Named-Entity Recognition for Polish with SProUT. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds) Intelligent Media Technology for Communicative Intelligence. IMTCI 2004. Lecture Notes in Computer Science(), vol 3490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558637_13

Download citation

  • DOI: https://doi.org/10.1007/11558637_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29035-3

  • Online ISBN: 978-3-540-31738-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics