Abstract
Although considerable work on named-entity recognition for few major languages exists, research on this topic in the context of Slavonic languages has been almost neglected. This paper presents a rule-based named-entity recognition system for Polish built on top of SProUT, a novel multi-lingual NLP platform. We pinpoint the encountered difficulties and present some promising evaluation results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Appelt, D., Israel, D.: An introduction to information extraction technology. In: A Tutorial prepared for IJCAI-1999 Conference (1999)
Becker, M., Drożdżyński, W., Krieger, H.-U., Piskorski, J., Schäfer, U., Xu, F.: SProUT – Shallow Processing with Typed Feature Structures and Unification. In: Proceedings of ICON 2002, Mumbai, India (2002)
Busemann, S., Krieger, H.-U.: Resources and Techniques for Multilingual Information Extraction. In: Proceedings of International Conference on Language Resources an Evaluation–LREC 2004, Lissabon, Portugal (2004)
Chinchor, N., Robinson, P.: MUC-7 Named Entity Task Definition (version 3.5). In: Proceedings of the MUC-7, Fairfax, Virginia, USA (1998)
Cunningham, H., Paskaleva, E., Bontcheva, K., Angelova, G.: Proceedings of the Workshop IESL – Information Extraction for Slavonic Languages, Borovets, Bulgaria (2003)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the ACL 2002, Philadelphia, USA (2002)
Dȩbowski, Ł.: Trigram morphosyntactic tagger for Polish. In: Proceedings of IIS 2004, Zakopane, Poland (2004)
Drożdżyński, W., Krieger, H.-U., Piskorski, J., Schäfer, U., Xu, F.: Shallow Processing with Unification and Typed Feature Structures – Foundations and Applications. German AI Journal KI-Zeitschrift, vol. 01/04, Gesellschaft für Informatik e.V. (2004)
Erjavec, T., Džeroski, S.: Lemmatising Unknown Words in Highly Inflective Languages. In: Proceedings of the IESL 2003, Borovets, Bulgaria (2003)
Grzenia, J.: Słownik nazw własnych – ortografia, wymowa, słowotwórstwo i odmiana. PWN, Seria: Słowniki Jȩzyka Polskiego (1998) ISBN: 83-01-12500-4
Krieger, H.-U., Drożdżyński, W., Piskorski, J., Scha̧fer, U., Xu, F.: A Bag of Usefull Techniques for Unification-Based Finite-State Transducers. In: Proceedings of KONVENS 2004, Vienna, Austria (2004)
Przepiórkowski, A.: Towards the design of a Syntactico-Semantic Lexicon for Polish. In: Proceedings of IIS 2004, Zakopane, Poland (2004)
Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proceedings of Morph logical Processing of Slavic Languages, EACL-2003, Budapest, Hungary (2003)
Świdziński, M., Saloni, Z.: Składnia współczesnego jȩzyka polskiego. PWN (1998) ISBN: 83-01-12712-0
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Piskorski, J. (2005). Named-Entity Recognition for Polish with SProUT. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds) Intelligent Media Technology for Communicative Intelligence. IMTCI 2004. Lecture Notes in Computer Science(), vol 3490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558637_13
Download citation
DOI: https://doi.org/10.1007/11558637_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29035-3
Online ISBN: 978-3-540-31738-8
eBook Packages: Computer ScienceComputer Science (R0)