Abstract
The paper presents both conceptual and technical issues related to the construc- tion of an HPSG test-suite for Polish. The test-suite consists of sentences of written Polish — both grammatical and ungrammatical. Each sentence is annotated with a list of linguistic phenomena it illustrates. Additionally, grammatical sentences are encoded in HPSG-style AVM structures. We describe also a technical organization of the database, as well as possible operations on it.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bańko, M. (1990). Niektóre problemy oceny adekwatności gramatyk (na przykladzie gramatyki Szpakowicza). Studia Gramatyczne IX, Prace Instytutu Języka Polskiego. Zakład narodowy im. Ossolińskich, Wroclaw.
Bartkowski, W. (2000). Komputerowa baza analiz gramatycznych w formal-izmie HPSG. Master’s thesis, Uniwersytet Warszawski, Wydzial Matema-tyki, Informatyki i Mechaniki, Warszawa.
Bień, J. S., Szafran, K., Woliński, M. (2000). Experimental parsers of Polish. Proceedings of FDSL 3, Leipzig.
Brill, E. (1993). Automatic grammar induction and parsing free text: A transformation-based approach. Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL’93), Columbus, OH.
Charniak, E. (1996). Tree-bank grammars. In AAAI-96 — Proceedings of the Thirteenth National Conference on Artificial Intelligence, p. 1031–1036, Cambridge, MA. The MIT Press.
Czuba, K., Przepiorkowski, A. (1995). Agreement and case assignment in Polish: An attempt at a unified account. Technical Report 783, Institute of Computer Science, Polish Academy of Sciences.
Frank, A., Sadler, A., van Genabith, J., Way, A. (2003). From treebank resources to LFG F-Structures. This volume.
Hajič, J. (1998). Building a syntactically annotated corpus: The prague dependency treebank. In Hajičová, E. (ed), Issues of Valency and Meaning. Studies in honour of Jarmila Panevowá, p. 106–132. Karolinum, Prague.
Hajič, J. and Hajičová, E. (1997). Syntactic tagging in the Prague tree bank. Proceedings of the Second European Seminar Language Applications for a Multilingual Europe, Kaunas.
Johnson, M., Geman, S., Canon, S., Chi, Z., Riezler, S. (1999). Estimators for stochastic “unification-based” grammars. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL’99), College Park, MD.
Johnson, M., Riezler, S. (2000). Exploiting auxiliary distributions in stochastic unification-based grammars. Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL 2000), Seattle, WA.
Lehmann, S., Oepen, S., Regnier-Prost, S., Netter, K., Lux, V., Klein, J., Falkedal, K., Fouvry, F., Estival, D., Dauphin, E., Compagnon, H., Baur, J., Balkan, L., Arnold, D. (1996). TSNLP — test suites for natural language processing. Proceedings of COLING 1996, Kopenhagen.
Marciniak, M., Mykowiecka, A., Kupść, A., Węgiel, M. (2000). Klasyfikacja zjawisk syntaktycznych na potrzeby testowego zbioru wyrazeń języka polskiego. Technical Report 908, Institute of Computer Science, Polish Academy of Sciences.
Obrębski, T. (2002). Automatyczna analiza składniowa języka polskiego z wykorzystaniem gramatyki zaleznościowej. PhD dissertation, Uniwersytet Adama Mickiewicza, Poznań.
Oepen, S., Netter, K., Klein, J. (1998). TSNLP — test suites for natural language processing. In Nerbonne, J. (ed), Linguistic Databases, CSLI Lecture Notes. CSLI Publications, Stanford.
Osborne, M. (2000). Estimation of stochastic attribute-value grammars using an informative sample. Proceedings of COLING 2000, Saarbrücken, Germany.
Pollard, C., Sag, I. A. (1994). Head-driven Phrase Structure Grammar. Chicago University Press, Chicago.
Przepiórkowski, A. (1999). Case Assignment and the Complement-Adjunct Dichotomy: A Non-Configurational Constraint-Based Approach. PhD thesis, Universität Tübingen, Germany.
Riezler, S., Prescher, D., Kuhn, J., Johnson, M. (2000). Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong.
Świdziński, M. (1992). Gramatyka formalna języka polskiego, volume 349 of Rozprawy Uniwersytetu Warszawskiego. Wydawnictwa Uniwersytetu War-szawskiego, Warszawa.
Szpakowicz, S. (1986). Formalny opis składniowy zdań polskich. Wydawnictwa Uniwersytetu Warszawskiego, Warszawa.
van Genabith, J., Way, A., Sadler, L. (1999). Semi-automatic generation of f-structures from treebanks. In Butt, M., Holloway King, T. (eds), Proceedings of the LFG99 Conference, Stanford, CA. CSLI Publications. http://www-csli.stanford.edu/publications/.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Marciniak, M., Mykowiecka, A., Przepiórkowski, A., Kupść, A. (2003). An HPSG-Annotated Test Suite for Polish. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_8
Download citation
DOI: https://doi.org/10.1007/978-94-010-0201-1_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1335-5
Online ISBN: 978-94-010-0201-1
eBook Packages: Springer Book Archive