Abstract
This chapter brings a relatively complete, though very brief, up-to-date information on the annotated corpus of Czech called Prague Dependency Treebank (PDT). It is the first complex linguistically motivated treebank based on a dependency syntactic theory, which contains annotation on several layers of sentence structure (Sects. 3, 4 and 5), coreference and basic discourse relations, genre specification and multiword expressions (Sect. 6). Section 7 presents a commented list of the whole PDT-style family of several follow-up treebanks developed in Prague as well as information on treebanks of other languages using the PDT-style annotation scheme in one way or another. In the last section, a brief description of the data format and the available tools is given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The term “tectogrammatical” for “deep structure” comes from [9] as a contrast to “phenogrammatical”, i.e. surface structure.
- 2.
Modifications and additions reflected in PDT 3.0 are given in [26].
- 3.
See e.g. the difference noted already by N. Chomsky between Two languages are spoken by everybody in this room versus, Everybody in this room speaks two languages, or John introduced Mary only to Jim , versus John introduced only Mary to Jim.
- 4.
The vertical arrow going from node #PersPron to node firma (‘the firm’).
- 5.
Three dark horizontal arrows going from nodes firma (‘the firm’), #PersPron, and Ostravsko (‘Ostravsko (region)’).
- 6.
The light horizontal arrow going from the node Frýdeckomístecko (‘Frydeckomistecko (region)’).
- 7.
Other values of the attribute range may indicate that a given argument consists of several sentences, or a complex set of nodes.
References
Bamman, D., Crane G.: The design and use of a Latin Dependency Treebank. In: Proceedings of the Fifth International Treebanks and Linguistic Theories Conference TLT 2006, Prague, Czech Republic, pp. 67–78 (2006)
Bejček, E., Straňák, P.: Annotation of multiword expressions in the Prague Dependency Treebank. In: Language Resources and Evaluation, vol. 44, No. 1–2, pp 7–21. Springer, Netherlands (2010)
Bejček, E., Panevová, J., Popelka, J., Smejkalová, L., Straňák, P., Ševčíková, M., Štěpánek, J., Toman, J., Žabokrtský, Z., Hajič, J.: Prague Dependency Treebank 2.5. Data/software, ÚFAL MFF UK Praha, Prague, Czech Republic. http://ufal.mff.cuni.cz/pdt2.5/ (2011)
Bejček, E., Hajičová, E., Hajič, J., Jínová, P., Kettnerová, V., Kolářová, V., Mikulová, M., Mírovský, J., Nedoluzhko, A., Panevová, J., Poláková, L., Ševčíková, M., Štěpánek, J., Zikánová, Š.: Prague Dependency Treebank 3.0. Data/software, Univerzita Karlova v Praze, MFF, ÚFAL, Prague, Czech republic. http://ufal.mff.cuni.cz/pdt3.0/ (2013)
Berovic, D., Agic, Z., Tadić, M.: Croatian Dependency Treebank: recent development and initial experiments. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 1902–1906 (2012)
Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague dependency treebank: a 3-level annotation scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 103–128. Kluwer, Dordrecht (2003)
Bojar, O., Žabokrtský, Z., Dušek, O., Galuščáková, P., Majliš, M., Mareček, D., Maršík, J., Novák, M., Popel, M., Tamchyna, A.: The joy of parallelism with CzEng 1.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 3921–3928. European Language Resources Association, İstanbul (2012). ISBN 978-2-9517408-7-7
Čmejrek, M., Cuřín, J., Havelka, J., Hajič, J., Kuboň, V.: Prague Czech–English Dependecy Treebank: syntactically annotated resources for machine translation. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisboa, Portugal, pp. 1597–1600 (2004). ISBN 2-9517408-1-6
Curry, H.B.: Some logical aspects of grammatical structure. In: Jakobson, R. ed., Proceedings of the Symposium, Structure of Language and its Mathematical Aspects, in Applied Mathematics 12, Providence, R.I, pp. 56–68 (1961)
Džeroski, S., Erjavec, T., Ledinek, N., Pajas, P., Žabokrtský, Ž., Žele, A.: Towards a Slovene Dependency Treebank. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, pp. 1388–1391 (2006)
Hajič, J.: Building a syntactically annotated Corpus: the Prague Dependency Treebank. In: Hajičová, E. (ed.) Issues of Valency and Meaning, Studies in Honour of Jarmila Panevová
Hajič, J., Vidová Hladká, B., Panevová, J., Hajičová, E., Sgall, P., Pajas, P.: Prague Dependency Treebank 1.0 (Final Production Label). CDROM, Linguistic Data Consortium, Philadelphia, PA, USA, LDC2001T10 (2001). ISBN 1-58563-212-0
Hajič, J., Panevová, J., Urešová, Z., Bémová, A., Kolářová, V., Pajas, P.: PDT-VALLEX: creating a large-coverage valency lexicon for treebank annotation. In: Proceedings of the Second Workshop on Treebanks and Linguistic Theories, pp. 57–68. Vaxjo University Press, Vaxjo (2003)
Hajič, J., Panevová, J., Hajičová, E., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M., Žabokrtský, Z., Ševčíková-Razímová, M., Urešová, Z.: Prague Dependency Treebank 2.0. Software prototype, linguistic data consortium, Philadelphia, PA, USA (2006). www.ldc.upenn.edu, ISBN 1-58563-370-4
Hajič, J., Hajičová, E., Panevová, J., Sgall, P., Bojar, O., Cinková, S., Fučíková, E., Mikulová, M., Pajas, P., Popelka, J., Semecký, J., Šindlerová, J., Štěpánek, J., Toman, J., Urešová, Z., Žabokrtský, Z.: Announcing Prague Czech–English Dependency Treebank 2.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 3153–3160. European Language Resources Association, İstanbul (2012). ISBN 978-2-9517408-7-7
Hajičová, E.: Presupposition and allegation revisited. J. Pragmat. 8, 155–167 (1984)
Hajičová, E.: Theoretical description of language as a basis of corpus annotation: the case of Prague dependency treebank. Prague Linguistic Circle Papers 4, pp. 111–127. John Benjamins, Amsterdam/Philadelphia (2002)
Hajičová, E.: Topic-focus articulation in the Czech national corpus. In: Hladký, J. (ed.) Language and Function: To the Memory of Jan Firbas, pp. 185–194. John Benjamins, Amsterdam (2003)
Hana, J., Štěpánek, J.: Prague markup language framework. Proceedings of the Sixth Linguistic Annotation Workshop, pp. 12–21. Association for Computational Linguistics, Stroudsburg (2012)
Hana, J., Zeman, D., Hajič, J., Hanová, H., Hladká, B., Jeřábek, E.: Manual for Morphological Annotation. Revision for the Prague Dependency Treebank 2.0. Technical report no. ÚFAL TR-2005-27, ÚFAL MFF UK, Prague, Czech Republic (2005)
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19, 313–330 (1993)
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., Grishman, R.: Annotating noun argument structure for NomBank. In: Proceedings of the LREC-2004 (2004a)
Meyers, A., Reeves, R., Macleod, C., Szekely, C., Zielinska, R., Young, V., Grishman, R.: The NomBank project: an interim report. In: Proceedings of the HLT-NAACL 2004 Workshop on Frontiers in Corpus Annotation (2004b)
Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolařová, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical level in the Prague Dependency Treebank. Annotation manual. Technical report no. 2006/30, ÚFAL MFF UK, Prague, Czech Republic (2006a)
Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolařová, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical level in the Prague Dependency Treebank. Reference book. Technical report no. 2006/32, ÚFAL MFF UK, Prague, Czech Republic (2006b)
Mikulová, M., Bejček, E., Mírovský, J., Nedoluzhko, A., Panevová, J., Poláková, L., Straňák, P., Ševčíková, M., Žabokrtský, Z.: From PDT 2.0 to PDT 3.0 (Modifications and Complements). Technical report no. 54, ÚFAL TR-2013-54, ÚFAL MFF UK, Prague, Czech Republic (2013)
Mladová, L., Zikánová, Š., Hajičová, E.: From sentence to discourse: building an annotation scheme for discourse based on Prague Dependency Treebank. Proceedings of LREC 2008, pp. 1–7. Marrakech, Morocco (2008)
Nedoluzhko, A., Mírovský, J.: Annotating Extended Textual Coreference and Bridging Relations in the Prague Dependency Treebank. Annotation manual. Technical report No. 44, ÚFAL MFF UK, Prague, Czech Republic (2011)
Nedoluzhko, A., Mírovský, J.: How dependency trees and tectogrammatics help annotating coreference and bridging relations in Prague Dependency Treebank. Proceedings of the Second International Conference on Dependency Linguistics, Depling 2013, pp. 244–251. Matfyzpress, Prague, Czech Republic (2013)
Pajas, P., Štěpánek, J.: XML-based representation of multi-layered annotation in the PDT 2.0. In: Proceedings of the LREC Workshop on Merging and Layering Linguistic Information (LREC 2006), Genova, Italy, pp. 40–47 (2006)
Pajas, P., Štěpánek, J.: Recent advances in a feature-rich framework for treebank annotation. In: The 22nd International Conference on Computational Linguistics - Proceedings of the Conference, vol. 2, The Coling 2008 Organizing Committee, pp. 673–680 (2008)
Pajas, P., Štěpánek, J.: System for querying syntactically annotated corpora. Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, pp. 33–36. Association for Computational Linguistics, Suntec (2009)
Palmer, M., Kingsbury, P., Gildea, D.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Panevová, J.: On verbal frames in functional generative description. Part I, Prague Bull. Math. Linguist. 22, 3–40; Part II, Prague Bull. Math. Linguist. 23, 17–52 (1974–75)
Panevová, J.: Ještě k teorii valence. Slovo Slovesn. 59(1), 1–13 (1998)
Panevová, J.: Valence a její univerzální a specifické projevy. In: Hladká, Z., Karlík, P. (eds.) Čeština - univerzália a specifika. In: Proceedings of the Conference in Šlapanice near, Brno, pp. 29–37 (1999)
Panevová, J.: K valenci substantiv (s ohledem na jejich derivaci). In: Zbornik matice srpske za slavistiku. Novi Sad, 61, 29–36 (2002)
Panevová, J., Ševčíková, M.: Delimitation of information between grammatical rules and lexicon. In: Proceedings of the International Conference on Dependency Linguistics (Depling 2011), pp. 173–182. Universitat Pompeu Fabra, Barcelona (2011). ISBN 978-84-615-1834-0
Panevová, J., Ševčíková, M.: The role of grammatical constraints in lexical component in functional generative description. In: Proceedings of the 6th International Conference on Meaning-Text Theory, Prague, 30–31 August 2013, pp. 134–143. Univerzita Karlova v Praze, Praha (2013). ISBN 978-3-86688-405-2
Pajas, P.: TrEd User’s Manual. http://ufal.mff.cuni.cz/tred/documentation/ar01-toc.html (2007)
Pajas, P., Štěpánek, J.: Recent advances in a feature-rich framework for treebank annotation. In: Proceedings of Coling 2008, Manchester, pp. 673–680 (2008)
Poláková, L., Jínová, P., Zikánová, Š., Bedřichová, Z., Mírovský, J., Rysová, M., Zdeňková, J., Pavlíková, V., Hajičová, E.: Manual for Annotation of Discourse Relations in Prague Dependency Treebank. Technical report no. 2012/47, UFAL MFF UK, Prague, Czech Republic (2012a)
Poláková, L., Jínová, P., Zikánová, Š., Hajičová, E., Mírovský, J., Nedoluzhko, A., Rysová, M., Pavlíková, V., Zdeňková, J., Pergler, J., Ocelák, R.: Prague Discourse Treebank 1.0. Data/software, ÚFAL MFF UK, Prague, Czech Republic. http://ufal.mff.cuni.cz/pdit (2012b)
Poláková, L., Mírovský, J., Nedoluzhko, A., Jínová, P., Zikánová, Š., Hajičová, E.: Introducing the Prague Discourse Treebank 1.0. In: Proceedings of the 6th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, pp. 91–99 (2013)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse Treebank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Prokopidis, P., Desypri, E., Koutsombogera, M., Papageorgiou, H., Piperidis, S.: Theoretical and practical issues in the construction of a Greek Dependency Treebank. In: Civit, M., Kubler, S., Antonia Marti, M. (eds.) Proceedings of The Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), pp. 149–160. Barcelona, Spain (2005)
Razímová, M., Žabokrtský, Z.: Morphological meanings in the Prague Dependency Treebank 2.0. In: Proceedings of the 8th International Conference, TSD 2005, Lecture Notes in Computer Science, vol. 3658, pp. 148–155. Springer, Berlin (2005)
Razímová, M., Žabokrtský, Z.: Annotation of grammatemes in the Prague Dependency Treebank 2.0. In: Proceedings of the LREC Workshop on Annotation Science, ELRA, Genova, Italy, pp. 12–19 (2006)
Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Academia/Reidel Publishing Company, Prague (1986)
Ševčíková, M., Žabokrtský, Z., Kr\(\mathring{u}\)za, O.: Named entities in czech: annotating data and developing NE tagger. In: Proceedings of the 10th International Conference on Text, Speech and Dialogue, Lecture Notes in Computer Science, pp. 188–195. Springer, Pilsen (2007)
Šimková, M., Garabík, R.: Cинтaкcнчecкaя pазmeткa в Cловaцком нaционaлыном коpпyce. In: Tpy д ы м e ж д y н apo д н o Й к o н ф epe н ц uu К opnyc н a я л u н г вucmua к a Sankt-Petersburg: St. Petersburg University Press, pp. 389–394(2006). ISBN 5-288-04181-4
Urešová, Z.: Building the PDT-VALLEX valency lexicon. Proceedings of the fifth Corpus Linguistics Conference, pp. 1–18. University of Liverpool, Liverpool (2012)
Urešová, Z., Pajas, P.: Diatheses in the Czech valency lexicon PDT-vallex. In: Slovko 2009, NLP, Corpus Linguistics, Corpus Based Grammar Research, Slovenská akadémia vied, Bratislava, pp. 358–376 (2009)
Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: to parse or not to parse? Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 2735–2741. European Language Resources Association, İstanbul (2012)
Zeman, D., Dušek, O., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: harmonized multi-language dependency treebank. Accepted for publication in: Language Resources and Evaluation, vol. 2014, p. 40. Springer, Netherlands (2014). ISSN 1574-020X
Acknowledgements
The authors are deeply indebted to their colleagues Jarmila Panevová and Markéta Lopatková for their careful reading of the original version of the present contribution and for their most helpful comments and suggestions. Also the comments of the anonymous reviewers were most welcome. The responsibility for the final version, of course, rests with the authors.
The present chapter was written under the financial support of the Grant Agency of the Czech Republic (project P406/12/0658), and the Ministry of Education, Youth and Sports (LINDAT-Clarin project LM2010013). This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Hajič, J., Hajičová, E., Mikulová, M., Mírovský, J. (2017). Prague Dependency Treebank. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_21
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_21
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)