Skip to main content

Prague Dependency Treebank

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

This chapter brings a relatively complete, though very brief, up-to-date information on the annotated corpus of Czech called Prague Dependency Treebank (PDT). It is the first complex linguistically motivated treebank based on a dependency syntactic theory, which contains annotation on several layers of sentence structure (Sects. 3, 4 and 5), coreference and basic discourse relations, genre specification and multiword expressions (Sect. 6). Section 7 presents a commented list of the whole PDT-style family of several follow-up treebanks developed in Prague as well as information on treebanks of other languages using the PDT-style annotation scheme in one way or another. In the last section, a brief description of the data format and the available tools is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term “tectogrammatical” for “deep structure” comes from [9] as a contrast to “phenogrammatical”, i.e. surface structure.

  2. 2.

    Modifications and additions reflected in PDT 3.0 are given in [26].

  3. 3.

    See e.g. the difference noted already by N. Chomsky between Two languages are spoken by everybody in this room versus, Everybody in this room speaks two languages, or John introduced Mary only to Jim , versus John introduced only Mary to Jim.

  4. 4.

    The vertical arrow going from node #PersPron to node firma (‘the firm’).

  5. 5.

    Three dark horizontal arrows going from nodes firma (‘the firm’), #PersPron, and Ostravsko (‘Ostravsko (region)’).

  6. 6.

    The light horizontal arrow going from the node Frýdeckomístecko (‘Frydeckomistecko (region)’).

  7. 7.

    Other values of the attribute range may indicate that a given argument consists of several sentences, or a complex set of nodes.

References

  1. Bamman, D., Crane G.: The design and use of a Latin Dependency Treebank. In: Proceedings of the Fifth International Treebanks and Linguistic Theories Conference TLT 2006, Prague, Czech Republic, pp. 67–78 (2006)

    Google Scholar 

  2. Bejček, E., Straňák, P.: Annotation of multiword expressions in the Prague Dependency Treebank. In: Language Resources and Evaluation, vol. 44, No. 1–2, pp 7–21. Springer, Netherlands (2010)

    Google Scholar 

  3. Bejček, E., Panevová, J., Popelka, J., Smejkalová, L., Straňák, P., Ševčíková, M., Štěpánek, J., Toman, J., Žabokrtský, Z., Hajič, J.: Prague Dependency Treebank 2.5. Data/software, ÚFAL MFF UK Praha, Prague, Czech Republic. http://ufal.mff.cuni.cz/pdt2.5/ (2011)

  4. Bejček, E., Hajičová, E., Hajič, J., Jínová, P., Kettnerová, V., Kolářová, V., Mikulová, M., Mírovský, J., Nedoluzhko, A., Panevová, J., Poláková, L., Ševčíková, M., Štěpánek, J., Zikánová, Š.: Prague Dependency Treebank 3.0. Data/software, Univerzita Karlova v Praze, MFF, ÚFAL, Prague, Czech republic. http://ufal.mff.cuni.cz/pdt3.0/ (2013)

  5. Berovic, D., Agic, Z., Tadić, M.: Croatian Dependency Treebank: recent development and initial experiments. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 1902–1906 (2012)

    Google Scholar 

  6. Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague dependency treebank: a 3-level annotation scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 103–128. Kluwer, Dordrecht (2003)

    Chapter  Google Scholar 

  7. Bojar, O., Žabokrtský, Z., Dušek, O., Galuščáková, P., Majliš, M., Mareček, D., Maršík, J., Novák, M., Popel, M., Tamchyna, A.: The joy of parallelism with CzEng 1.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 3921–3928. European Language Resources Association, İstanbul (2012). ISBN 978-2-9517408-7-7

    Google Scholar 

  8. Čmejrek, M., Cuřín, J., Havelka, J., Hajič, J., Kuboň, V.: Prague Czech–English Dependecy Treebank: syntactically annotated resources for machine translation. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisboa, Portugal, pp. 1597–1600 (2004). ISBN 2-9517408-1-6

    Google Scholar 

  9. Curry, H.B.: Some logical aspects of grammatical structure. In: Jakobson, R. ed., Proceedings of the Symposium, Structure of Language and its Mathematical Aspects, in Applied Mathematics 12, Providence, R.I, pp. 56–68 (1961)

    Google Scholar 

  10. Džeroski, S., Erjavec, T., Ledinek, N., Pajas, P., Žabokrtský, Ž., Žele, A.: Towards a Slovene Dependency Treebank. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, pp. 1388–1391 (2006)

    Google Scholar 

  11. Hajič, J.: Building a syntactically annotated Corpus: the Prague Dependency Treebank. In: Hajičová, E. (ed.) Issues of Valency and Meaning, Studies in Honour of Jarmila Panevová

    Google Scholar 

  12. Hajič, J., Vidová Hladká, B., Panevová, J., Hajičová, E., Sgall, P., Pajas, P.: Prague Dependency Treebank 1.0 (Final Production Label). CDROM, Linguistic Data Consortium, Philadelphia, PA, USA, LDC2001T10 (2001). ISBN 1-58563-212-0

    Google Scholar 

  13. Hajič, J., Panevová, J., Urešová, Z., Bémová, A., Kolářová, V., Pajas, P.: PDT-VALLEX: creating a large-coverage valency lexicon for treebank annotation. In: Proceedings of the Second Workshop on Treebanks and Linguistic Theories, pp. 57–68. Vaxjo University Press, Vaxjo (2003)

    Google Scholar 

  14. Hajič, J., Panevová, J., Hajičová, E., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M., Žabokrtský, Z., Ševčíková-Razímová, M., Urešová, Z.: Prague Dependency Treebank 2.0. Software prototype, linguistic data consortium, Philadelphia, PA, USA (2006). www.ldc.upenn.edu, ISBN 1-58563-370-4

    Google Scholar 

  15. Hajič, J., Hajičová, E., Panevová, J., Sgall, P., Bojar, O., Cinková, S., Fučíková, E., Mikulová, M., Pajas, P., Popelka, J., Semecký, J., Šindlerová, J., Štěpánek, J., Toman, J., Urešová, Z., Žabokrtský, Z.: Announcing Prague Czech–English Dependency Treebank 2.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 3153–3160. European Language Resources Association, İstanbul (2012). ISBN 978-2-9517408-7-7

    Google Scholar 

  16. Hajičová, E.: Presupposition and allegation revisited. J. Pragmat. 8, 155–167 (1984)

    Article  Google Scholar 

  17. Hajičová, E.: Theoretical description of language as a basis of corpus annotation: the case of Prague dependency treebank. Prague Linguistic Circle Papers 4, pp. 111–127. John Benjamins, Amsterdam/Philadelphia (2002)

    Chapter  Google Scholar 

  18. Hajičová, E.: Topic-focus articulation in the Czech national corpus. In: Hladký, J. (ed.) Language and Function: To the Memory of Jan Firbas, pp. 185–194. John Benjamins, Amsterdam (2003)

    Chapter  Google Scholar 

  19. Hana, J., Štěpánek, J.: Prague markup language framework. Proceedings of the Sixth Linguistic Annotation Workshop, pp. 12–21. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

  20. Hana, J., Zeman, D., Hajič, J., Hanová, H., Hladká, B., Jeřábek, E.: Manual for Morphological Annotation. Revision for the Prague Dependency Treebank 2.0. Technical report no. ÚFAL TR-2005-27, ÚFAL MFF UK, Prague, Czech Republic (2005)

    Google Scholar 

  21. Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19, 313–330 (1993)

    Google Scholar 

  22. Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., Grishman, R.: Annotating noun argument structure for NomBank. In: Proceedings of the LREC-2004 (2004a)

    Google Scholar 

  23. Meyers, A., Reeves, R., Macleod, C., Szekely, C., Zielinska, R., Young, V., Grishman, R.: The NomBank project: an interim report. In: Proceedings of the HLT-NAACL 2004 Workshop on Frontiers in Corpus Annotation (2004b)

    Google Scholar 

  24. Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolařová, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical level in the Prague Dependency Treebank. Annotation manual. Technical report no. 2006/30, ÚFAL MFF UK, Prague, Czech Republic (2006a)

    Google Scholar 

  25. Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolařová, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical level in the Prague Dependency Treebank. Reference book. Technical report no. 2006/32, ÚFAL MFF UK, Prague, Czech Republic (2006b)

    Google Scholar 

  26. Mikulová, M., Bejček, E., Mírovský, J., Nedoluzhko, A., Panevová, J., Poláková, L., Straňák, P., Ševčíková, M., Žabokrtský, Z.: From PDT 2.0 to PDT 3.0 (Modifications and Complements). Technical report no. 54, ÚFAL TR-2013-54, ÚFAL MFF UK, Prague, Czech Republic (2013)

    Google Scholar 

  27. Mladová, L., Zikánová, Š., Hajičová, E.: From sentence to discourse: building an annotation scheme for discourse based on Prague Dependency Treebank. Proceedings of LREC 2008, pp. 1–7. Marrakech, Morocco (2008)

    Google Scholar 

  28. Nedoluzhko, A., Mírovský, J.: Annotating Extended Textual Coreference and Bridging Relations in the Prague Dependency Treebank. Annotation manual. Technical report No. 44, ÚFAL MFF UK, Prague, Czech Republic (2011)

    Google Scholar 

  29. Nedoluzhko, A., Mírovský, J.: How dependency trees and tectogrammatics help annotating coreference and bridging relations in Prague Dependency Treebank. Proceedings of the Second International Conference on Dependency Linguistics, Depling 2013, pp. 244–251. Matfyzpress, Prague, Czech Republic (2013)

    Google Scholar 

  30. Pajas, P., Štěpánek, J.: XML-based representation of multi-layered annotation in the PDT 2.0. In: Proceedings of the LREC Workshop on Merging and Layering Linguistic Information (LREC 2006), Genova, Italy, pp. 40–47 (2006)

    Google Scholar 

  31. Pajas, P., Štěpánek, J.: Recent advances in a feature-rich framework for treebank annotation. In: The 22nd International Conference on Computational Linguistics - Proceedings of the Conference, vol. 2, The Coling 2008 Organizing Committee, pp. 673–680 (2008)

    Google Scholar 

  32. Pajas, P., Štěpánek, J.: System for querying syntactically annotated corpora. Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, pp. 33–36. Association for Computational Linguistics, Suntec (2009)

    Chapter  Google Scholar 

  33. Palmer, M., Kingsbury, P., Gildea, D.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  34. Panevová, J.: On verbal frames in functional generative description. Part I, Prague Bull. Math. Linguist. 22, 3–40; Part II, Prague Bull. Math. Linguist. 23, 17–52 (1974–75)

    Google Scholar 

  35. Panevová, J.: Ještě k teorii valence. Slovo Slovesn. 59(1), 1–13 (1998)

    Google Scholar 

  36. Panevová, J.: Valence a její univerzální a specifické projevy. In: Hladká, Z., Karlík, P. (eds.) Čeština - univerzália a specifika. In: Proceedings of the Conference in Šlapanice near, Brno, pp. 29–37 (1999)

    Google Scholar 

  37. Panevová, J.: K valenci substantiv (s ohledem na jejich derivaci). In: Zbornik matice srpske za slavistiku. Novi Sad, 61, 29–36 (2002)

    Google Scholar 

  38. Panevová, J., Ševčíková, M.: Delimitation of information between grammatical rules and lexicon. In: Proceedings of the International Conference on Dependency Linguistics (Depling 2011), pp. 173–182. Universitat Pompeu Fabra, Barcelona (2011). ISBN 978-84-615-1834-0

    Google Scholar 

  39. Panevová, J., Ševčíková, M.: The role of grammatical constraints in lexical component in functional generative description. In: Proceedings of the 6th International Conference on Meaning-Text Theory, Prague, 30–31 August 2013, pp. 134–143. Univerzita Karlova v Praze, Praha (2013). ISBN 978-3-86688-405-2

    Google Scholar 

  40. Pajas, P.: TrEd User’s Manual. http://ufal.mff.cuni.cz/tred/documentation/ar01-toc.html (2007)

  41. Pajas, P., Štěpánek, J.: Recent advances in a feature-rich framework for treebank annotation. In: Proceedings of Coling 2008, Manchester, pp. 673–680 (2008)

    Google Scholar 

  42. Poláková, L., Jínová, P., Zikánová, Š., Bedřichová, Z., Mírovský, J., Rysová, M., Zdeňková, J., Pavlíková, V., Hajičová, E.: Manual for Annotation of Discourse Relations in Prague Dependency Treebank. Technical report no. 2012/47, UFAL MFF UK, Prague, Czech Republic (2012a)

    Google Scholar 

  43. Poláková, L., Jínová, P., Zikánová, Š., Hajičová, E., Mírovský, J., Nedoluzhko, A., Rysová, M., Pavlíková, V., Zdeňková, J., Pergler, J., Ocelák, R.: Prague Discourse Treebank 1.0. Data/software, ÚFAL MFF UK, Prague, Czech Republic. http://ufal.mff.cuni.cz/pdit (2012b)

  44. Poláková, L., Mírovský, J., Nedoluzhko, A., Jínová, P., Zikánová, Š., Hajičová, E.: Introducing the Prague Discourse Treebank 1.0. In: Proceedings of the 6th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, pp. 91–99 (2013)

    Google Scholar 

  45. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse Treebank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  46. Prokopidis, P., Desypri, E., Koutsombogera, M., Papageorgiou, H., Piperidis, S.: Theoretical and practical issues in the construction of a Greek Dependency Treebank. In: Civit, M., Kubler, S., Antonia Marti, M. (eds.) Proceedings of The Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), pp. 149–160. Barcelona, Spain (2005)

    Google Scholar 

  47. Razímová, M., Žabokrtský, Z.: Morphological meanings in the Prague Dependency Treebank 2.0. In: Proceedings of the 8th International Conference, TSD 2005, Lecture Notes in Computer Science, vol. 3658, pp. 148–155. Springer, Berlin (2005)

    Google Scholar 

  48. Razímová, M., Žabokrtský, Z.: Annotation of grammatemes in the Prague Dependency Treebank 2.0. In: Proceedings of the LREC Workshop on Annotation Science, ELRA, Genova, Italy, pp. 12–19 (2006)

    Google Scholar 

  49. Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Academia/Reidel Publishing Company, Prague (1986)

    Google Scholar 

  50. Ševčíková, M., Žabokrtský, Z., Kr\(\mathring{u}\)za, O.: Named entities in czech: annotating data and developing NE tagger. In: Proceedings of the 10th International Conference on Text, Speech and Dialogue, Lecture Notes in Computer Science, pp. 188–195. Springer, Pilsen (2007)

    Google Scholar 

  51. Šimková, M., Garabík, R.: Cинтaкcнчecкaя pазmeткa в Cловaцком нaционaлыном коpпyce. In: Tpy д ы м e ж д y н apo д н o Й к o н ф epe н ц uu К opnyc н a я л u н г вucmua к a Sankt-Petersburg: St. Petersburg University Press, pp. 389–394(2006). ISBN 5-288-04181-4

    Google Scholar 

  52. Urešová, Z.: Building the PDT-VALLEX valency lexicon. Proceedings of the fifth Corpus Linguistics Conference, pp. 1–18. University of Liverpool, Liverpool (2012)

    Google Scholar 

  53. Urešová, Z., Pajas, P.: Diatheses in the Czech valency lexicon PDT-vallex. In: Slovko 2009, NLP, Corpus Linguistics, Corpus Based Grammar Research, Slovenská akadémia vied, Bratislava, pp. 358–376 (2009)

    Google Scholar 

  54. Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: to parse or not to parse? Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 2735–2741. European Language Resources Association, İstanbul (2012)

    Google Scholar 

  55. Zeman, D., Dušek, O., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: harmonized multi-language dependency treebank. Accepted for publication in: Language Resources and Evaluation, vol. 2014, p. 40. Springer, Netherlands (2014). ISSN 1574-020X

    Google Scholar 

Download references

Acknowledgements

The authors are deeply indebted to their colleagues Jarmila Panevová and Markéta Lopatková for their careful reading of the original version of the present contribution and for their most helpful comments and suggestions. Also the comments of the anonymous reviewers were most welcome. The responsibility for the final version, of course, rests with the authors.

The present chapter was written under the financial support of the Grant Agency of the Czech Republic (project P406/12/0658), and the Ministry of Education, Youth and Sports (LINDAT-Clarin project LM2010013). This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Hajič .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Hajič, J., Hajičová, E., Mikulová, M., Mírovský, J. (2017). Prague Dependency Treebank. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_21

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics