Language Resources and Evaluation

, Volume 46, Issue 1, pp 25–36 | Cite as

Annotation of sentence structure

Capturing the relationship between clauses in Czech sentences
  • Markéta LopatkováEmail author
  • Petr Homola
  • Natalia Klyueva
Original paper


The focus of this article is on the creation of a collection of sentences manually annotated with respect to their sentence structure. We show that the concept of linear segments—linguistically motivated units, which may be easily detected automatically—serves as a good basis for the identification of clauses in Czech. The segment annotation captures such relationships as subordination, coordination, apposition and parenthesis; based on segmentation charts, individual clauses forming a complex sentence are identified. The annotation of a sentence structure enriches a dependency-based framework with explicit syntactic information on relations among complex units like clauses. We have gathered a collection of 3,444 sentences from the Prague Dependency Treebank, which were annotated with respect to their sentence structure (these sentences comprise 10,746 segments forming 6,341 clauses). The main purpose of the project is to gain a development data—promising results for Czech NLP tools (as a dependency parser or a machine translation system for related languages) that adopt an idea of clause segmentation have been already reported. The collection of sentences with annotated sentence structure provides the possibility of further improvement of such tools.


Sentence and clause structure Dependency and coordination Annotation 



The article presents the results of the project supported by the grant No. 405/08/0681 and partially by the grant No. P202/10/1333, Grant Agency of the Czech Republic. Also, the authors are grateful to the unknown reviewers for their valuable suggestions.


  1. Abney, S. P. (1991). Parsing by chunks. In R. Berwick, S. Abney, & C. Tenny (Eds.). Principle-based parsing (pp. 257–278). Dordrecht: Kluwer Academic Publishers.CrossRefGoogle Scholar
  2. Abney, S. P. (1995). Partial parsing via finite-state cascades. Journal of Natural Language Engineering 2(4), 337–344.CrossRefGoogle Scholar
  3. Ciravegna, F., & Lavelli, A. (1999). Full text parsing using cascades of rules: An information extraction procedure. In Proceedings of EACL’99 (pp. 102–109). University of Bergen, Bergen.Google Scholar
  4. Hajič, J. (2004). Disambiguation of rich inflection (computational morphology of Czech). Prague: Karolinum Press.Google Scholar
  5. Hajič, J., Panevová, J., Buráňová, E., Urešová, Z., Bémová, A., Štěpánek, J., et al. (2004). Anotace na analytické rovině. Návod pro anotátory. UFAL/CKL technical report no. 2004/TR-2004-23, ÚFAL/CKL MFF UK.Google Scholar
  6. Hajič, J., Hajičová, E., Panevová, J., Sgall, P., Pajas, P., Štěpánek, J., et al. (2006). Prague dependency treebank 2.0. Philadelphia: Linguistic Data Consortium.Google Scholar
  7. Holan, T., & Žabokrtský, Z. (2006). Combining Czech dependency parsers. In Proceedings of TSD 2006(pp. 95–102). Springer, LNAI, Vol. 4188.Google Scholar
  8. Homola, P., & Kuboň, V. (2010). Exploiting charts in the MT between related languages. International Journal of Computational Linguistics and Applications 1(1–2), 185–199.Google Scholar
  9. Jones, B. E. M. (1994). Exploiting the role of punctuation in parsing natural text. In: Proceedings of the COLING’94, (pp. 421–425).Google Scholar
  10. Krůza, O., & Kuboň, V. (2009). Automatic extraction of clause relationships from a treebank. In Computational linguistics and intelligent text processing. Proceedings of CICLing 2009 (pp. 195–206). Springer, LNCS, Vol. 5449.Google Scholar
  11. Kuboň, V. (2001). Problems of robust parsing of Czech. PhD thesis, Faculty of Mathematics and Physics, Charles University in Prague, Prague.Google Scholar
  12. Kuboň, V., Lopatková, M., Plátek, M. & Pognan, P. (2007). A linguistically-based segmentation of complex sentences. In D. Wilson & G. Sutcliffe (Eds.). Proceedings of FLAIRS conference (pp. 368–374). Menlo Park, CA: AAAI Press.Google Scholar
  13. Lopatková, M. & Holan, T. (2009). Segmentation charts for Czech—Relations among segments in complex sentences. In A. H. Dediu, A. M. Ionescu, & C. Martín-Vide (Eds.). Proceedings of LATA 2009 (Vol. 5457, pp. 542–553). New York: Springer, LNCS.Google Scholar
  14. Lopatková, M., & Kljueva, N. (2010). Anotace segmentů. (Anotanční příručka) (in manuscript).Google Scholar
  15. Marinčič, D., Šef, T., & Gams, M. (2010). Intraclausal coordination and clause detection as a preprocessing step to dependency parsing. In V. Matoušek, & P. Mautner (Eds.) Proceedings of TSD 2009 (Vol. 5729, pp. 147–153). Springer, LNAI, New York.Google Scholar
  16. Ohno, T., Matsubara, S., Kashioka, H., Maruyama, T., & Inagaki, Y. (2006) Dependency parsing of Japanese spoken monologue based on clause boundaries. In Proceedings of COLING and ACL, ACL, (pp. 169–176).Google Scholar
  17. Šmilauer, V. (1969). Novočeská skladba (New Czech syntax). PhD thesis, Praha: Státní pedagogické nakladatelství.Google Scholar
  18. Zeman, D. (2004). Parsing with a statistical dependency model. PhD thesis, Prague: Charles University in Prague.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Markéta Lopatková
    • 1
    Email author
  • Petr Homola
    • 1
  • Natalia Klyueva
    • 1
  1. 1.Charles University in Prague, Faculty of Mathematics and PhysicsPragueCzech Republic

Personalised recommendations