Skip to main content
Log in

Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool

  • Project Notes
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process that we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, the BOUN Treebank is the largest Turkish UD treebank. It contains a total of 9761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regards to dependency parsing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Data availability

Our materials regarding our treebank and tool are available online. The links are provided within the text.

Code availability

Our code regarding R and Python scripts are available online. The links are provided within the text.

Notes

  1. UD version 2.7. Available at http://hdl.handle.net/11234/1-3424.

  2. Conventions used in the paper are as follows: 1 = first person, 2 = second person, 3 = third person, abl = ablative, acc = accusative, aor = aorist, caus = causative, cl = classifier, com = comitative, cond = conditional, cop = copula, cvb = converb, dat = dative, emph = emphasis, fut = future, gen = genitive, hnr = honorific, imp = imperative, loc = locative, neg = negative, nmlz = nominalizer, pass = passive, pl = plural, poss = possessive, prog = progressive, pst = past, q = question particle, sg = singular. The dash symbol (-) in linguistics examples marks morpheme boundary, the equal sign (=) is used when the morpheme attached to a base is a clitic. The tilde ~ is used to indicate partial replication. The asterisk * at the beginning of a sentence shows the sentence’s ungrammaticality, and the percentage symbol (%) shows the marginal acceptability of the sentence. Additionally, we presented the analytic words within a box when they are segmented for annotation.

  3. UD version number of these treebanks is 2.7. Turkish PUD version 2.7 is our re-annotated version.

  4. Our treebank is available online at https://github.com/UniversalDependencies/UD_Turkish-BOUN/.

  5. This table is retrieved from https://www.tnc.org.tr/about-the-corpus/object/ on September 15, 2020.

  6. https://github.com/boun-tabi/UD_docs.

  7. For more information on the UD framework, see https://universaldependencies.org/u/dep/index.html. For our annotation guidelines, please see https://github.com/boun-tabi/UD_docs.

  8. For the complete table of syntactic relations, please check https://universaldependencies.org/u/dep/index.html.

  9. Throughout the paper, changes in the annotation convention introduced by us are shown with bold arcs, whereas the dashed arcs suggest previous annotations. The solid arcs represent unaltered dependencies. Every annotated tree that contains a bold arc in this paper is taken from previous Turkish treebanks, that is either the IMST-UD Treebank or the Turkish PUD Treebank.

  10. We thank the anonymous reviewer for pointing out this issue and initiate this discussion.

  11. For the whole discussion, see https://github.com/UniversalDependencies/docs/issues/639.

  12. Note that in certain environments where there is an immediate follow-up sentence to Example 21, com-marked argument can still be omitted as in (i). We thank the anonymous reviewer for pointing this out.

  13. BoAT is available at https://github.com/boun-tabi/BoAT.

  14. https://github.com/universaldependencies/tools.

  15. https://universaldependencies.org/ext-format.html.

  16. These treebanks are available at https://github.com/boun-tabi/UD_Turkish-BIMST and https://github.com/UniversalDependencies/UD_Turkish-PUD.

  17. In a non-projective sentence, the dependency edges cannot be drawn in the plane above the sentence without any two edges crossing each other, as in (iii). However, in a projective sentence, the dependency edges can be drawn in this manner with no edges crossing, as in (ii) (Nivre, 2009).

  18. The re-annotation process was performed on the UD 2.3 versions of these treebanks.

References

  • Aksan, Y., Aksan, M., Koltuksuz, A., Sezer, T., Mersinli, Ü., Demirhan, U. U., Yılmazer, H., Atasoy, G., Öz, S., Yıldız, İ., & Kurtoğlu, Ö. (2012). Construction of the Turkish National Corpus (TNC). In Proceedings of the eighth international conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul, Turkey, pp. 3223–3227, http://www.lrec-conf.org/proceedings/lrec2012/pdf/991_Paper.pdf.

  • Atalay, N.B., Oflazer, K., & Say, B. (2003). The annotation process in the Turkish Treebank. In Proceedings of 4th international workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003. https://www.aclweb.org/anthology/W03-2405.

  • Aygen, G. (2003). Extractability and the nominative case feature on tense. In S. Özsoy, D. Akar, M. Nakipoğlu-Demiralp, E. E. Taylan, & A. Aksu-Koç (Eds.), Studies in Turkish linguistics: Proceedings of the 10th international conference in Turkish linguistics, İstanbul.

  • Ballesteros, M., Herrera, J., Francisco, V., & Gervás, P. (2012). Are the existing training corpora unnecessarily large? Procesamiento del Lenguaje Natural, 48, 21–27.

    Google Scholar 

  • Bickel, B., & Nichols, J. (2013). Inflectional synthesis of the verb. In M. S. Dryer & M. Haspelmath (Eds.), The World atlas of language structures online. Max Planck Institute for Evolutionary Anthropology.

    Google Scholar 

  • Borges Völker, E., Wendt, M., Hennig, F., & Köhn, A. (2019). HDT-UD: A very large Universal Dependencies treebank for German. In Proceedings of the third workshop on Universal Dependencies (UDW, SyntaxFest 2019), Association for Computational Linguistics, Paris, France, pp. 46–57. https://doi.org/10.18653/v1/W19-8006, https://www.aclweb.org/anthology/W19-8006.

  • Brants, S., Dipper, S., Hansen, S., Lezius, W., & Smith, G. (2002). The Tiger treebank. In Proceedings of the workshop on treebanks and linguistic theories (Vol. 168).

  • Brants, T. (2000). TnT - a statistical part-of-speech tagger. Sixth Applied Natural Language Processing Conference (pp. 224–231). Association for Computational Linguistics. https://doi.org/10.3115/974147.974178.

    Chapter  Google Scholar 

  • Çetinoğlu, Ö., & Çöltekin, Ç. (2019). Challenges of annotating a code-switching treebank. In Proceedings of the 18th international workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019), Paris, France, pp. 82–90, https://doi.org/10.18653/v1/W19-7809, https://www.aclweb.org/anthology/W19-7809

  • Çetinoğlu, Ö. (2009). A large scale LFG grammar for Turkish. Ph.D. Thesis, Sabanci University.

  • Çöltekin, Ç. (2010). A freely available morphological analyzer for Turkish. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta, http://www.lrec-conf.org/proceedings/lrec2010/pdf/109_Paper.pdf

  • Çöltekin, Ç. (2015). A grammar-book treebank of Turkish. In M. Dickinson, E. Hinrichs, A. Patejuk, & A. Przepiórkowski (Eds.), Proceedings of the 14th workshop on Treebanks and linguistic theories (TLT 14) (pp. 35–49).

  • Çöltekin, Ç. (2016). (When) do we need inflectional groups? In Proceedings of The 1st international conference on Turkic computational linguistics.

  • Dozat, T., Qi, P., & Manning, C. D. (2017). Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In Proceedings of the CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies (pp. 20–30)

  • Durgar El-Kahlout, İ., Akın, A. A., & Yılmaz, E. (2014). Initial explorations in two-phase Turkish dependency parsing by incorporating constituents. In Proceedings of the first joint workshop on statistical parsing of morphologically rich languages and syntactic analysis of non-canonical languages, Dublin City University, Dublin, Ireland, pp. 82–89. https://www.aclweb.org/anthology/W14-6108.

  • Eryiğit, G. (2007). ITU treebank annotation tool. In Proceedings of the linguistic annotation workshop (pp. 117–120).

  • Eryiğit, G., & Pamay, T. (2007). ITU validation set for Metu-Sabancı Turkish treebank. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 7(1), 31–37.

    Google Scholar 

  • Eryiğit, G., Nivre, J., & Oflazer, K. (2008). Dependency parsing of Turkish. Computational Linguistics, 34(3), 357–389.

    Article  Google Scholar 

  • Foth, K. A., Köhn, A., Beuck, N., & Menzel, W. (2014). Because size does matter: The Hamburg Dependency Treebank. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland, pp. 2326–2333. http://www.lrec-conf.org/proceedings/lrec2014/pdf/860_Paper.pdf.

  • Ginter, F., Hajič, J., Luotolahti, J., Straka, M., & Zeman, D. (2017). CoNLL 2017 shared task—automatically annotated raw texts and word embeddings. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.

  • Göksel, A. (2001). The auxiliary ol at the morphology–syntax interface. In E. E. Taylan (Ed.), The verb in Turkish. John Benjamins.

    Google Scholar 

  • Göksel, A., & Kerslake, C. (2005). Turkish: A comprehensive grammar. Comprehensive grammars. Routledge.

    Google Scholar 

  • Hall, J., Nilsson, J., Nivre, J., Eryiğit, G., Megyesi, B., Nilsson M., & Saers, M. (2007). Single malt or blended? A study in multilingual parser optimization. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Association for Computational Linguistics, Prague, Czech Republic, pp. 933–939, https://www.aclweb.org/anthology/D07-1097.

  • Hayashi, T. (1996). The dual status of possessive compounds in modern Turkish. Symbolae Turcologicae, 6, 119–129.

    Google Scholar 

  • Heinecke, J. (2019). ConlluEditor: a fully graphical editor for Universal Dependencies treebank files. In Proceedings of the third workshop on universal dependencies (UDW, SyntaxFest 2019), Association for Computational Linguistics, Paris, France, pp. 87–93. https://doi.org/10.18653/v1/W19-8010, https://www.aclweb.org/anthology/W19-8010.

  • Hoffman, B. (1995). The computational analysis of the syntax and interpretation of “free” word order in Turkish. PhD thesis, University of Pennsylvania.

  • İşsever, S. (2003). Information structure in Turkish: The word order-prosody interface. Lingua, 113(11), 1025–1053.

    Article  Google Scholar 

  • İşsever, S. (2007). Towards a unified account of clause-initial scrambling in Turkish: A feature analysis. Turkic Languages, 11(1), 93–123.

    Google Scholar 

  • Kanerva, J., Ginter, F., Miekka, N., Leino, A., & Salakoski, T. (2018). Turku neural parser pipeline: An end-to-end system for the CoNLL 2018 shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Association for Computational Linguistics, Brussels, Belgium, pp. 133–142. http://www.aclweb.org/anthology/K18-2013.

  • Kapan, A. (2019). Derivational networks of nouns and adjectives in Turkish. Master’s thesis, Boğaziçi University, İstanbul, Turkey.

  • Kayadelen, T., Öztürel, A., & Bohnet, B. (2020). A gold standard dependency treebank for Turkish. In Proceedings of the 12th language resources and evaluation conference, European Language Resources Association, Marseille, France, pp. 5156–5163. https://www.aclweb.org/anthology/2020.lrec-1.634.

  • Kornfilt, J. (1984). Case marking, agreement, and empty categories in Turkish. Harvard University.

    Google Scholar 

  • Kornfilt, J. (2005). Asymmetries between pre-verbal and post-verbal scrambling in Turkish. In The free word order phenomenon: Its syntactic sources and diversity (pp. 163–180). Mouton de Gruyter.

  • Kunduracı, A. (2013). Turkish noun-noun compounds: A process-based paradigmatic account. PhD thesis, University of Calgary.

  • Kural, M. (1992). Properties of scrambling in Turkish. Ms, UCLA.

  • Kural, M. (1997). Postverbal constituents in Turkish and the linear correspondence axiom. Linguistic Inquiry, 28, 498–519.

  • Leech, G., & Garside, R. (1991). Running a grammar factory: The production of syntactically analysed corpora or treebanks (pp. 15–32). English Computer Corpora: Selected Papers and Research Guide.

  • Makazhanov, A., Sultangazina, A., Makhambetov, O., & Yessenbayev, Z. (2015). Syntactic Annotation of Kazakh: Following the Universal Dependencies Guidelines. a report. In 3rd International conference on Turkic languages processing (TurkLang 2015) (pp. 338–350).

  • Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  • Megyesi, B. (2002). Data-driven syntactic analysis —Methods and applications for Swedish. PhD thesis, KTH.

  • Megyesi, B., Dahlqvist, B., Pettersson, E., & Nivre, J. (2008). Swedish-Turkish parallel treebank. In Proceedings of the sixth international conference on language resources and evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech, Morocco. http://www.lrec-conf.org/proceedings/lrec2008/pdf/121_paper.pdf.

  • Megyesi, B., Dahlqvist, B., Csató, É.Á., & Nivre, J. (2010). The English-Swedish-Turkish parallel treebank. In Proceedings of the seventh international conference on language resources and evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/116_Paper.pdf.

  • Nivre, J. (2009). Non-projective dependency parsing in expected linear time. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (pp. 351–359).

  • Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D. (2016). Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the tenth international conference on language resources and evaluation (LREC’16), European Language Resources Association (ELRA), Portorož, Slovenia, pp. 1659–1666. https://www.aclweb.org/anthology/L16-1262.

  • Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryiğit, G., Kübler, S., Marinov, S., & Marsi, E. (2007). Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.

    Article  Google Scholar 

  • Nivre, J., Nilsson, J., & Hall, J. (2006). Talbanken05: A Swedish treebank with phrase structure and dependency annotation. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06), European Language Resources Association (ELRA), Genoa, Italy. http://www.lrec-conf.org/proceedings/lrec2006/pdf/223_pdf.pdf.

  • Nivre, J., Zeman, D., Ginter, F., & Tyers, F. M. (2017). Tutorial on Universal Dependencies. Presented at European chapter of the Association for Computational Linguistics, Valencia. Retrieved April 8, 2019, from http://universaldependencies.org/eacl17tutorial/applications.pdf.

  • Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing, 9(2), 137–148.

    Article  Google Scholar 

  • Oflazer, K., Say, B., Hakkani-Tür, D. Z., & Tür, G. (2003). Building a Turkish Treebank (pp. 261–277). Springer. https://doi.org/10.1007/978-94-010-0201-1_15

  • Özsoy, A. S. (1988). Null subject parameter and Turkish. In Studies on modern Turkish: Proceedings of the 3rd conference on Turkish linguistics (pp. 82–90). Tilburg University Press.

  • Özsoy, A. S. (2019). Word Order in Turkish (Vol. 97). Springer.

    Book  Google Scholar 

  • Öztürk, B. (2006). Null arguments and case-driven agree in Turkish. In C. Boeckx (Ed.), Minimalist essays (pp. 268–287). John Benjamins Publishing Company.

    Chapter  Google Scholar 

  • Öztürk, B. (2008). Non-configurationality: Free word order and argument drop in Turkish. The limits of syntactic variation (pp. 411–440). John Benjamins Publishing Company.

    Google Scholar 

  • Öztürk, B. (2013). Postverbal constituents in SOV languages. In Theoretical approaches to disharmonic word orders (pp. 270–305). MIT.

  • Öztürk, B., & Taylan, E. E. (2016). Possessive constructions in Turkish. Lingua, 182, 88–108. https://doi.org/10.1016/j.lingua.2015.08.008

    Article  Google Scholar 

  • Pamay, T., Sulubacak, U., Torunoğlu-Selamet, D., & Eryiğit, G. (2015). The annotation process of the ITU web treebank. In Proceedings of The 9th linguistic annotation workshop, Association for Computational Linguistics, Denver, CO, USA, pp. 95–101. https://doi.org/10.3115/v1/W15-1610, https://www.aclweb.org/anthology/W15-1610.

  • Popel, M., Žabokrtský Z., & Vojtek, M. (2017). Udapi: Universal API for universal dependencies. In Proceedings of the NoDaLiDa 2017 workshop on universal dependencies (UDW 2017), Association for Computational Linguistics, Gothenburg, Sweden, pp. 96–101

  • Przepiórkowski, A., & Patejuk, A. (2018). Arguments and adjuncts in universal dependencies. In Proceedings of the 27th international conference on computational linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3837–3852. https://www.aclweb.org/anthology/C18-1324.

  • Sağ, Y. (2019). The semantics of number marking: Reference to kinds, counting, and optional classifiers. PhD thesis, Rutgers University.

  • Sak, H., Güngör, T., & Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In International conference on natural language processing (pp. 417–427). Springer.

  • Sak, H., Güngör, T., & Saraçlar, M. (2011). Resources for Turkish morphological processing. Language Resources and Evaluation, 45(2), 249–261.

    Article  Google Scholar 

  • Sampson, G. (1995). English for the computer: The SUSANNE corpus and analytic scheme. Clarendon Press.

  • Say, B., Zeyrek, D., Oflazer, K., & Özge, U. (2002). Development of a corpus and a treebank for present-day written Turkish. In Proceedings of the 11th international conference of Turkish linguistics, Eastern Mediterranean University, pp. 183–192.

  • Slobin, D. I., & Bever, T. G. (1982). Children use canonical sentence schemas: A crosslinguistic study of word order and inflections. Cognition, 12(3), 229–265.

    Article  Google Scholar 

  • Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J. (2012). Brat: A web-based tool for NLP-assisted text annotation. In Proceedings of the demonstrations at the 13th conference of the European chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Avignon, France, pp. 102–107.

  • Sulger, S., Butt, M., King, T. H., Meurer, P., Laczkó, T., Rákosi, G., Dione, C. B., Dyvik, H., Rosén, V., De Smedt, K., Patejuk, A., Çetinoğlu, Ö., Arka, I. W., & Mistica, M. (2013). ParGramBank: The ParGram parallel treebank. In Proceedings of the 51st annual meeting of the Association for Computational Linguistics (Vol. 1: Long Papers), Association for Computational Linguistics, Sofia, Bulgaria, pp. 550–560. https://www.aclweb.org/anthology/P13-1054.

  • Sulubacak, U., & Eryiğit, G. (2018). Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering & Computer Sciences, 26(3), 1662–1672.

    Google Scholar 

  • Sulubacak, U., Eryiğit, G., & Pamay, T. (2016a). IMST: A revisited Turkish dependency treebank. In Proceedings of TurCLing 2016, the 1st international conference on Turkic computational linguistics, Ege University Press.

  • Sulubacak, U., Gökırmak, M., Tyers, F., Çöltekin, Ç., Nivre, J., & Eryiğit, G. (2016b). Universal Dependencies for Turkish. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical papers, The COLING 2016 Organizing Committee, Osaka, Japan, pp. 3444–3454.

  • Taylan, E. E. (1984). The function of word order in Turkish grammar. University of California Press. https://doi.org/10.2307/415636

  • Taylan, E. E. (1986). Pronominal versus zero representation of anaphora in Turkish. In: Studies in Turkish linguistics (p. 209). John Benjamins.

  • Taylan, E. E. (2015). The phonology and morphology of Turkish. Boğaziçi University.

  • Taylan, E. E., & Öztürk Başaran, B. (2014). The notorious -(s)i(n) in Turkish: Neither an agreement nor a compound marker? Dilbilim Araştırmaları Dergisi, 2, 181–199.

    Google Scholar 

  • Türk, U., Atmaca, F., Özateş, Ş. B., Köksal, A., Öztürk Başaran, B., Güngör, T., & Özgür, A. (2019a). Turkish treebanking: Unifying and constructing efforts. In Proceedings of the 13th linguistic annotation workshop, Association for Computational Linguistics, Florence, Italy, pp. 166–177, https://doi.org/10.18653/v1/W19-4019. https://www.aclweb.org/anthology/W19-4019.

  • Türk, U., Atmaca, F., Özateş, Ş. B., Öztürk Başaran, B., Güngör, T., & Özgür, A. (2019b). Improving the annotations in the Turkish Universal Dependency treebank. In Proceedings of the third workshop on universal dependencies (UDW, SyntaxFest 2019), Association for Computational Linguistics, Paris, France, pp. 108–115. https://doi.org/10.18653/v1/W19-8013, https://www.aclweb.org/anthology/W19-8013.

  • Tyers, F. M., Sheyanova, M., & Washington, J. N. (2017a). UD annotatrix: An annotation tool for universal dependencies. In Proceedings of the 16th international workshop on treebanks and linguistic theories, Prague, Czech Republic, pp. 10–17.

  • Tyers, F. M., Washington, J., Çöltekin, Ç., & Makazhanov, A. (2017b). An assessment of Universal Dependency annotation guidelines for Turkic languages. In Proceedings of the 5th international conference on Turkic Languages Processing (TurkLang 2017), Tatarstan Academy of Sciences.

  • van der Beek, L., Bouma, G., Malouf, R., & van Noord, G. (2002). The Alpino dependency treebank. In: Theune, M., Nijholt, A., & Hondorp, H. (Eds.), 12th Meeting on Computational Linguistics in the Netherlands (CLIN) 2001, Rodopi, Language and Computers: Studies in Practical Linguistics, pp. 8–22, 30 November 2001.

  • Yıldız, O.T., Solak, E., Çandır, Ş., Ehsani, R., & Görgün, O. (2016). Constructing a Turkish constituency parse treebank. In Information Sciences and Systems 2015 (pp. 339–347). Springer.

  • Yuret, D., & Türe, F. (2006). Learning morphological disambiguation rules for Turkish. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Association for Computational Linguistics, pp. 328–334.

  • Zeman, D. (2017). Core arguments in Universal Dependencies. In Proceedings of the fourth international conference on Dependency Linguistics (Depling 2017), Linköping University Electronic Press, Pisa, Italy, pp. 287–296. https://www.aclweb.org/anthology/W17-6532.

  • Zeman, D., Hajič, J., Popel, M., Potthast, M., Straka, M., Ginter, F., Nivre, J., & Petrov, S. (2018). CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies. In Proceedings of the CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies, Association for Computational Linguistics, Brussels, Belgium, pp. 1–21. http://www.aclweb.org/anthology/K18-2001.

  • Zeman, D., Popel, M., Straka, M., Hajic, J., Nivre, J., Ginter, F., Luotolahti, J., Pyysalo, S., & Petrov, S. (2017). CoNLL 2017 shared task: Multilingual parsing from raw text to Universal Dependencies. In Proceedings of the CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies, Association for Computational Linguistics, Vancouver, Canada, pp. 1–19. http://www.aclweb.org/anthology/K/K17/K17-3001.pdf.

Download references

Acknowledgements

We are immensely grateful to Prof. Yeşim Aksan and the other members of the Turkish National Corpus Team for their tremendous help in providing us with sentences from the Turkish National Corpus. We are also thankful to the anonymous reviewers from SyntaxFest’19 and LAW XIII, as well as to Çağrı Çöltekin for his constructive comments on the re-annotation process of the IMST and PUD Treebanks. GEBIP Award of the Turkish Academy of Sciences (to A.O.) is gratefully acknowledged.

Funding

This work was supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) under grant number 117E971 and BIDEB 2211 graduate scholarship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Utku Türk.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Morphological conversion

Table 12 Mappings of morphological features from the notation of Sak et al. (2011) to the features used in the UD framework

Appendix 2: Word order statistics of the BOUN Treebank

Table 13 Word order counts and relative percentages of main arguments within the BOUN Treebank when there is no null argument
Table 14 Word order counts and percentages of main arguments within the BOUN Treebank

Appendix 3: TNC registers

Table 15 TNC Details

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Türk, U., Atmaca, F., Özateş, Ş.B. et al. Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool. Lang Resources & Evaluation 56, 259–307 (2022). https://doi.org/10.1007/s10579-021-09558-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-021-09558-0

Keywords

Navigation