Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool

Türk, Utku; Atmaca, Furkan; Özateş, Şaziye Betül; Berk, Gözde; Bedir, Seyyit Talha; Köksal, Abdullatif; Başaran, Balkız Öztürk; Güngör, Tunga; Özgür, Arzucan

doi:10.1007/s10579-021-09558-0

Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool

Project Notes
Published: 08 November 2021

Volume 56, pages 259–307, (2022)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

703 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process that we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, the BOUN Treebank is the largest Turkish UD treebank. It contains a total of 9761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regards to dependency parsing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

Our materials regarding our treebank and tool are available online. The links are provided within the text.

Code availability

Our code regarding R and Python scripts are available online. The links are provided within the text.

Notes

UD version 2.7. Available at http://hdl.handle.net/11234/1-3424.
Conventions used in the paper are as follows: 1 = first person, 2 = second person, 3 = third person, abl = ablative, acc = accusative, aor = aorist, caus = causative, cl = classifier, com = comitative, cond = conditional, cop = copula, cvb = converb, dat = dative, emph = emphasis, fut = future, gen = genitive, hnr = honorific, imp = imperative, loc = locative, neg = negative, nmlz = nominalizer, pass = passive, pl = plural, poss = possessive, prog = progressive, pst = past, q = question particle, sg = singular. The dash symbol (-) in linguistics examples marks morpheme boundary, the equal sign (=) is used when the morpheme attached to a base is a clitic. The tilde ~ is used to indicate partial replication. The asterisk * at the beginning of a sentence shows the sentence’s ungrammaticality, and the percentage symbol (%) shows the marginal acceptability of the sentence. Additionally, we presented the analytic words within a box when they are segmented for annotation.
UD version number of these treebanks is 2.7. Turkish PUD version 2.7 is our re-annotated version.
Our treebank is available online at https://github.com/UniversalDependencies/UD_Turkish-BOUN/.
This table is retrieved from https://www.tnc.org.tr/about-the-corpus/object/ on September 15, 2020.
https://github.com/boun-tabi/UD_docs.
For more information on the UD framework, see https://universaldependencies.org/u/dep/index.html. For our annotation guidelines, please see https://github.com/boun-tabi/UD_docs.
For the complete table of syntactic relations, please check https://universaldependencies.org/u/dep/index.html.
Throughout the paper, changes in the annotation convention introduced by us are shown with bold arcs, whereas the dashed arcs suggest previous annotations. The solid arcs represent unaltered dependencies. Every annotated tree that contains a bold arc in this paper is taken from previous Turkish treebanks, that is either the IMST-UD Treebank or the Turkish PUD Treebank.
We thank the anonymous reviewer for pointing out this issue and initiate this discussion.
For the whole discussion, see https://github.com/UniversalDependencies/docs/issues/639.
Note that in certain environments where there is an immediate follow-up sentence to Example 21, com-marked argument can still be omitted as in (i). We thank the anonymous reviewer for pointing this out.
BoAT is available at https://github.com/boun-tabi/BoAT.
https://github.com/universaldependencies/tools.
https://universaldependencies.org/ext-format.html.
These treebanks are available at https://github.com/boun-tabi/UD_Turkish-BIMST and https://github.com/UniversalDependencies/UD_Turkish-PUD.
In a non-projective sentence, the dependency edges cannot be drawn in the plane above the sentence without any two edges crossing each other, as in (iii). However, in a projective sentence, the dependency edges can be drawn in this manner with no edges crossing, as in (ii) (Nivre, 2009).
The re-annotation process was performed on the UD 2.3 versions of these treebanks.

References

Aksan, Y., Aksan, M., Koltuksuz, A., Sezer, T., Mersinli, Ü., Demirhan, U. U., Yılmazer, H., Atasoy, G., Öz, S., Yıldız, İ., & Kurtoğlu, Ö. (2012). Construction of the Turkish National Corpus (TNC). In Proceedings of the eighth international conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul, Turkey, pp. 3223–3227, http://www.lrec-conf.org/proceedings/lrec2012/pdf/991_Paper.pdf.
Atalay, N.B., Oflazer, K., & Say, B. (2003). The annotation process in the Turkish Treebank. In Proceedings of 4th international workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003. https://www.aclweb.org/anthology/W03-2405.
Aygen, G. (2003). Extractability and the nominative case feature on tense. In S. Özsoy, D. Akar, M. Nakipoğlu-Demiralp, E. E. Taylan, & A. Aksu-Koç (Eds.), Studies in Turkish linguistics: Proceedings of the 10th international conference in Turkish linguistics, İstanbul.
Ballesteros, M., Herrera, J., Francisco, V., & Gervás, P. (2012). Are the existing training corpora unnecessarily large? Procesamiento del Lenguaje Natural, 48, 21–27.
Google Scholar
Bickel, B., & Nichols, J. (2013). Inflectional synthesis of the verb. In M. S. Dryer & M. Haspelmath (Eds.), The World atlas of language structures online. Max Planck Institute for Evolutionary Anthropology.
Google Scholar
Borges Völker, E., Wendt, M., Hennig, F., & Köhn, A. (2019). HDT-UD: A very large Universal Dependencies treebank for German. In Proceedings of the third workshop on Universal Dependencies (UDW, SyntaxFest 2019), Association for Computational Linguistics, Paris, France, pp. 46–57. https://doi.org/10.18653/v1/W19-8006, https://www.aclweb.org/anthology/W19-8006.
Brants, S., Dipper, S., Hansen, S., Lezius, W., & Smith, G. (2002). The Tiger treebank. In Proceedings of the workshop on treebanks and linguistic theories (Vol. 168).
Brants, T. (2000). TnT - a statistical part-of-speech tagger. Sixth Applied Natural Language Processing Conference (pp. 224–231). Association for Computational Linguistics. https://doi.org/10.3115/974147.974178.
Chapter Google Scholar
Çetinoğlu, Ö., & Çöltekin, Ç. (2019). Challenges of annotating a code-switching treebank. In Proceedings of the 18th international workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019), Paris, France, pp. 82–90, https://doi.org/10.18653/v1/W19-7809, https://www.aclweb.org/anthology/W19-7809
Çetinoğlu, Ö. (2009). A large scale LFG grammar for Turkish. Ph.D. Thesis, Sabanci University.
Çöltekin, Ç. (2010). A freely available morphological analyzer for Turkish. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta, http://www.lrec-conf.org/proceedings/lrec2010/pdf/109_Paper.pdf
Çöltekin, Ç. (2015). A grammar-book treebank of Turkish. In M. Dickinson, E. Hinrichs, A. Patejuk, & A. Przepiórkowski (Eds.), Proceedings of the 14th workshop on Treebanks and linguistic theories (TLT 14) (pp. 35–49).
Çöltekin, Ç. (2016). (When) do we need inflectional groups? In Proceedings of The 1st international conference on Turkic computational linguistics.
Dozat, T., Qi, P., & Manning, C. D. (2017). Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In Proceedings of the CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies (pp. 20–30)
Durgar El-Kahlout, İ., Akın, A. A., & Yılmaz, E. (2014). Initial explorations in two-phase Turkish dependency parsing by incorporating constituents. In Proceedings of the first joint workshop on statistical parsing of morphologically rich languages and syntactic analysis of non-canonical languages, Dublin City University, Dublin, Ireland, pp. 82–89. https://www.aclweb.org/anthology/W14-6108.
Eryiğit, G. (2007). ITU treebank annotation tool. In Proceedings of the linguistic annotation workshop (pp. 117–120).
Eryiğit, G., & Pamay, T. (2007). ITU validation set for Metu-Sabancı Turkish treebank. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 7(1), 31–37.
Google Scholar
Eryiğit, G., Nivre, J., & Oflazer, K. (2008). Dependency parsing of Turkish. Computational Linguistics, 34(3), 357–389.
Article Google Scholar
Foth, K. A., Köhn, A., Beuck, N., & Menzel, W. (2014). Because size does matter: The Hamburg Dependency Treebank. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland, pp. 2326–2333. http://www.lrec-conf.org/proceedings/lrec2014/pdf/860_Paper.pdf.
Ginter, F., Hajič, J., Luotolahti, J., Straka, M., & Zeman, D. (2017). CoNLL 2017 shared task—automatically annotated raw texts and word embeddings. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
Göksel, A. (2001). The auxiliary ol at the morphology–syntax interface. In E. E. Taylan (Ed.), The verb in Turkish. John Benjamins.
Google Scholar
Göksel, A., & Kerslake, C. (2005). Turkish: A comprehensive grammar. Comprehensive grammars. Routledge.
Google Scholar
Hall, J., Nilsson, J., Nivre, J., Eryiğit, G., Megyesi, B., Nilsson M., & Saers, M. (2007). Single malt or blended? A study in multilingual parser optimization. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Association for Computational Linguistics, Prague, Czech Republic, pp. 933–939, https://www.aclweb.org/anthology/D07-1097.
Hayashi, T. (1996). The dual status of possessive compounds in modern Turkish. Symbolae Turcologicae, 6, 119–129.
Google Scholar
Heinecke, J. (2019). ConlluEditor: a fully graphical editor for Universal Dependencies treebank files. In Proceedings of the third workshop on universal dependencies (UDW, SyntaxFest 2019), Association for Computational Linguistics, Paris, France, pp. 87–93. https://doi.org/10.18653/v1/W19-8010, https://www.aclweb.org/anthology/W19-8010.
Hoffman, B. (1995). The computational analysis of the syntax and interpretation of “free” word order in Turkish. PhD thesis, University of Pennsylvania.
İşsever, S. (2003). Information structure in Turkish: The word order-prosody interface. Lingua, 113(11), 1025–1053.
Article Google Scholar
İşsever, S. (2007). Towards a unified account of clause-initial scrambling in Turkish: A feature analysis. Turkic Languages, 11(1), 93–123.
Google Scholar
Kanerva, J., Ginter, F., Miekka, N., Leino, A., & Salakoski, T. (2018). Turku neural parser pipeline: An end-to-end system for the CoNLL 2018 shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Association for Computational Linguistics, Brussels, Belgium, pp. 133–142. http://www.aclweb.org/anthology/K18-2013.
Kapan, A. (2019). Derivational networks of nouns and adjectives in Turkish. Master’s thesis, Boğaziçi University, İstanbul, Turkey.
Kayadelen, T., Öztürel, A., & Bohnet, B. (2020). A gold standard dependency treebank for Turkish. In Proceedings of the 12th language resources and evaluation conference, European Language Resources Association, Marseille, France, pp. 5156–5163. https://www.aclweb.org/anthology/2020.lrec-1.634.
Kornfilt, J. (1984). Case marking, agreement, and empty categories in Turkish. Harvard University.
Google Scholar
Kornfilt, J. (2005). Asymmetries between pre-verbal and post-verbal scrambling in Turkish. In The free word order phenomenon: Its syntactic sources and diversity (pp. 163–180). Mouton de Gruyter.
Kunduracı, A. (2013). Turkish noun-noun compounds: A process-based paradigmatic account. PhD thesis, University of Calgary.
Kural, M. (1992). Properties of scrambling in Turkish. Ms, UCLA.
Kural, M. (1997). Postverbal constituents in Turkish and the linear correspondence axiom. Linguistic Inquiry, 28, 498–519.
Leech, G., & Garside, R. (1991). Running a grammar factory: The production of syntactically analysed corpora or treebanks (pp. 15–32). English Computer Corpora: Selected Papers and Research Guide.
Makazhanov, A., Sultangazina, A., Makhambetov, O., & Yessenbayev, Z. (2015). Syntactic Annotation of Kazakh: Following the Universal Dependencies Guidelines. a report. In 3rd International conference on Turkic languages processing (TurkLang 2015) (pp. 338–350).
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Megyesi, B. (2002). Data-driven syntactic analysis —Methods and applications for Swedish. PhD thesis, KTH.
Megyesi, B., Dahlqvist, B., Pettersson, E., & Nivre, J. (2008). Swedish-Turkish parallel treebank. In Proceedings of the sixth international conference on language resources and evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech, Morocco. http://www.lrec-conf.org/proceedings/lrec2008/pdf/121_paper.pdf.
Megyesi, B., Dahlqvist, B., Csató, É.Á., & Nivre, J. (2010). The English-Swedish-Turkish parallel treebank. In Proceedings of the seventh international conference on language resources and evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/116_Paper.pdf.
Nivre, J. (2009). Non-projective dependency parsing in expected linear time. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (pp. 351–359).
Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D. (2016). Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the tenth international conference on language resources and evaluation (LREC’16), European Language Resources Association (ELRA), Portorož, Slovenia, pp. 1659–1666. https://www.aclweb.org/anthology/L16-1262.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryiğit, G., Kübler, S., Marinov, S., & Marsi, E. (2007). Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.
Article Google Scholar
Nivre, J., Nilsson, J., & Hall, J. (2006). Talbanken05: A Swedish treebank with phrase structure and dependency annotation. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06), European Language Resources Association (ELRA), Genoa, Italy. http://www.lrec-conf.org/proceedings/lrec2006/pdf/223_pdf.pdf.
Nivre, J., Zeman, D., Ginter, F., & Tyers, F. M. (2017). Tutorial on Universal Dependencies. Presented at European chapter of the Association for Computational Linguistics, Valencia. Retrieved April 8, 2019, from http://universaldependencies.org/eacl17tutorial/applications.pdf.
Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing, 9(2), 137–148.
Article Google Scholar
Oflazer, K., Say, B., Hakkani-Tür, D. Z., & Tür, G. (2003). Building a Turkish Treebank (pp. 261–277). Springer. https://doi.org/10.1007/978-94-010-0201-1_15
Özsoy, A. S. (1988). Null subject parameter and Turkish. In Studies on modern Turkish: Proceedings of the 3rd conference on Turkish linguistics (pp. 82–90). Tilburg University Press.
Özsoy, A. S. (2019). Word Order in Turkish (Vol. 97). Springer.
Book Google Scholar
Öztürk, B. (2006). Null arguments and case-driven agree in Turkish. In C. Boeckx (Ed.), Minimalist essays (pp. 268–287). John Benjamins Publishing Company.
Chapter Google Scholar
Öztürk, B. (2008). Non-configurationality: Free word order and argument drop in Turkish. The limits of syntactic variation (pp. 411–440). John Benjamins Publishing Company.
Google Scholar
Öztürk, B. (2013). Postverbal constituents in SOV languages. In Theoretical approaches to disharmonic word orders (pp. 270–305). MIT.
Öztürk, B., & Taylan, E. E. (2016). Possessive constructions in Turkish. Lingua, 182, 88–108. https://doi.org/10.1016/j.lingua.2015.08.008
Article Google Scholar
Pamay, T., Sulubacak, U., Torunoğlu-Selamet, D., & Eryiğit, G. (2015). The annotation process of the ITU web treebank. In Proceedings of The 9th linguistic annotation workshop, Association for Computational Linguistics, Denver, CO, USA, pp. 95–101. https://doi.org/10.3115/v1/W15-1610, https://www.aclweb.org/anthology/W15-1610.
Popel, M., Žabokrtský Z., & Vojtek, M. (2017). Udapi: Universal API for universal dependencies. In Proceedings of the NoDaLiDa 2017 workshop on universal dependencies (UDW 2017), Association for Computational Linguistics, Gothenburg, Sweden, pp. 96–101
Przepiórkowski, A., & Patejuk, A. (2018). Arguments and adjuncts in universal dependencies. In Proceedings of the 27th international conference on computational linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3837–3852. https://www.aclweb.org/anthology/C18-1324.
Sağ, Y. (2019). The semantics of number marking: Reference to kinds, counting, and optional classifiers. PhD thesis, Rutgers University.
Sak, H., Güngör, T., & Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In International conference on natural language processing (pp. 417–427). Springer.
Sak, H., Güngör, T., & Saraçlar, M. (2011). Resources for Turkish morphological processing. Language Resources and Evaluation, 45(2), 249–261.
Article Google Scholar
Sampson, G. (1995). English for the computer: The SUSANNE corpus and analytic scheme. Clarendon Press.
Say, B., Zeyrek, D., Oflazer, K., & Özge, U. (2002). Development of a corpus and a treebank for present-day written Turkish. In Proceedings of the 11th international conference of Turkish linguistics, Eastern Mediterranean University, pp. 183–192.
Slobin, D. I., & Bever, T. G. (1982). Children use canonical sentence schemas: A crosslinguistic study of word order and inflections. Cognition, 12(3), 229–265.
Article Google Scholar
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J. (2012). Brat: A web-based tool for NLP-assisted text annotation. In Proceedings of the demonstrations at the 13th conference of the European chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Avignon, France, pp. 102–107.
Sulger, S., Butt, M., King, T. H., Meurer, P., Laczkó, T., Rákosi, G., Dione, C. B., Dyvik, H., Rosén, V., De Smedt, K., Patejuk, A., Çetinoğlu, Ö., Arka, I. W., & Mistica, M. (2013). ParGramBank: The ParGram parallel treebank. In Proceedings of the 51st annual meeting of the Association for Computational Linguistics (Vol. 1: Long Papers), Association for Computational Linguistics, Sofia, Bulgaria, pp. 550–560. https://www.aclweb.org/anthology/P13-1054.
Sulubacak, U., & Eryiğit, G. (2018). Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering & Computer Sciences, 26(3), 1662–1672.
Google Scholar
Sulubacak, U., Eryiğit, G., & Pamay, T. (2016a). IMST: A revisited Turkish dependency treebank. In Proceedings of TurCLing 2016, the 1st international conference on Turkic computational linguistics, Ege University Press.
Sulubacak, U., Gökırmak, M., Tyers, F., Çöltekin, Ç., Nivre, J., & Eryiğit, G. (2016b). Universal Dependencies for Turkish. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical papers, The COLING 2016 Organizing Committee, Osaka, Japan, pp. 3444–3454.
Taylan, E. E. (1984). The function of word order in Turkish grammar. University of California Press. https://doi.org/10.2307/415636
Taylan, E. E. (1986). Pronominal versus zero representation of anaphora in Turkish. In: Studies in Turkish linguistics (p. 209). John Benjamins.
Taylan, E. E. (2015). The phonology and morphology of Turkish. Boğaziçi University.
Taylan, E. E., & Öztürk Başaran, B. (2014). The notorious -(s)i(n) in Turkish: Neither an agreement nor a compound marker? Dilbilim Araştırmaları Dergisi, 2, 181–199.
Google Scholar
Türk, U., Atmaca, F., Özateş, Ş. B., Köksal, A., Öztürk Başaran, B., Güngör, T., & Özgür, A. (2019a). Turkish treebanking: Unifying and constructing efforts. In Proceedings of the 13th linguistic annotation workshop, Association for Computational Linguistics, Florence, Italy, pp. 166–177, https://doi.org/10.18653/v1/W19-4019. https://www.aclweb.org/anthology/W19-4019.
Türk, U., Atmaca, F., Özateş, Ş. B., Öztürk Başaran, B., Güngör, T., & Özgür, A. (2019b). Improving the annotations in the Turkish Universal Dependency treebank. In Proceedings of the third workshop on universal dependencies (UDW, SyntaxFest 2019), Association for Computational Linguistics, Paris, France, pp. 108–115. https://doi.org/10.18653/v1/W19-8013, https://www.aclweb.org/anthology/W19-8013.
Tyers, F. M., Sheyanova, M., & Washington, J. N. (2017a). UD annotatrix: An annotation tool for universal dependencies. In Proceedings of the 16th international workshop on treebanks and linguistic theories, Prague, Czech Republic, pp. 10–17.
Tyers, F. M., Washington, J., Çöltekin, Ç., & Makazhanov, A. (2017b). An assessment of Universal Dependency annotation guidelines for Turkic languages. In Proceedings of the 5th international conference on Turkic Languages Processing (TurkLang 2017), Tatarstan Academy of Sciences.
van der Beek, L., Bouma, G., Malouf, R., & van Noord, G. (2002). The Alpino dependency treebank. In: Theune, M., Nijholt, A., & Hondorp, H. (Eds.), 12th Meeting on Computational Linguistics in the Netherlands (CLIN) 2001, Rodopi, Language and Computers: Studies in Practical Linguistics, pp. 8–22, 30 November 2001.
Yıldız, O.T., Solak, E., Çandır, Ş., Ehsani, R., & Görgün, O. (2016). Constructing a Turkish constituency parse treebank. In Information Sciences and Systems 2015 (pp. 339–347). Springer.
Yuret, D., & Türe, F. (2006). Learning morphological disambiguation rules for Turkish. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Association for Computational Linguistics, pp. 328–334.
Zeman, D. (2017). Core arguments in Universal Dependencies. In Proceedings of the fourth international conference on Dependency Linguistics (Depling 2017), Linköping University Electronic Press, Pisa, Italy, pp. 287–296. https://www.aclweb.org/anthology/W17-6532.
Zeman, D., Hajič, J., Popel, M., Potthast, M., Straka, M., Ginter, F., Nivre, J., & Petrov, S. (2018). CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies. In Proceedings of the CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies, Association for Computational Linguistics, Brussels, Belgium, pp. 1–21. http://www.aclweb.org/anthology/K18-2001.
Zeman, D., Popel, M., Straka, M., Hajic, J., Nivre, J., Ginter, F., Luotolahti, J., Pyysalo, S., & Petrov, S. (2017). CoNLL 2017 shared task: Multilingual parsing from raw text to Universal Dependencies. In Proceedings of the CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies, Association for Computational Linguistics, Vancouver, Canada, pp. 1–19. http://www.aclweb.org/anthology/K/K17/K17-3001.pdf.

Download references

Acknowledgements

We are immensely grateful to Prof. Yeşim Aksan and the other members of the Turkish National Corpus Team for their tremendous help in providing us with sentences from the Turkish National Corpus. We are also thankful to the anonymous reviewers from SyntaxFest’19 and LAW XIII, as well as to Çağrı Çöltekin for his constructive comments on the re-annotation process of the IMST and PUD Treebanks. GEBIP Award of the Turkish Academy of Sciences (to A.O.) is gratefully acknowledged.

Funding

This work was supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) under grant number 117E971 and BIDEB 2211 graduate scholarship.

Author information

Authors and Affiliations

Deparment of Linguistics, Boğaziçi University, İstanbul, Turkey
Utku Türk, Furkan Atmaca, Seyyit Talha Bedir & Balkız Öztürk Başaran
Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
Şaziye Betül Özateş, Gözde Berk, Abdullatif Köksal, Tunga Güngör & Arzucan Özgür

Authors

Utku Türk
View author publications
You can also search for this author in PubMed Google Scholar
Furkan Atmaca
View author publications
You can also search for this author in PubMed Google Scholar
Şaziye Betül Özateş
View author publications
You can also search for this author in PubMed Google Scholar
Gözde Berk
View author publications
You can also search for this author in PubMed Google Scholar
Seyyit Talha Bedir
View author publications
You can also search for this author in PubMed Google Scholar
Abdullatif Köksal
View author publications
You can also search for this author in PubMed Google Scholar
Balkız Öztürk Başaran
View author publications
You can also search for this author in PubMed Google Scholar
Tunga Güngör
View author publications
You can also search for this author in PubMed Google Scholar
Arzucan Özgür
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Utku Türk.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Morphological conversion

Table 12 Mappings of morphological features from the notation of Sak et al. (2011) to the features used in the UD framework

Full size table

Appendix 2: Word order statistics of the BOUN Treebank

Table 13 Word order counts and relative percentages of main arguments within the BOUN Treebank when there is no null argument

Full size table

Table 14 Word order counts and percentages of main arguments within the BOUN Treebank

Full size table

Appendix 3: TNC registers

Table 15 TNC Details

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Türk, U., Atmaca, F., Özateş, Ş.B. et al. Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool. Lang Resources & Evaluation 56, 259–307 (2022). https://doi.org/10.1007/s10579-021-09558-0

Download citation

Accepted: 06 August 2021
Published: 08 November 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10579-021-09558-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool

Abstract

Access this article

Data availability

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Morphological conversion

Appendix 2: Word order statistics of the BOUN Treebank

Appendix 3: TNC registers

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation