Advertisement

Building the first comprehensive machine-readable Turkish sign language resource: methods, challenges and solutions

  • Gülşen EryiğitEmail author
  • Cihat Eryiğit
  • Serpil Karabüklü
  • Meltem Kelepir
  • Aslı Özkul
  • Tuğba Pamay
  • Dilara Torunoğlu-Selamet
  • Hatice Köse
Original Paper
  • 100 Downloads

Abstract

This article describes the procedures employed during the development of the first comprehensive machine-readable Turkish Sign Language (TiD) resource: a bilingual lexical database and a parallel corpus between Turkish and TiD. In addition to sign language specific annotations (such as non-manual markers, classifiers and buoys) following the recently introduced TiD knowledge representation (Eryiğit et al. 2016), the parallel corpus contains also annotations of dependency relations, which makes it the first parallel treebank between a sign language and an auditory-vocal language.

Keywords

Turkish sign language TiD Parallel dependency treebank Turkish Machine-readable Parallel corpus 

Notes

Acknowledgements

We are grateful for the support of our signers Jale Erdul, Elvan Tamyürek Özparlak, Neslihan Kurt, our Project advisors Prof. Dr. Sumru Özsoy and Hasan Dikyuva, and of our project members Pınar Uluer, Neziha Akalın, Kenan Kasarcı, Nevzat Kırgıç, Cüneyd Ancın. Finally, we want to thank our three reviewers for insightful comments and suggestions that helped us improve the final version of the article.

References

  1. Ahrenberg, L. (2007). Lines: An English-Swedish parallel treebank. In Proceedings of the 16th Nordic conference of computational linguistics, pp. 270–274, Tartu.Google Scholar
  2. Atalay, N. B., Oflazer, K. & Say, B. (2003). The annotation process in the Turkish treebank. In Proceedings of the 4th international workshop on linguistically interpreteted corpora, pp. 33–38, Budapest.Google Scholar
  3. Boz, S., Özçelik, U., & Kaygusuz, Çağla. (2013). Matematik 1 (4th ed.). Milli Eğitim Bakanlığı Yayınları, Ankara: T.C.Google Scholar
  4. Bungeroth, J. & Ney, H. (2004). Statistical sign language translation. In Proceedings of the 6th workshop on representation and processing of sign languages at the 4th international conference on language resources and evaluation, pp. 105–108, Lisbon.Google Scholar
  5. Bungeroth, J., Stein, D., Dreuw, P., Ney, H., Morrissey, S., Way, A. & Van Zijl, L. (2008). The ATIS Sign Language corpus. In Proceedings of the 6th international conference on language resources and evaluation, pp. 2943–2946, Marrakech.Google Scholar
  6. Bungeroth, J., Stein, D., Dreuw, P., Zahedi, M. & Ney, H. (2006). A German Sign Language corpus of the domain weather report. In Proceedings of the 5th international conference on language resources and evaluation, pp. 2000–2003, Genoa.Google Scholar
  7. Camgöz, N. C., Kindiroglu, A. A., Karabüklü, S., Kelepir, M., Özsoy, A. S. & Akarun, L. (2016). BosphorusSign: A Turkish Sign Language recognition corpus in health and finance domains. In Proceedings of the 10th international conference on language resources and evaluation, pp. 1383–1388, Portorož.Google Scholar
  8. Cmejrek, M., Curín, J., Hajic, J. & Havelka, J. (2005). Prague Czech-English dependency treebank: resource for structure-based MT. In Proceedings of the 11th annual conference of the European Association for Machine Translation, pp. 73–78, Budapest.Google Scholar
  9. Costello, B., Herrmann, A., Mantovan, L., Pfau, R. & Sverrisdottir, R. (2017). Section 3.10.1 Numerals. In Quer, J., Cecchetto, C., Donati, C., Geraci, C., Kelepir, M., Pfau, R. & Steinbach, M. (eds) SignGram Blueprint: A guide to sign language grammar writing, pp. 148–151, de Gruyter, Berlin, Boston.Google Scholar
  10. Crasborn, O. & Sloetjes, H. (2008). Enhanced ELAN functionality for sign language corpora. In Proceedings of the 3rd workshop on the representation and processing of sign languages: Construction and exploitation of sign language corpora at the 6th international conference on language resources and evaluation, pp. 39–43, Marrakech.Google Scholar
  11. Crasborn, O. A. & Zwitserlood, I. (2008). The corpus NGT: An online corpus for professionals and laymen. In Proceedings of the 3rd workshop on the representation and processing of sign languages: Construction and exploitation of sign language corpora at the 6th international conference on language resources and evaluation, pp. 44–49.Google Scholar
  12. Crasborn, O., Bank, R., Zwitserlood, I., van der Kooij, E., de Meijer, A., & Safar, A. (2015). Annotation conventions for the corpus NGT. Ms: Radboud University Nijmegen.Google Scholar
  13. Cuřín, J., Čmejrek, M., Havelka, J. & Kuboň, V. (2004). Building a parallel bilingual syntactically annotated corpus. In Proceedings of the international conference on natural language processing, pp. 168–176, Hyderabad.Google Scholar
  14. Dalkılıç, H., & Gölge, N. (2013). Hayat Bilgisi 1 (4th ed.). Milli Eğitim Bakanlığı Yayınları, Ankara: T.C.Google Scholar
  15. De Vos, C., van Zuilen, M., Crasborn, O. & Levinson, S. (2015). NGT interactive corpus. MPI for psycholinguistics, the language archive, https://hdl.handle.net/1839/00-0000-0000-0021-8357-B@view.
  16. Demiroğlu, R., & Gökahmetoğlu, E. (2013). Türkçe 1 (4th ed.). Milli Eğitim Bakanlığı Yayınları, Ankara: T.C.Google Scholar
  17. DeNeefe, S., Knight, K., Wang, W. & Marcu, D. (2007). What can syntax-based MT learn from phrase-based MT? In Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning, pp. 755–763, Prague.Google Scholar
  18. Dikyuva, H., Makaroğlu, B., & Arık, E. (2017). Turkish sign language grammar. Ankara: Ministry of Family and Social Policies Press.Google Scholar
  19. Eryiğit, G. (2007a). ITU validation set for Metu-Sabancı Turkish treebank.Google Scholar
  20. Eryiğit, G. (2007b). ITU treebank annotation tool. In Proceedings of the linguistic annotation workshop at the 40th annual meeting on association for computational linguistics, pp. 117–120, Prague.Google Scholar
  21. Eryiğit, G. (2014). ITU Turkish NLP web service. In Proceedings of the demonstrations at the 14th conference of the European chapter of the association for computational linguistics, pp. 1–4, Gothenburg.Google Scholar
  22. Eryiğit, C. (2017). Text to sign language machine translation system for Turkish. Ph.D. thesis, Istanbul Technical University, Istanbul.Google Scholar
  23. Eryiğit, G., Adalı, K., Torunoğlu-Selamet, D., Sulubacak, U. & Pamay, T. (2015). Annotation and extraction of multiword expressions in Turkish treebanks. In Proceedings of the human language technology conference at the North American Chapter of the Association for Computational Linguistics, pp. 70–76, Denver, CO.Google Scholar
  24. Eryiğit, C., Köse, H., Kelepir, M., & Eryiğit, G. (2016). Building machine-readable knowledge representations for Turkish Sign Language generation. Knowledge-Based Systems, 108, 179–194.CrossRefGoogle Scholar
  25. Galley, M., Graehl, J., Knight, K., Marcu, D., DeNeefe, S., Wang, W. & Thayer, I. (2006). Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics, pp. 961–968, Sydney.Google Scholar
  26. Galley, M., Hopkins, M., Knight, K. & Marcu, D. (2004). What’s in a translation rule? In Proceedings of the human language technology conference at the North American Chapter of the Association for Computational Linguistics, pp. 273–280, Boston, MA.Google Scholar
  27. Hanke, T. & Storz, J. (2008). iLex–a database tool for integrating sign language corpus linguistics and sign language lexicography. In Proceedings of the 3rd workshop on the representation and processing of sign languages at the 6th international conference on language resources and evaluation, pp. 64–67, Marrakech.Google Scholar
  28. Johnston, T. (2008). Corpus linguistics and signed languages: No lemmata, no corpus. In The 3rd workshop on the representation and processing of sign languages: Construction and exploitation of sign language corpora at the 6th international conference on language resources and evaluation, Marrakech.Google Scholar
  29. Johnston, T. (2016). Auslan corpus annotation guidelines. Centre for Language Sciences, Department of Linguistics, Macquarie University (Sydney) and La Trobe University (Melbourne), http://media.auslan.org.au/attachments/Auslan_Corpus_Annotation_Guidelines_November2016.pdf.
  30. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics: Companion volume—proceedings of the demo and poster sessions, pp. 177–180, Prague.Google Scholar
  31. Koizumi, A., Sagawa, H. & Takeuchi, M. (2002). An annotated Japanese Sign Language corpus. In Proceedings of the 3rd international conference on language resources and evaluation, pp. 927–930, Las Palmas.Google Scholar
  32. Kubuş, O. (2008). An analysis of Turkish Sign Language (TiD) phonology and morphology. Master’s thesis, Middle East Technical University, Ankara.Google Scholar
  33. Leeson, L., Saeed, J., Leonard, C., Macduff, A. & Byrne-Dunne, D. (2006). Moving heads and moving hands: Developing a digital corpus of Irish Sign Language: The ‘Signs of Ireland’ corpus development project. In Proceedings of the information technology and telecommunications conference, Carlow.Google Scholar
  34. Liddell, S. K. (2003). Grammar, gesture, and meaning in American Sign Language. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  35. McCrae, J., Spohr, D. & Cimiano, P. (2011). Linking lexical resources and ontologies on the semantic web with Lemon. In The semantic web: Research and applications, pp. 245–259, Berlin, Springer.Google Scholar
  36. Megyesi, B., Dahlqvist, B., Pettersson, E. & Nivre, J. (2008). Swedish—Turkish parallel treebank. In Proceedings of the 6th international conference on language resources and evaluation, pp. 470–473, Marrakech.Google Scholar
  37. Miller, C. (2001). Section I: Some reflections on the need for a common sign notation. Sign Language & Linguistics, 4(1), 11–28.CrossRefGoogle Scholar
  38. Neidle, C., Sclaroff, S., & Athitsos, V. (2001). Signstream: A tool for linguistic and computer vision research on visual-gestural language data. Behavior Research Methods, Instruments, & Computers, 33(3), 311–320.  https://doi.org/10.3758/BF03195384.CrossRefGoogle Scholar
  39. Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., et al. (2016). Universal dependencies v1: A multilingual treebank collection. In Proceedings of the 10th international conference on language resources and evaluation, pp. 1659–1666, Portorož.Google Scholar
  40. Oflazer, K., Say, B., Hakkani-Tür, D. Z., & Tür, G. (2003). Building a Turkish treebank. In A. Abeillé (Ed.), Treebanks: Building and using parsed corpora (pp. 261–277). London: Kluwer.CrossRefGoogle Scholar
  41. Östling, R., Börstell, C., Gaärdenfors, M. & Wirén, M. (2017). Universal dependencies for Swedish Sign Language. In Proceedings of the 21st Nordic conference on computational linguistics, pp. 303–308, Gothenburg.Google Scholar
  42. Othman, A., Tmar, Z. & Jemni, M. (2012). Toward developing a very big sign language parallel corpus. In Proceedings of the international conference on computers for handicapped persons, pp. 192–199, Paris.Google Scholar
  43. Özsoy, S., Arık, E., Göksel, A., Kelepir, M. & Nuhbalaoğlu, D. (2013). Documenting Turkish sign language: A report on a research project. In Current directions in TiD research, pp. 55–70, Cambridge Scholars.Google Scholar
  44. Pamay, T., Sulubacak, U., Torunoğlu-Selamet, D. & Eryiğit, G. (2015). The annotation process of the ITU web treebank. In Proceedings of the 9th linguistic annotation workshop at the North American Chapter of the Association for computational linguistics, pp. 95–101, Denver, CO.Google Scholar
  45. Perniss, P., Thompson, R. L., & Vigliocco, G. (2010). Iconicity as a general property of language: Evidence from spoken and signed languages. Frontiers in Psychology, 227(1), 1–15.Google Scholar
  46. Pfau, R. & Quer, J. (2010). Nonmanuals: Their prosodic and grammatical roles. In Sign languages, pp. 381–402. Cambridge, Cambridge University Press.Google Scholar
  47. Prillwitz, S., Hanke, T., König, S., Konrad, R., Langer, G. & Schwarz, A. (2008). DGS corpus project–development of a corpus based electronic dictionary German Sign Language/German. In Proceedings of the 3rd workshop on the representation and processing of sign languages at the 6th international conference on language resources and evaluation, pp. 159–164, Marrakech.Google Scholar
  48. Şahin, M., Sulubacak, U. & Eryiğit, G. (2013) Redefinition of Turkish morphology using flag diacritics. In Proceedings of the 10th symposium on natural language processing, Phuket.Google Scholar
  49. Schembri, A., Fenlon, J., Rentelis, R., Reynolds, S., & Cormier, K. (2013). Building the British Sign Language corpus. Language Documentation & Conservation, 7, 136–154.Google Scholar
  50. Selçuk-Şimşek, M., & Çiçekli, I. (2017). Bidirectional machine translation between Turkish and Turkish Sign Language: A data-driven approach. International Journal on Natural Language Computing, 6(3), 33–46.CrossRefGoogle Scholar
  51. Steinbach, M. (2012). Plurality. In Sign language: An international handbook, pp. 112–136, De Gruyter Mouton.Google Scholar
  52. Sulubacak, U. & Eryiğit, G. (2013). Representation of morphosyntactic units and coordination structures in the Turkish dependency treebank. In Proceedings of the 4th workshop on statistical parsing of morphologically rich languages at the conference on empirical methods on natural language processing, p. 129, Seattle, WA.Google Scholar
  53. Sulubacak, U., Gokirmak, M., Tyers, F., Çöltekin, Ç., Nivre, J. & Eryiğit, G. (2016a). Universal dependencies for Turkish. In Proceedings of the 26th international conference on computational linguistics, pp. 3444–3454, Osaka.Google Scholar
  54. Sulubacak, U., Pamay, T. & Eryiğit, G. (2016b). IMST: A revisited Turkish dependency treebank. In Proceedings of the 1st international conference on turkic computational linguistics at the international conference on computational linguistics and intelligent text processing, pp. 1–6, Konya.Google Scholar
  55. Sulubacak, U., & Eryigit, G. (2018). Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering & Computer Sciences, 26(3), 1662–1672.Google Scholar
  56. Su, H.-Y., & Wu, C.-H. (2009). Improving structural statistical machine translation for sign language with small corpus using thematic role templates as translation memory. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1305–1315.CrossRefGoogle Scholar
  57. Swedish Sign Language Corpus Project, U. D. (2017). Universal dependencies for Swedish Sign Language. Stockholm University, https://www.ling.su.se/english/research/research-projects/sign-language/swedish-sign-language-corpus-project-1.59270.
  58. Tesnière, L. (1959). Eléments de syntaxe structurale. Klincksieck: Librairie C.Google Scholar
  59. Tinsley, J., Hearne, M. & Way, A. (2009). Exploiting parallel treebanks to improve phrase-based statistical machine translation. In Proceedings of the international conference on intelligent text processing and computational linguistics, pp. 318–331, Mexico City.Google Scholar
  60. Uchimoto, K., Zhang, Y., Sudo, K., Murata, M., Sekine, S. & Isahara, H. (2004). Multilingual aligned parallel treebank corpus reflecting contextual information and its applications. In Proceedings of the workshop on multilingual linguistic resources at the 42th annual meeting on association of computational linguistics, pp. 63–70, Barcelona.Google Scholar
  61. Wallin, L. & Mesch, J. (2015). Swedish sign language corpus. In Proceedings of digging into signs workshop: Developing annotation standards for sign language corpora, London.Google Scholar
  62. Wittenburg, P., Brugman, H., Russel, A., Klassmann, A. & Sloetjes, H. (2006). ELAN: A professional framework for multimodality research. In Proceedings of the 5th international conference on language resources and evaluation, pp. 1556–1559, Genoa.Google Scholar
  63. Zwitserlood, I. (2012). Classifiers. In Sign languages: An international handbook, pp. 158–186, Mouton de Gruyter.Google Scholar
  64. Zwitserlood, I., Perniss, P., & Özyürek, A. (2012). An empirical investigation of expression of multiple entities in Turkish Sign Language (TİD): Considering the effects of modality. Lingua, 122(14), 1636–1667.CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringIstanbul Technical UniversityIstanbulTurkey
  2. 2.Department of LinguisticsBoğaziçi UniversityIstanbulTurkey
  3. 3.Department of LinguisticsPurdue UniversityWest LafayetteUSA
  4. 4.English Language Teacher EducationIstanbul Bilgi UniversityIstanbulTurkey

Personalised recommendations