Machine Translation

, Volume 26, Issue 4, pp 325–357 | Cite as

Analysis, preparation, and optimization of statistical sign language machine translation

Article

Abstract

Sign languages represent an interesting niche for statistical machine translation that is typically hampered by the scarceness of suitable data, and most papers in this area apply only a few, well-known techniques that are not adapted to small-sized corpora. In this article, we analyze existing data collections and emphasize their quality and usability for statistical machine translation. We also offer findings in the proper preprocessing of a sign language corpus, by introducing sentence end markers, splitting compound words and handling parallel communication channels. Then, we focus on optimization procedures that are tailored to scarce resources, such as scaling factor optimization, alignment optimization and system combination. All methods are evaluated on two of the largest sign language corpora available.

Keywords

Sign languages Scarce resources Syntactic methods Parallel input 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bauer B, Kraiss KF (2001) Towards an automatic sign language recognition system using subunits. In: Gesture and sign language in human-computer interaction. International gesture workshop GW 2001, Springer, London, pp 64–75Google Scholar
  2. Becker C (2010) Lesen und Schreiben Lernen mit einer Hörschädigung. Unterstützte Kommunikation 1: 17–21Google Scholar
  3. Bellugi U, Fischer S (1972) A comparison of sign language and spoken language. Cognition 1: 173–200CrossRefGoogle Scholar
  4. Bertoldi N, Tiotto G, Prinetto P, Piccolo E, Nunnari F, Lombardo V, Mazzei A, Damiano R, Lesmo L, Principe AD (2010) On the creation and the annotation of a large-scale Italian-LIS parallel corpus. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 19–22Google Scholar
  5. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311Google Scholar
  6. Bungeroth J, Ney H (2004) Statistical sign language translation. In: LREC 2004, workshop proceedings: representation and processing of sign languages, Lisbon, Portugal, pp 105–108Google Scholar
  7. Bungeroth J, Stein D, Dreuw P, Ney H, Morrissey S, Way A, van Zijl L (2008) The ATIS sign language corpus. In: International conference on language resources and evaluation, Marrakech, Morocco, p 4Google Scholar
  8. Chiu Y, Wu C, Su H, Cheng C (2007) Joint optimization of word alignment and epenthesis generation for chinese to taiwanese sign synthesis. IEEE Trans Pattern Anal Mach Intel 29(1): 28–39CrossRefGoogle Scholar
  9. Crasborn O, Zwitserlood I (2008) The corpus NGT: an online corpus for professionals and laymen. In: Crasborn O, Hanke T, Efthimiou E, Zwitserlood I, Thoutenhoofd E (eds) Construction and exploitation of sign language corpora. 3rd workshop on the representation and processing of sign languages at LREC 2008, ELDA, Paris, France, pp 44–49Google Scholar
  10. Crasborn O, van der Kooij E, Nonhebel A, Emmerik W (2004) ECHO data set for sign language of the Netherlands (NGT). Department of Linguistics, Radboud University Nijmegen, NijmegenGoogle Scholar
  11. Dreuw P, Forster J, Gweth Y, Stein D, Ney H, Martinez G, Verges Llahi J, Crasborn O, Ormel E, Du W, Hoyoux T, Piater J, Moya Lazaro JM, Wheatley M (2010a) Signspeak—understanding, recognition, and translation of sign languages. In: 4th Workshop on the representation and processing of sign languages: corpora and sign language technologies, MaltaGoogle Scholar
  12. Dreuw P, Ney H, Martinez G, Crasborn O, Piater J, Miguel Moya J, Wheatley M (2010b) The signspeak project—bridging the gap between signers and speakers. In: International conference on language resources and evaluation, Valletta, Malta, pp 476–481Google Scholar
  13. Efthimiou E, Fotinea SE, Vogler C, Hanke T, Glauert J, Bowden R, Braffort A, Collet C, Maragos P, Segouat J (2009) Sign language recognition, generation, and modelling: a research effort with applications in deaf communication. In: Stephanidis C (ed) Universal access in human-computer interaction. Addressing diversity, lecture notes in computer science, vol 5614, Springer, Berlin, pp 21–30Google Scholar
  14. Fiscus JG (1997) A post-processing system to yield reduced word error rates: recognizer output voting error reduction (rover), pp 347–352Google Scholar
  15. Hermans D, Knoors H, Ormel E, Verhoeven L (2008a) Modeling reading vocabulary learning in deaf children in bilingual education programs. J Deaf Stud Deaf Edu 13(2): 155–174CrossRefGoogle Scholar
  16. Hermans D, Knoors H, Ormel E, Verhoeven L (2008b) The relationship between the reading and signing skills of deaf children in bilingual education programs. J Deaf Stud Deaf Edu 13(4): 519–530Google Scholar
  17. Huenerfauth M (2004) Spatial representation of classifier predicates for machine translation into american sign language. In: Workshop on representation and processing of sign language, 4th internationnal conference on language ressources and evaluation, LREC 2004, pp 24–31Google Scholar
  18. Johnston T (2001) The lexical database of auslan (Australian sign language). Sign Lang Linguist 4(25): 145–169CrossRefGoogle Scholar
  19. Kanis J, Müller L (2009) Advances in Czech—signed speech translation. In: Lecture notes in computer science, vol 5729. Springer, New York, pp 48–55Google Scholar
  20. Kanis J, Zahradil J, Jurčíček F, Müller L (2005) Czech-sign speech corpus for semantic based machine translation. In: Lecture notes in artificial intelligence, vol. 4188, pp 613–620Google Scholar
  21. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Machine translation summit X, Phuket, Thailand, pp 79–86Google Scholar
  22. Koehn P, Och FJ, Marcu D (2003) Statistical Phrase-Based Translation. In: Proceedings of the human language technology, North American chapter of the association for computational linguistics, Edmonton, Canada, pp 54–60Google Scholar
  23. Kramer F (2007) Kulturfaire Berufseignungsdiagnostik bei Gehörlosen und daraus abgeleitete Untersuchungen zu den Unterschieden der Rechenfertigkeiten bei Gehörlosen und Hörenden. PhD thesis, RWTH Aachen University, Aachen, GermanyGoogle Scholar
  24. Massó G, Badia T (2010) Dealing with sign language morphemes for statistical machine translation. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 154–157Google Scholar
  25. Matusov E, Leusch G, Banchs RE, Bertoldi N, Dechelotte D, Federico M, Kolss M, Lee YS, Marino JB, Paulik M, Roukos S, Schwenk H, Ney H (2008) System combination for machine translation of spoken and written language. IEEE Trans Audio Speech Lang Process 16(7): 1222–1237CrossRefGoogle Scholar
  26. Mauser A, Hasan S, Ney H (2009) Extending statistical machine translation with discriminative and trigger-based lexicon models. In: Conference on empirical methods in natural language processing, Singapore, pp 210–218Google Scholar
  27. Morissey S (2008) Data-driven machine translation for sign languages. PhD thesis, School of Computing, Dublin City University, Dublin City University, IrelandGoogle Scholar
  28. Morrissey S, Way A (2006) Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation. In: Proceedings of the 5th SALTMIL workshop on minority languages at LREC’06, Genoa, Italy, pp 91–98Google Scholar
  29. Morrissey S, Way A, Stein D, Bungeroth J, Ney H (2007) Towards a hybrid data-driven MT system for sign languages. In: Machine translation summit, Copenhagen, Denmark, pp 329–335Google Scholar
  30. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51MATHCrossRefGoogle Scholar
  31. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics, Philadelphia, Pennsylvania, USA, pp 311–318Google Scholar
  32. Pizzuto E, Rossini P, Russo T (2006) Representing signed languages in written form: questions that need to be posed. In: Proceedings of the workshop on the representation and processing of sign languages: “lexicographic matters and didactic scenarios”, international conference on language resources and evaluation LREC 2006, Genoa, Italy—28th May 2006, pp 1–6Google Scholar
  33. Popović M, Stein D, Ney H (2006) Statistical machine translation of German compound words. In: 5th international conference on natural language processing, FinTal, Turku, Finland, pp 616–624Google Scholar
  34. Prillwitz S (1989) HamNoSys, Version 2.0; Hamburg notation system for sign language. An introductory guide. Signum, HamburgGoogle Scholar
  35. Rexroat N (1997) The colonization of the deaf community. Soc Work Perspect 7(1): 18–26Google Scholar
  36. Sáfár É, Marshall I (2001) The architecture of an English-text-to-sign-languages translation system. In: et al GA (ed) Recent advances in natural language processing (RANLP), Tzigov Chark, Bulgaria, pp 223–228Google Scholar
  37. San-Segundo R, Pardo JM, Ferreiros J, Sama V, Barra-Chicote R, Lucas JM, Snchez D, Garca A (2010) Spoken spanish generation from sign language. Interact Comput 22(2):123–139, URL http://linkinghub.elsevier.com/retrieve/pii/S095354380900099X Google Scholar
  38. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas, Cambridge, MA, USA, pp 223–231Google Scholar
  39. Speers AL (2002) Representation of American sign language for machine translation. PhD thesis, Georgetown University, Washington, DCGoogle Scholar
  40. Stein D, Bungeroth J, Ney H (2006) Morpho-syntax based statistical methods for sign language translation. In: Conference of the European association for machine translation, Oslo, Norway, pp 169–177Google Scholar
  41. Stein D, Forster J, Zelle U, Dreuw P, Ney H (2010a) Analysis of the German sign language weather forecast corpus. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 225–230Google Scholar
  42. Stein D, Schmidt C, Ney H (2010b) Sign language machine translation overkill. In: Proceedings of the international workshop on spoken language translation (IWSLT), Paris, France, pp 337–334Google Scholar
  43. Veale T, Conway A, Collins B (1998) The challenges of cross-modal translation: English to sign language translation in the zardoz system. J Mach Trans 13(1): 81–106CrossRefGoogle Scholar
  44. Venugopal A, Zollmann A, Smith N, Vogel S (2009) Preference grammars: softening syntactic constraints to improve statistical machine translation. In: Proceedings of human language technologies: the 2009 annual conference of the north American chapter of the association for computational linguistics, Boulder, Colorado, USA, pp 236–244Google Scholar
  45. Vilar D, Stein D, Ney H (2008) Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), Waikiki, Hawaii, pp 190–197Google Scholar
  46. Vilar D, Stein D, Huck M, Ney H (2010) Jane: open source hierarchical translation, extended with reordering and lexicon models. In: ACL 2010 joint fifth workshop on statistical machine translation and metrics MATR, Uppsala, Sweden, pp 262–270Google Scholar
  47. Wauters LN, van Bon WHJ, Tellings AEJM (2006) Reading comprehension of Dutch deaf children. Read Writ Interdiscip J 19: 49–76CrossRefGoogle Scholar
  48. Zens R, Ney H (2008) Improvements in dynamic programming beam search for phrase-based statistical machine translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), Honolulu, Hawaii, pp 195–205Google Scholar
  49. Zielinski A, Simon C (2008) Morphisto: an open-source morphological analyzer for German. In: Proceedings of the international workshop on finite-state methods and natural language processing Ispra, Italy, pp 177–184Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.RWTH Aachen UniversityAachenGermany

Personalised recommendations