Advertisement

Arabian Journal for Science and Engineering

, Volume 41, Issue 8, pp 3071–3080 | Cite as

A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine

  • Hamdi Ahmed Rajeh
  • Zhiyong Li
  • Abdullah Mohammed Ayedh
Research Article - Computer Engineering and Computer Science

Abstract

This study addresses the integration and incorporation of rich additional information into the phrase-based approach, aptly called factored translation, which is an extension of phrase-based statistic machine translation (PBSMT). This approach was proven successful when translating English into a morphologically rich language. PBSMT represents the baseline of this work. We extend the phrase-based translation approach by integrating additional linguistic knowledge, namely part-of-speech (POS) tags, to create a factored model. The main contribution of this study is the creation of a new approach for Arabic–English translation via the injection of the factored model into Combinatory Categorial Grammar (CCG) supertags to form an integrated model (POS + CCG). The system was trained on a freely available multi-UN corpus on Arabic–English language pairs. Moses decoder, which is an open-source factored SMT system, was used to integrate these data into the target language model and the target side of the translation model. Results showed improvements to the BLEU automatic score via various high n-gram language models (LMs). The integration of the featured factors (POS + CCG) of the translation has been successfully tested. Overall, the 3-, 5-, 7-, and 9-g LM evaluation with BLEU scores proved that our integrated model performed better than PBSMT. Compared with three other models (PBSMT, POS, and CCG models), the integrated model improved the translation quality by 1.54, 1.29, and 0.21 %, respectively, over the 3-g LM.

Keywords

Statistical machine translation Phrase-based translation model Combinatory Categorial Grammar Part-of-speech Factored translation model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tripathi S., Sarkhel J.K.: Approaches to machine translation. Ann. Libr. Inf. Stud. 57, 388–393 (2010)Google Scholar
  2. 2.
    Koehn P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)CrossRefzbMATHGoogle Scholar
  3. 3.
    Mehay, D.N.; Brew, C.: CCG syntactic reordering models for phrase-based machine translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation ACL, pp. 210–221 (2012)Google Scholar
  4. 4.
    Koehn, P.; Och, F.J.; Marcu, D.: Statistical phrase-based translation. In: Proceedings of NAACL-HLT. ACL, pp. 48–54 (2003)Google Scholar
  5. 5.
    Hassan H., Sima’an K., Way A.: Syntactically lexicalized phrase-based SMT. IEEE Trans. Audio Speech Lang. Process. 16(7), 1260–1273 (2008)CrossRefGoogle Scholar
  6. 6.
    Steedman M.: The Syntactic Process. MIT Press, Cambridge (2000)zbMATHGoogle Scholar
  7. 7.
    Koehn, P.; Hoang, H.: Factored translation models. In: EMNLP-CoNLL, pp. 868–876 (2007)Google Scholar
  8. 8.
    Hassan, H.; Sima’an, K.; Way, A.: A syntactic language model based on incremental CCG parsing. In: Spoken Language Technology Workshop, IEEE, pp. 205–208 (2008)Google Scholar
  9. 9.
    Almaghout, H.; Jiang, J., Way, A.: Extending CCG-based syntactic constraints in hierarchical phrase-based SMT. In: Proceedings of the Annual Conference of the European Association for MT (EAMT), pp. 193–200 (2012)Google Scholar
  10. 10.
    Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Cowan, B.; Shen, W.; Moran, C.; Zens, R.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL, pp. 177–180 (2007)Google Scholar
  11. 11.
    Bojar, O.: English-to-Czech factored machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation. ACL, pp. 232–239 (2007)Google Scholar
  12. 12.
    Huet, S.; Manishina, E.; Lefèvre, F.: Factored machine translation systems for Russian-English. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 152–155 (2013)Google Scholar
  13. 13.
    de Medeiros Caseli, H.; Nunes, I.A.: Factored Translation between Brazilian Portuguese and English. In: SBIA, pp. 163–172. Springer (2010)Google Scholar
  14. 14.
    Almaghout, H.; Jiang, J., Way, A.: CCG augmented hierarchical phrase-based machine-translation. In: Proceedings of the 7th International Workshop on Spoken Language Translatiopn (2010)Google Scholar
  15. 15.
    Almaghout, H.; Jiang, J., Way, A.: CCG contextual labels in hierarchical phrase-based SMT. In: Proceedings of EAMT, pp. 281–288 (2011)Google Scholar
  16. 16.
    Birch, A.; Osborne, M.; Koehn, P.: CCG supertags in factored statistical machine translation. In: Proceedings of the Second Workshop on SMT. ACL, pp. 9–16 (2007)Google Scholar
  17. 17.
    Mustafa S.H.: Character contiguity in N-gram-based word matching: the case for Arabic text searching. Inf. Process. Manag. 41(4), 819–827 (2005)CrossRefGoogle Scholar
  18. 18.
    Clark S., Curran J.R.: Wide-coverage efficient statistical parsing with CCG and log-linear models. Comput. Linguist. 33(4), 493–552 (2007)CrossRefzbMATHGoogle Scholar
  19. 19.
    Curran, J.R.; Clark, S.; Vadas, D.: Multi-tagging for lexicalized-grammar parsing. In: Proceedings of the 21st International Conference on Computational Linguistics ACL, pp. 697–704 (2006)Google Scholar
  20. 20.
    Hockenmaier, J.; Steedman, M.: CCGbank: User’s Manual. Technical Reports (CIS). Paper 52. Department of Computer & Information Science, University of Pennsylvania, Philadelphia (2005). http://repository.upenn.edu/cgi/viewcontent.cgi?article=1054&context=cis_reports
  21. 21.
    Hassan, H.; Sima’an, K.; Way, A.: Supertagged phrase-based statistical machine translation. In: Proceedings of the ACL (2007)Google Scholar
  22. 22.
    Boxwell, S.A.; Brew, C.: A Pilot Arabic CCGbank. In: Proceedings of the Seventh International Conference on LREC-10 (2010)Google Scholar
  23. 23.
    El-taher A.I., Bakr H.M.A., Zidan I., Shaalan K.: An Arabic CCG approach for determining constituent types from Arabic Treebank. J. King Saud Univ. Comput. Info. Sci. 26(4), 441–449 (2014)Google Scholar
  24. 24.
    Kaeshammer, M.; Wetzel, D.: Enriching phrase-based statistical machine translation with POS information. In: RANLP Student Research Workshop, pp. 33–40 (2011)Google Scholar
  25. 25.
    Tian, L.;Wong, D.F.; Chao, L.S.; Oliveira, F.: A relationship: word alignment, phrase table, and translation quality. Sci.World J. 2014, 438106 (2014). doi: 10.1155/2014/438106
  26. 26.
    Clark, S.; Curran, J.R.: Parsing the WSJ using CCG and log-linear models. In: Proceedings of the 42nd Annual ACL, p. 103 (2004)Google Scholar
  27. 27.
    Federico, M.; Bertoldi, N.; Cettolo, M.: IRSTLM: an open source toolkit for handling large scale language models. In: Interspeech, 9th Annual Conference of the International Speech Communication Association, pp. 1618–1621 (2008)Google Scholar
  28. 28.
    Tamchyna, A.; Bojar, O.: No free lunch in factored phrase-based machine translation. In: Computational Linguistics and Intelligent Text Processing, pp. 210–223. Springer (2013)Google Scholar
  29. 29.
    Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on ACL, pp. 311–318 (2002)Google Scholar

Copyright information

© King Fahd University of Petroleum & Minerals 2016

Authors and Affiliations

  • Hamdi Ahmed Rajeh
    • 1
  • Zhiyong Li
    • 1
  • Abdullah Mohammed Ayedh
    • 2
  1. 1.Hunan UniversityChangshaChina
  2. 2.Central South UniversityChangshaChina

Personalised recommendations