A POS-Based Preordering Approach for English-to-Arabic Statistical Machine Translation

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 782)

Abstract

In this work, we present a POS-based preordering approach that tackles both long- and short-distance reordering phenomena. Syntactic unlexicalized reordering rules are automatically extracted from a parallel corpus using only word alignment and a source-side language tagging. The reordering rules are used in a deterministic manner; this prevents the decoding speed from being bottlenecked in the reordering procedure. A new approach for both rule filtering and rule application is used to ensure a fast and efficient reordering. The tests performed on the IWSLT2016 English-to-Arabic evaluation benchmark show a noticeable increase in the overall Blue Score for our system over the baseline PSMT system.

Keywords

Machine translation Arabic NLP Preordering Reordering rules Statistical translation 

References

  1. 1.
    Brown, P.F., Cocke, J., Della-Pietra, S.A., Della-Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Rossin, P.: A statistical approach to machine translation. Computat. Linguist. 16(2), 76–85 (1990)Google Scholar
  2. 2.
    Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Lakemeyer, G., Koehler, J. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45751-8_2 CrossRefGoogle Scholar
  3. 3.
    Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 295–302 (2002)Google Scholar
  4. 4.
    Xia, F., McCord, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, p. 508 (2004)Google Scholar
  5. 5.
    Habash, N.: Syntactic preprocessing for statistical machine translation. In: Proceedings of the 11th MT Summit, p. 10 (2007)Google Scholar
  6. 6.
    Genzel, D.: Automatically learning source-side reordering rules for large scale machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 376–384. Association for Computational Linguistics (2010)Google Scholar
  7. 7.
    Yang, N., Li, M., Zhang, D., Yu, N.: A ranking-based approach to word reordering for statistical machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 912–920. Association for Computational Linguistics (2012)Google Scholar
  8. 8.
    Sudoh, K., Nagata, M.: Chinese-to-Japanese patent machine translation based on syntactic pre-ordering for WAT 2016. In: Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pp. 211–215 (2016)Google Scholar
  9. 9.
    Jehl, L., Gispert, A., Hopkins, M., Byrne, W.: Source-side preordering for translation using logistic regression and depth-first branch-and-bound search (2014)Google Scholar
  10. 10.
    Fuji, M., Utiyama, M., Sumita, E., Matsumoto, Y.: Global pre-ordering for improving sublanguage translation. In: WAT 2016, p. 84 (2016)Google Scholar
  11. 11.
    Zhang, Y., Zens, R., Ney, H.: Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation, pp. 1–8. Association for Computational Linguistics (2007)Google Scholar
  12. 12.
    Elming, J.: Syntactic reordering integrated with phrase-based SMT. In: Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation, pp. 46–54. Association for Computational Linguistics (2008)Google Scholar
  13. 13.
    Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)MATHGoogle Scholar
  14. 14.
    Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167. Association for Computational Linguistics (2003)Google Scholar
  15. 15.
    Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)Google Scholar
  16. 16.
    Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. arXiv preprint. arXiv:1104.2086 (2011)
  17. 17.
    De La Briandais, R.: File searching using variable length keys. In: Papers presented at the March 3–5, 1959, Western Joint Computer Conference, pp. 295–298. ACM (1959)Google Scholar
  18. 18.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)Google Scholar
  19. 19.
    Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools, vol. 110 (2009)Google Scholar
  20. 20.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)Google Scholar
  21. 21.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)Google Scholar
  22. 22.
    Koehn, P., Axelrod, A., Birch, A., Callison-Burch, C., Osborne, M., Talbot, D., White, M.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: IWSLT, pp. 68–75 (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.NLP, Machine Learning and Applications (TALAA) Group Laboratory for Research in Artificial Intelligence (LRIA), Department of Computer ScienceUniversity of Science and Technology Houari Boumediene (USTHB)Bab-Ezzouar, AlgiersAlgeria

Personalised recommendations