Utilization of Multi-word Expressions to Improve Statistical Machine Translation of Statutory Sentences

  • Satomi Sakamoto
  • Yasuhiro OgawaEmail author
  • Makoto Nakamura
  • Tomohiro Ohno
  • Katsuhiko Toyama
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10091)


Statutory sentences are generally difficult to read because of their complicated expressions and length. Such difficulty is one reason for the low quality of statistical machine translation (SMT). Multi-word expressions (MWEs) also complicate statutory sentences and extend their length. Therefore, we proposed a method that utilizes MWEs to improve the SMT system of statutory sentences. In our method, we extracted the monolingual MWEs from a parallel corpus, automatically acquired these translations based on the Dice coefficient, and integrated the extracted bilingual MWEs into an SMT system by the single-tokenization strategy. The experiment results with our SMT system using the proposed method significantly improved the translation quality. Although automatic translation equivalent acquisition using the Dice coefficient is not perfect, the best system’s score was close to a system that used bilingual MWEs whose equivalents are translated by hand.


Multi-word expressions Statistical machine translation Legal information sharing 



This research was partly supported by the Japan Society for the Promotion of Science KAKENHI Grant-in-Aid for Scientific Research (S) No. 23220005, (A) No. 26240050 and (C) No. 15K00201.


  1. 1.
    Caseli, H.M., Villavicencio, A., Machado, A., Finatto, M.J.: Statistically-driven alignment-based multiword expression identification for technical domains. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 1–8 (2009)Google Scholar
  2. 2.
    Van de Cruys, T., Moirón, B.V.: Semantics-based multiword expression extraction. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 25–32 (2007)Google Scholar
  3. 3.
    EDP, ALC Press Inc.: Eijiro, 8 edn. (2014)Google Scholar
  4. 4.
    Finlayson, M.A., Kulkarni, N.: Detecting multi-word expressions improves word sense disambiguation. In: Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World, pp. 20–24 (2011)Google Scholar
  5. 5.
    Isozaki, H., Sudoh, K., Tsukada, H., Duh, K.: Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics MATR, pp. 244–251 (2010)Google Scholar
  6. 6.
    Bui, T.H., Nguyen, L.M., Shimazu, A.: Translating legal sentence by segmentation and rule selection. Int. J. Nat. Lang. Comput. 2(4), 35–54 (2013)CrossRefGoogle Scholar
  7. 7.
    Toyama, K., Saito, D., Sekine, Y., Ogawa, Y., Kakuta, T., Kimura, T., Matsuura, Y.: Design and development of Japanese law translation memory database system. In: Law via the Internet 2011, 12 p. (2011)Google Scholar
  8. 8.
    Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12–19 (2006)Google Scholar
  9. 9.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)Google Scholar
  10. 10.
    Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: Proceedings of the 2004 Conference on Empirical Methods on Natural Language Processing, pp. 230–237 (2004)Google Scholar
  11. 11.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefzbMATHGoogle Scholar
  12. 12.
    Pal, S., Naskar, S.K., Bandyopadhyay, S.: MWE alignment in phrase based statistical machine translation. In: Proceedings of the XIV Machine Translation Summit, pp. 61–68 (2013)Google Scholar
  13. 13.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)Google Scholar
  14. 14.
    Piao, S.S., Rayson, P., Archer, D., McEnery, T.: Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Comput. Speech Lang. 19(4), 378–397 (2005)CrossRefGoogle Scholar
  15. 15.
    Ramisch, C.: Multiword Expressions Acquisition: A Generic and Open Framework. Springer, Cham (2014)Google Scholar
  16. 16.
    Ren, Z., Lü, Y., Cao, J., Liu, Q., Huang, Y.: Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 47–54 (2009)Google Scholar
  17. 17.
    Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). doi: 10.1007/3-540-45715-1_1 CrossRefGoogle Scholar
  18. 18.
    Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol. 2, pp. 901–904 (2002)Google Scholar
  19. 19.
    Tsvetkov, Y., Wintner, S.: Extraction of multi-word expressions from small parallel corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1256–1264 (2010)Google Scholar
  20. 20.
    Zarrieß, S., Kuhn, J.: Exploiting translational correspondences for pattern-independent MWE identification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 23–30 (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Satomi Sakamoto
    • 1
  • Yasuhiro Ogawa
    • 1
    • 2
    Email author
  • Makoto Nakamura
    • 3
  • Tomohiro Ohno
    • 1
    • 2
  • Katsuhiko Toyama
    • 1
    • 2
  1. 1.Graduate School of Information ScienceNagoya UniversityNagoyaJapan
  2. 2.Information Technology CenterNagoya UniversityNagoyaJapan
  3. 3.Graduate School of Law, Japan Legal Information InstituteNagoya UniversityNagoyaJapan

Personalised recommendations