Exploring the Relevance of Bilingual Morph-Units in Automatic Induction of Translation Templates

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11238)


To tackle the problem of out-of-vocabulary (OOV) words and improve bilingual lexicon coverage, the relevance of bilingual morph-units is explored in inducing translation patterns considering unigram to n-gram and n-gram to unigram translations. The approach relies on induction of translation templates using bilingual stems learnt from automatically acquired bilingual translation lexicons. By generalising the templates using bilingual suffix clusters, new translations are automatically suggested.



K. M. Kavitha and Luís Gomes acknowledge the Research Fellowship by FCT/MCTES with Ref. nos., SFRH/BD/64371/2009 and SFRH/BD/65059/2009, respectively, and the funded research project ISTRION (Ref. PTDC/EIA-EIA/114521/2009) that provided other means for the research carried out. The authors thank NOVA LINCS, FCT/UNL for the support and SJEC for the partial financial assistance provided.


  1. 1.
    Yang, M., Kirchhoff, K.: Phrase-based backoff models for machine translation of highly inflected languages. In: Proceedings of EACL, pp. 41–48 (2006)Google Scholar
  2. 2.
    de Gispert, A., Mariño, J.B. Crego, J.M.: Improving statistical machine translation by classifying and generalizing inflected verb forms. In: Proceedings of 9th European Conference on Speech Communication and Technology, Lisboa, Portugal , pp. 3193–3196 (2005)Google Scholar
  3. 3.
    Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 209–217. ACL (2009)Google Scholar
  4. 4.
    Momouchi, H.S.K.A.Y., Tochinai, K.: Prediction method of word for translation of unknown word. In: Proceedings of the IASTED International Conference, Artificial Intelligence and Soft Computing, 27 July–1 August 1997, Banff, Canada, p. 228. Acta Pr. (1997)Google Scholar
  5. 5.
    Snyder, B., Barzilay, R.: Unsupervised multilingual learning for morphological segmentation. In: Proceedings of ACL 2008: HLT, pp. 737–745. ACL (2008)Google Scholar
  6. 6.
    Karimbi Mahesh, K., Gomes, L., Lopes, J.G.P.: Identification of bilingual segments for translation generation. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 167–178. Springer, Cham (2014). Scholar
  7. 7.
    Cicekli, I., Güvenir, H.A.: Learning translation templates from bilingual translation examples. In: Carl, M., Way, A. (eds.) Recent Advances in Example-Based Machine Translation. TLTB, vol. 21, pp. 255–286. Springer, Dordrecht (2003). Scholar
  8. 8.
    Rile, H., Zong, C., Bo, X.: An approach to automatic acquisition of translation templates based on phrase structure extraction and alignment. IEEE Trans. Audio Speech Lang. Process. 14(5), 1656–1663 (2006)CrossRefGoogle Scholar
  9. 9.
    Gangadharaiah, R., Brown, R.D., Carbonell, J.: Phrasal equivalence classes for generalized corpus-based machine translation. In: Gelbukh, A. (ed.) CICLing 2011. LNCS, vol. 6609, pp. 13–28. Springer, Heidelberg (2011). Scholar
  10. 10.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)Google Scholar
  11. 11.
    Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of Recent Advances in Natural Language Processing, pp. 214–218 (2009)Google Scholar
  12. 12.
    Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS (LNAI), vol. 5816, pp. 587–597. Springer, Heidelberg (2009). Scholar
  13. 13.
    Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 624–633. Springer, Heidelberg (2011). Scholar
  14. 14.
    Gomes, L.. Lopes, G.P.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, pp. 513–524, October 2009Google Scholar
  15. 15.
    Gomes, L.: Translation alignment and extraction within a lexica-centered iterative workflow. Ph.D. thesis, Lisboa, Portugal, December 2017Google Scholar
  16. 16.
    Kavitha, K.M., Gomes, L., Aires, J., Lopes, J.G.P.: Classification and selection of translation candidates for parallel corpora alignment. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS (LNAI), vol. 9273, pp. 723–734. Springer, Cham (2015). Scholar
  17. 17.
    Costa, J., Gomes, L., Lopes, G.P., Russo, L.M.S.: Improving bilingual search performance using compact full-text indices. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 582–595. Springer, Cham (2015). Scholar
  18. 18.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, pp. 52–61. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
  19. 19.
    Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Learning clusters of bilingual suffixes using bilingual translation lexicon. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 607–615. Springer, Cham (2015). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaLisbonPortugal
  2. 2.Department of Computer Science and EngineeringSt Joseph Engineering College VamanjoorMangaluruIndia

Personalised recommendations