Skip to main content

The Utility of Hierarchical Phrase-Based Model Machine Translation for Low Resource Languages

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2018)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13396))

  • 297 Accesses

Abstract

The paper uses a hierarchical phrase-based model to develop Statistical Machine Translation (SMT) Systems for four low resourced South Asian languages. South Asian languages predominantly use traditional statistical and neural machine approaches to translate into another language (mainly English). However, translation accuracy is not much higher as South Asian languages lack in necessary natural language resources and tools; hence classified as low resourced languages. Any SMT system needs large parallel corpora for actual performance. So, the non-availability of corpora constraints the success in machine translation of those languages. Another reason for poor translation quality is grammatical differences between South Asian languages and English: morphological richness and different sentence structure. But traditional SMT systems use the default distortion reordering model to reorder the sentences independent of their context. To overcome this problem, hierarchical phrase model translation, which uses grammar rules formed by the Synchronous Context-Free Grammar, is proposed. This paper considers English to Tamil, Tamil to English, Malayalam to English, English to Malayalam, Tamil to Sinhala and Sinhala to Tamil translations. In the end, we evaluate the system using BLEU as the evaluation metric. The hierarchical phrase-based model shows better results compared to the traditional approach between Tamil-English and Malayalam-English pairs. For Sinhala to Tamil, it achieves 11.18 and 10.73 for vice-versa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2007)

    Google Scholar 

  2. Watanabe, T., Tsukada, H., Isozaki, H.: Left-to-right target generation for hierarchical phrase-based translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006)

    Google Scholar 

  3. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2005)

    Google Scholar 

  4. Gispert, A.: Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars. Comput. Linguist. 36(3), 505–533 (2010)

    Article  Google Scholar 

  5. Koehn, P.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: International Workshop on Spoken Language Translation (IWSLT) 2005 (2005)

    Google Scholar 

  6. Mohaghegh, M., Sarrafzadeh, A.: A hierarchical phrase-based model for English-persian statistical machine translation. In: 2012 International Conference on Innovations in Information Technology (IIT) (2012)

    Google Scholar 

  7. Jawaid, B., Zeman, D.: Word-order issues in english-to-urdu statistical machine translation. The Prague Bulletin of Mathematical Linguistics 95, 87–106 (2011)

    Article  Google Scholar 

  8. Khan, N.: English to urdu hierarchical phrase-based statistical machine translation. In: Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing (2013)

    Google Scholar 

  9. Germann, U.: Building a statistical machine translation system from scratch: how much bang for the buck can we expect? In: Proceedings of the Workshop on Data-Driven Methods in Machine Translation, pp. 1–8. ACL, Morristown, NJ, USA (2001)

    Google Scholar 

  10. Vasu Renganathan.: An interactive approach to development of English to Tamil machine translation system on the web. In: Proceedings of INFITT-2002 (2002)

    Google Scholar 

  11. AUKBC Research Centre

    Google Scholar 

  12. Loganathan, R.: English-Tamil Machine Translation System. Master of Science by Research Thesis, Amrita Vishwa Vidyapeetham, Coimbatore (2010)

    Google Scholar 

  13. Kumar, M.A.: Factored statistical machine translation system for English to Tamil language. Pertanika J. Soc. Sci. Humanit. 22(4) (2014)

    Google Scholar 

  14. Unnikrishnan, P., Antony, P.J., Soman, K.P.: A novel approach for English to south dravidian language (2011)

    Google Scholar 

  15. Sebastian, M.P., Sheena Kurian, K., Kumar, G.S.: A framework of statistical machine translator from English to malayalam. In: Proceedings of Fourth International Conference on Information Processing, Bangalore, India (2010)

    Google Scholar 

  16. Weerasinghe, R.: A statistical machine translation approach to sinhala-tamil language translation. Towards an ICT Enabled Society, p. 136 (2003)

    Google Scholar 

  17. Sripirakas, S., Weerasinghe, A., Herath, D.L.: Statistical machine translation of systems for Sinhala-Tamil. In: International Conference on Advances in ICT for Emerging Regions (ICTer), 2010 (2010)

    Google Scholar 

  18. Sripirakas, S., Weerasinghe, A.R., Herath, D.L.: Statistical machine translation of systems for Sinhala-Tamil. In: International Conference on Advances in ICT for Emerging Regions (ICTer), 2010, pp. 62–68. IEEE (2010)

    Google Scholar 

  19. Jeyakaran, M., Weerasinghe, R.: A novel kernel regression based machine translation system for Sinhala-Tamil translation. In: Proceedings of the 4th Annual UCSC Research Symposium (2011)

    Google Scholar 

  20. Pushpananda, R., Weerasinghe, R., Niranjan, M.: Sinhala-Tamil machine translation: towards better translation quality. In: Australasian Language Technology Association Workshop 2014, Melbourne (2014)

    Google Scholar 

  21. Rajpirathap, S., Sheeyam, S., Umasuthan, K.: Real-time direct translation system for Sinhala and Tamil languages. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS) (2015)

    Google Scholar 

  22. http://ltrc.iiit.ac.in/corpus/corpus.html

  23. Koehn, P., et al.: Moses: open source toolkit for statistical machine translation (2007)

    Google Scholar 

  24. Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation (2011)

    Google Scholar 

  25. Och, F.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (2003)

    Google Scholar 

  26. Huang, L., Chiang, D.: Forest rescoring: faster decoding with integrated language models. In: Proceedings of the Annual Meeting-Association For Computational Linguistics (2007)

    Google Scholar 

  27. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2002)

    Google Scholar 

  28. Nagata, M., et al.: A clustered global phrase reordering model for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006)

    Google Scholar 

  29. Pushpananda, R., Weerasinghe, R., Niranjan, M.: Statistical machine translation from and into morphologically rich and low resourced languages. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 545–556. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18111-0_41

    Chapter  Google Scholar 

Download references

Acknowledgement

We are grateful to Treesa Anjaly Cyriac, who supported in providing Malayalam resources. We are thankful to all our colleagues who supported the Linguists who helped align POS tagsets and us. The authors would also like to thank the LK Domain registry for partially funding this publication.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to S. Yashothara or R. T. Uthayasanker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yashothara, S., Uthayasanker, R.T. (2023). The Utility of Hierarchical Phrase-Based Model Machine Translation for Low Resource Languages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23793-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23792-8

  • Online ISBN: 978-3-031-23793-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics