Abstract
The paper uses a hierarchical phrase-based model to develop Statistical Machine Translation (SMT) Systems for four low resourced South Asian languages. South Asian languages predominantly use traditional statistical and neural machine approaches to translate into another language (mainly English). However, translation accuracy is not much higher as South Asian languages lack in necessary natural language resources and tools; hence classified as low resourced languages. Any SMT system needs large parallel corpora for actual performance. So, the non-availability of corpora constraints the success in machine translation of those languages. Another reason for poor translation quality is grammatical differences between South Asian languages and English: morphological richness and different sentence structure. But traditional SMT systems use the default distortion reordering model to reorder the sentences independent of their context. To overcome this problem, hierarchical phrase model translation, which uses grammar rules formed by the Synchronous Context-Free Grammar, is proposed. This paper considers English to Tamil, Tamil to English, Malayalam to English, English to Malayalam, Tamil to Sinhala and Sinhala to Tamil translations. In the end, we evaluate the system using BLEU as the evaluation metric. The hierarchical phrase-based model shows better results compared to the traditional approach between Tamil-English and Malayalam-English pairs. For Sinhala to Tamil, it achieves 11.18 and 10.73 for vice-versa.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2007)
Watanabe, T., Tsukada, H., Isozaki, H.: Left-to-right target generation for hierarchical phrase-based translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2005)
Gispert, A.: Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars. Comput. Linguist. 36(3), 505–533 (2010)
Koehn, P.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: International Workshop on Spoken Language Translation (IWSLT) 2005 (2005)
Mohaghegh, M., Sarrafzadeh, A.: A hierarchical phrase-based model for English-persian statistical machine translation. In: 2012 International Conference on Innovations in Information Technology (IIT) (2012)
Jawaid, B., Zeman, D.: Word-order issues in english-to-urdu statistical machine translation. The Prague Bulletin of Mathematical Linguistics 95, 87–106 (2011)
Khan, N.: English to urdu hierarchical phrase-based statistical machine translation. In: Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing (2013)
Germann, U.: Building a statistical machine translation system from scratch: how much bang for the buck can we expect? In: Proceedings of the Workshop on Data-Driven Methods in Machine Translation, pp. 1–8. ACL, Morristown, NJ, USA (2001)
Vasu Renganathan.: An interactive approach to development of English to Tamil machine translation system on the web. In: Proceedings of INFITT-2002 (2002)
AUKBC Research Centre
Loganathan, R.: English-Tamil Machine Translation System. Master of Science by Research Thesis, Amrita Vishwa Vidyapeetham, Coimbatore (2010)
Kumar, M.A.: Factored statistical machine translation system for English to Tamil language. Pertanika J. Soc. Sci. Humanit. 22(4) (2014)
Unnikrishnan, P., Antony, P.J., Soman, K.P.: A novel approach for English to south dravidian language (2011)
Sebastian, M.P., Sheena Kurian, K., Kumar, G.S.: A framework of statistical machine translator from English to malayalam. In: Proceedings of Fourth International Conference on Information Processing, Bangalore, India (2010)
Weerasinghe, R.: A statistical machine translation approach to sinhala-tamil language translation. Towards an ICT Enabled Society, p. 136 (2003)
Sripirakas, S., Weerasinghe, A., Herath, D.L.: Statistical machine translation of systems for Sinhala-Tamil. In: International Conference on Advances in ICT for Emerging Regions (ICTer), 2010 (2010)
Sripirakas, S., Weerasinghe, A.R., Herath, D.L.: Statistical machine translation of systems for Sinhala-Tamil. In: International Conference on Advances in ICT for Emerging Regions (ICTer), 2010, pp. 62–68. IEEE (2010)
Jeyakaran, M., Weerasinghe, R.: A novel kernel regression based machine translation system for Sinhala-Tamil translation. In: Proceedings of the 4th Annual UCSC Research Symposium (2011)
Pushpananda, R., Weerasinghe, R., Niranjan, M.: Sinhala-Tamil machine translation: towards better translation quality. In: Australasian Language Technology Association Workshop 2014, Melbourne (2014)
Rajpirathap, S., Sheeyam, S., Umasuthan, K.: Real-time direct translation system for Sinhala and Tamil languages. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS) (2015)
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation (2007)
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation (2011)
Och, F.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (2003)
Huang, L., Chiang, D.: Forest rescoring: faster decoding with integrated language models. In: Proceedings of the Annual Meeting-Association For Computational Linguistics (2007)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2002)
Nagata, M., et al.: A clustered global phrase reordering model for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006)
Pushpananda, R., Weerasinghe, R., Niranjan, M.: Statistical machine translation from and into morphologically rich and low resourced languages. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 545–556. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18111-0_41
Acknowledgement
We are grateful to Treesa Anjaly Cyriac, who supported in providing Malayalam resources. We are thankful to all our colleagues who supported the Linguists who helped align POS tagsets and us. The authors would also like to thank the LK Domain registry for partially funding this publication.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Yashothara, S., Uthayasanker, R.T. (2023). The Utility of Hierarchical Phrase-Based Model Machine Translation for Low Resource Languages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-23793-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23792-8
Online ISBN: 978-3-031-23793-5
eBook Packages: Computer ScienceComputer Science (R0)