Morphology generation for English-Indian language statistical machine translation

Sreelekha, S.

doi:10.1007/s00500-020-05393-7

Morphology generation for English-Indian language statistical machine translation

Methodologies and Application
Published: 19 November 2020

Volume 25, pages 3657–3664, (2021)
Cite this article

Soft Computing Aims and scope Submit manuscript

Sreelekha S. ORCID: orcid.org/0000-0002-5455-5023¹

230 Accesses
2 Citations
Explore all metrics

Abstract

When translating into morphologically rich languages, statistical MT approaches face the problem of data sparsity. The severity of the sparseness problem will be high when the corpus size of morphologically richer language is less. Even though, we can use factored models to correctly generate morphological forms of words, the problem of data sparseness limits their performance. In this paper, we describe a simple and effective solution which is based on enriching the input corpora with various morphological forms of words. We use this method with the phrase-based and factor-based experiments on two morphologically rich languages: Hindi and Marathi when translating from English. We evaluate the performance of our experiments both in terms of automatic evaluation and subjective evaluation such as adequacy and fluency. We observe that the morphology injection method helps in improving the quality of translation. We further analyze that the morph injection method helps in handling the data sparseness problem to a great level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages

SMT: A Case Study of Kazakh-English Word Alignment

Analysis of Complexity Between Spoken and Written Language for Statistical Machine Translation in West-Slavic Group

Availability of data and materials

The created Lexical Resources are freely available using a creative commons license.

Notes

References

Avramidis E, Koehn P (2008) Enriching morphologically poor languages for statistical machine translation. In: ACL
Chahuneau V, Schlinger E, Smith NA, Dyer C (2013) Translating into morphologically rich languages with synthetic phrases. In: EMNLP
De Marneffe M-C, Manning CD (2008) Stanford typed dependencies manual. http://nlp.stanford.edu/software/dependenciesmanual.pdf
Koehn P, Hoang H (2007) Factored translation models. In: EMNLP-CoNLL
Koehn P, Och FJ, Marcu D (2007) Statistical phrase-based translation. NAACL on human language technology-volume 1. ACL
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL
Singh S, Sarma VM (2011) Verbal inflection in Hindi: a distributed morphology approach. In: PACLIC
Singh S, Sarma VM, Muller S (2010) Hindi noun inflection and distributed morphology. Universite Paris Diderot, Paris 7, France. Muller S (ed). CSLI Publications (2006), p 307
Sreelekha S, Dabre R, Bhattacharyya P (2013) Comparison of SMT and RBMT, the requirement of hybridization for Marathi–Hindi MT ICON. In: 10th international conference on NLP, December 2013
Tamchyna A, Bojar O (2013) No free lunch in factored phrase-based machine translation. In: Computational linguistics and intelligent text processing. Springer, Berlin Heidelberg, pp 210–223
Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature rich part of speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol 1. Association for Computational Linguistics, 27 May 2003, pp 173–180. https://doi.org/10.3115/1073445.1073478

Download references

Acknowledgements

The authors would like to thank Prof. Pushpak Bhattacharyya for his guidance during this work. The authors would like to thank Department of Science & Technology, Govt. of India for providing fund under Woman Scientist Scheme (WOS-A) with the Project Code-SR/WOS-A/ET/1075/2014. The author would like to acknowledge her own associated works published in ACM and ACL web.

Funding

This work is funded by Department of Science & Technology, Govt. of India under Woman Scientist Scheme (WOS-A) with the Project Code-SR/WOS-A/ET/1075/2014.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai, India
Sreelekha S.

Authors

Sreelekha S.
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The first author is the sole author of this work.

Corresponding author

Correspondence to Sreelekha S..

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interest associated with this work.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sreelekha, S. Morphology generation for English-Indian language statistical machine translation. Soft Comput 25, 3657–3664 (2021). https://doi.org/10.1007/s00500-020-05393-7

Download citation

Published: 19 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00500-020-05393-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Morphology generation for English-Indian language statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages

SMT: A Case Study of Kazakh-English Word Alignment

Analysis of Complexity Between Spoken and Written Language for Statistical Machine Translation in West-Slavic Group

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Morphology generation for English-Indian language statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages

SMT: A Case Study of Kazakh-English Word Alignment

Analysis of Complexity Between Spoken and Written Language for Statistical Machine Translation in West-Slavic Group

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation