A Study on Effect of Semantic Noise Parameters on Corpus for English–Hindi Statistical Machine Translation

Maheshwari, Shikha

doi:10.1007/978-981-10-7386-1_45

Shikha Maheshwari¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 696))

1215 Accesses

Abstract

In this paper, an approach has been proposed for improving the tone of statistical machine translation system by analyzing the effect of semantic noise parameters on corpus leading to the selection of more informative corpus. As for some specific application nowadays being translation system running on mobile devices, etc., the computation resources are limited and therefore a compact, efficient, and quite informative corpus is desirable, the resulted optimized corpus will then enhance the performance of translation system. In this proposed research work, extensive work on data cleaning for reducing the impact of semantic noise has been carried out. Experimental results show that our proposed strategies work very well. This work is motivated by our attempts to understand the factors which can affect the quality of corpus for statistical machine translation, especially for English–Hindi systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brown, P., Cocke, J., Pietra, S. A., Pietra, V. J., Jelinek, F., Lafferty, J. D., Roossin, P. S. (1990). A Statistical Approach to Machine Translation. Computational Linguistic 16(2), 79–85.
Google Scholar
Mandal, A., Vergyri, D., Wang, W., Zheng, J., Stolcke, A., Tur, G., Ayan, N. F. (2007). Efficient Data Selection for Machine Translation.
Google Scholar
Maheshwari, S., & Sharma, H. (2014). Improvements in Corpus Quality for Statistical Machine Translation. IJSRD - International Journal for Scientific Research & Development.
Google Scholar
Resnik, P. (1999). Mining the web for bilingual text. ACL-1999, 37th Annual Meeting of the Association for Computational Linguistics, (pp. 527–534). Maryland, USA.
Google Scholar
Goutte, C., Carpuat, M., & Foster, G. (2012). The impact of sentence alignment errors on phrase-based machine translation performance. 10th Conference of Association for Machine Translation, (p. 8). San Diego, USA.
Google Scholar
Pecina, P., Toral, A., Papavassiliou, V., Prokopidis, P., Tamchyna, A., Way, A., & Van, G. J. (2014). Domain Adaptation of Statistical Machine Translation using Web-Crawled resources and Model parameter tunig. Language Resources and Evaluation, 147–193.
Google Scholar
Yamada, K., & Knight, K. (2001). A Syntax-based Statistical Translation Model. In proceedings of ACL2001, 523–530.
Google Scholar
Koehn, P. (2010). Statistical Machine Translation. New York: Cambridge University Press.
Google Scholar
Lehal, Goyal, V., & Singh, G. (2009). Advances in Machine Translation Systems. Languages in India Vol. 9, ISSN 1930-2940, 138–150.
Google Scholar
Sunita, R., & Laxmi, D. (2013). Machine Transliteration of related languages: Punjabi to Hindi. IJSETR) Volume 2, Issue 3, 733–773.
Google Scholar
Dubey, P., & Devanand. (2013). Machine Translation System for Hindi-Dogri Language Pair. IEEE Conference (ICMIRA), 422–425.
Google Scholar
Kirchhoff, K., & Bilmes, J. (2014). Submodularity for data selection in machine translation. EMNLP 2014 (pp. 131–141). Doha, Qatar: The 2014 Conference on Empirical Methods In Natural Language Processing.
Google Scholar
Mittal, R., & Garg, N. K. (2014). A Review On Various Techniques Of Machine Translation. IJESRT, ISSN: 2277-9655, 813–815.
Google Scholar
Jinhua, D., & Wang, S. (2011). XAUT Statistical Machine Translation Systems. CWMT2011.
Google Scholar
Yu Zhong, P. L. (2011). Approaches to Improving Corpus Quality for Statistical Machine Translation. International Journal of Computer Processing of Languages, 327–348.
Google Scholar

Download references

Author information

Authors and Affiliations

Jaipur National University, Jaipur, India
Shikha Maheshwari

Authors

Shikha Maheshwari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shikha Maheshwari .

Editor information

Editors and Affiliations

University of Murcia, Murcia, Spain
Gregorio Martinez Perez
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Munesh C. Trivedi
Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, Allahabad, Uttar Pradesh, India
Krishn K. Mishra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maheshwari, S. (2018). A Study on Effect of Semantic Noise Parameters on Corpus for English–Hindi Statistical Machine Translation. In: Perez, G., Tiwari, S., Trivedi, M., Mishra, K. (eds) Ambient Communications and Computer Systems. Advances in Intelligent Systems and Computing, vol 696. Springer, Singapore. https://doi.org/10.1007/978-981-10-7386-1_45

Download citation

DOI: https://doi.org/10.1007/978-981-10-7386-1_45
Published: 21 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7385-4
Online ISBN: 978-981-10-7386-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics