Abstract
In this paper, an approach has been proposed for improving the tone of statistical machine translation system by analyzing the effect of semantic noise parameters on corpus leading to the selection of more informative corpus. As for some specific application nowadays being translation system running on mobile devices, etc., the computation resources are limited and therefore a compact, efficient, and quite informative corpus is desirable, the resulted optimized corpus will then enhance the performance of translation system. In this proposed research work, extensive work on data cleaning for reducing the impact of semantic noise has been carried out. Experimental results show that our proposed strategies work very well. This work is motivated by our attempts to understand the factors which can affect the quality of corpus for statistical machine translation, especially for English–Hindi systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brown, P., Cocke, J., Pietra, S. A., Pietra, V. J., Jelinek, F., Lafferty, J. D., Roossin, P. S. (1990). A Statistical Approach to Machine Translation. Computational Linguistic 16(2), 79–85.
Mandal, A., Vergyri, D., Wang, W., Zheng, J., Stolcke, A., Tur, G., Ayan, N. F. (2007). Efficient Data Selection for Machine Translation.
Maheshwari, S., & Sharma, H. (2014). Improvements in Corpus Quality for Statistical Machine Translation. IJSRD - International Journal for Scientific Research & Development.
Resnik, P. (1999). Mining the web for bilingual text. ACL-1999, 37th Annual Meeting of the Association for Computational Linguistics, (pp. 527–534). Maryland, USA.
Goutte, C., Carpuat, M., & Foster, G. (2012). The impact of sentence alignment errors on phrase-based machine translation performance. 10th Conference of Association for Machine Translation, (p. 8). San Diego, USA.
Pecina, P., Toral, A., Papavassiliou, V., Prokopidis, P., Tamchyna, A., Way, A., & Van, G. J. (2014). Domain Adaptation of Statistical Machine Translation using Web-Crawled resources and Model parameter tunig. Language Resources and Evaluation, 147–193.
Yamada, K., & Knight, K. (2001). A Syntax-based Statistical Translation Model. In proceedings of ACL2001, 523–530.
Koehn, P. (2010). Statistical Machine Translation. New York: Cambridge University Press.
Lehal, Goyal, V., & Singh, G. (2009). Advances in Machine Translation Systems. Languages in India Vol. 9, ISSN 1930-2940, 138–150.
Sunita, R., & Laxmi, D. (2013). Machine Transliteration of related languages: Punjabi to Hindi. IJSETR) Volume 2, Issue 3, 733–773.
Dubey, P., & Devanand. (2013). Machine Translation System for Hindi-Dogri Language Pair. IEEE Conference (ICMIRA), 422–425.
Kirchhoff, K., & Bilmes, J. (2014). Submodularity for data selection in machine translation. EMNLP 2014 (pp. 131–141). Doha, Qatar: The 2014 Conference on Empirical Methods In Natural Language Processing.
Mittal, R., & Garg, N. K. (2014). A Review On Various Techniques Of Machine Translation. IJESRT, ISSN: 2277-9655, 813–815.
Jinhua, D., & Wang, S. (2011). XAUT Statistical Machine Translation Systems. CWMT2011.
Yu Zhong, P. L. (2011). Approaches to Improving Corpus Quality for Statistical Machine Translation. International Journal of Computer Processing of Languages, 327–348.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Maheshwari, S. (2018). A Study on Effect of Semantic Noise Parameters on Corpus for English–Hindi Statistical Machine Translation. In: Perez, G., Tiwari, S., Trivedi, M., Mishra, K. (eds) Ambient Communications and Computer Systems. Advances in Intelligent Systems and Computing, vol 696. Springer, Singapore. https://doi.org/10.1007/978-981-10-7386-1_45
Download citation
DOI: https://doi.org/10.1007/978-981-10-7386-1_45
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7385-4
Online ISBN: 978-981-10-7386-1
eBook Packages: EngineeringEngineering (R0)