Abstract
In this paper, we propose an alternate technique to improve the performance of the statistical machine translation (SMT) system. Here, the phrases are re-weighted in light of linguistic knowledge as both syntactic and semantic information. Syntactic knowledge helps to increase fluency whereas semantic similarity helps to incorporate semantic meaning, which is required for adequacy of translated sentences. The scores of the phrases from the phrase-table are re-balanced by expanding and diminishing the weights of the correct phrases and the incorrect phrases, respectively. Additional knowledge in phrase-table helps in improving overall performance of translation quality. In this work, our proposed methodology achieves an impressive accuracy improvement in terms of BLEU, NIST and RIBES in different domain data. We achieve 58.54 BLEU points, 0.7759 RIBES points and 9.684 NIST points for product domain catalog.
Similar content being viewed by others
Notes
HT: kaam hun.
HT: kaam kar raha hun.
HT: suraj.
HT: achhya hun.
HT: achhya.
References
Callison-Burch C and Koehn P 2005 Introduction to statistical machine translation. Language 1: 1
Koehn P and Monz C 2006 Shared task: Exploiting parallel texts for statistical machine translation. In: Proceedings of the NAACL 2006 workshop on statistical machine translation, New York City (June 2006)
Koehn P, Och F J and Marcu D 2003 Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 48–54. Association for Computational Linguistics
Och F J, Tillmann C, Ney H, et al 1999 Improved alignment models for statistical machine translation. In: Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20–28
Yamada K and Knight K 2001 A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 523–530. Association for Computational Linguistics
Marcu D and Wong W 2002 A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 133–139. Association for Computational Linguistics
Hanneman G and Lavie A 2009 Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system. In: Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation, pp. 1–9. Association for Computational Linguistics
Vilar D, Stein D and Ney H 2008 Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. In: IWSLT, pp. 190–197
Marton Y and Resnik P 2008 Soft syntactic constraints for hierarchical phrased-based translation. In: ACL, pp. 1003–1011
Chiang D 2005 A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270. Association for Computational Linguistics
Nadejde M, Birch A and Koehn P 2016 Modeling selectional preferences of verbs and nouns in string-to-tree machine translation. In: WMT, pp. 32–42
Weller M, Walde S S I and Fraser A 2014 Using noun class information to model selectional preferences for translating prepositions in smt. In: Proceedings of AMTA
Wang C, Collins M and Koehn P 2007 Chinese syntactic reordering for statistical machine translation. In: EMNLP-CoNLL, pp. 737–745
Wang W, Knight K and Marcu D 2007 Binarizing syntax trees to improve syntax-based machine translation accuracy. In: EMNLP-CoNLL, pp. 746–754
DeNeefe S, Knight K, Wang W and Marcu D 2007 What can syntax-based mt learn from phrase-based mt? In: EMNLP-CoNLL, pp. 755–763
Marcu D, Wang W, Echihabi A and Knight K 2006 Spmt: Statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 44–52. Association for Computational Linguistics
Charniak E, Knight K and Yamada K 2003 Syntax-based language models for statistical machine translation. In: Proceedings of MT Summit IX, pp. 40–46
Pal S, Hasanuzzaman M, Naskar S K and Bandyopadhyay S 2013 Impact of linguistically motivated shallow phrases in pb-smt. ICON
Banik D, Ekbal A and Bhattacharyya P 2018 Machine learning based optimized pruning approach for decoding in statistical machine translation. IEEE Access 7: 1736–1751
Sen S, Banik D, Ekbal A and Bhattacharyya P 2016 Iitp English-Hindi machine translation system at wat 2016. In: Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pp. 216–222
Banik D, Sen S, Ekbal A and Bhattacharyya P 2016 Can smt and rbmt improve each other’s performance?-an experiment with english-hindi translation. In: 13th International Conference on Natural Language Processing, p. 10
Banik D, Ekbal A, Bhattacharyya P and Bhattacharyya S 2019 Assembling translations from multi-engine machine translation outputs. Applied Soft Computing 78: 230–239
Chiang D, Marton Y and Resnik P 2008 Online large-margin training of syntactic and structural translation features. In: Proceedings of the conference on empirical methods in natural language processing, pp. 224–233. Association for Computational Linguistics
Jones B, Andreas J, Bauer D, Hermann Karl Moritz and Knight Kevin 2012 Semantics-based machine translation with hyperedge replacement grammars. In: COLING, pp. 1359–1376
Hermann K M 2012 Semantics-based machine translation with hyperedge replacement grammars
Collins M, Koehn P and Kučerová I 2005 Clause restructuring for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp. 531–540. Association for Computational Linguistics
Ramanathan A, Hegde J, Shah R M, Bhattacharyya P and Sasikumar M 2008 Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In: IJCNLP, pp. 513–520
Bhattacharyya P 2017 Indowordnet. In: The WordNet in Indian Languages, pp. 1–18. Springer
Bojar O, Diatka V, Rychlỳ P, Stranák P, Suchomel V, Tamchyna A and Zeman D 2014 Hindencorp-Hindi-English and Hindi-only corpus for machine translation. In: LREC, pp. 3550–3555
Jha G N 2010 The tdil program and the Indian language corpora intiative (ilci). In: LREC
Kunchukuttan A, Mehta P and Bhattacharyya P 2017 The IIT Bombay English-Hindi parallel corpus. arXiv preprint arXiv:1710.02855
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, et al 2007 Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pp. 177–180. Association for Computational Linguistics
Papineni K, Roukos S, Ward T and Zhu W-J 2002 Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics
Callison-Burch C 2005 Linear b system description for the 2005 nist mt evaluation exercise. In: Proceedings of the NIST 2005 Machine Translation Evaluation Workshop. Citeseer
Isozaki H, Hirao T, Duh K, Sudoh K and Tsukada H 2010 Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944–952. Association for Computational Linguistics
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Banik, D., Ekbal, A. & Bhattacharyya, P. Statistical machine translation based on weighted syntax–semantics. Sādhanā 45, 191 (2020). https://doi.org/10.1007/s12046-020-01427-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-020-01427-w