Skip to main content
Log in

Statistical machine translation based on weighted syntax–semantics

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

In this paper, we propose an alternate technique to improve the performance of the statistical machine translation (SMT) system. Here, the phrases are re-weighted in light of linguistic knowledge as both syntactic and semantic information. Syntactic knowledge helps to increase fluency whereas semantic similarity helps to incorporate semantic meaning, which is required for adequacy of translated sentences. The scores of the phrases from the phrase-table are re-balanced by expanding and diminishing the weights of the correct phrases and the incorrect phrases, respectively. Additional knowledge in phrase-table helps in improving overall performance of translation quality. In this work, our proposed methodology achieves an impressive accuracy improvement in terms of BLEU, NIST and RIBES in different domain data. We achieve 58.54 BLEU points, 0.7759 RIBES points and 9.684 NIST points for product domain catalog.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. HT: kaam hun.

  2. HT: kaam kar raha hun.

  3. HT: suraj.

  4. HT: achhya hun.

  5. HT: achhya.

  6. http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php.

  7. http://nlp.stanford.edu:8080/parser/.

  8. http://www.cfilt.iitb.ac.in/~sudha/bilingual_mapping.tar.gz.

  9. http://www.statmt.org/moses/.

  10. http://nlp.stanford.edu/software/lex-parser.html.

  11. http://ltrc.iiit.ac.in/analyzer/hindi/.

  12. http://ltrc.iiit.ac.in/analyzer/bengali/.

References

  1. Callison-Burch C and Koehn P 2005 Introduction to statistical machine translation. Language 1: 1

    Google Scholar 

  2. Koehn P and Monz C 2006 Shared task: Exploiting parallel texts for statistical machine translation. In: Proceedings of the NAACL 2006 workshop on statistical machine translation, New York City (June 2006)

  3. Koehn P, Och F J and Marcu D 2003 Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 48–54. Association for Computational Linguistics

  4. Och F J, Tillmann C, Ney H, et al 1999 Improved alignment models for statistical machine translation. In: Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20–28

  5. Yamada K and Knight K 2001 A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 523–530. Association for Computational Linguistics

  6. Marcu D and Wong W 2002 A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 133–139. Association for Computational Linguistics

  7. Hanneman G and Lavie A 2009 Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system. In: Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation, pp. 1–9. Association for Computational Linguistics

  8. Vilar D, Stein D and Ney H 2008 Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. In: IWSLT, pp. 190–197

  9. Marton Y and Resnik P 2008 Soft syntactic constraints for hierarchical phrased-based translation. In: ACL, pp. 1003–1011

  10. Chiang D 2005 A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270. Association for Computational Linguistics

  11. Nadejde M, Birch A and Koehn P 2016 Modeling selectional preferences of verbs and nouns in string-to-tree machine translation. In: WMT, pp. 32–42

  12. Weller M, Walde S S I and Fraser A 2014 Using noun class information to model selectional preferences for translating prepositions in smt. In: Proceedings of AMTA

  13. Wang C, Collins M and Koehn P 2007 Chinese syntactic reordering for statistical machine translation. In: EMNLP-CoNLL, pp. 737–745

  14. Wang W, Knight K and Marcu D 2007 Binarizing syntax trees to improve syntax-based machine translation accuracy. In: EMNLP-CoNLL, pp. 746–754

  15. DeNeefe S, Knight K, Wang W and Marcu D 2007 What can syntax-based mt learn from phrase-based mt? In: EMNLP-CoNLL, pp. 755–763

  16. Marcu D, Wang W, Echihabi A and Knight K 2006 Spmt: Statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 44–52. Association for Computational Linguistics

  17. Charniak E, Knight K and Yamada K 2003 Syntax-based language models for statistical machine translation. In: Proceedings of MT Summit IX, pp. 40–46

  18. Pal S, Hasanuzzaman M, Naskar S K and Bandyopadhyay S 2013 Impact of linguistically motivated shallow phrases in pb-smt. ICON

  19. Banik D, Ekbal A and Bhattacharyya P 2018 Machine learning based optimized pruning approach for decoding in statistical machine translation. IEEE Access 7: 1736–1751

    Article  Google Scholar 

  20. Sen S, Banik D, Ekbal A and Bhattacharyya P 2016 Iitp English-Hindi machine translation system at wat 2016. In: Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pp. 216–222

  21. Banik D, Sen S, Ekbal A and Bhattacharyya P 2016 Can smt and rbmt improve each other’s performance?-an experiment with english-hindi translation. In: 13th International Conference on Natural Language Processing, p. 10

  22. Banik D, Ekbal A, Bhattacharyya P and Bhattacharyya S 2019 Assembling translations from multi-engine machine translation outputs. Applied Soft Computing 78: 230–239

    Article  Google Scholar 

  23. Chiang D, Marton Y and Resnik P 2008 Online large-margin training of syntactic and structural translation features. In: Proceedings of the conference on empirical methods in natural language processing, pp. 224–233. Association for Computational Linguistics

  24. Jones B, Andreas J, Bauer D, Hermann Karl Moritz and Knight Kevin 2012 Semantics-based machine translation with hyperedge replacement grammars. In: COLING, pp. 1359–1376

  25. Hermann K M 2012 Semantics-based machine translation with hyperedge replacement grammars

  26. Collins M, Koehn P and Kučerová I 2005 Clause restructuring for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp. 531–540. Association for Computational Linguistics

  27. Ramanathan A, Hegde J, Shah R M, Bhattacharyya P and Sasikumar M 2008 Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In: IJCNLP, pp. 513–520

  28. Bhattacharyya P 2017 Indowordnet. In: The WordNet in Indian Languages, pp. 1–18. Springer

  29. Bojar O, Diatka V, Rychlỳ P, Stranák P, Suchomel V, Tamchyna A and Zeman D 2014 Hindencorp-Hindi-English and Hindi-only corpus for machine translation. In: LREC, pp. 3550–3555

  30. Jha G N 2010 The tdil program and the Indian language corpora intiative (ilci). In: LREC

  31. Kunchukuttan A, Mehta P and Bhattacharyya P 2017 The IIT Bombay English-Hindi parallel corpus. arXiv preprint arXiv:1710.02855

  32. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, et al 2007 Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pp. 177–180. Association for Computational Linguistics

  33. Papineni K, Roukos S, Ward T and Zhu W-J 2002 Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics

  34. Callison-Burch C 2005 Linear b system description for the 2005 nist mt evaluation exercise. In: Proceedings of the NIST 2005 Machine Translation Evaluation Workshop. Citeseer

  35. Isozaki H, Hirao T, Duh K, Sudoh K and Tsukada H 2010 Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944–952. Association for Computational Linguistics

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debajyoty Banik.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banik, D., Ekbal, A. & Bhattacharyya, P. Statistical machine translation based on weighted syntax–semantics. Sādhanā 45, 191 (2020). https://doi.org/10.1007/s12046-020-01427-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-020-01427-w

Keywords

Navigation