Advertisement

Language Resources and Evaluation

, Volume 50, Issue 4, pp 793–819 | Cite as

Referential translation machines for predicting semantic similarity

  • Ergun BiçiciEmail author
  • Andy Way
Original Paper

Abstract

Referential translation machines (RTMs) are a computational model effective at judging monolingual and bilingual similarity while identifying translation acts between any two data sets with respect to interpretants, data close to the task instances. RTMs pioneer a language-independent approach to all similarity tasks and remove the need to access any task- or domain-specific information or resource. We use RTMs for predicting the semantic similarity of text and present state-of-the-art results showing that RTMs can achieve better results on the test set than on the training set. Interpretants are used to derive features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of the acts of translation, which may ubiquitously be observed in communication. RTMs can achieve top performance at SemEval in various semantic similarity prediction tasks as well as similarity prediction tasks in bilingual settings. We obtain rankings of various prediction tasks using the performance of RTM and relative evaluation metrics, which can help identify which tasks and subtasks require more work by design.

Keywords

Referential translation machine RTM Semantic similarity Machine translation Performance prediction Machine translation performance prediction 

Notes

Acknowledgments

This work is supported in part by SFI (12/CE/I2267) as part of the ADAPT CNGL Centre for Global Intelligent Content (www.adaptcentre.ie) at Dublin City University, in part by SFI (13/TIDA/I2740) for the project “Monolingual and Bilingual Text Quality Judgments with Translation Performance Prediction” (www.computing.dcu.ie/ebicici/Projects/TIDA_RTM.html), and in part by the European Commission through the QTLaunchPad FP7 Project (No: 296347). We also thank the SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support.

References

  1. Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., Rigau, G., Uria, L., & Wiebe, J. (2015). Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 252–263). Denver: Association for Computational Linguistics. http://www.aclweb.org/anthology/S15-2045.
  2. Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Mihalcea, R., Rigau, G., & Wiebe, J. (2014). SemEval-2014 Task 10: Multilingual semantic textual similarity. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014) (pp. 81–91). Dublin.Google Scholar
  3. Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., & Guo, W. (2013). *SEM 2013 shared task: Semantic textual similarity, including a pilot on typed-similarity. In: *SEM 2013: The second joint conference on lexical and computational semantics.Google Scholar
  4. Baker, C.F., Fillmore, C.J., & Lowe, J.B. (1998). The berkeley framenet project. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, vol. 1, ACL ’98 (pp. 86–90).Google Scholar
  5. Banko, M., & Brill, E. (2001). Scaling to very very large corpora for natural language disambiguation. In: Proceedings of 39th annual meeting of the association for computational linguistics (pp. 26–33). Toulouse. doi: 10.3115/1073012.1073017. http://www.aclweb.org/anthology/P01-1005.
  6. Bär, D., Biemann, C., Gurevych, I., & Zesch, T. (2012). Ukp: Computing semantic textual similarity by combining multiple content similarity measures. In: *SEM 2012: The first joint conference on lexical and computational semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the sixth international workshop on semantic evaluation (SemEval 2012) (pp. 435–440). Montréal (2012). http://www.aclweb.org/anthology/S12-1059.
  7. Biçici, E. (2008). Context-based sentence alignment in parallel corpora. In: A. Gelbukh (ed.) Computational linguistics and intelligent text processing. Lecture notes in computer science (vol. 4919, pp. 434–444). doi: 10.1007/978-3-540-78135-6_37.
  8. Biçici, E. (2011). The regression model of machine translation. Ph.D. thesis, Koç University. Supervisor: Deniz Yuret.Google Scholar
  9. Biçici, E. (2013). Referential translation machines for quality estimation. In: Proceedings of the eighth workshop on statistical machine translation (pp. 343–351). Sofia.Google Scholar
  10. Biçici, E. (2015). RTM-DCU: Predicting semantic similarity with referential translation machines. In: SemEval-2015: Semantic evaluation exercises: International workshop on semantic evaluation. Denver.Google Scholar
  11. Biçici, E., & van Genabith, J. (2013). CNGL-CORE: Referential translation machines for measuring semantic similarity. In: *SEM 2013: The second joint conference on lexical and computational semantics (pp. 234–240). Atlanta.Google Scholar
  12. Biçici, E., & Way, A. (2014) Referential translation machines for predicting translation quality. In: Proceedings of the ninth workshop on statistical machine translation (pp. 313–321). Baltimore.Google Scholar
  13. Biçici, E., & Way, A. (2014). RTM-DCU: Referential translation machines for semantic similarity. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval-2014) (pp. 487–496). Dublin. http://aclweb.org/anthology/S14-2085.
  14. Biçici, E., & Yuret, D. (2011). Instance selection for machine translation using feature decay algorithms. In: Proceedings of the sixth workshop on statistical machine translation (pp. 272–283). Edinburgh.Google Scholar
  15. Biçici, E., & Yuret, D. (2011) RegMT system for machine translation, system combination, and evaluation. In: Proceedings of the sixth workshop on statistical machine translation (pp. 323–329). Edinburgh. http://www.aclweb.org/anthology/W11-2137.
  16. Biçici, E., Liu, Q., & Way, A. (2014). Parallel FDA5 for fast deployment of accurate statistical machine translation systems. In: Proceedings of the ninth workshop on statistical machine translation (pp. 59–65). Baltimore.Google Scholar
  17. Biçici, E., Liu, Q., & Way, A. (2015). ParFDA for fast deployment of accurate statistical machine translation systems, benchmarks, and statistics. In: Proceedings of the tenth workshop on statistical machine translation. Lisbon: Association for Computational Linguistics.Google Scholar
  18. Biçici, E., Liu, Q., & Way, A. (2015). Referential translation machines for predicting translation quality and related statistics. In: Proceedings of the tenth workshop on statistical machine translation. Lisbon: Association for Computational Linguistics.Google Scholar
  19. Biçici, E. (2008). Consensus ontologies in socially interacting multiagent systems. Journal of Multiagent and Grid Systems, 4(3), 297–314.CrossRefGoogle Scholar
  20. Biçici, E., Groves, D., & van Genabith, J. (2013). Predicting sentence translation quality using extrinsic and language independent features. Machine Translation, 27, 171–192. doi: 10.1007/s10590-013-9138-4.CrossRefGoogle Scholar
  21. Biçici, E., & Yuret, D. (2015). Optimizing instance selection for statistical machine translation with feature decay algorithms. IEEE/ACM Transactions On Audio, Speech, and Language Processing (TASLP), 23, 339–350. doi: 10.1109/TASLP.2014.2381882.CrossRefGoogle Scholar
  22. Björnsson, C.H. (1968). Läsbarhet.Google Scholar
  23. Bliss, C. (2012). Comedy is translation. http://www.ted.com/talks/chris_bliss_comedy_is_translation.html.
  24. Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., Soricut, R., & Specia, L. (2013). Findings of the 2013 Workshop on Statistical Machine Translation. In: Proceedings of the eighth workshop on statistical machine translation (pp. 1–44). Sofia. http://www.aclweb.org/anthology/W13-2201.
  25. Bojar, O., Buck, C., Federmann, C., Haddow, B., Koehn, P., Leveling, J., Monz, C., Pecina, P., Post, M., Saint-Amand, H., Soricut, R., Specia, L., & Tamchyna, A. (2014). Findings of the 2014 workshop on statistical machine translation. In: Proceedings of the ninth workshop on statistical machine translation (pp. 12–58). Baltimore.Google Scholar
  26. Bojar, O., Chatterjee, R., Federmann, C., Haddow, B., Hokamp, C., Huck, M., Pecina, P., Koehn, P., Monz, C., Negri, M., Post, M., Scarton, C., Specia, L., & Turchi, M. (2015). Findings of the 2015 workshop on statistical machine translation. In: Proceedings of the tenth workshop on statistical machine translation. Lisbon.Google Scholar
  27. Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.Google Scholar
  28. Cover, T. M., & Thomas, J. A. (2006). Elements of information theory. Wiley-Interscience.Google Scholar
  29. de Souza, J.G.C., Buck, C., Turchi, M., & Negri, M. (2013). FBK-UEdin participation to the WMT13 quality estimation shared task. In: Proceedings of the eighth workshop on statistical machine translation (pp. 352–358). Sofia.Google Scholar
  30. Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the second international conference on human language technology research (pp. 138–145). San Francisco.Google Scholar
  31. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.CrossRefGoogle Scholar
  32. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.CrossRefGoogle Scholar
  33. Hagström, K. (2012). Swedish readability calculator. https://github.com/keha76/Swedish-Readability-Calculator.
  34. Jurgens, D., Pilehvar, M.T., & Navigli, R. (2014). SemEval-2014 Task 3: Cross-level semantic similarity. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014) (pp. 17–26). Dublin.Google Scholar
  35. Koehn, P. (2010a). Statistical machine translation (1st ed.). New York, USA: Cambridge University Press.Google Scholar
  36. Koehn, P. (2010b). An experimental management system. The Prague Bulletin of Mathematical Linguistics, 94, 87–96.CrossRefGoogle Scholar
  37. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., & Herbst, E. (2007). Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions (pp. 177–180). Association for Computational Linguistics. http://aclweb.org/anthology/P07-2045.
  38. Lavie, A., Agarwal, A. (2007). METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation (pp. 228–231). Prague.Google Scholar
  39. Levy, R., & Andrew, G. (2006). Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the fifth international conference on language resources and evaluation.Google Scholar
  40. Manandhar, S., & Yuret, D. (2013). Second joint conference on lexical and computational semantics (*sem), volume 2: Proceedings of the seventh international workshop on semantic evaluation (semeval 2013). In: Second joint conference on lexical and computational semantics (*SEM), volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013). Association for Computational Linguistics. http://aclweb.org/anthology/S13-2000.
  41. Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2), 313–330.Google Scholar
  42. Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., & Zamparelli, R. (2014). SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014) (pp. 1–8). Dublin.Google Scholar
  43. Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., & Zamparelli, R. (2014). A SICK cure for the evaluation of compositional distributional semantic models. In: N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the ninth international conference on language resources and evaluation (LREC’14). Reykjavik.Google Scholar
  44. Mendonça, Â., Jaquette, D., Graff, D., & DiPersio, D. (2011). Spanish Gigaword third edition, linguistic data consortium.Google Scholar
  45. Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRefGoogle Scholar
  46. Nakov, P., & Zesch, T. (2014). Proceedings of the 8th international workshop on semantic evaluation (semeval 2014). In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014). Association for Computational Linguistics. http://aclweb.org/anthology/S14-2000.
  47. Nakov, P., Zesch, T., Cer, D., & Jurgens, D. (2015). Proceedings of the 9th international workshop on semantic evaluation (semeval 2015). In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Association for Computational Linguistics. http://aclweb.org/anthology/S15-2000.
  48. Papineni, K., Roukos, S., Ward, T., & Zhu, W.J. (2002). BLEU: A method for automatic evaluation of machine translation. In: Proceedings of 40th annual meeting of the association for computational linguistics (pp. 311–318). Philadelphia.Google Scholar
  49. Parker, R., Graff, D., Kong, J., Chen, K., & Maeda, K. (2011). English Gigaword fifth edition, linguistic data consortium.Google Scholar
  50. Pradhan, S. S., Hovy, E. H., Marcus, M. P., Palmer, M., Ramshaw, L. A., & Weischedel, R. M. (2007). Ontonotes: A unified relational semantic representation. International Journal of Semantic Computing, 1(4), 405–419.CrossRefGoogle Scholar
  51. Raybaud, S., Langlois, D., & Smaïli, K. (2011). “This sentence is wrong”. Detecting errors in machine-translated sentences. Machine Translation, 25(1), 1–34. doi: 10.1007/s10590-011-9094-9.CrossRefGoogle Scholar
  52. Seginer, Y. (2007). Learning syntactic structure. Ph.D. thesis, Universiteit van Amsterdam.Google Scholar
  53. Smola, A.J., Murata, N., Schölkopf, B., & Müller, K.R. (1998). Asymptotically optimal choice of \(\varepsilon \)-loss for support vector machines. In: L. Niklasson, M. Boden, T. Ziemke (Eds.), Proceedings of the international conference on artificial neural networks, perspectives in neural computing (pp. 105–110). Berlin.Google Scholar
  54. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation inthe Americas.Google Scholar
  55. Specia, L., Cancedda, N., Dymetman, M., Turchi, M., & Cristianini, N. (2009). Estimating the sentence-level quality of machine translation systems. In: Proceedings of the 13th annual conference of the European association for machine translation (EAMT) (pp. 28–35). Barcelona.Google Scholar
  56. Specia, L., Shah, K., Avramidis, E., & Biçici, E. (2014). QTLaunchPad deliverable D2.2.1 quality estimation for system selection and combination. http://www.qt21.eu/launchpad/deliverable/quality-estimation-system-selection-and-combination.
  57. Stolcke, A. (2002). SRILM - an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing (pp. 901–904).Google Scholar
  58. Tan, L., Scarton, C., Specia, L., & van Genabith, J. (2015). Usaar-sheffield: Semantic textual similarity with deep regression and machine translation evaluation metrics. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 85–89). Association for Computational Linguistics. http://aclweb.org/anthology/S15-2015.
  59. Toutanova, K., Klein, D., Manning, C.D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American Chapter of the association for computational linguistics on human language technology–volume 1, NAACL ’03 (pp. 173–180). Stroudsburg.Google Scholar
  60. Wikipedia: LIX (2013). http://en.wikipedia.org/wiki/LIX.
  61. Xu, W., Callison-Burch, C., & Dolan, B. (2015). Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (pit). In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 1–11). Denver: Association for Computational Linguistics. http://www.aclweb.org/anthology/S15-2001.
  62. Zarrella, G., Henderson, J., Merkhofer, E.M., & Strickhart, L. (2015). Mitre: Seven systems for semantic similarity in tweets. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 12–17). Denver: Association for Computational Linguistics. http://www.aclweb.org/anthology/S15-2002.

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.ADAPT Centre for Digital Content Technology, School of ComputingDublin City UniversityDublinIreland

Personalised recommendations