Enhancing the Assessment of (Polish) Translation in PROMIS Using Statistical, Semantic, and Neural Network Metrics

  • Krzysztof Wołk
  • Wojciech Glinkowski
  • Agnieszka Żukowska
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 746)


Differences in culture and language create the need for translators to convert text from one language into another. In order to preserve meaning, context must be analyzed in detail in translation. This study aims to develop accurate evaluation metrics for translations within the PROMIS (Patient-Reported Outcomes Measurement Information System) process, particularly in the reconciliation step, by providing reviews by experts as additional information following backward translation. The result is a semi-automatic semantic evaluation metric for Polish based on the concept of the human-aided translation evaluation metric (HMEANT). We assessed the proposed metrics using a statistics-based support vector machine classifier and applied deep neural networks to replicate the operation of the human brain. We compared the results of the proposed metrics with human judgment and well-known machine translation metrics, such as BLEU (Bilingual Evaluation Understudy), NIST, TER (Translation Error Rate), and METEOR (Metric for Evaluation of Translation with Explicit Ordering). We found that a few of the proposed metrics were highly correlated with human judgment and provided additional semantic information independent of human experience. This showed that the proposed metrics can help assess translations in PROMIS.


PROMIS Translation evaluation Machine translation Automatic translation evaluation Translation support 


  1. 1.
    Birch, A., Haddow, B., Germann, U., Nadejde, M., Buck, C., Koehn, P.: The feasibility of HMEANT as a human MT evaluation metric. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 52–61 (2013)Google Scholar
  2. 2.
    Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)CrossRefGoogle Scholar
  3. 3.
    Cettolo, M., Girardi, C., Federico, M.: Wit3: web inventory of transcribed and translated talks. In: Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), pp. 261–268 (2012)Google Scholar
  4. 4.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRefGoogle Scholar
  5. 5.
    Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1556–1566 (2015)Google Scholar
  6. 6.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)Google Scholar
  7. 7.
    Zhang, Y., Vogel, S., Waibel, A.: Interpreting BLEU/NIST scores: how much improvement do we need to have a better system? In: LREC (2004)Google Scholar
  8. 8.
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)Google Scholar
  9. 9.
    Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, vol. 29, pp. 65–72 (2005)Google Scholar
  10. 10.
    Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)CrossRefGoogle Scholar
  11. 11.
    US Department of Health and Human Services. PROMIS: Instrument Development and Psychometric Evaluation Scientific Standards. Bethesda: National Institutes of Health (2012). Accessed 8 Mar 2017
  12. 12.
    Wołk, K., Marasek, K.: Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents. In: Proceedings of the 12th IWSLT, pp. 118–125 (2015)Google Scholar
  13. 13.
    Bonomi, A.E., Cella, D.F., Hahn, E.A., Bjordal, K., Sperner-Unterweger, B., Gangeri, L., Bergman, B., Willems-Groot, J., Hanquet, P., Zittoun, R.: Multilingual translation of the functional assessment of cancer therapy (FACT) quality of life measurement system. Qual. Life Res. 5(3), 309–320 (1996)CrossRefGoogle Scholar
  14. 14.
    Wild, D., Eremenco, S., Mear, I., Martin, M., Houchin, C., Gawlicki, M., Hareendran, A., Wiklund, I., Chong, L.Y., von Maltzahn, R., Cohen, L., Molsen, E.: Multinational trials-recommendation on the translations required, approaches to using the same language in different countries, and the approaches to support pooling the data: the ISPOR patient reported outcome translation and linguistic validation good practice task force report. Value Health 12(4), 430–440 (2009)CrossRefGoogle Scholar
  15. 15.
    Wołk, K., Marasek, K.: Polish-English speech statistical machine translation systems for the IWSLT 2013. In: Proceedings of the 10th International Workshop on Spoken Language Translation, pp. 113–119 (2015)Google Scholar
  16. 16.
    Wołk, K., Marasek, K.: Polish – English speech statistical machine translation systems for the IWSLT 2014. In: Proceedings of the 11th International Workshop on Spoken Language Translation, pp. 143–149 (2014)Google Scholar
  17. 17.
    Axelrod, A., He, X., Gao, J.: Domain adaptation via pseudo in-domain data selection. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 355–362 (2011)Google Scholar
  18. 18.
    Wang, L., Wong, D.F., Chao, L.S., Lu, Y., Xing, J.: A systematic comparison of data selection criteria for SMT domain adaptation. Sci. World J. 2014, 745485 (2014)Google Scholar
  19. 19.
    Berrotarán, G., Carrascosa, R., Vine, A.: Yalign documentation (2013). Accessed 01 June 2016
  20. 20.
    Junczys-Dowmunt, M., Szał, A.: Symgiza ++: symmetrized word alignment models for statistical machine translation. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) Security and Intelligent Information Systems, pp. 379–390. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Moses Statistical Machine Translation, “OOVs” (2015). Accessed 27 Sept 2015
  22. 22.
    Durrani, N., Hoang, H., Koehn, P., Sajjad, H.: Integrating an unsupervised transliteration model into statistical machine translation. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 148–153 (2014)Google Scholar
  23. 23.
    Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)Google Scholar
  24. 24.
    Costa-Jussà, M.R., Fonollosa, J.R.: Using linear interpolation and weighted reordering hypotheses in the Moses system. In: A: International Conference on Language Resources and Evaluation. Seventh Conference on International Language Resources and Evaluation, pp. 1712–1718 (2010)Google Scholar
  25. 25.
    Moses statistical machine translation, “Build reordering model” (2013). Accessed 10 Oct 2015
  26. 26.
    Wołk, K., Marasek, K.: A sentence meaning-based alignment method for parallel text corpora preparation. In: Rocha, Á., Correia, A.M., Tan, F.B., Stroetmann, K.A. (eds.) New Perspectives in Information Systems and Technologies, pp. 229–237. Springer, Cham (2014)CrossRefGoogle Scholar
  27. 27.
    Moore, R.C., Lewis, W.: Intelligent selection of language model training data. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 220–224 (2010)Google Scholar
  28. 28.
    Dieny, R., Thevenon, J., Martinez-Del-Rincon, J., Nebel, J.-C.: Bioinformatics inspired algorithm for stereo correspondence. In: International Conference on Computer Vision Theory and Application, pp. 465–473 (2011)Google Scholar
  29. 29.
    Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145 (2002)Google Scholar
  30. 30.
    Koehn, P., Federico, M., Shen, W., Bertoldi, N., Callison-Burch, C., Bojar, O., Cowan, B., Dyer, C., Hoang, H., Zens, R., Constantin, A., Herbst, E., Moran, C.: Open-source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Factored Translation Models and Confusion Network Decoding, pp. 177–180 (2007)Google Scholar
  31. 31.
    Lo, C.-K., Wu, D.: MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 220–229 (2011)Google Scholar
  32. 32.
    Robertson, M.J., Kendall, P.C., Ritchie, S., Mcilroy, P.W., Adams, M.J.: The weighted index method: a new technique for analyzing planar optical waveguides. J. Lightwave Technol. 7(12), 2105–2111 (1989)CrossRefGoogle Scholar
  33. 33.
    Derksen, S., Keselman, H.J.: Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables. Br. J. Math. Stat. Psychol. 45(2), 265–282 (1992)CrossRefGoogle Scholar
  34. 34.
    Lino, A., Rocha, Á., Sizo, A.: Virtual teaching and learning environments: automatic evaluation with artificial neural networks. Cluster Comput, 1–11 (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Krzysztof Wołk
    • 1
  • Wojciech Glinkowski
    • 2
  • Agnieszka Żukowska
    • 2
  1. 1.Polish-Japanese Academy of Information TechnologyWarsawPoland
  2. 2.Medical University of WarsawWarsawPoland

Personalised recommendations