We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Evolution of Gaussian Process kernels for machine translation post-editing effort estimation | SpringerLink

Evolution of Gaussian Process kernels for machine translation post-editing effort estimation


In many Natural Language Processing problems the combination of machine learning and optimization techniques is essential. One of these problems is the estimation of the human effort needed to improve a text that has been translated using a machine translation method. Recent advances in this area have shown that Gaussian Processes can be effective in post-editing effort prediction. However, Gaussian Processes require a kernel function to be defined, the choice of which highly influences the quality of the prediction. On the other hand, the extraction of features from the text can be very labor-intensive, although recent advances in sentence embedding have shown that this process can be automated. In this paper, we use a Genetic Programming algorithm to evolve kernels for Gaussian Processes to predict post-editing effort based on sentence embeddings. We show that the combination of evolutionary optimization and Gaussian Processes removes the need for a-priori specification of the kernel choice, and, by using a multi-objective variant of the Genetic Programming approach, kernels that are suitable for predicting several metrics can be learned. We also investigate the effect that the choice of the sentence embedding method has on the kernel learning process.

This is a preview of subscription content, access via your institution.


  1. 1.

    Araujo, L.: How evolutionary algorithms are applied to statistical natural language processing. Artificial Intelligence Review 28(4), 275–303 (2007). https://doi.org/10.1007/s10462-009-9104-y

  2. 2.

    Artetxe, M., Schwenk, H.: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Transactions of the Association for Computational Linguistics 7, 597–610 (2019). Publisher: MIT Press

  3. 3.

    Beck, D.: Modelling Representation Noise in Emotion Analysis using Gaussian Processes. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 140–145. Asian Federation of Natural Language Processing, Taipei, Taiwan (2017). https://www.aclweb.org/anthology/I17-2024

  4. 4.

    Beck, D., Cohn, T., Hardmeier, C., Specia, L.: Learning Structural Kernels for Natural Language Processing. Transactions of the Association for Computational Linguistics 3, 461–473 (2015). https://doi.org/10.1162/tacl_a_00151.

  5. 5.

    Beck, D., Specia, L., Cohn, T.: Exploring prediction uncertainty in machine translation quality estimation. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 208–218 (2016)

  6. 6.

    Bing, W., Wen-qiong, Z., Ling, C., Jia-hong, L.: A GP-Based Kernel Construction and Optimization Method for RVM. In: 2010 the 2Nd International Conference on Computer and Automation Engineering (ICCAE), Vol. 4, pp. 419–423. https://doi.org/10.1109/ICCAE.2010.5451646 (2010)

  7. 7.

    Blum, M., Riedmiller, M.: Optimization of Gaussian Process Hyperparameters using Rprop. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2013). https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-51.pdf

  8. 8.

    Brochu, E., Cora, V. ., de Freitas, N.: A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. arXiv:1012.2599 [cs]. 1012.2599 (2010)

  9. 9.

    Bungum, L.: Evolutionary Algorithms in Natural Language Processing. In: Norwegian Artificial Intelligence Symposium, Vol. 22 (2010)

  10. 10.

    Callison-Burch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 Workshop on Statistical Machine Translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, WMT ’11, pp. 22–64. Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2132960.2132964. Event-place: Edinburgh, Scotland

  11. 11.

    Chu, W., Ghahramani, Z.: Gaussian Processes for Ordinal Regression. Journal of Machine Learning Research 6(Jul), 1019–1041 (2005). http://www.jmlr.org/papers/v6/chu05a.html

  12. 12.

    Cohn, T., Preotiuc-Pietro, D., Lawrence, N.: Gaussian processes for natural language processing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Tutorials, pp. 1–3 (2014)

  13. 13.

    Cohn, T., Specia, L.: Modelling annotator bias with multi-task gaussian processes: an application to machine translation quality estimation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 32–42 (2013)

  14. 14.

    Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. In: M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J.J. Merelo, H.P. Schwefel (eds.) Parallel Problem Solving from Nature PPSN VI, Lecture Notes in Computer Science, pp. 849–858. Springer Berlin Heidelberg (2000)

  15. 15.

    Deriu, J., Lucchi, A., De Luca, V., Severyn, A., Müller, S., Cieliebak, M., Hofmann, T., Jaggi, M.: Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification. In: Proceedings of the 26th International Conference on World Wide Web, WWW ’17, pp. 1045–1052. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017). https://doi.org/10.1145/3038912.3052611. Event-place: Perth, Australia

  16. 16.

    Diosan, L., Rogozan, A., Pecuchet, J. P.: Evolving Kernel Functions for SVMs by Genetic Programming. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 19–24. https://doi.org/10.1109/ICMLA.2007.70 (2007)

  17. 17.

    Dioşan, L., Rogozan, A., Pecuchet, J.P.: Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters. Applied Intelligence 36(2), 280–294 (2012). https://doi.org/10.1007/s10489-010-0260-1. https://link.springer.com/article/10.1007/s10489-010-0260-1

  18. 18.

    Duvenaud, D.: Automatic model construction with Gaussian processes. Thesis, University of Cambridge (2014). http://www.repository.cam.ac.uk/handle/1810/247281

  19. 19.

    Duvenaud, D., Lloyd, J., Grosse, R., Tenenbaum, J., Zoubin, G.: Structure Discovery in Nonparametric Regression through Compositional Kernel Search. In: Proceedings of The 30th International Conference on Machine Learning, pp. 1166–1174 (2013). http://jmlr.org/proceedings/papers/v28/duvenaud13.html

  20. 20.

    Fortin, F.A., Rainville, F.M.D., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13(Jul), 2171–2175 (2012). http://www.jmlr.org/papers/v13/fortin12a.html

  21. 21.

    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association 32 (200), 675–701 (1937)

    Article  Google Scholar 

  22. 22.

    Gagné, C., Schoenauer, M., Sebag, M., Tomassini, M.: Genetic Programming for Kernel-Based Learning with Co-evolving Subsets Selection. In: Parallel Problem Solving from Nature - PPSN IX, Lecture Notes in Computer Science, pp. 1008–1017. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11844297_102. https://link.springer.com/chapter/10.1007/11844297_102

  23. 23.

    Genton, M.G.: Classes of Kernels for Machine Learning: A Statistics Perspective. J. Mach. Learn. Res. 2, 299–312 (2002). http://dl.acm.org/citation.cfm?id=944790.944815

  24. 24.

    Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), 3483–3487. European Languages Resources Association (ELRA), Miyazaki, Japan (2018)

  25. 25.

    Howley, T., Madden, M.G.: An Evolutionary Approach to Automatic Kernel Construction. In: Artificial Neural Networks – ICANN 2006, Lecture Notes in Computer Science, pp. 417–426. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11840930_43. https://link.springer.com/chapter/10.1007/11840930_43

  26. 26.

    Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-Thought Vectors. In: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett (eds.) Advances in Neural Information Processing Systems 28, pp. 3294–3302. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5950-skip-thought-vectors.pdf

  27. 27.

    Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., Konen, W.: Tuning and evolution of support vector kernels. Evolutionary Intelligence 5(3), 153–170 (2012). https://doi.org/10.1007/s12065-012-0073-8. https://link.springer.com/article/10.1007/s12065-012-0073-8

  28. 28.

    Koza, J. R.: Genetic programming: on the programming of computers by means of natural selection MIT press (1992)

  29. 29.

    Kronberger, G., Kommenda, M.: Evolution of Covariance Functions for Gaussian Process Regression Using Genetic Programming. In: Computer Aided Systems Theory - EUROCAST 2013, Lecture Notes in Computer Science, pp. 308–315. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-53856-8_39. https://link.springer.com/chapter/10.1007/978-3-642-53856-8_39

  30. 30.

    Lampos, V., Zou, B., Cox, I.J.: Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance. In: Proceedings of the 26th International Conference on World Wide Web, WWW ’17, pp. 695–704. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017). https://doi.org/10.1145/3038912.3052622. Event-place: Perth, Australia

  31. 31.

    Le, Q., Mikolov, T.: Distributed Representations of Sentences and Documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014). http://proceedings.mlr.press/v32/le14.html. ISSN: 1938-7228 Section: Machine Learning

  32. 32.

    Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J., Ghahramani, Z.: Automatic Construction and Natural-Language Description of Nonparametric Regression Models. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014). https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8240

  33. 33.

    MacKay, D.J.C.: Bayesian Methods for Backpropagation Networks. In: Models of Neural Networks III, Physics of Neural Networks, pp. 211–254. Springer, New York, NY (1996). https://doi.org/10.1007/978-1-4612-0723-8_6. https://link.springer.com/chapter/10.1007/978-1-4612-0723-8_6

  34. 34.

    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

  35. 35.

    Montana, D.J.: Strongly Typed Genetic Programming. Evolutionary Computation 3(2), 199–230 (1995). https://doi.org/10.1162/evco.1995.3.2.199

  36. 36.

    Neal, R.M.: Bayesian Learning for Neural Networks. Lecture Notes in Statistics. Springer-Verlag, New York (1996). https://www.springer.com/gp/book/9780387947242

  37. 37.

    Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)

  38. 38.

    Polajnar, T., Rogers, S., Girolami, M.: Protein interaction detection in sentences via Gaussian processes: a preliminary evaluation. International journal of data mining and bioinformatics 5(1), 52–72 (2011)

    Article  Google Scholar 

  39. 39.

    Powell, M.J.D.: An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal 7(2), 155–162 (1964). https://doi.org/10.1093/comjnl/7.2.155. http://comjnl.oxfordjournals.org/content/7/2/155

  40. 40.

    Preoţiuc-Pietro, D., Cohn, T.: A temporal model of text periodicities using Gaussian Processes. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 977–988 (2013)

  41. 41.

    Rasmussen, C. E., Williams, C. K.: Gaussian processes for machine learning MIT Press (2006)

  42. 42.

    Roman, I., Mendiburu, A., Santana, R., Lozano, J.A.: Sentiment analysis with genetically evolved Gaussian kernels. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 1328–1337. Association for Computing Machinery, Prague, Czech Republic (2019). https://doi.org/10.1145/3321707.3321779

  43. 43.

    Roman, I., Santana, R., Mendiburu, A., Lozano, J. A.: Evolving Gaussian Process kernels from elementary mathematical expressions. arXiv:1910.05173 [cs, stat] (2019). 1910.05173. ArXiv:1910.05173

  44. 44.

    Roman, I., Santana, R., Mendiburu, A., Lozano, J.A.: Evolving Gaussian Process Kernels for Translation Editing Effort Estimation. In: N.F. Matsatsinis, Y. Marinakis, P. Pardalos (eds.) Learning and Intelligent Optimization, Lecture Notes in Computer Science, pp. 304–318. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-38629-0_25

  45. 45.

    Santana, R.: Reproducing and learning new algebraic operations on word embeddings using genetic programming. arXiv:1702.05624 [cs]. 1702.05624. ArXiv:1702.05624 (2017)

  46. 46.

    Schwarz, G.: Estimating the Dimension of a Model. The Annals of Statistics 6(2), 461–464 (1978). https://doi.org/10.1214/aos/1176344136. https://projecteuclid.org/euclid.aos/1176344136

  47. 47.

    Shaffer, J.P.: Modified Sequentially Rejective Multiple Test Procedures. Journal of the American Statistical Association (2012). https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1986.10478341

  48. 48.

    Shah, K., Cohn, T., Specia, L.: An investigation on the effectiveness of features for translation quality estimation. In: Proceedings of the Machine Translation Summit, vol. 14, pp. 167–174 (2013)

  49. 49.

    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)

  50. 50.

    Specia, L.: Exploiting objective annotations for measuring translation post-editing effort. In: Proceedings of the 15th Conference of the European Association for Machine Translation, pp. 73–80 (2011)

  51. 51.

    Specia, L., Shah, K., de Souza, J. G., Cohn, T.: Quest - A translation quality estimation framework. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 79–84. Association for Computational Linguistics, Sofia, Bulgaria (2013)

  52. 52.

    Sullivan, K.M., Luke, S.: Evolving Kernels for Support Vector Machine Classification. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07, pp. 1702–1707. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1276958.1277292

  53. 53.

    Vapnik, V. N.: The nature of statistical learning theory. Springer-verlag, berlin heidelberg (1995)

  54. 54.

    Wang, Z., de Freitas, N.: Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process hyper-Parameters. arXiv:1406.7758 [cs, stat] (2014)

  55. 55.

    Yankovskaya, E., Tättar, A., Fishel, M.: Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pp. 101–105. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-5410. https://www.aclweb.org/anthology/W19-5410

Download references


This work has been supported by the Spanish Ministry of Science and Innovation (project PID2019-104966GB-I00), and the Basque Government (projects KK-2020/00049 and IT1244-19, and ELKARTEK program).

Author information



Corresponding author

Correspondence to Ibai Roman.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Roman, I., Santana, R., Mendiburu, A. et al. Evolution of Gaussian Process kernels for machine translation post-editing effort estimation. Ann Math Artif Intell (2021). https://doi.org/10.1007/s10472-021-09751-5

Download citation


  • Gaussian processes
  • Genetic programming
  • Kernel selection
  • Quality estimation
  • Sentence embeddings
  • Natural language processing