Genetic Programming and Evolvable Machines

, Volume 13, Issue 1, pp 103–133 | Cite as

Knowledge mining sensory evaluation data: genetic programming, statistical techniques, and swarm optimization

  • Kalyan VeeramachaneniEmail author
  • Ekaterina Vladislavleva
  • Una-May O’Reilly


Knowledge mining sensory evaluation data is a challenging process due to extreme sparsity of the data, and a large variation in responses from different members (called assessors) of the panel. The main goals of knowledge mining in sensory sciences are understanding the dependency of the perceived liking score on the concentration levels of flavors’ ingredients, identifying ingredients that drive liking, segmenting the panel into groups with similar liking preferences and optimizing flavors to maximize liking per group. Our approach employs (1) Genetic programming (symbolic regression) and ensemble methods to generate multiple diverse explanations of assessor liking preferences with confidence information; (2) statistical techniques to extrapolate using the produced ensembles to unobserved regions of the flavor space, and segment the assessors into groups which either have the same propensity to like flavors, or are driven by the same ingredients; and (3) two-objective swarm optimization to identify flavors which are well and consistently liked by a selected segment of assessors.


Symbolic regression Sensory science Ensembles Non-linear optimization Variable selection Pareto genetic programming Hedonic evaluation Complexity control 


  1. 1.
    A. Antos, I. Kontoyiannis, in Information Theory, 2001. Proceedings. 2001 IEEE International Symposium. Estimating the entropy of discrete distributions. (IEEE Press, New York, 2001), pp. 45–45. doi: 10.1109/ISIT.2001.935908
  2. 2.
    G. Folino, C. Pizzuti, G. Spezzano, GP ensembles for large-scale data classification. IEEE Trans. Evol. Comput. 10(5), 604–616 (2006)CrossRefGoogle Scholar
  3. 3.
    F.D. Francone, L.M. Deschaine, T. Battenhouse, J.J. Warren, in Late Breaking Papers at the 2004 Genetic and Evolutionary Computation Conference, ed. by M. Keijzer. Discrimination of unexploded ordnance from clutter using linear genetic programming (Seattle, Washington, USA, 2004).
  4. 4.
    Y. Freund, Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Y. Freund, H.S. Seung, E. Shamir, N. Tishby, in Advances in Neural Information Processing Systems, 5th edn. Information, prediction, and query by committee (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993), pp. 483–490 [NIPS Conference]Google Scholar
  6. 6.
    R.J. Gilbert, R. Goodacre, B. Shann, D.B. Kell, J. Taylor, J.J. Rowland, in Genetic Programming 1998: Proceedings of the Third Annual Conference, ed by J.R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D.B. Fogel, M.H. Garzon, D.E. Goldberg, H. Iba, R. Riolo. Genetic programming-based variable selection for high-dimensional data (Morgan Kaufmann, University of Wisconsin, Madison, Wisconsin, USA, 1998), pp. 109–115Google Scholar
  7. 7.
    L.K. Hansen, P. Salamon. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990). doi: Google Scholar
  8. 8.
    H. Iba, in Proceedings of the Genetic and Evolutionary Computation Conference, vol 2, ed. by W. Banzhaf, J. Daida, A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakiela, R.E. Smith. Bagging, boosting, and bloating in genetic programming (Morgan Kaufmann, Orlando Florida USA, 1999) pp. 1053–1060.Google Scholar
  9. 9.
    M. Keijzer, in Proceedings of the 6th European Conference on Genetic programming, EuroGP’03. Improving symbolic regression with interval arithmetic and linear scaling (Springer, Berlin, Heidelberg, 2003), pp. 70–82.
  10. 10.
    M. Keijzer, Scaled symbolic regression. Genetic Program. Evol. Mach. 5(3), 259–269 (2004). doi: 10.1023/B:GENP.0000030195.77571.f9 CrossRefGoogle Scholar
  11. 11.
    M.F. Korns, in Genetic Programming Theory and Practice IV, Genetic and Evolutionary Computation, vol. 5, chap. 16, ed. by R.L. Riolo, T. Soule, B. Worzel. Large-scale, time-constrained symbolic regression (Springer, Ann Arbor, 2006) pp. 299–314.Google Scholar
  12. 12.
    M. Kotanchek, G. Smits, E. Vladislavleva, in Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation, chap. 12, ed. by R.L. Riolo, T. Soule, B. Worzel. Trustable symoblic regression models (Springer, Ann Arbor, 2007) pp. 203–222.Google Scholar
  13. 13.
    J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, MA, USA, 1992)zbMATHGoogle Scholar
  14. 14.
    A. Krogh, J. Vedelsby, in Advances in Neural Information Processing Systems, vol. 7, ed. by G. Tesauro, D. Touretzky, T. Leen. Neural network ensembles, cross validation, and active learning (The MIT Press, Cambridge, MA, USA, 1995) pp. 231–238.Google Scholar
  15. 15.
    J. Landry, L.D. Kosta, T. Bernier, Discriminant feature selection by genetic programming: Towards a domain independent multi-class object detection system. J. Syst. Cybernet. Inform. 3(1), 76–81 (2006)Google Scholar
  16. 16.
    X. Li, in Lecture Notes in Computer Science, vol. 2723/2003. A non-dominated sorting particle swarm optimizer for multiobjective optimization (Springer, Berlin, 2003), pp. 37–48.Google Scholar
  17. 17.
    Y. Liu, X. Yao, in PPSN VII: Proceedings of the 7th International Conference on Parallel Problem Solving from Nature. Learning and evolution by minimization of mutual information (Springer, London, UK, 2002), pp. 495–504Google Scholar
  18. 18.
    Y. Liu, X. Yao, T. Higuchi, Evolutionary ensembles with negative correlation learning. IEEE Trans. Evol. Comput. 4(4), 380–387 (2000)CrossRefGoogle Scholar
  19. 19.
    H.R. Moskowitz, R. Bernstein, Variability in hedonics: Indications of world-wide sensory and cognitive preference segmentation. J. Sens. Stud. 15(3), 263–284 (2000)CrossRefGoogle Scholar
  20. 20.
    S. Mukherjee, V. Vapnik, in NIPS 12. Multivariate density estimation: a support vector machine approach (1999), pp. 1–8Google Scholar
  21. 21.
    K. Neshatian, M. Zhang, M. Johnston, in Australian Conference on Artificial Intelligence, Lecture Notes in Computer Science, vol. 4830, ed. by M.A. Orgun, J. Thornton. Feature construction and dimension reduction using genetic programming (Springer, Berlin, 2007), pp. 160–170Google Scholar
  22. 22.
    G. Paris, D. Robilliard, C. Fonlupt, in Artificial Evolution 5th International Conference, Evolution Artificielle, EA 2001, LNCS, vol. 2310, ed. by P. Collet, C. Fonlupt, J.K. Hao, E. Lutton, M. Schoenauer. Applying boosting techniques to genetic programming (Springer, Creusot France, 2001), pp. 267–278.Google Scholar
  23. 23.
    R. Poli, in Evolutionary Computing, 1143, ed. by T.C. Fogarty. Genetic programming for feature detection and image segmentation (Springer, University of Sussex, UK, 1996), pp. 110–125.CrossRefGoogle Scholar
  24. 24.
    R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)Google Scholar
  25. 25.
    M.D. Schmidt, H. Lipson, Coevolution of fitness predictors. IEEE Trans. Evol. Comput. 12(6), 736–749 (2008)CrossRefGoogle Scholar
  26. 26.
    J.R. Sherrah, R.E. Bogner, A. Bouzerdoum, in Genetic Programming 1997: Proceedings of the Second Annual Conference, ed. by J.R. Koza, K. Deb, M. Dorigo, D.B. Fogel, M. Garzon, H. Iba, R.L. Riolo. The evolutionary pre-processor: Automatic feature extraction for supervised classification using genetic programming (Morgan Kaufmann, Stanford University, CA, USA, 1997), pp. 304–312.Google Scholar
  27. 27.
    G. Smits, A. Kordon, K. Vladislavleva, E. Jordaan, M. Kotanchek, in Genetic Programming Theory and Practice III, Genetic Programming, vol. 9, chap. 6, ed. by T. Yu, R.L. Riolo, B. Worzel. Variable selection in industrial datasets using pareto genetic programming (Springer, Ann Arbor, 2005), pp. 79–92.Google Scholar
  28. 28.
    G. Smits, M. Kotanchek, in Genetic Programming Theory and Practice II, chap. 17, ed. by U.M. O’Reilly, T. Yu, R.L. Riolo, B. Worzel. Pareto-front exploitation in symbolic regression (Springer, Ann Arbor, 2004), pp. 283–299.Google Scholar
  29. 29.
    P. Sun, X. Yao, in ICDM ’06: Proceedings of the Sixth International Conference on Data Mining. Boosting kernel models for regression (IEEE Computer Society, Washington, DC, USA 2006), pp. 583–591Google Scholar
  30. 30.
    J.S. Taylor, A. Dolia, in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, ed. by N. Lawrence. A framework for probability density estimation. Journal of Machine Learning Research (2007), pp. 468–475Google Scholar
  31. 31.
    K. Veeramachaneni, K. Vladislavleva, M. Burland, J. Parcon, U.M. O’Reilly, in GECCO ’10: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, ed. by J. Branke, M. Pelikan, E. Alba, D.V. Arnold, J. Bongard, A. Brabazon, J. Branke, M.V. Butz, J. Clune, M. Cohen, K. Deb, A.P. Engelbrecht, N. Krasnogor, J.F. Miller, M. O’Neill, K. Sastry, D. Thierens, J. van Hemert, L. Vanneschi, C. Witt. Evolutionary optimization of flavors (ACM, Portland, Oregon, USA, 2010), pp. 1291–1298Google Scholar
  32. 32.
    E. Vladislavleva, Model-based problem solving through symbolic regression via pareto genetic programming. Ph.D. thesis (Tilburg University, Tilburg, the Netherlands, 2008).
  33. 33.
    K. Vladislavleva, K. Veeramachaneni, M. Burland, J. Parcon, U.M. O’Reilly, in GECCO ’10: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, ed. by J. Branke, M. Pelikan, E. Alba, D.V. Arnold, J. Bongard, A. Brabazon, J. Branke, M.V. Butz, J. Clune, M. Cohen, K. Deb, A.P. Engelbrecht, N. Krasnogor, J.F. Miller, M. O’Neill, K. Sastry, D. Thierens, J. Hemert, L. Vanneschi, C. Witt. Knowledge mining with genetic programming methods for variable selection in flavor design (ACM, Portland, Oregon, USA, 2010), pp. 941–948.CrossRefGoogle Scholar
  34. 34.
    K. Vladislavleva, K. Veeramachaneni, U.M. O’Reilly, in Proceedings of the 13th European Conference on Genetic Programming, EuroGP 2010, LNCS, vol. 6021, ed. by A.I. Esparcia-Alcazar, A. Ekart, S. Silva, S. Dignum, A.S. Uyar. Learning a lot from only a little: Genetic programming for panel segmentation on sparse sensory evaluation data (Springer, Istanbul, 2010), pp. 244–255.Google Scholar
  35. 35.
    D.H. Wolpert, Stacked generalization. Neural Netw. 5(2), 241–259 (1992)MathSciNetCrossRefGoogle Scholar
  36. 36.
    J. Yu, J. Yu, A.A. Almal, S.M. Dhanasekaran, D. Ghosh, W.P. Worzel, A.M. Chinnaiyan, Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292–303 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Kalyan Veeramachaneni
    • 1
    Email author
  • Ekaterina Vladislavleva
    • 2
  • Una-May O’Reilly
    • 3
  1. 1.CSAIL, MITCambridgeUSA
  2. 2.Evolved Analytics Europe BVBAWijnegemBelgium
  3. 3.CSAIL, MITCambridgeUSA

Personalised recommendations