Abstract
Knowledge mining sensory evaluation data is a challenging process due to extreme sparsity of the data, and a large variation in responses from different members (called assessors) of the panel. The main goals of knowledge mining in sensory sciences are understanding the dependency of the perceived liking score on the concentration levels of flavors’ ingredients, identifying ingredients that drive liking, segmenting the panel into groups with similar liking preferences and optimizing flavors to maximize liking per group. Our approach employs (1) Genetic programming (symbolic regression) and ensemble methods to generate multiple diverse explanations of assessor liking preferences with confidence information; (2) statistical techniques to extrapolate using the produced ensembles to unobserved regions of the flavor space, and segment the assessors into groups which either have the same propensity to like flavors, or are driven by the same ingredients; and (3) two-objective swarm optimization to identify flavors which are well and consistently liked by a selected segment of assessors.
This is a preview of subscription content, access via your institution.



















Notes
The greater than normal number of samples were enabled by a proprietary method for delivering the flavor to the assessor which delays sensory fatigue.
The choice of the θ threshold is highly influential in the subsequent conclusions.
References
A. Antos, I. Kontoyiannis, in Information Theory, 2001. Proceedings. 2001 IEEE International Symposium. Estimating the entropy of discrete distributions. (IEEE Press, New York, 2001), pp. 45–45. doi:10.1109/ISIT.2001.935908
G. Folino, C. Pizzuti, G. Spezzano, GP ensembles for large-scale data classification. IEEE Trans. Evol. Comput. 10(5), 604–616 (2006)
F.D. Francone, L.M. Deschaine, T. Battenhouse, J.J. Warren, in Late Breaking Papers at the 2004 Genetic and Evolutionary Computation Conference, ed. by M. Keijzer. Discrimination of unexploded ordnance from clutter using linear genetic programming (Seattle, Washington, USA, 2004). http://www.cs.bham.ac.uk/wbl/biblio/gecco2004/LBP022.pdf
Y. Freund, Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)
Y. Freund, H.S. Seung, E. Shamir, N. Tishby, in Advances in Neural Information Processing Systems, 5th edn. Information, prediction, and query by committee (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993), pp. 483–490 [NIPS Conference]
R.J. Gilbert, R. Goodacre, B. Shann, D.B. Kell, J. Taylor, J.J. Rowland, in Genetic Programming 1998: Proceedings of the Third Annual Conference, ed by J.R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D.B. Fogel, M.H. Garzon, D.E. Goldberg, H. Iba, R. Riolo. Genetic programming-based variable selection for high-dimensional data (Morgan Kaufmann, University of Wisconsin, Madison, Wisconsin, USA, 1998), pp. 109–115
L.K. Hansen, P. Salamon. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990). doi:http://dx.doi.org/10.1109/34.58871
H. Iba, in Proceedings of the Genetic and Evolutionary Computation Conference, vol 2, ed. by W. Banzhaf, J. Daida, A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakiela, R.E. Smith. Bagging, boosting, and bloating in genetic programming (Morgan Kaufmann, Orlando Florida USA, 1999) pp. 1053–1060.
M. Keijzer, in Proceedings of the 6th European Conference on Genetic programming, EuroGP’03. Improving symbolic regression with interval arithmetic and linear scaling (Springer, Berlin, Heidelberg, 2003), pp. 70–82. http://portal.acm.org/citation.cfm?id=1762668.1762676
M. Keijzer, Scaled symbolic regression. Genetic Program. Evol. Mach. 5(3), 259–269 (2004). doi:10.1023/B:GENP.0000030195.77571.f9
M.F. Korns, in Genetic Programming Theory and Practice IV, Genetic and Evolutionary Computation, vol. 5, chap. 16, ed. by R.L. Riolo, T. Soule, B. Worzel. Large-scale, time-constrained symbolic regression (Springer, Ann Arbor, 2006) pp. 299–314.
M. Kotanchek, G. Smits, E. Vladislavleva, in Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation, chap. 12, ed. by R.L. Riolo, T. Soule, B. Worzel. Trustable symoblic regression models (Springer, Ann Arbor, 2007) pp. 203–222.
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, MA, USA, 1992)
A. Krogh, J. Vedelsby, in Advances in Neural Information Processing Systems, vol. 7, ed. by G. Tesauro, D. Touretzky, T. Leen. Neural network ensembles, cross validation, and active learning (The MIT Press, Cambridge, MA, USA, 1995) pp. 231–238.
J. Landry, L.D. Kosta, T. Bernier, Discriminant feature selection by genetic programming: Towards a domain independent multi-class object detection system. J. Syst. Cybernet. Inform. 3(1), 76–81 (2006)
X. Li, in Lecture Notes in Computer Science, vol. 2723/2003. A non-dominated sorting particle swarm optimizer for multiobjective optimization (Springer, Berlin, 2003), pp. 37–48.
Y. Liu, X. Yao, in PPSN VII: Proceedings of the 7th International Conference on Parallel Problem Solving from Nature. Learning and evolution by minimization of mutual information (Springer, London, UK, 2002), pp. 495–504
Y. Liu, X. Yao, T. Higuchi, Evolutionary ensembles with negative correlation learning. IEEE Trans. Evol. Comput. 4(4), 380–387 (2000)
H.R. Moskowitz, R. Bernstein, Variability in hedonics: Indications of world-wide sensory and cognitive preference segmentation. J. Sens. Stud. 15(3), 263–284 (2000)
S. Mukherjee, V. Vapnik, in NIPS 12. Multivariate density estimation: a support vector machine approach (1999), pp. 1–8
K. Neshatian, M. Zhang, M. Johnston, in Australian Conference on Artificial Intelligence, Lecture Notes in Computer Science, vol. 4830, ed. by M.A. Orgun, J. Thornton. Feature construction and dimension reduction using genetic programming (Springer, Berlin, 2007), pp. 160–170
G. Paris, D. Robilliard, C. Fonlupt, in Artificial Evolution 5th International Conference, Evolution Artificielle, EA 2001, LNCS, vol. 2310, ed. by P. Collet, C. Fonlupt, J.K. Hao, E. Lutton, M. Schoenauer. Applying boosting techniques to genetic programming (Springer, Creusot France, 2001), pp. 267–278.
R. Poli, in Evolutionary Computing, 1143, ed. by T.C. Fogarty. Genetic programming for feature detection and image segmentation (Springer, University of Sussex, UK, 1996), pp. 110–125.
R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
M.D. Schmidt, H. Lipson, Coevolution of fitness predictors. IEEE Trans. Evol. Comput. 12(6), 736–749 (2008)
J.R. Sherrah, R.E. Bogner, A. Bouzerdoum, in Genetic Programming 1997: Proceedings of the Second Annual Conference, ed. by J.R. Koza, K. Deb, M. Dorigo, D.B. Fogel, M. Garzon, H. Iba, R.L. Riolo. The evolutionary pre-processor: Automatic feature extraction for supervised classification using genetic programming (Morgan Kaufmann, Stanford University, CA, USA, 1997), pp. 304–312.
G. Smits, A. Kordon, K. Vladislavleva, E. Jordaan, M. Kotanchek, in Genetic Programming Theory and Practice III, Genetic Programming, vol. 9, chap. 6, ed. by T. Yu, R.L. Riolo, B. Worzel. Variable selection in industrial datasets using pareto genetic programming (Springer, Ann Arbor, 2005), pp. 79–92.
G. Smits, M. Kotanchek, in Genetic Programming Theory and Practice II, chap. 17, ed. by U.M. O’Reilly, T. Yu, R.L. Riolo, B. Worzel. Pareto-front exploitation in symbolic regression (Springer, Ann Arbor, 2004), pp. 283–299.
P. Sun, X. Yao, in ICDM ’06: Proceedings of the Sixth International Conference on Data Mining. Boosting kernel models for regression (IEEE Computer Society, Washington, DC, USA 2006), pp. 583–591
J.S. Taylor, A. Dolia, in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, ed. by N. Lawrence. A framework for probability density estimation. Journal of Machine Learning Research (2007), pp. 468–475
K. Veeramachaneni, K. Vladislavleva, M. Burland, J. Parcon, U.M. O’Reilly, in GECCO ’10: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, ed. by J. Branke, M. Pelikan, E. Alba, D.V. Arnold, J. Bongard, A. Brabazon, J. Branke, M.V. Butz, J. Clune, M. Cohen, K. Deb, A.P. Engelbrecht, N. Krasnogor, J.F. Miller, M. O’Neill, K. Sastry, D. Thierens, J. van Hemert, L. Vanneschi, C. Witt. Evolutionary optimization of flavors (ACM, Portland, Oregon, USA, 2010), pp. 1291–1298
E. Vladislavleva, Model-based problem solving through symbolic regression via pareto genetic programming. Ph.D. thesis (Tilburg University, Tilburg, the Netherlands, 2008). http://arno.uvt.nl/show.cgi?fid=80764
K. Vladislavleva, K. Veeramachaneni, M. Burland, J. Parcon, U.M. O’Reilly, in GECCO ’10: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, ed. by J. Branke, M. Pelikan, E. Alba, D.V. Arnold, J. Bongard, A. Brabazon, J. Branke, M.V. Butz, J. Clune, M. Cohen, K. Deb, A.P. Engelbrecht, N. Krasnogor, J.F. Miller, M. O’Neill, K. Sastry, D. Thierens, J. Hemert, L. Vanneschi, C. Witt. Knowledge mining with genetic programming methods for variable selection in flavor design (ACM, Portland, Oregon, USA, 2010), pp. 941–948.
K. Vladislavleva, K. Veeramachaneni, U.M. O’Reilly, in Proceedings of the 13th European Conference on Genetic Programming, EuroGP 2010, LNCS, vol. 6021, ed. by A.I. Esparcia-Alcazar, A. Ekart, S. Silva, S. Dignum, A.S. Uyar. Learning a lot from only a little: Genetic programming for panel segmentation on sparse sensory evaluation data (Springer, Istanbul, 2010), pp. 244–255.
D.H. Wolpert, Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
J. Yu, J. Yu, A.A. Almal, S.M. Dhanasekaran, D. Ghosh, W.P. Worzel, A.M. Chinnaiyan, Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292–303 (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Veeramachaneni, K., Vladislavleva, E. & O’Reilly, UM. Knowledge mining sensory evaluation data: genetic programming, statistical techniques, and swarm optimization. Genet Program Evolvable Mach 13, 103–133 (2012). https://doi.org/10.1007/s10710-011-9153-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-011-9153-2
Keywords
- Symbolic regression
- Sensory science
- Ensembles
- Non-linear optimization
- Variable selection
- Pareto genetic programming
- Hedonic evaluation
- Complexity control