Learning a Lot from Only a Little: Genetic Programming for Panel Segmentation on Sparse Sensory Evaluation Data

  • Katya Vladislavleva
  • Kalyan Veeramachaneni
  • Una-May O’Reilly
  • Matt Burland
  • Jason Parcon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6021)

Abstract

We describe a data mining framework that derives panelist information from sparse flavour survey data. One component of the framework executes genetic programming ensemble based symbolic regression. Its evolved models for each panelist provide a second component with all plausible and uncorrelated explanations of how a panelist rates flavours. The second component bootstraps the data using an ensemble selected from the evolved models, forms a probability density function for each panelist and clusters the panelists into segments that are easy to please, neutral, and hard to please.

Keywords

symbolic regression panel segmentation survey data ensemble modeling hedonic sensory evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Moskowitz, H.R., Bernstein, R.: Variability in hedonics: Indications of world-wide sensory and cognitive preference segmentation. Journal of Sensory Studies 15(3), 263–284 (2000)CrossRefGoogle Scholar
  2. 2.
    Smits, G., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.M., Yu, T., Riolo, R.L., Worzel, B. (eds.) Genetic Programming Theory and Practice II. Springer, Ann Arbor (2004)Google Scholar
  3. 3.
    Liu, Y., Yao, X., Higuchi, T.: Evolutionary ensembles with negative correlation learning. IEEE Transactions on Evolutionary Computation 4(4), 380 (2000)CrossRefGoogle Scholar
  4. 4.
    Liu, Y., Yao, X.: Learning and evolution by minimization of mutual information. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 495–504. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)CrossRefGoogle Scholar
  6. 6.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5(2), 241–259 (1992)CrossRefGoogle Scholar
  7. 7.
    Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 231–238. The MIT Press, Cambridge (1995)Google Scholar
  8. 8.
    Paris, G., Robilliard, D., Fonlupt, C.: Applying boosting techniques to genetic programming. In: Collet, P., Fonlupt, C., Hao, J.-K., Lutton, E., Schoenauer, M. (eds.) EA 2001. LNCS, vol. 2310, pp. 267–278. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Banzhaf, W., Daida, J., Eiben, A.E., Garzon, M.H., Honavar, V., Jakiela, M., Smith, R.E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, Orlando, Florida, USA, vol. 2, pp. 1053–1060. Morgan Kaufmann, San Francisco (1999)Google Scholar
  10. 10.
    Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)Google Scholar
  11. 11.
    Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Information, prediction, and query by committee. In: Advances in Neural Information Processing Systems [NIPS Conference], vol. 5, pp. 483–490. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  12. 12.
    Sun, P., Yao, X.: Boosting kernel models for regression. In: ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, Washington, DC, USA, pp. 583–591. IEEE Computer Society, Los Alamitos (2006)CrossRefGoogle Scholar
  13. 13.
    Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Folino, G., Pizzuti, C., Spezzano, G.: GP ensembles for large-scale data classification. IEEE Trans. Evolutionary Computation 10(5), 604–616 (2006)CrossRefGoogle Scholar
  15. 15.
    Vladislavleva, E.: Model-based Problem Solving through Symbolic Regression via Pareto Genetic Programming. PhD thesis, Tilburg University, Tilburg, the Netherlands (2008)Google Scholar
  16. 16.
    Vladislavleva, E.J., Smits, G.F., den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation 13(2), 333–349 (2009)CrossRefGoogle Scholar
  17. 17.
    Kotanchek, M., Smits, G., Vladislavleva, E.: Trustable symoblic regression models. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice V. Genetic and Evolutionary Computation, pp. 203–222. Springer, Ann Arbor (2007)Google Scholar
  18. 18.
    Taylor, J.S., Dolia, A.: A framework for probability density estimation. In: Lawrence, N. (ed.) Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, Journal of Machine Learning Research, 468–475 (2007)Google Scholar
  19. 19.
    Mukherjee, S., Vapnik, V.: Multivariate density estimation: a support vector machine approach. In: NIPS, vol. 12. Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Katya Vladislavleva
    • 1
  • Kalyan Veeramachaneni
    • 2
  • Una-May O’Reilly
    • 2
  • Matt Burland
    • 3
  • Jason Parcon
    • 3
  1. 1.University of AntwerpBelgium
  2. 2.Massachusetts Institute of TechnologyUSA
  3. 3.Givaudan Flavors Corp.USA

Personalised recommendations