Using Data Mining for Wine Quality Assessment

  • Paulo Cortez
  • Juliana Teixeira
  • António Cerdeira
  • Fernando Almeida
  • Telmo Matos
  • José Reis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5808)

Abstract

Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physicochemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certification step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a regression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful for understanding how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production.

Keywords

Ordinal Regression Sensitivity Analysis Sensory Preferences Support Vector Machines Variable and Model Selection Wine Science 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bi, J., Bennett, K.: Regression Error Characteristic curves. In: Proceedings of 20th Int. Conf. on Machine Learning (ICML), Washington DC, USA (2003)Google Scholar
  2. 2.
    Blake, C., Merz, C.: UCI Repository of Machine Learning Databases (1998)Google Scholar
  3. 3.
    Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: COLT 1992: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)CrossRefGoogle Scholar
  4. 4.
    Cherkassy, V., Ma, Y.: Practical Selection of SVM Parameters and Noise Estimation for SVM Regression. Neural Networks 17(1), 113–126 (2004)CrossRefMATHGoogle Scholar
  5. 5.
    Cortez, P.: RMiner: Data Mining with Neural Networks and Support Vector Machines using R. In: Rajesh, R. (ed.) Introduction to Advanced Scientific Softwares and Toolboxes (in press)Google Scholar
  6. 6.
    Cortez, P., Portelinha, M., Rodrigues, S., Cadavez, V., Teixeira, A.: Lamb Meat Quality Assessment by Support Vector Machines. Neural Processing Letters 24(1), 41–51 (2006)CrossRefGoogle Scholar
  7. 7.
    CVRVV. Portuguese Wine - Vinho Verde. Comissão de Viticultura da Região dos Vinhos Verdes (CVRVV) (July 2008), http://www.vinhoverde.pt
  8. 8.
    Dietterich, T.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1923 (1998)CrossRefGoogle Scholar
  9. 9.
    Ebeler, S.: Linking flavour chemistry to sensory analysis of wine. In: Flavor Chemistry - Thirty Years of Progress, pp. 409–422. Kluwer Academic Publishers, Dordrecht (1999)CrossRefGoogle Scholar
  10. 10.
    Ferrer, J., MacCawley, A., Maturana, S., Toloza, S., Vera, J.: An optimization approach for scheduling wine grape harvest operations. Production Economics, pp. 985–999 (2008)Google Scholar
  11. 11.
    Flexer, A.: Statistical evaluation of neural networks experiments: Minimum requirements and current practice. In: Proceedings of the 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria, vol. 2, pp. 1005–1008 (1996)Google Scholar
  12. 12.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)MATHGoogle Scholar
  13. 13.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, NY (2001)CrossRefMATHGoogle Scholar
  14. 14.
    Kewley, R., Embrechts, M., Breneman, C.: Data Strip Mining for the Virtual Design of Pharmaceuticals with Neural Networks. IEEE Transactions on Neural Networks 11(3), 668–679 (2000)CrossRefGoogle Scholar
  15. 15.
    Kramer, S., Widmer, G., Pfahringer, B., De Groeve, M.: Prediction of Ordinal Classes Using Regression Trees. Fundamenta Informaticae 47(1), 1–13 (2001)MathSciNetMATHGoogle Scholar
  16. 16.
    Legin, A., Rudnitskaya, A., Luvova, L., Vlasov, Y., Natale, C., D’Amico, A.: Evaluation of Italian wine by the electronic tongue: recognition, quantitative analysis and correlation with human sensory perception. Analytica Chimica Acta, 33–34 (2003)Google Scholar
  17. 17.
    Moreno, I., González-Weller, D., Gutierrez, V., Marino, M., Cameán, A., González, A., Hardisson, A.: Differentiation of two Canary DO red wines according to their metal content from inductively coupled plasma optical emission spectrometry and graphite furnace atomic absorption spectrometry by using Probabilistic Neural Networks. Talanta 72, 263–268 (2007)CrossRefGoogle Scholar
  18. 18.
    R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2008), http://www.R-project.org, ISBN 3-900051-00-3
  19. 19.
    Rumelhart, D., Hinton, G., Williams, R.: Learning Internal Representations by Error Propagation. In: Rulmelhart, D., McClelland, J. (eds.) Parallel Distributed Processing: Explorations in the Microstructures of Cognition, pp. 318–362. MIT Press, Cambridge (1986)Google Scholar
  20. 20.
    Smith, D., Margolskee, R.: Making sense of taste. Scientific American 284, 26–33 (2001)CrossRefGoogle Scholar
  21. 21.
    Smola, A., Scholkopf, B.: A tutorial on support vector regression. Statistics and Computing 14, 199–222 (2004)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Sun, L., Danzer, K., Thiel, G.: Classification of wine samples by means of artificial neural networks and discrimination analytical methods. Fresenius’ Journal of Analytical Chemistry 359, 143–149 (1997)CrossRefGoogle Scholar
  23. 23.
    Turban, E., Sharda, R., Aronson, J., King, D.: Business Intelligence, A Managerial Approach. Prentice-Hall, Englewood Cliffs (2007)Google Scholar
  24. 24.
    Vlassides, S., Ferrier, J., Block, D.: Using Historical Data for Bioprocess Optimization: Modeling Wine Characteristics Using Artificial Neural Networks and Archived Process Information. Biotechnology and Bioengineering, 73(1) (2001)Google Scholar
  25. 25.
    Wang, W., Xu, Z., Lu, W., Zhang, X.: Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55, 643–663 (2003)CrossRefGoogle Scholar
  26. 26.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  27. 27.
    Yu, H., Lin, H., Xu, H., Ying, Y., Li, B., Pan, X.: Prediction of Enological Parameters and Discrimination of Rice Wine Age Using Least-Squares Support Vector Machines and Near Infrared Spectroscopy. Agricultural and Food Chemistry 56, 307–313 (2008)CrossRefGoogle Scholar
  28. 28.
    Yu, M., Shanker, M., Zhang, G., Hung, M.: Modeling consumer situational choice of long distance communication with neural networks. Decision Support Systems 44, 899–908 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Paulo Cortez
    • 1
  • Juliana Teixeira
    • 1
  • António Cerdeira
    • 2
  • Fernando Almeida
    • 2
  • Telmo Matos
    • 2
  • José Reis
    • 1
    • 2
  1. 1.Dep. of Information Systems/Algoritmi CentreUniversity of MinhoGuimarãesPortugal
  2. 2.Viticulture Commission of the Vinho Verde region (CVRVV)PortoPortugal

Personalised recommendations