Understanding the Wine Judges and Evaluating the Consistency Through White-Box Classification Algorithms

  • Bernard ChenEmail author
  • Hai Le
  • Christopher Rhodes
  • Dongsheng Che
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9728)


Wine is a broad field of study and is more and more popular today. However, limited amounts of data science and data mining research are applied on this topic to benefit wine producers, distributors, and consumers. According to the American Association of Wine Economics, “Who is a reliable wine judge?” and “Are wine judges consistent?” are typical questions that beg for formal statistical answers.

This paper proposes to use the white box classification algorithms to understand the wine judges and evaluate the consistency while they score a wine as 90+ or 90−. Three white box classification algorithms, Naïve Bayes, Decision Tree, and K-nearest neighbors are applied to wine sensory data derived from professional wine reviews. Each algorithm is able to tell how the judges make their decision. The extracted information is also useful to wine producers, distributors, and consumers. The data set includes 1000 wines with 500 scored as 90+ points (positive class) and 500 scored as 90− points (negative class). 5-fold cross validation is used to validate the performance of classification algorithms. The higher prediction accuracy indicates the higher consistency of the wine judge. The best white box classification algorithm prediction accuracy we produced is as high as 85.7 % from a modified version of Naïve Bayes algorithm.


Wineinformatics Wine judges evaluation Decision tree Naïve Bayes K-nearest neighbors SVM 


  1. 1.
    Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, Hoboken (2011)CrossRefzbMATHGoogle Scholar
  2. 2.
    International Organization of Wine and Vine.
  3. 3.
  4. 4.
    Sun, L.-X., Danzer, K., Thiel, G.: Classification of wine samples by means of artificial neural networks and discrimination analytical methods. Fresen. J. Anal. Chem. 359(2), 143–149 (1997)CrossRefGoogle Scholar
  5. 5.
    Yang, N.: Quality differentiation in wine markets. Washington State University (2010)Google Scholar
  6. 6.
    Masset, P., Weisskopf, J.P., Cossutta, M.: Wine tasters, ratings, and en primeur prices. J. Wine Econ. 10(01), 75–107 (2015)CrossRefGoogle Scholar
  7. 7.
    Storchmann, K.: Introduction to the issue. J. Wine Econ. 10(01), 1–3Google Scholar
  8. 8.
    Bodington, J.C.: Evaluating wine-tasting results and randomness with a mixture of rank preference models. J. Wine Econ. 10(01), 31–46 (2015)CrossRefGoogle Scholar
  9. 9.
    Stuen, E.T., Miller, J.R., Stone, R.W.: An analysis of wine critic consensus: a study of Washington and California wines. J. Wine Econ. 10(01), 47–61 (2015)CrossRefGoogle Scholar
  10. 10.
    Lewis, D.D.: Naïve (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    Magerman, D.M.: Statistical decision-tree models for parsing. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 276–283. Association for Computational Linguistics (1995)Google Scholar
  12. 12.
    Arias - Bolzmann, L., Orkun, S., Andres, M., Len, L.: Emerald insight. Int. J. Wine Mark. 7 Apr 2014Google Scholar
  13. 13.
    De Villiers, A., Alberts, P., Tredoux, A.G.J., Nieuwoudt, H.H.: Analytical techniques for wine analysis: an African perspective; a review. Analytica Chimica Acta 730, 2–23 (2012)CrossRefGoogle Scholar
  14. 14.
    Chen, B.: Wine Attributes. File Wine_Wheel_01242014.dat
  15. 15.
  16. 16.
  17. 17.
  18. 18.
    Sutton, O.: Introduction to k nearest neighbour classification and condensed nearest neighbour data reduction (2012)Google Scholar
  19. 19.
    Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46 (1992)Google Scholar
  20. 20.
    Kamber, M., Han, J.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  21. 21.
    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Taipei, 19 March 2015Google Scholar
  22. 22.
    Friedl, M.A., Brodley, C.E.: Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61(3), 399–409 (1997)CrossRefGoogle Scholar
  23. 23.
    Chen, B., Rhodes, C., Crawford, A., Hambuchen, L.: Wine informatics: applying data mining on wine sensory. Accepted by 2014 Workshop on Domain Driven Data Mining (DDDM 2014) (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Bernard Chen
    • 1
    Email author
  • Hai Le
    • 1
  • Christopher Rhodes
    • 1
  • Dongsheng Che
    • 2
  1. 1.Department of Computer ScienceUniversity of Central ArkansasConwayUSA
  2. 2.Department of Computer ScienceEast Stroudsburg UniversityEast StroudsburgUSA

Personalised recommendations