Probabilistic Learning Vector Quantization with Cross-Entropy for Probabilistic Class Assignments in Classification Learning

  • Andrea Villmann
  • Marika Kaden
  • Sascha Saralajew
  • Thomas VillmannEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10841)


Classification learning by prototype based approaches is an attractive strategy to achieve interpretable classification models. Frequently, those models optimize the classification error or an approximation thereof. Current deep network approaches use the cross entropy maximization instead. Therefore, we propose a prototype based classifier based on cross-entropy as a probabilistic classifier. As we deduce, the proposed probabilistic classifier is a generalization of the robust soft-learning vector quantizer and allows to handle label noise in training data, i.e. the classifier is able to take into account probabilistic class assignments during learning.


  1. 1.
    Hertz, J.A., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Santa Fe Institute Studies in the Sciences of Complexity: Lecture Notes, vol. 1. Addison-Wesley, Redwood City (1991)Google Scholar
  2. 2.
    Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRefGoogle Scholar
  3. 3.
    Kohonen, T.: Learning vector quantization. Neural Netw. 1(Suppl. 1), 303 (1988)Google Scholar
  4. 4.
    Biehl, M., Hammer, B., Villmann, T.: Prototype-based models in machine learning. Wiley Interdiscip. Rev.: Cogn. Sci. 7(2), 92–111 (2016)CrossRefGoogle Scholar
  5. 5.
    Sato, A., Yamada, K.: Generalized learning vector quantization. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Proceedings of the 1995 Conference on Advances in Neural Information Processing Systems, vol. 8, pp. 423–429. MIT Press, Cambridge (1996)Google Scholar
  6. 6.
    Kaden, M., Lange, M., Nebel, D., Riedel, M., Geweniger, T., Villmann, T.: Aspects in classification learning - review of recent developments in Learning Vector Quantization. Found. Comput. Decis. Sci. 39(2), 79–105 (2014)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Villmann, T., Bohnsack, A., Kaden, M.: Can learning vector quantization be an alternative to SVM and deep learning? J. Artif. Intell. Soft Comput. Res. 7(1), 65–81 (2017)CrossRefGoogle Scholar
  8. 8.
    Seo, S., Obermayer, K.: Soft learning vector quantization. Neural Comput. 15, 1589–1604 (2003)CrossRefGoogle Scholar
  9. 9.
    Torkkola, K.: Feature extraction by non-parametric mutual information maximization. J. Mach. Learn. Res. 3, 1415–1438 (2003)MathSciNetzbMATHGoogle Scholar
  10. 10.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)CrossRefGoogle Scholar
  11. 11.
    Xu, D., Principe, J.: Training MLPs layer-by-layer with the information potential. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 1999, Los Alamitos, pp. 1045–1048. IEEE Press (1999)Google Scholar
  12. 12.
    Principe, J.C.: Information Theoretic Learning. Springer, Heidelberg (2010). Scholar
  13. 13.
    Bunte, K., Schneider, P., Hammer, B., Schleif, F.-M., Villmann, T., Biehl, M.: Limited rank matrix learning, discriminative dimension reduction and visualization. Neural Netw. 26(1), 159–173 (2012)CrossRefGoogle Scholar
  14. 14.
    Principe, J.C., Fischer III, J.W., Xu, D.: Information theoretic learning. In: Haykin, S. (ed.) Unsupervised Adaptive Filtering. Wiley, New York (2000)Google Scholar
  15. 15.
    Hild, K.E., Erdogmus, D., Principe, J.: Blind source separation using Rényi’s mutual information. IEEE Signal Process. Lett. 8(6), 174–176 (2001)CrossRefGoogle Scholar
  16. 16.
    Martinetz, T.: Selbstorganisierende neuronale Netzwerkmodelle zur Bewegungssteuerung. Ph.D.-thesis, Technische Universität München, München, Germany (1992)Google Scholar
  17. 17.
    Martinetz, T.M., Berkovich, S.G., Schulten, K.J.: ‘Neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Trans. Neural Netw. 4(4), 558–569 (1993)CrossRefGoogle Scholar
  18. 18.
    Deco, G., Obradovic, D.: An Information-Theoretic Approach to Neural Computing. Springer, Heidelberg, New York, Berlin (1997). Scholar
  19. 19.
    Rényi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley. University of California Press (1961)Google Scholar
  20. 20.
    Rényi, A.: Probability Theory. North-Holland Publishing Company, Amsterdam (1970)zbMATHGoogle Scholar
  21. 21.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  22. 22.
    Wittner, B.S., Denker, J.S.: Strategies for teaching layered networks classification tasks. In: Anderson, D.Z. (ed.) Neural Information Processing Systems, pp. 850–859. American Institute of Physics (1988)Google Scholar
  23. 23.
    Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 437–478. Springer, Heidelberg (2012). Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Andrea Villmann
    • 1
    • 2
  • Marika Kaden
    • 1
  • Sascha Saralajew
    • 1
    • 3
  • Thomas Villmann
    • 1
    Email author
  1. 1.Saxony Institute for Computational Intelligence and Machine LearningUniversity of Applied Sciences MittweidaMittweidaGermany
  2. 2.Schulzentrum Döbeln-MittweidaMittweidaGermany
  3. 3.Dr. Ing. h.c. F. Porsche AG WeissachWeissachGermany

Personalised recommendations