Advertisement

The Multivariate Entropy Triangle and Applications

  • Francisco José Valverde-Albacete
  • Carmen Peláez-Moreno
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9648)

Abstract

We extend a framework for the analysis of classifiers to encompass also the analysis of data sets. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions, not only bivariate ones. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how information gets transformed through machine learning classification tasks.

Keywords

Mutual Information Multivariate Distribution Multivariate Setting Geometric Locus Bivariate Case 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    McGill, W.J.: Multivariate information transmission. Psychometrika 19(2), 97–116 (1954)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Shannon, C.E.: A mathematical theory of Communication. Bell Syst. Techn. J. XXVII(3), 379–423 (1948)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Murphy, K.P.: Machine Learning. A Probabilistic Perspective. MIT Press, Cambridge (2012)zbMATHGoogle Scholar
  4. 4.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, Orlando (2006)zbMATHGoogle Scholar
  5. 5.
    Valverde-Albacete, F.J., Peláez-Moreno, C.: Two information-theoretic tools to assess the performance of multi-class classifiers. Pattern Recogn. Lett. 31(12), 1665–1671 (2010)CrossRefGoogle Scholar
  6. 6.
    Valverde-Albacete, F.J., Peláez-Moreno, C.: 100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox. PLOS ONE 9(1), e84217 (2014)CrossRefGoogle Scholar
  7. 7.
    Valverde-Albacete, F.J., Carrillo-de-Albornoz, J., Peláez-Moreno, C.: A proposal for new evaluation metrics and result visualization technique for sentiment analysis tasks. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 41–52. Springer, Heidelberg (2013)Google Scholar
  8. 8.
    Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13(1), 27–66 (2012)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surv. (CSUR) 47(3), 1–38 (2015)CrossRefGoogle Scholar
  10. 10.
    Mejía-Navarrete, D., Gallardo-Antolín, A., Peláez-Moreno, C., Valverde-Albacete, F.J.: Feature extraction assessment for an acoustic-event classification task using the entropy triangle. In: Interspeech 2010: 12th Annual Conference of the International Speech Communication Association (2011)Google Scholar
  11. 11.
    Han, T.S.: Linear dependence structure of the entropy space. Inf. Control 29, 337–368 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Han, T.S.: Nonnegative entropy measures of multivariate symmetric correlations. Inf. Control 36(2), 133–156 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Watanabe, S.: Information theoretical analysis of multivariate correlation. IBM Corp. J. Res. Dev. 4(1), 66–82 (1960)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Studený, M., Vejnarová, J.: The multiinformation function as a tool for measuring stochastic dependence. In: Jordan, M.I. (ed.) Learning in Graphical Models. NATO ASI Series, vol. 89, pp. 261–297. Springer, Netherlands (1998)CrossRefGoogle Scholar
  15. 15.
    Abdallah, S.A., Plumbley, M.D.: A measure of statistical complexity based on predictive information with application to finite spin systems. Phys. Lett. A 376(4), 275–281 (2012)CrossRefzbMATHGoogle Scholar
  16. 16.
    R Core Team: R A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015)Google Scholar
  17. 17.
    Meyer, D., Zeileis, A., Hornik, K.: VCD: Visualizing Categorical Data. R package version 1.4-1 (2015)Google Scholar
  18. 18.
    Leisch, F., Dimitriadou, E.: mlbench: Machine Learning Benchmark Problems. R package version 2.1-1 (2010)Google Scholar
  19. 19.
    Lichman, M.: UCI Machine Learning Repository. University of California, Irvine (2013)Google Scholar
  20. 20.
    Hamilton, N.: ggtern: An Extension to ggplot2, for the Creation of Ternary Diagrams. R package version 1.0.6.1 (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Francisco José Valverde-Albacete
    • 1
  • Carmen Peláez-Moreno
    • 1
  1. 1.Departamento de Teoría de la Señal y de las ComunicacionesUniversidad Carlos III de MadridLeganésSpain

Personalised recommendations