Advertisement

Discriminant analysis for discrete variables derived from a tree-structured graphical model

  • Gonzalo Perez-de-la-CruzEmail author
  • Guillermina Eslava-Gomez
Regular Article

Abstract

The purpose of this paper is to illustrate the potential use of discriminant analysis for discrete variables whose dependence structure is assumed to follow, or can be approximated by, a tree-structured graphical model. This is done by comparing its empirical performance, using estimated error rates for real and simulated data, with the well-known Naive Bayes classification rule and with linear logistic regression, both of which do not consider any interaction between variables, and with models that consider interactions like a decomposable and the saturated model. The results show that discriminant analysis based on tree-structured graphical models, a simple nonlinear method including only some of the pairwise interactions between variables, is competitive with, and sometimes superior to, other methods which assume no interactions, and has the advantage over more complex decomposable models of finding the graph structure in a fast way and exact form.

Keywords

Discrete variables Discriminant analysis Error rates Minimum weight spanning tree Multinomial distribution Sparseness Structure estimation Tree-structured graphical models 

Mathematics Subject Classification

62H30 68T10 

Notes

Acknowledgements

This work was written while GEG was at the Department of Applied Mathematics and Computer Science, Technical University of Denmark, on Sabbatical leave from the Faculty of Sciences at the National Autonomous University of Mexico (UNAM), and gratefully acknowledges a six months grant from the program PASPA, DGAPA, UNAM. GPC was a postdoctoral researcher at the Department of Applied Mathematics and Computer Science, Technical University of Denmark, and received a postdoctoral Grant (252737) by the National Council of Science and Technology (CONACYT) of Mexico. We are very grateful to Drs. H. Avila Rosas and L. D. Sánchez Velázquez for providing the ICU data, and for helpful discussions concerning the codification and selection of variables.

References

  1. Abreu GCG, Edwards D, Labouriau R (2010) High-dimensional graphical model search with the gRapHD R package. J Stat Softw 37(1):1–18CrossRefGoogle Scholar
  2. Asparoukhov OK, Krzanowski WJ (2001) A comparison of discriminant procedures for binary variables. Comput Stat Data Anal 38:139–160MathSciNetCrossRefzbMATHGoogle Scholar
  3. Cheng J, Li T, Levina E, Zhu J (2017) High-dimensional mixed graphical models. J Comput Graph Stat 26(2):367–378MathSciNetCrossRefGoogle Scholar
  4. Chow CK, Liu CN (1966) An approach to structure adaptation in pattern recognition. IEEE Trans Syst Sci Cybern 2:73–80CrossRefGoogle Scholar
  5. Chow CK, Liu CN (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14:462–467CrossRefzbMATHGoogle Scholar
  6. Chow CK, Wagner TJ (1973) Consistency of an estimate of tree-dependent probability distributions. IEEE Trans Inf Theory 19:369–371CrossRefGoogle Scholar
  7. Edwards D, Abreu G, Labouriau R (2010) Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests. BMC Bioinform 11:18CrossRefGoogle Scholar
  8. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163CrossRefzbMATHGoogle Scholar
  9. Goldstein M, Dillon WR (1978) Discrete discriminant analysis. Wiley, New YorkzbMATHGoogle Scholar
  10. Gou J, Levina E, Michailidis G, Zhu L (2015) Graphical models for ordinal data. J Comput Graph Stat 24(1):183–204MathSciNetCrossRefGoogle Scholar
  11. Hand DJ (1981) Discrimination and classification. Wiley, ChichesterzbMATHGoogle Scholar
  12. Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity. CRC Press, Boca Raton, The lasso and generalizationsCrossRefzbMATHGoogle Scholar
  13. Højsgaard S (2012) Graphical independence networks with the gRain package for R. J Stat Softw 46(10):1–26Google Scholar
  14. Højsgaard S, Lauritzen SL, Edwards D (2012) Graphical models with R. Springer, New YorkCrossRefzbMATHGoogle Scholar
  15. Kim J (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53:3735–3745MathSciNetCrossRefzbMATHGoogle Scholar
  16. Krzanowski WJ, Marriott FHC (1995) Multivariate analysis Part 2: classification, covariance, structures and repeated measurements. Arnold, LondonzbMATHGoogle Scholar
  17. Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7:48–50MathSciNetCrossRefzbMATHGoogle Scholar
  18. Lauritzen SL (1996) Graphical models. Oxford University Press, New YorkzbMATHGoogle Scholar
  19. Lee JD, Hastie TJ (2015) Learning the structure of mixed graphical models. J Comput Graph Stat 24(1):230–253MathSciNetCrossRefGoogle Scholar
  20. Loh PL, Wainwright MJ (2013) Structure estimation for discrete graphical models: generalized covariance matrices and their inverses. Ann Stat 41:3022–3049MathSciNetCrossRefzbMATHGoogle Scholar
  21. Perez-de-la-Cruz G, Eslava-Gomez G (2016) Discriminant analysis with Gaussian graphical tree models. AStA Adv Stat Anal 100:161–187MathSciNetCrossRefzbMATHGoogle Scholar
  22. Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36:1389–1401CrossRefGoogle Scholar
  23. R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria http://www.R-project.org/
  24. Tan VYF, Sanghavi S, Fisher JW, Willsky AS (2010) Learning graphical models for hypothesis testing and classification. IEEE Trans Signal Proces 58:5481–5495MathSciNetCrossRefzbMATHGoogle Scholar
  25. Welch BL (1939) Note on discriminant functions. Biometrika 31:218–220MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Gonzalo Perez-de-la-Cruz
    • 1
    Email author
  • Guillermina Eslava-Gomez
    • 2
  1. 1.National Institute of Statistics and Geography (INEGI) of MexicoMexico CityMexico
  2. 2.Department of Mathematics, Faculty of SciencesUNAMMexico CityMexico

Personalised recommendations