Skip to main content
Log in

Discriminant analysis for discrete variables derived from a tree-structured graphical model

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

The purpose of this paper is to illustrate the potential use of discriminant analysis for discrete variables whose dependence structure is assumed to follow, or can be approximated by, a tree-structured graphical model. This is done by comparing its empirical performance, using estimated error rates for real and simulated data, with the well-known Naive Bayes classification rule and with linear logistic regression, both of which do not consider any interaction between variables, and with models that consider interactions like a decomposable and the saturated model. The results show that discriminant analysis based on tree-structured graphical models, a simple nonlinear method including only some of the pairwise interactions between variables, is competitive with, and sometimes superior to, other methods which assume no interactions, and has the advantage over more complex decomposable models of finding the graph structure in a fast way and exact form.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abreu GCG, Edwards D, Labouriau R (2010) High-dimensional graphical model search with the gRapHD R package. J Stat Softw 37(1):1–18

    Article  Google Scholar 

  • Asparoukhov OK, Krzanowski WJ (2001) A comparison of discriminant procedures for binary variables. Comput Stat Data Anal 38:139–160

    Article  MathSciNet  Google Scholar 

  • Cheng J, Li T, Levina E, Zhu J (2017) High-dimensional mixed graphical models. J Comput Graph Stat 26(2):367–378

    Article  MathSciNet  Google Scholar 

  • Chow CK, Liu CN (1966) An approach to structure adaptation in pattern recognition. IEEE Trans Syst Sci Cybern 2:73–80

    Article  Google Scholar 

  • Chow CK, Liu CN (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14:462–467

    Article  Google Scholar 

  • Chow CK, Wagner TJ (1973) Consistency of an estimate of tree-dependent probability distributions. IEEE Trans Inf Theory 19:369–371

    Article  Google Scholar 

  • Edwards D, Abreu G, Labouriau R (2010) Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests. BMC Bioinform 11:18

    Article  Google Scholar 

  • Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163

    Article  Google Scholar 

  • Goldstein M, Dillon WR (1978) Discrete discriminant analysis. Wiley, New York

    MATH  Google Scholar 

  • Gou J, Levina E, Michailidis G, Zhu L (2015) Graphical models for ordinal data. J Comput Graph Stat 24(1):183–204

    Article  MathSciNet  Google Scholar 

  • Hand DJ (1981) Discrimination and classification. Wiley, Chichester

    MATH  Google Scholar 

  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity. CRC Press, Boca Raton, The lasso and generalizations

    Book  Google Scholar 

  • Højsgaard S (2012) Graphical independence networks with the gRain package for R. J Stat Softw 46(10):1–26

    Google Scholar 

  • Højsgaard S, Lauritzen SL, Edwards D (2012) Graphical models with R. Springer, New York

    Book  Google Scholar 

  • Kim J (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53:3735–3745

    Article  MathSciNet  Google Scholar 

  • Krzanowski WJ, Marriott FHC (1995) Multivariate analysis Part 2: classification, covariance, structures and repeated measurements. Arnold, London

    MATH  Google Scholar 

  • Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7:48–50

    Article  MathSciNet  Google Scholar 

  • Lauritzen SL (1996) Graphical models. Oxford University Press, New York

    MATH  Google Scholar 

  • Lee JD, Hastie TJ (2015) Learning the structure of mixed graphical models. J Comput Graph Stat 24(1):230–253

    Article  MathSciNet  Google Scholar 

  • Loh PL, Wainwright MJ (2013) Structure estimation for discrete graphical models: generalized covariance matrices and their inverses. Ann Stat 41:3022–3049

    Article  MathSciNet  Google Scholar 

  • Perez-de-la-Cruz G, Eslava-Gomez G (2016) Discriminant analysis with Gaussian graphical tree models. AStA Adv Stat Anal 100:161–187

    Article  MathSciNet  Google Scholar 

  • Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36:1389–1401

    Article  Google Scholar 

  • R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria http://www.R-project.org/

  • Tan VYF, Sanghavi S, Fisher JW, Willsky AS (2010) Learning graphical models for hypothesis testing and classification. IEEE Trans Signal Proces 58:5481–5495

    Article  MathSciNet  Google Scholar 

  • Welch BL (1939) Note on discriminant functions. Biometrika 31:218–220

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was written while GEG was at the Department of Applied Mathematics and Computer Science, Technical University of Denmark, on Sabbatical leave from the Faculty of Sciences at the National Autonomous University of Mexico (UNAM), and gratefully acknowledges a six months grant from the program PASPA, DGAPA, UNAM. GPC was a postdoctoral researcher at the Department of Applied Mathematics and Computer Science, Technical University of Denmark, and received a postdoctoral Grant (252737) by the National Council of Science and Technology (CONACYT) of Mexico. We are very grateful to Drs. H. Avila Rosas and L. D. Sánchez Velázquez for providing the ICU data, and for helpful discussions concerning the codification and selection of variables.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gonzalo Perez-de-la-Cruz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Perez-de-la-Cruz, G., Eslava-Gomez, G. Discriminant analysis for discrete variables derived from a tree-structured graphical model. Adv Data Anal Classif 13, 855–876 (2019). https://doi.org/10.1007/s11634-019-00352-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-019-00352-z

Keywords

Mathematics Subject Classification

Navigation