Skip to main content
Log in

A nonparametric copula-based decision tree for two random variables using MIC as a classification index

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The copula is well-known for learning scale-free measures of dependence among variables and has invited much interest in recent years. At the very coronary heart of the copula, the concept is the well-known theorem of Sklar. It states that any multivariate distribution function can be disintegrated into the marginal distributions and a copula, which comprises the reliance between variables. On the other hand, the decision tree is a renowned nonparametric dominant modeling approach used for both regression and labeling problems. A decision tree represents a tree-structured classification of the data into surprising instructions for simplicity and prediction reason. In this paper, we are going to appraise with novel nonparametric copula-based decision tree organization using a measure of dependence: maximal information coefficient as classification index for two related variables which best classify the data concerning looking at the factors, but additionally ranked the factors in line with their inferences. Additionally, we pre-test the splitting criteria value to anticipate growing branches of the decision tree at each infant node. For example, we followed our proposed method to credit card records for Taiwan and coronary heart disease records of Pakistan and acquired the desirable outcomes. As a result, the anticipated method of initiating two-variable decision trees is tested using constructive tools for classification, prediction and reconnecting critical factors in statistics, finance, fitness sciences, machine learning, and many other associated fields.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Code availability

All the results reported in this research are carried out in R-studio with the help of the "Kdecopula" and "Minerva" package.

References

  • Aitkenhead MJ (2008) A co-evolving decision tree classification method. Expert SystAppl 34(1):18–25

    Article  Google Scholar 

  • Alsagheer RHA, Alharan AFH, Al-Haboobi ASA (2017) Popular decision tree algorithms of data mining techniques: a review. Int J ComputSci Mobile Comput IJCSMC 6(6):133–142

    Google Scholar 

  • Balakrishnan S, Madigan D (2006) Decision trees for functional variables. In: 6th international conference on data mining (ICDM'06), pp 798–802

  • Chen SX, Huang TM (2007) Nonparametric estimation of copula functions for dependence modelling. Can J Stat 35(2):145–159

    Article  MathSciNet  Google Scholar 

  • Cherubini U, Luciano E, Vecchiato W (2004) Copula methods in finance. Wiley finance series. Wiley, London

    Book  Google Scholar 

  • Elidan G (2012) Copula network classifiers. In: Proceedings of the 15th international conference on artificial intelligence and statistics, PMLR, vol 22, pp 346–354

  • Elidan G (2013) Copulas in machine learning. In: Jaworski P, Durante F, Hardle WK (eds) Copulae in mathematical and quantitative finance, volume 213 of lecture notes in statistics. Springer, Berlin, pp 39–60

  • Filose M et al (2013) Minerva: maximal information-based nonparametric exploration R package for variable analysis version 1.3. https://www.rproject.org. https://mpba.fbk.eu/cmine

  • Geenens G, Charpentier A, Paindaveine D (2017) Probit transformation for nonparametric kernel estimation of the copula density. Bernoulli 23(3):1848–1873

    Article  MathSciNet  Google Scholar 

  • Gijbels I, Mielniczuk J (1990) Estimating the density of a copula function. Commun Stat Theory Methods 19(2):445–464

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, Inference and Prediction. Springer, New York

    Book  Google Scholar 

  • Kinney JB, Gurinder SA (2014) Equitability, mutual information, and the maximal information coefficient. PNAS 111(9):3354–3359

    Article  MathSciNet  Google Scholar 

  • Kraskov A, Stogbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E Stat Nonlin Soft Matter Phys 69(6 Pt 2):066138

    Article  MathSciNet  Google Scholar 

  • Nagler T (2017) Kdecopula: an R package for the kernel estimation of bivariate copula densities. https://cran.r-project.org/web/packages/kdecopula/README.html

  • Nelsen RB (1997) An introduction to copulas. Springer, New York

    MATH  Google Scholar 

  • Ozdemir O, Allen TG, Choi S, Wimalajeewa T, Varshney PK (2018) Copula based classifier fusion under statistical dependence. IEEE Trans Pattern Anal Mach Intell 40(11):2740–2748

    Article  Google Scholar 

  • Patel BN, Prajapati SG, Lakharia KI (2012) Efficient classification of data using decision tree. BunfInt J Data Min 2(1):6–12

    Google Scholar 

  • Reshef DN et al (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524

    Article  Google Scholar 

  • Reshef DN, Reshef Y, Mitzenmacher M, Sabeti P (2013) Equitability analysis of the maximal information coefficients with comparisons. arXiv:1301.6314v1 [cs. L.G.]

  • Simon N, Tibshirani R (2011) Comment on "Detecting novel associations in large data sets" by Reshef et al. Science. arXiv:1401.7645

  • Sklar A (1959) Fonctions de Répartition à n Dimensions et Leurs Marges. Université Paris 8

  • Wang T, Dyer JS (2012) A copulas-based approach to modeling dependence in decision trees. Oper Res 60(1):1

    Article  MathSciNet  Google Scholar 

  • Wang LM, Li XL, Cao CH, Yuan SM (2006) Combining decision tree and naïve Bayes for classification. Knowl Based Syst 19(7):511–515

    Article  Google Scholar 

  • Yeh IC, Lien CH (2009) The comparisons of data mining techniques for the predictive accuracy of the probability of default of credit card clients. Expert SystAppl 36(2):2473–2480

    Article  Google Scholar 

Download references

Acknowledgements

Any source has not supported this study organization.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Q. S. Shan or S. Z. Abbas.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, Y.A., Shan, Q.S., Liu, Q. et al. A nonparametric copula-based decision tree for two random variables using MIC as a classification index. Soft Comput 25, 9677–9692 (2021). https://doi.org/10.1007/s00500-020-05399-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05399-1

Keywords

Navigation