Modeling Threshold Interaction Effects Through the Logistic Classification Trunk
We introduce a model dealing with the identification of interaction effects in binary response data, which integrates recursive partitioning and generalized linear models. It derives from an ad-hoc specification and consequent implementation of the Simultaneous Threshold Interaction Modeling Algorithm (STIMA). The model, called Logistic Classification Trunk, allows us to obtain regression parameters by maximum likelihood through the simultaneous estimation of both main effects and threshold interaction effects. The main feature of this model is that it allows the user to evaluate a unique model and simultaneously the importance of both effects obtained by first growing a classification trunk and then by pruning it back to avoid overfitting. We investigate the choice of a suitable pruning parameter through a simulation study and compare the classification accuracy of the Logistic Classification Trunk with that of 13 alternative models/classifiers on 25 binary response datasets.
KeywordsSTIMA Generalized linear modeling Logistic Regression Recursive partitioning Interaction effects Regression trunk
Unable to display preview. Download preview PDF.
- BACHE, K., and LICHMAN, M. (2013), “UCI Machine Learning Repository”, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml/.
- COHEN, J., COHEN, P., WEST, S., and AIKEN, L. (2003), Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Mahwah NJ: Lawrence Erlbaum.Google Scholar
- CULP, M., JOHNSON, K., and MICHAILIDIS, G. (2012), ada: An R Package for Stochastic Boosting, R package version 2.0-3, http://CRAN.R-project.org/package=ada.
- GRUBINGER, T., ZEILEIS, A., and PFEIFFER, K.P. (2011), “Evolutionary Learning of Globally Optimal Classification and Regression Trees in R”,Working Paper 2011-20. Working Papers in Economics and Statistics, Research Platform Empirical and Experimental Economics, Universitaet Innsbruck, http://EconPapers.RePEc.org/RePEc:inn:wpaper:2011-20.
- HALVORSEN, T. (2012), “ElemStatLearn: Data Sets, Functions and Examples”, in The Elements of Statistical Learning, Data Mining, Inference, and Prediction, T. Hastie, R. Tibshirani, and J. Friedman, R package version 2012.04-0, http://CRAN.R-project.org/package=ElemStatLearn.
- HASTIE, T. (2013), gam: Generalized Additive Models, R package version 1.09, http://CRAN.R-project.org/package=gam.
- HASTIE, T., and TIBSHIRANI, R. (2013), mda: Mixture and Flexible Discriminant Analysis, R package version 0.4-4,S original by Hastie and Tishirani, original R port by Leisch, Hornik, and Ripley, http://CRAN.R-project.org/package=mda.
- HOSMER, D.W., LEMESHOW, S., and STURDIVANT, R.X. (2013), Applied Logistic Regression (3rd. ed.), Hoboken, NJ: John Wiley and Sons, Inc.Google Scholar
- HOSMER, D.W., and LEMESHOW, S. (2000), Applied Logistic Regression (2nd ed.), Hoboken NJ: John Wiley and Sons, Inc.Google Scholar
- HOSMER, D.W., and LEMESHOW, S. (1989), Applied Logistic Regression (1st ed.), Hoboken NJ: John Wiley and Sons, Inc.Google Scholar
- HOTHORN, T. (2014), “TH.Data: TH’s Data Archive”, R package version 1.0-3, http://CRAN.R-project.org/package=TH.data.
- KAPELNER, A., and BLEICH, J. (2013), “bartMachine: A Powerful Tool for Machine Learning”, ArXiv e-prints, http://arxiv.org/abs/1312.2171.
- KUHN, A. [Contributions from J. Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt, T. Cooper, Z. Mayer and the R Core Team] (2014), “caret: Classification and Regression Training. R package version 6.0-30”, http://CRAN.R-project.org/package=caret.
- KUHN, M., WESTON, S., and COULTER, N. (2014), “C50: C5.0 Decision Trees and Rule-Based Models”, R package version 0.1.0-19, (C code for C5.0 by R. Quinlan), http://CRAN.R-project.org/package=C50.
- LEISCH, F., and DIMITRIADU, E. (2010), “mlbench: Machine Learning Benchmark Problems”, R package version 2.1-1, http://CRAN.R-project.org/package=mlbench.
- LIAW, A., and WIENER, M. (2002), “Classification and Regression by randomForest”, R News, 2(3), 18–22.Google Scholar
- MENARD, S. (2000), “Coefficients of Determination for Multiple Logistic Regression Analysis”, The American Statistician, 54(1), 17–24.Google Scholar
- PETERS, A., and HOTHORN, T. (2013), “ipred: Improved Predictors”, R package version 0.9-3, http://CRAN.R-project.org/package=ipred.
- QUINLAN, J.R. (1993), C4.5: Programs for Machine Learning, San Franciso CA: Morgan Kaufmann Publishers Inc.Google Scholar
- R CORE TEAM (2016), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.Google Scholar
- RIPLEY, B.D. (1996), Pattern Recognition and Neural Networks, New York NY: Cambridge University Press.Google Scholar
- THERNEAU, T., ATKINSON, B., and RIPLEY, B.D. (2014). rpart: Recursive Partitioning and Regression Trees. R package version 4.1-5, http://CRAN.R-project.org/package=rpart.