Skip to main content
Log in

Modified Rule Ensemble Method for Binary Data and Its Applications

  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

Ensemble learning, which combines multiple base learners to improve statistical prediction accuracy, is frequently used in statistical science and data mining. However, because of their “black box” nature, ensemble learning models are difficult to interpret. A recently proposed rule ensemble method known as RuleFit presents the base learner as a production rule and also generates a measure that influences the response variable. The RuleFit method for binary response applies a squared-error ramp loss function, and base learners are weighted by shrinkage regression using the lasso method. Thus, RuleFit is not constructed by a logistic regression model. Moreover, highly correlated pairs of base learners may be excessively pruned by the lasso method. In this study, we solved the excess pruning problem by constructing RuleFit within a logistic regression framework, weighting the base learners by elastic net. The effectiveness ofour proposed RuleFit model is illustrated through a real data set. In small-scale simulations, this method demonstrated higher predictive performance than the original RuleFit model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    MATH  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R. and Stone, C.J. (1984). Classification and regression trees. Wadsworth International Group.

    Google Scholar 

  • Fteund, Y. and Schapire, R.E. (1996). Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth International Conference, 96, 148–156.

    Google Scholar 

  • Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29, 1189–1232.

    Article  MathSciNet  Google Scholar 

  • Friedman, J.H., Hastie, T., Höfling, H., and Tibshirani. R. (2007). Pathwise coordinate optimization, Ann. Appl. StaL, 1(2), 302–332.

    Article  MathSciNet  Google Scholar 

  • Friedman, J. H. and Popescu, B. E. (2003). Importance sampled learning ensembles, Journal of Machine Learning Research, 94305.

    Google Scholar 

  • Friedman, J. H. and Popescu, B. E. (2004). Gradient directed regularization for linear regression and classification. Technical Report, Statistics Department, Stanford University.

    Google Scholar 

  • Friedman, J. H. and Popescu, B. E. (2008). Predictive learning via rule ensemble. Ann. Appl. Stal. 2(3), 916–954.

    Article  MathSciNet  Google Scholar 

  • Hastie, T., Tibshirani, R., and Friedman, J. H. (2009). The elements of statistical learning (2nd edition). New York: Springer-Verlag.

    Book  Google Scholar 

  • Li, L., Yan, K., Shimokawa, T., Oyama, I., and Kitamura, S. (2013). Investigation of factors affecting the evaluation of street scapes in Japan and China, International Journal of Affective Engineering, 12(1), 1–10.

    Article  Google Scholar 

  • Ridgeway, G. (2007). Generalized boosted models: a guide to the gbm package, http://cran.rproject.org/web/psdcages/gbm/vignettes/gbm.pdf#search=’generalized+boosting+machines+ridgeway

    Google Scholar 

  • Sexton, J. and Laake, P. (2007). Boosted regression trees with errors in variables, Biometrics, 63(2), 586–592.

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. B58(2), 267–288.

    MathSciNet  MATH  Google Scholar 

  • Shimokawa, T., Tsuji, M., and Goto, M. (2011). Modified rule ensemble method and its application for bioceutical data, Japanese Journal of Applied Statistics, 40(1), 19–40 (in Japanese).

    Article  Google Scholar 

  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. B67(2), 301–320.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toshio Shimokawa.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shimokawa, T., Li, L., Yan, K. et al. Modified Rule Ensemble Method for Binary Data and Its Applications. Behaviormetrika 41, 225–244 (2014). https://doi.org/10.2333/bhmk.41.225

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2333/bhmk.41.225

Key Words and Phrases

Navigation