Advertisement

Statistics and Computing

, Volume 20, Issue 2, pp 119–138 | Cite as

Twin Boosting: improved feature selection and prediction

  • Peter BühlmannEmail author
  • Torsten Hothorn
Article

Abstract

We propose Twin Boosting which has much better feature selection behavior than boosting, particularly with respect to reducing the number of false positives (falsely selected features). In addition, for cases with a few important effective and many noise features, Twin Boosting also substantially improves the predictive accuracy of boosting. Twin Boosting is as general and generic as (gradient-based) boosting. It can be used with general weak learners and in a wide variety of situations, including generalized regression, classification or survival modeling. Furthermore, it is computationally feasible for large problems with potentially many more features than observed samples. Finally, for the special case of orthonormal linear models, we prove equivalence of Twin Boosting to the adaptive Lasso which provides some theoretical aspects on feature selection with Twin Boosting.

Keywords

Classification Gradient descent High-dimensional data Regression Regularization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L.: Arcing classifiers (with discussion). Ann. Stat. 26, 801–849 (1998) zbMATHCrossRefMathSciNetGoogle Scholar
  2. Breiman, L.: Prediction games & arcing algorithms. Neural Comput. 11, 1493–1517 (1999) CrossRefGoogle Scholar
  3. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) zbMATHCrossRefGoogle Scholar
  4. Bühlmann, P.: Boosting for high-dimensional linear models. Ann. Stat. 34, 559–583 (2006) zbMATHCrossRefGoogle Scholar
  5. Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat. Sci. 22, 477–505 (2007) CrossRefGoogle Scholar
  6. Bühlmann, P., Meier, L.: Discussion of “One-step sparse estimates in nonconcave penalized likelihood models” (H. Zou and R. Li, auths.). Ann. Stat. 36, 1534–1541 (2008) zbMATHCrossRefGoogle Scholar
  7. Bühlmann, P., Yu, B.: Boosting with the L 2 loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003) zbMATHCrossRefGoogle Scholar
  8. Bühlmann, P., Yu, B.: Sparse boosting. J. Mach. Learn. Res. 7, 1001–1024 (2006) MathSciNetGoogle Scholar
  9. Conlon, E., Liu, X., Lieb, J., Liu, J.: Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl. Acad. Sci. USA 100, 3339–3344 (2003) CrossRefGoogle Scholar
  10. Cox, D.: Partial likelihood. Biometrika 62, 269–276 (1975) zbMATHCrossRefMathSciNetGoogle Scholar
  11. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression (with discussion). Ann. Stat. 32, 407–451 (2004) zbMATHCrossRefMathSciNetGoogle Scholar
  12. Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1996) Google Scholar
  13. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997) zbMATHCrossRefMathSciNetGoogle Scholar
  14. Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001) zbMATHCrossRefGoogle Scholar
  15. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion). Ann. Stat. 28, 337–407 (2000) zbMATHCrossRefMathSciNetGoogle Scholar
  16. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.: Feature Extraction, Foundations and Applications, Studies in Fuzziness and Soft Computing. Springer, Heidelberg (2006) Google Scholar
  17. Hand, D.: Classifier technology and the illusion of progress (with discussion). Stat. Sci. 21, 1–34 (2006) zbMATHCrossRefMathSciNetGoogle Scholar
  18. Huang, J., Ma, S., Zhang, C.-H.: Adaptive Lasso for sparse high-dimensional regression models. Stat. Sin. 18, 1603–1618 (2008) zbMATHMathSciNetGoogle Scholar
  19. Jamain, A., Hand, D.: The naive Bayes mystery. Pattern Recogn. Lett. 26, 1752–1760 (2005) CrossRefGoogle Scholar
  20. Lutz, R.: Logitboost with trees applied to the WCCI 2006 performance prediction challenge datasets. In: Proceedings of the IJCNN 2006 Google Scholar
  21. Lutz, R.W., Bühlmann, P.: Conjugate direction boosting. J. Comput. Graph. Stat. 15, 287–311 (2006) CrossRefGoogle Scholar
  22. Meinshausen, N.: Relaxed Lasso. Comput. Stat. Data Anal. 52, 374–393 (2007) zbMATHCrossRefMathSciNetGoogle Scholar
  23. Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34, 1436–1462 (2006) zbMATHCrossRefGoogle Scholar
  24. Meir, R., Rätsch, G.: An introduction to boosting and leveraging. In: Mendelson, S., Smola, A. (eds.) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science. Springer, Berlin (2003) Google Scholar
  25. Rätsch, G., Onoda, T., Müller, K.: Soft margins for AdaBoost. Mach. Learn. 42, 287–320 (2001) zbMATHCrossRefGoogle Scholar
  26. Ridgeway, G.: The state of boosting. Comput. Sci. Stat. 31, 172–181 (1999) Google Scholar
  27. Schapire, R.: The boosting approach to machine learning: an overview. In: Denison, D., Hansen, M., Holmes, C., Mallick, B., Yu, B. (eds.) MSRI Workshop on Nonlinear Estimation and Classification. Springer, Berlin (2002) Google Scholar
  28. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc., Ser. B 58, 267–288 (1996) zbMATHMathSciNetGoogle Scholar
  29. Tutz, G., Binder, H.: Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62, 961–971 (2006) zbMATHCrossRefMathSciNetGoogle Scholar
  30. Tutz, G., Reithinger, F.: A boosting approach to flexible semiparametric mixed models. Stat. Med. 26, 2872–2900 (2007) CrossRefMathSciNetGoogle Scholar
  31. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006) MathSciNetGoogle Scholar
  32. Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006) zbMATHCrossRefGoogle Scholar
  33. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc., Ser. B 67, 301–320 (2005) zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Seminar für StatistikETH ZürichZürichSwitzerland
  2. 2.Institut für StatistikLudwig-Maximilians-UniversitätMünchenGermany

Personalised recommendations