Machine Learning

, Volume 82, Issue 3, pp 375–397 | Cite as

Feature-subspace aggregating: ensembles for stable and unstable learners

  • Kai Ming Ting
  • Jonathan R. Wells
  • Swee Chuan Tan
  • Shyh Wei Teng
  • Geoffrey I. Webb
Open Access
Article

Abstract

This paper introduces a new ensemble approach, Feature-Subspace Aggregating (Feating), which builds local models instead of global models. Feating is a generic ensemble approach that can enhance the predictive performance of both stable and unstable learners. In contrast, most existing ensemble approaches can improve the predictive performance of unstable learners only. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble through an increased level of localisation in Feating. Our empirical evaluation shows that Feating performs significantly better than Boosting, Random Subspace and Bagging in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by Feating makes feasible SVM ensembles that would otherwise be infeasible for large data sets. When SVM is the preferred base learner, we show that Feating SVM performs better than Boosting decision trees and Random Forests. We further demonstrate that Feating also substantially reduces the error of another stable learner, k-nearest neighbour, and an unstable learner, decision tree.

Keywords

Classifier ensembles Stable learners Unstable learners Model diversity Local models Global models 

References

  1. Asuncion, A., & Newman, D. J. (2007). UCI repository of machine learning databases. University of California, Irvine, CA. Google Scholar
  2. Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73. CrossRefGoogle Scholar
  3. Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 245–250). New York: ACM. CrossRefGoogle Scholar
  4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140. MathSciNetMATHGoogle Scholar
  5. Breiman, L. (1998). Arcing classifiers (with discussion). Annals of Statistics, 26(3), 801–849. MathSciNetMATHCrossRefGoogle Scholar
  6. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32 MATHCrossRefGoogle Scholar
  7. Cerquides, J., & Mantaras, R. L. D. (2005). Robust Bayesian linear classifier ensembles. In Proceedings of the sixteenth European conference on machine learning (pp. 70–81). Berlin: Springer. Google Scholar
  8. Davidson, I. (2004). An ensemble technique for stable learners with performance bounds. In Proceedings of the thirteenth national conference on artificial intelligence (pp. 330–335). Menlo Park: AAAI Press. Google Scholar
  9. DePasquale, J., & Polikar, O. (2007). Random feature subset selection for ensemble based classification of data with missing features. In Lecture notes in computer science: Vol. 4472. Multiple classifier systems (pp. 251–260). Berlin: Springer. CrossRefGoogle Scholar
  10. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874. Google Scholar
  11. Frank, E., Hall, M., & Pfahringer, B. (2003). Locally weighted Naive Bayes. In Proceedings of the 19th conference on uncertainty in artificial intelligence (pp. 249–256). San Mateo: Morgan Kaufmann. Google Scholar
  12. Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions Pattern Analysis and Machine Intelligence, 20(8), 832–844. CrossRefGoogle Scholar
  13. Kim, H.-C., Pang, S., Je, H.-M., Kim, D., & Bang, S.-Y. (2002). Support vector machine ensemble with bagging. In Lecture notes in computer science: Vol. 2388. Pattern recognition with support vector machines (pp. 131–141). Berlin: Springer. Google Scholar
  14. Klanke, S., Vijayakumar, S., & Schaal, S. (2008). A library for local weighted projection regression. Journal of Machine Learning Research, 9, 623–626. MathSciNetGoogle Scholar
  15. Kohavi, R. (1996). Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the 2nd international conference on knowledge discovery and data mining (pp. 202–207). New York: ACM. Google Scholar
  16. Kohavi, R., & Li, C. H. (1995). Oblivious decision trees, graphs, and top-down pruning. In Proceedings of the 14th international joint conference on artificial intelligence (pp. 1071–1077). San Mateo: Morgan Kaufmann. Google Scholar
  17. Li, X., Wang, L., & Sung, E. (2005). A study of AdaBoost with SVM based weak learners. In Proceedings of the international joint conference on neural networks (pp. 196–201). New York: IEEE Press. Google Scholar
  18. Liu, F. T., Ting, K. M., Yu, Y., & Zhou, Z. H. (2008). Spectrum of variable-random trees. Journal of Artificial Intelligence Research, 32, 355–384. MATHGoogle Scholar
  19. Opitz, D. (1999). Feature selection for ensembles. In Proceedings of the 16th national conference on artificial intelligence (pp. 379–384). Menlo Park: AAAI Press. Google Scholar
  20. Oza, N. C., & Tumer, K. (2001). Input decimation ensembles: decorrelation through dimensionality reduction. In LNCS: Vol. 2096. Proceedings of the second international workshop on multiple classifier systems (pp. 238–247). Berlin: Springer. CrossRefGoogle Scholar
  21. Pavlov, D., Mao, J., & Dom, B. (2000). Scaling-up support vector machines using the boosting algorithm. In Proceedings of the 15th international conference on pattern recognition (pp. 219–222). Los Alamitos: IEEE Comput. Soc. CrossRefGoogle Scholar
  22. Quinlan, J. R. (1993). C4.5: program for machine learning. San Mateo: Morgan Kaufmann. Google Scholar
  23. Schapire, R. E., & Singer, S. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297–336. MATHCrossRefGoogle Scholar
  24. Tao, D., Tang, X., Li, X., & Wu, X. (2006). Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 1088–1099. CrossRefGoogle Scholar
  25. Tsang, I. W., Kwok, J. T., & Cheung, P.-M. (2005). Core vector machines: fast SVM training on very large data sets. Journal of Machine Learning Research, 6, 363–392. MathSciNetGoogle Scholar
  26. Tsang, I. W., Kocsor, A., & Kwok, J. T. (2007). Simpler core vector machines with enclosing balls. In Proceedings of the twenty-fourth international conference on machine learning (pp. 911–918). San Mateo: Morgan Kaufmann. Google Scholar
  27. Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: averaged one-dependence estimators. Machine Learning, 58(1), 5–24. MATHCrossRefGoogle Scholar
  28. Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd edn.). San Mateo: Morgan Kaufmann. MATHGoogle Scholar
  29. Yang, Y., Webb, G. I., Cerquides, J., Korb, K., Boughton, J., & Ting, K. M. (2007). To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence ensembles. IEEE Transaction on Knowledge and Data Engineering, 19(12), 1652–1665. CrossRefGoogle Scholar
  30. Yu, H.-F., Hsieh, C.-J., Chang, K.-W., & Lin, C.-J. (2010). Large linear classification when data cannot fit in memory. In Proceedings of the sixteenth ACM SIGKDD conference on knowledge discovery and data mining (pp. 833–842). New York: ACM. CrossRefGoogle Scholar
  31. Zheng, Z., & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine Learning, 41(1), 53–84. CrossRefGoogle Scholar
  32. Zheng, F., & Webb, G. I. (2006). Efficient lazy elimination for averaged one-dependence estimators. In Proceedings of the twenty-third international conference on machine learning (pp. 1113–1120). San Mateo: Morgan Kaufmann. Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Kai Ming Ting
    • 1
  • Jonathan R. Wells
    • 1
  • Swee Chuan Tan
    • 1
  • Shyh Wei Teng
    • 1
  • Geoffrey I. Webb
    • 2
  1. 1.Gippsland School of Information TechnologyMonash UniversityVicAustralia
  2. 2.Clayton School of Information TechnologyMonash UniversityVicAustralia

Personalised recommendations