Skip to main content
Log in

A subspace aggregating algorithm for accurate classification

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We present a technique for learning via aggregation in supervised classification. The new method improves classification performance, regardless of which classifier is at its core. This approach exploits the information hidden in subspaces by combinations of aggregating variables and is applicable to high-dimensional data sets. We provide algorithms that randomly divide the variables into smaller subsets and permute them before applying a classification method to each subset. We combine the resulting classes to predict the class membership. Theoretical and simulation analyses consistently demonstrate the high accuracy of our classification methods. In comparison to aggregating observations through sampling, our approach proves to be significantly more effective. Through extensive simulations, we evaluate the accuracy of various classification methods. To further illustrate the effectiveness of our techniques, we apply them to five real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. RF: It was utilized with the default parameters, see https://cran.r-project.org/web/packages/randomForest/index.html.

  2. Boosting: It was utilized with the default parameters, see https://cran.r-project.org/web/packages/adabag/index.html.

  3. XGBoost: It was utilized with max_depth = 4, eta = 0.5, nthread = 3, nrounds = 30, subsample = 0, see https://cran.r-project.org/web/packages/xgboost/index.html.

  4. https://github.com/saeidamiri1/sagg.

References

  • Alfaro E, G’amez M, Garc’ia N (2013) adabag: an R package for classification with boosting and bagging. J Stat Softw 54(2):1–35

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Chicago

    Google Scholar 

  • Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining pp 785-794

  • Croux C, Joossens K, Lemmens A (2007) Trimmed bagging. Comput Stat Data Anal 52(1):362–368

    Article  MathSciNet  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407

    Article  Google Scholar 

  • Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, Altman RB (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci 98(24):13784–13789

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Gorman RP, Sejnowski TJ (1988) Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw 1(1):75–89

    Article  Google Scholar 

  • Gul A, Perperoglou A, Khan Z, Mahmoud O, Miftahuddin M, Adler W, Lausen B (2016) Ensemble of a subset of kNN classifiers. Adv Data Anal Classif 1–14

  • Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc Ser B (Statistical Methodology) 67(3):427–444

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2021) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Berlin

    Google Scholar 

  • Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  • Hothorn T, Lausen B (2003) Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recogn 36(6):1303–1309

    Article  ADS  Google Scholar 

  • Johnson B (2013) High resolution urban land cover classification using a competitive multi-scale object-based approach. Remote Sens Lett 4(2):131–140

    Article  Google Scholar 

  • Lee S, Cho S (2001) Smoothed bagging with kernel bandwidth selectors. Neural Process Lett 14(2):157–168

    Article  ADS  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22

    Google Scholar 

  • Lichman M (2013) UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

  • Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23

    Article  Google Scholar 

  • Soleymani M, Lee SMS (2014) Sequential combination of weighted and nonparametric bagging for classification. Biometrika 101(2):491–498

    Article  MathSciNet  Google Scholar 

  • Ting KM, Wells JR, Tan SC, Teng SW, Webb GI (2011) Feature-subspace aggregating: ensembles for stable and unstable learners. Mach Learn 82:375–397

    Article  MathSciNet  Google Scholar 

  • Venables WN, Ripley BD (2013) Modern applied statistics with S-PLUS. Springer Science & Business Media, Berlin

    Google Scholar 

  • Zhu J, Zou H, Rosset S, Hastie T (2009) Multi-class adaboost. Stat Interface 2(3):349–360

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeid Amiri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Algorithm 1
figure a

SAGG: Subspace Aggregating Algorithm

Algorithm 2
figure b

SAGG-kNN: Subspace Aggregating Algorithm via kNN

Algorithm 3
figure c

SAGG: Subspace Aggregating Algorithm with Fewer Miss-classifications

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amiri, S., Modarres, R. A subspace aggregating algorithm for accurate classification. Comput Stat (2024). https://doi.org/10.1007/s00180-024-01476-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00180-024-01476-3

Keywords

Navigation