A subspace aggregating algorithm for accurate classification

Amiri, Saeid; Modarres, Reza

doi:10.1007/s00180-024-01476-3

A subspace aggregating algorithm for accurate classification

Original Paper
Published: 09 March 2024

(2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

41 Accesses
Explore all metrics

Abstract

We present a technique for learning via aggregation in supervised classification. The new method improves classification performance, regardless of which classifier is at its core. This approach exploits the information hidden in subspaces by combinations of aggregating variables and is applicable to high-dimensional data sets. We provide algorithms that randomly divide the variables into smaller subsets and permute them before applying a classification method to each subset. We combine the resulting classes to predict the class membership. Theoretical and simulation analyses consistently demonstrate the high accuracy of our classification methods. In comparison to aggregating observations through sampling, our approach proves to be significantly more effective. Through extensive simulations, we evaluate the accuracy of various classification methods. To further illustrate the effectiveness of our techniques, we apply them to five real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The deterministic subspace method for constructing classifier ensembles

Article Open access 03 October 2017

Ensemble Enhanced Evidential k-NN Classifier Through Random Subspaces

Subspace Clustering Technique Using Multi-objective Functions for Multi-class Categorical Data

Notes

RF: It was utilized with the default parameters, see https://cran.r-project.org/web/packages/randomForest/index.html.
Boosting: It was utilized with the default parameters, see https://cran.r-project.org/web/packages/adabag/index.html.
XGBoost: It was utilized with max_depth = 4, eta = 0.5, nthread = 3, nrounds = 30, subsample = 0, see https://cran.r-project.org/web/packages/xgboost/index.html.
https://github.com/saeidamiri1/sagg.

References

Alfaro E, G’amez M, Garc’ia N (2013) adabag: an R package for classification with boosting and bagging. J Stat Softw 54(2):1–35
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Chicago
Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining pp 785-794
Croux C, Joossens K, Lemmens A (2007) Trimmed bagging. Comput Stat Data Anal 52(1):362–368
Article MathSciNet Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407
Article Google Scholar
Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, Altman RB (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci 98(24):13784–13789
Article ADS CAS PubMed PubMed Central Google Scholar
Gorman RP, Sejnowski TJ (1988) Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw 1(1):75–89
Article Google Scholar
Gul A, Perperoglou A, Khan Z, Mahmoud O, Miftahuddin M, Adler W, Lausen B (2016) Ensemble of a subset of kNN classifiers. Adv Data Anal Classif 1–14
Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc Ser B (Statistical Methodology) 67(3):427–444
Article MathSciNet Google Scholar
Hastie T, Tibshirani R, Friedman J (2021) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Berlin
Google Scholar
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Hothorn T, Lausen B (2003) Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recogn 36(6):1303–1309
Article ADS Google Scholar
Johnson B (2013) High resolution urban land cover classification using a competitive multi-scale object-based approach. Remote Sens Lett 4(2):131–140
Article Google Scholar
Lee S, Cho S (2001) Smoothed bagging with kernel bandwidth selectors. Neural Process Lett 14(2):157–168
Article ADS Google Scholar
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Google Scholar
Lichman M (2013) UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23
Article Google Scholar
Soleymani M, Lee SMS (2014) Sequential combination of weighted and nonparametric bagging for classification. Biometrika 101(2):491–498
Article MathSciNet Google Scholar
Ting KM, Wells JR, Tan SC, Teng SW, Webb GI (2011) Feature-subspace aggregating: ensembles for stable and unstable learners. Mach Learn 82:375–397
Article MathSciNet Google Scholar
Venables WN, Ripley BD (2013) Modern applied statistics with S-PLUS. Springer Science & Business Media, Berlin
Google Scholar
Zhu J, Zou H, Rosset S, Hastie T (2009) Multi-class adaboost. Stat Interface 2(3):349–360
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Montreal Neurological Institute-Hospital, McGill University, Montréal, Canada
Saeid Amiri
Department of Statistics, The George Washington University, Washington, DC, USA
Reza Modarres

Authors

Saeid Amiri
View author publications
You can also search for this author in PubMed Google Scholar
Reza Modarres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeid Amiri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Amiri, S., Modarres, R. A subspace aggregating algorithm for accurate classification. Comput Stat (2024). https://doi.org/10.1007/s00180-024-01476-3

Download citation

Received: 13 February 2023
Accepted: 29 January 2024
Published: 09 March 2024
DOI: https://doi.org/10.1007/s00180-024-01476-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A subspace aggregating algorithm for accurate classification

Abstract

Access this article

Similar content being viewed by others

The deterministic subspace method for constructing classifier ensembles

Ensemble Enhanced Evidential k-NN Classifier Through Random Subspaces

Subspace Clustering Technique Using Multi-objective Functions for Multi-class Categorical Data

Notes

References