A method for selecting the relevant dimensions for high-dimensional classification in singular vector spaces

Tadesse, Dawit G.; Carpenter, Mark

doi:10.1007/s11634-018-0311-8

A method for selecting the relevant dimensions for high-dimensional classification in singular vector spaces

Regular Article
Published: 25 January 2018

Volume 13, pages 405–426, (2019)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Dawit G. Tadesse¹ &
Mark Carpenter²

373 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we give a new feature selection algorithm for the binary class classification problem in sparse high-dimensional spaces. Singular value decomposition (SVD) is a popular dimension reduction method in higher-dimensional classification. The traditional SVD method begins by ranking the Singular Dimensions (SDs) from largest singular value to the smallest. However, when the number of signals is fewer than the number of noise, the first few ranked SDs are not necessarily the best for classification. We demonstrate, theoretically and empirically, that our method efficiently selects the SDs most appropriate for classification and significantly reduces the misclassification error. We also apply our method to real data text mining applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximizing adjusted covariance: new supervised dimension reduction for classification

Article 02 April 2024

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Survey on SVM and their application in image classification

Article 11 January 2018

References

Albright R (2004) Taming text with the SVD. SAS Institute Inc., Cary
Google Scholar
Bickel PJ, Levina E (2004) Some theory for Fisher’s linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10:989–1010
Article MathSciNet MATH Google Scholar
Cormack G, Gomez J, Sanz E (2007) Spam filtering for short messages. In: Proceedings of the sixteenth ACM conference on information and knowledge management, pp 313–320
Deerwester S, Dumais G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407
Article Google Scholar
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:7787
Article MathSciNet MATH Google Scholar
Fan J, Fan Y (2008) High dimensional classification using features annealed independence rules. Ann Stat 36:2605–2637
Article MathSciNet MATH Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultra-high dimensional feature space. J R Stat Soc B 70:849–911
Article MathSciNet Google Scholar
Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space (invited review article). Stat Sin 20:101–148
MATH Google Scholar
Filannino M (2011) DBWorld e-mail classification using a very small corpus. The University of Manchester, Manchester
Google Scholar
Joachims T (1997) Text categorization with support vector machines. Technical report, LS VIII Number 23, University of Dortmund
Mahinovs A, Tiwari A (2007) Text classification method review. In: Decision engineering report series, April 2007
Mai Q, Zou H, Yau M (2012) A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99:29–42
Article MathSciNet MATH Google Scholar
Mesterharm C, Pazzani M (2011) Active learning using on-line algorithms. In: KDD 2011
Pitts M, Clark C (2011) SAS text miner: theory and practice at UnitedHealthcare, UnitedHealthcare (Presentation at Analytics 2011 Conference)
Romero R, Iglesias EL, Borrajo L (2015) A linear-RBF multikernel SVM to classify big text corpora. BioMed Res Int 2015:878291
Article Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99:6567–6572
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the associate editor and referees for their helpful comments and suggestions.

Author information

Authors and Affiliations

Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, 3333 Burnet Ave, Cincinnati, OH, 45229, USA
Dawit G. Tadesse
Mathematics and Statistics, Auburn University, 221 Parker Hall, Auburn, AL, 36849, USA
Mark Carpenter

Authors

Dawit G. Tadesse
View author publications
You can also search for this author in PubMed Google Scholar
Mark Carpenter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dawit G. Tadesse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tadesse, D.G., Carpenter, M. A method for selecting the relevant dimensions for high-dimensional classification in singular vector spaces. Adv Data Anal Classif 13, 405–426 (2019). https://doi.org/10.1007/s11634-018-0311-8

Download citation

Received: 10 September 2014
Revised: 30 November 2017
Accepted: 16 January 2018
Published: 25 January 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s11634-018-0311-8

Keywords

Mathematics Subject Classification

62H30

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A method for selecting the relevant dimensions for high-dimensional classification in singular vector spaces

Abstract

Access this article

Similar content being viewed by others

Maximizing adjusted covariance: new supervised dimension reduction for classification

Feature dimensionality reduction: a review

Survey on SVM and their application in image classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A method for selecting the relevant dimensions for high-dimensional classification in singular vector spaces

Abstract

Access this article

Similar content being viewed by others

Maximizing adjusted covariance: new supervised dimension reduction for classification

Feature dimensionality reduction: a review

Survey on SVM and their application in image classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation