Abstract
Dimensionality reduction is an important aspect in the pattern classification literature, and linear discriminant analysis (LDA) is one of the most widely studied dimensionality reduction technique. The application of variants of LDA technique for solving small sample size (SSS) problem can be found in many research areas e.g. face recognition, bioinformatics, text recognition, etc. The improvement of the performance of variants of LDA technique has great potential in various fields of research. In this paper, we present an overview of these methods. We covered the type, characteristics and taxonomy of these methods which can overcome SSS problem. We have also highlighted some important datasets and software/packages.
Similar content being viewed by others
Notes
These four spaces can also be represented in Fig. 2 without performing a preprocessing step. In that case, r t in the figure will be replaced by the dimensionality d and the size of the spaces will change accordingly.
For this experiment, first we project the original feature vectors onto the range space of \({\mathbf{S}}_{T}\) matrix as a pre-processing step. Then all the spaces are utilized individually to do dimensionality reduction and to classify a test feature vector, the nearest neighbor classifier is used. To obtain performance in terms of average classification accuracy, \(k\)-fold cross-validation process has been applied, where k = 5. The details of the datasets have been given later in Sect. 10.1.
For more datasets on face see Ralph Gross [19], Zhao et al. [70] and http://www.face-rec.org/databases/. For bio-medical data see Kent Ridge Bio-medical Repository (http://datam.i2r.a-star.edu.sg/datasets/krbd/).
References
Aas K, Eikvil L (1999) Text categorization: a survey. Norwegian Computing Center Report NR 941
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96:6745–6750
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsemeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47
Beer DG, Kardia SLR, Huang C–C, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JMG, Iannettoni MD, Orringer MB, Hanash S (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824
Belhumeur PN, Hespanhaand JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Machine Intell 19(7):711–720
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma sub-classes. PNAS 98(24):13790–13795
Blake CL, Merz CJ (1998) UCI repository of machine learning databases, Irvine, CA, University of Calif., Dept. of Information and Comp. Sci. http://www.ics.uci.edu/_mlearn
Cevikalp H, Neamtu M, Wilkes MA, Barkana A (2005) Discriminative common vectors for face recognition. IEEE Trans Pattern Anal Machine Intell 27(1):4–13
Chen L-F, Liao H-YM, Ko M-T, Lin J-C, Yu G-J (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recogn 33:1713–1726
Chu D, Thye GS (2010) A new and fast implementation for null space based linear discriminant analysis. Pattern Recogn 43:1373–1379
Cui X, Zhao H, Wilson J (2010) Optimized ranking and selection methods for feature selection with application in microarray experiments. J Biopharm Stat 20(2):223–239
Dai DQ, Yuen PC (2007) Face recognition by regularized discriminant analysis. IEEE Trans SMC Part B 37(4):1080–1085
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87
Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84(405):165–175
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press Inc., Hartcourt Brace Jovanovich, San Diego, CA
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Gordon GJ, Jensen RV, Hsiao L–L, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
Gross R (2005) Face databases. In: Handbook of face recognition. Springer, New York, pp 301–327
Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat 23:73–102
Huang R, Liu Q, Lu H, Ma S (2002) Solving the small sample size problem of LDA. Proc ICPR 3(2002):29–32
Jiang X, Mandal B, Kot A (2008) Eigenfeature regularization and extraction in face recognition. IEEE Trans Pattern Anal Machine Intell 30(3):383–394
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural network. Nat Med 7:673–679
Lewis DD (1999) Reuters-21578 text categorization test collection distribution 1.0. http://www.daviddlewis.com/resources/testcollections/reuters21578/
Li H, Zhang K, Jiang T (2005) Robust and accurate cancer classification with gene expression profiling. In: Proceedings of IEEE Comput. Syst. Bioinform. Conf., pp 310–321
Li H, Jiang T, Zhang K (2003) Efficient and robust feature extraction by maximum margin criterion. In: Advances in neural information processing systems
Liu J, Chen SC, Tan XY (2007) Efficient pseudo-inverse linear discriminant analysis and its nonlinear form for face recognition. Int J Pattern Recogn Artif Intell 21(8):1265–1278
Lu J, Plataniotis K, Venetsanopoulos A (2003) Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans Neural Netw 14(1):117–126
Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Regularized discriminant analysis for the small sample. Pattern Recogn Lett 24:3079–3087
Lu J, Plataniotis KN, Venetsanopoulos AN (2005) Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn Lett 26(2):181–191
Mak MW, Kung SY (2006) A solution to the curse of dimensionality problem in pairwise scoring techniques. In: Int. Conf. on Neural Info. Process. (ICONIP’06), pp 314–323
Martinez AM (2002) Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans Pattern Anal Machine Intell 24(6):748–763
Moghaddam B, Weiss Y, Avidan S (2006) Generalized spectral bounds for sparse LDA. In: Int. Conf. Mach. Learn., ICML’06, pp 641–648
Paliwal KK, Sharma A (2012) Improved pseudoinverse linear discriminant analysis method for dimensionality reduction. Int J Pattern Recogn Artif Intell 26(1):1250002-1–1250002-9
Paliwal KK, Sharma A (2011) Approximate LDA technique for dimensionality reduction in the small sample size case. J Pattern Recogn Res 6(2):298–306
Paliwal KK, Sharma A (2010) Improved direct LDA and its application to DNA microarray gene expression data. Pattern Recogn Lett 31:2489–2492
Petricoin EF III, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg MS, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577
Phillips PJ, Moon H, Rauss PJ, Rizvi S (2000) The FERET evaluation methodology for face recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104
Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Gene expression-based classification and outcome prediction of central nervous system embryonal tumors. Nature 415:436–442
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C-H, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98(26):15149–15154
Samaria F, Harter A (1994) Parameterization of a stochastic model for human face identification. In: Proceedings of the Second IEEE Workshop Appl. of Comp. Vis., pp 138–142
Sanderson C, Paliwal KK (2003) Fast features for face authentication under illumination direction changes. Pattern Recogn Lett 24:2409–2419
Sharma A, Paliwal KK (2006) Class-dependent PCA, LDA and MDC: a combined classifier for pattern classification. Pattern Recogn 39(7):1215–1229
Sharma A, Paliwal KK (2007) Fast principal component analysis using fixed-point algorithm. Pattern Recogn Lett 28(10):1151–1155
Sharma A, Paliwal KK (2008) Cancer classification by gradient LDA technique using microarray gene expression data. Data Knowl Eng 66(2):338–347
Sharma A, Paliwal KK (2008) Rotational linear discriminant analysis technique for dimensionality reduction. IEEE Trans Knowl Data Eng 20(10):1336–1347
Sharma A, Paliwal KK (2010) Regularisation of eigenfeatures by extrapolation of scatter-matrix in face-recognition problem. Electron Lett IEEE 46(10):450–475
Sharma A, Imoto S, Miyano S, Sharma V (2011) Null space based feature selection method for gene expression data. Int J Mach Learn Cybernet. doi:10.1007/s13042-011-0061-9
Sharma A, Paliwal KK (2012) A new perspective to null linear discriminant analysis method and its fast implementation using random matrix multiplication with scatter matrices. Pattern Recogn 45:2205–2213
Sharma A, Paliwal KK (2012) A two-stage linear discriminant analysis for face-recognition. Pattern Recogn Lett 33:1157–1162
Sharma A, Imoto S, Miyano S (2012) A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans Comput Biol Bioinf 9(3):754–764
Sharma A, Imoto S, Miyano S (2012) A between-class overlapping filter-based method for transcriptome data analysis. J Bioinf Comput Biol 10(5):1250010-1–1250010-20
Sharma A, Imoto S, Miyano S (2012) A filter based feature selection algorithm using null space of covariance matrix for DNA microarray gene expression data. Curr Bioinf 7(3):6
Sharma A, Paliwal KK, Imoto S, Miyano S (2013) A feature selection method using improved regularized linear discriminant analysis. Mach Vis Appl. doi:10.1007/s00138-013-0577-y
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209
Song F, Zhang D, Wang J, Liu H, Tao Q (2007) A parameterized direct LDA and its application to face recognition. Neurocomputing 71:191–196
Swets DL, Weng J (1996) Using discriminative eigenfeatures for image retrieval. IEEE Trans Pattern Anal Mach Intell 18(8):831–836
Thomaz CE, Kitani EC, Gillies DF (2005) A maximum uncertainty LDA-based approach for limited sample size problems with application to face recognition. In: Proceedings of 18th Brazilian Symp. On Computer Graphics and Image Processing, (IEEE CS Press), pp 89–96
Tian Q, Barbero M, Gu ZH, Lee SH (1986) Image classification by the Foley-Sammon transform. Opt Eng 25(7):834–840
van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AMH, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Lett Nat Nat 415:530–536
Witten IH, Frank E (2000) Data mining: practical machine learning tools with java implementations. Morgan Kaufmann, San Francisco. http:/www.cs.waikato.ac.nz/ml/weka/
Ye J (2005) Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J Mach Learn Res 6:483–502
Ye J, Janardan R, Li Q, Park H (2004) Feature extraction via generalized uncorrelated linear discriminant analysis. In: The Twenty-First International Conference on Machine Learning, pp 895–902
Ye J, Li Q (2005) A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941
Ye J, Xiong T (2006) Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. J Mach Learn Res 7:1183–1204
Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer 1(2):133–143
Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data-with application to face recognition. Pattern Recogn 34:2067–2070
Zhao W, Chellappa R, Krishnaswamy A (1998) Discriminant analysis of principal components for face recognition. In: Proceedings of Thir Int. Conf. on Automatic Face and Gesture Recognition, Nara, Japan, pp 336–341
Zhao W, Chellappa R, Phillips PJ (1999) Subspace linear discriminant analysis for face recognition, Technical Report CAR-TR-914, CS-TR-4009. University of Maryland at College Park, USA
Zhao W, Chellappa R, Phillips PJ (2003) Face recognition: a literature survey. ACM Comput Surv 35(4):399–458
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sharma, A., Paliwal, K.K. Linear discriminant analysis for the small sample size problem: an overview. Int. J. Mach. Learn. & Cyber. 6, 443–454 (2015). https://doi.org/10.1007/s13042-013-0226-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-013-0226-9