Skip to main content
Log in

A feature selection method using improved regularized linear discriminant analysis

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Investigation of genes, using data analysis and computer-based methods, has gained widespread attention in solving human cancer classification problem. DNA microarray gene expression datasets are readily utilized for this purpose. In this paper, we propose a feature selection method using improved regularized linear discriminant analysis technique to select important genes, crucial for human cancer classification problem. The experiment is conducted on several DNA microarray gene expression datasets and promising results are obtained when compared with several other existing feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. SVM-RFE [15] is a wrapper-based method. It is an iterative method which works backward from an initial set of features. The SVM aims to find maximum margin hyperplane between the two classes to minimize classification error using some kernel function.

  2. Since RLDA or Improved RLDA is a method for solving small sample size (SSS) problem, the value of q has to be in (\(n,d\)).

  3. Most of the datasets are downloaded from the Kent Ridge Bio-medical Dataset (KRBD) (http://datam.i2r.a-star.edu.sg/datasets/krbd/). The datasets are transformed or reformatted and made available by KRBD repository and we have used them without any further preprocessing. Some datasets which are not available on KRBD repository are downloaded and directly used from respective authors’ supplement link. The URL addresses for all the datasets are given in the Reference Section.

  4. The cross-validation-based results are shown in Appendix A. The comparison of improved RLDA with different values of regularization parameter has been shown in Appendix B.

    Table 2 The classification accuracy of various feature selection methods using four distinct classifiers on the SRBCT dataset
    Table 3 The classification accuracy of various feature selection methods using four distinct classifiers on the MLL dataset
    Table 4 The classification accuracy of various feature selection methods using four distinct classifiers on the Acute Leukemia dataset
  5. Note that for all the feature selection methods except Lasso method the number of selected features is 150 (in Tables 2, 3 and 4). The Lasso method itself obtains the optimal number of selected features and therefore cannot be adjusted for a predefined number of selected features.

  6. Ingenuity Pathway Analysis (IPA) (http://www.ingenuity.com) is a software that helps researchers to model, analyze, and understand the complex biological and chemical systems at the core of life science research. IPA has been broadly adopted by the life science research community. IPA helps to understand complex ’omics data at multiple levels by integrating data from a variety of experimental platforms and providing insight into the molecular and chemical interactions, cellular phenotypes, and disease processes of the system. IPA provides insight into the causes of observed gene expression changes and into the predicted downstream biological effects of those changes. Even if the experimental data is not available, IPA can be used to intelligently search the Ingenuity Knowledge Base for information on genes, proteins, chemicals, drugs, and molecular relationships to build biological models or to get up to speed in a relevant area of research. IPA provides the right biological context to facilitate informed decision-making, advance research project design, and generate new testable hypotheses.

References

  1. Anton, H.: Calculus. Wiley, New York (1995)

    Google Scholar 

  2. Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsemeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002). [Data Source1: http://sdmc.lit.org.sg/GEDatasets/Datasets.html] [Data Source2: http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=63]

  3. Banerjee, M., Mitra, S., Banka, H.: Evolutinary-rough feature selection in gene expression data. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 37, 622–632 (2007)

    Article  Google Scholar 

  4. Cong G., Tan K.-L., Tung A.K.H., Xu X.: Mining top-k covering rule groups for gene expression data. In: The ACM SIGMOD International Conference on Management of Data, pp. 670–681 (2005)

  5. Dai, D.Q., Yuen, P.C.: Regularized discriminant analysis and its application to face recognition. Pattern Recognit. 36(3), 845–847 (2003)

    Article  MATH  Google Scholar 

  6. Dai, D.Q., Yuen, P.C.: Face recognition by regularized discriminant analysis. IEEE Trans. SMC 37(4), 1080–1085 (2007)

    Google Scholar 

  7. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 523–529 (2003)

  8. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  9. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discriminant methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  10. Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)

    Article  Google Scholar 

  11. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)

    MATH  Google Scholar 

  12. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Article  Google Scholar 

  13. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999). [Data Source: http://datam.i2r.a-star.edu.sg/datasets/krbd/]

    Google Scholar 

  14. Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 8(1), 86–100 (2007)

    Article  MATH  Google Scholar 

  15. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  16. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, NY (2001)

    Book  MATH  Google Scholar 

  17. Huang, R., Liu, Q., Lu, H., Ma, S.: Solving the small sample size problem of LDA. Proc. ICPR 3, 29–32 (2002)

    Google Scholar 

  18. Huang, Y., Xu, D., Nie, F.: Semi-supervised dimension reduction using trace ratio criterion. IEEE Trans. Neural Netw. Learn. Syst. 23(3), 519–526 (2012)

    Article  Google Scholar 

  19. Huang, Y., Xu, D., Nie, F.: Patch distribution compatible semi-supervised dimension reduction for face and human gait recognition. IEEE Trans. Circuits Syst. Video Technol. 22(3), 479–488 (2012)

    Article  Google Scholar 

  20. Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural network. Nat. Med. 7, 673–679 (2001). [Data Source: http://research.nhgri.nih.gov/microarray/Supplement/]

    Google Scholar 

  21. Li, J., Wong, L.: Using rules to analyse bio-medical data: a comparison between C4.5 and PCL. In: Advances in Web-Age Information Management, pp. 254–265. Springer, Berlin (2003)

  22. Liu, J., Chen, S.C., Tan, X.Y.: Efficient pseudo-inverse linear discriminant analysis and its nonlinear form for face recognition. Int. J. Patt. Recogn. Artif. Intell. 21(8), 1265–1278 (2007)

    Article  Google Scholar 

  23. Nie, F., Huang, H., Cai X., Ding, C.: Efficient and robust feature selection via joint \(l_{2,1} \)-norms minimization, NIPS (2010)

  24. Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18, 546–554 (2002)

    Article  Google Scholar 

  25. Pavlidis, P., Weston, J., Cai, J. and Grundy, W.N.: Gene functional classification from heterogeneous data. In: International Conference on Computational Biology, pp. 249–255 (2001)

  26. Peng, H., Long, F., Dong, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  27. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  28. Sharma, A., Imoto, S., Miyano, S.: A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans. Computat. Biol. Bioinf. 9(3), 754–764 (2012)

    Article  Google Scholar 

  29. Sharma, A., Imoto, S., Miyano, S.: A between-class overlapping filter-based method for transcriptome data analysis. J. Bioinf. Computat. Biol. 10(5), 1250010-1–1250010-20 (2012)

    Google Scholar 

  30. Sharma, A., Imoto, S., Miyano, S., Sharma, V.: Null space based feature selection method for gene expression data. Int. J. Mach. Learn. Cybern. 3(4), 269–276 (2012). doi:10.1007/s13042-011-0061-9

    Article  Google Scholar 

  31. Sharma, A., Koh, C.H., Imoto, S., Miyano, S.: Strategy of finding optimal number of features on gene expression data. IEE. Electron. Lett. 47(8), 480–482 (2011)

    Article  Google Scholar 

  32. Sharma, A., Paliwal, K.K.: Fast principal component analysis using fixed-point algorithm. Pattern Recognit. Lett. 28(10), 1151–1155 (2007)

    Article  Google Scholar 

  33. Sharma, A., Paliwal, K.K.: Rotational linear discriminant analysis for dimensionality reduction. IEEE Trans. Knowl. Data Eng. 20(10), 1336–1347 (2008)

    Article  Google Scholar 

  34. Sharma, A., Paliwal, K.K.: A gradient linear discriminant analysis for small sample sized problem. Neural Process. Lett. 27(1), 17–24 (2008)

    Article  Google Scholar 

  35. Sharma, A., Paliwal, K.K.: A new perspective to null linear discriminant analysis method and its fast implementation using random matrix multiplication with scatter matrices. Pattern Recognit. 45, 2205–2213 (2012)

    Article  MATH  Google Scholar 

  36. Sharma, A., Lyons, J., Dehzangi, A., Paliwal, K.K.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theoret. Biol. 320(7), 41–46 (2013)

    Article  MathSciNet  Google Scholar 

  37. Sharma, A., Paliwal, K.K., Imoto, S., Miyano, S., Sharma, V., Ananthanarayanan, R.: A feature selection method using fixed-point algorithm for DNA microarray gene expression data. Int. J. Knowl. Based Intell. Eng. Syst. (2013, accepted)

  38. Su, Y., Murali, T.M., Pavlovic, V., Kasif, S.: RankGene: identification of diagnostic genes based on expression data, Bioinformatics, pp. 1578–1579 (2003)

  39. Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinf. 2(3 Suppl), S75–83 (2003)

    Google Scholar 

  40. Tao, L., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(14), 2429–2437 (2004)

    Google Scholar 

  41. Thomas, J., Olson, J.M., Tapscott, S.J., Zhao, L.P.: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 11, 1227–1236 (2001)

    Article  Google Scholar 

  42. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  43. Wang, A., Gehan, E.A.: Gene selection for microarray data analysis using principal component analysis. Stat. Med. 24, 2069–2087 (2005)

    Article  MathSciNet  Google Scholar 

  44. Wu, G., Xu, W., Zhang, Y., Wei, Y.: A preconditioned conjugate gradient algorithm fo GeneRank with application to microarray data mining. Data Mining Knowl. Discov. (2011). doi:10.1007/s10618-011-0245-7

    Google Scholar 

  45. Xu, D., Yan, S.: Semi-supervised bilinear subspace learning. IEEE Trans. Image Process. 18(7), 1671–1676 (2009)

    Article  MathSciNet  Google Scholar 

  46. Zhou, L., Wang, L., Shen, C., Barnes, N.: Hippocampal shape classification using redundancy constrained feature selection. Medical Image Computing and Computer-Assisted Intervention, MICCAI 2010. In: Lecture Notes in Computer Science, vol. 6362, pp. 266–273. Springer, Berlin (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alok Sharma.

Appendices

Appendix A

In this section, we use cross-validation procedure to compute average classification accuracy using four distinct classifiers and the proposed feature selection method. Three datasets have been used for this purpose are SRBCT, MLL and Acute Leukemia. The classification accuracy using fold \(k=5\) and fold \(k=10\) are given in Tables 10, 11 and 12. It can be observed that the classification accuracy obtained by \(k\)-fold cross-validation procedure is comparably similar to the classification accuracy obtained in Tables 2-4.

Table 10 \(k\)-fold cross-validation using improved RLDA and four distinct classifiers on the SRBCT dataset
Table 11 \(k\)-fold cross-validation using improved RLDA and four distinct classifiers on the MLL dataset
Table 12 \(k\)-fold cross-validation using improved RLDA and four distinct classifiers on the Acute Leukemia dataset

Appendix B

In this appendix, we compare different values of regularization parameter with the proposed improved RLDA technique. In order to show this, we computed classification accuracy on four different values of \(\alpha \) for RLDA technique. These are \(\delta =[0.001,0.01,0.1,1]\), where \(\alpha =\delta *\lambda _{\mathrm{W}} \) and \(\lambda _{\mathrm{W}} \) is the maximum eigenvalue of within-class scatter matrix. We applied threefold cross-validation procedure on a number of datasets and shown the results in columns 2–5 of Table 11. The last column of the table denotes the classification accuracy using improved RLDA technique (Table 13).

Table 13 Classification accuracy (in percentage) of RLDA and improved RLDA

It can be observed from the table that the different values of the regularization parameter give different classification accuracies and therefore, the choice of the regularization parameter affects the classification performance. Thus, it is important to select the regularization parameter correctly to get the good classification performance. It can be observed that for all the datasets, the proposed technique is exhibiting promising results.

Appendix C

Corollary 1

The value of regularization parameter is non-negative; i.e., \(\alpha \ge 0\) for \(r_w \le r_t \), where \(r_t =\mathrm{rank}({\mathbf{S}}_{\mathrm{T}} )\) and \(r_w =\mathrm{rank}({\mathbf{S}}_{\mathrm{W}} )\).

Proof

From Eq. 2, we can write

$$\begin{aligned} J=\frac{{\mathbf{w}}^\mathrm{T}{\mathbf{S}}_{\mathrm{B}} {\mathbf{w}}}{{\mathbf{w}}^\mathrm{T}({\mathbf{S}}_{\mathrm{W}} +\alpha {\mathbf{I}}){\mathbf{w}}}, \end{aligned}$$
(11)

where \({\mathbf{S}}_{\mathrm{B}} \in {\mathbb {R}}^{r_t \times r_t }\) and \({\mathbf{S}}_{\mathrm{W}} \in {\mathbb {R}}^{r_t \times r_t }\). We can rearrange the above expression as

$$\begin{aligned} {\mathbf{w}}^\mathrm{T}{\mathbf{S}}_\mathrm{B} {\mathbf{w}}=J{\mathbf{w}}^\mathrm{T}({\mathbf{S}}_{\mathrm{W}} +\alpha {\mathbf{I}}){\mathbf{w}} \end{aligned}$$
(12)

The eigenvalue decomposition (EVD) of \({\mathbf{S}}_{\mathrm{W}} \) matrix (assuming \(r_w <r_t )\) can be given as \({\mathbf{S}}_{\mathrm{W}} ={\mathbf{U\Lambda }}^2{\mathbf{U}}^\mathrm{T},\) where \({\mathbf{U}}\in {\mathbb {R}}^{r_t \times r_t }\) is an orthogonal matrix, \({{\varvec{\Lambda }}}^2=\small \left[ {{\begin{array}{l@{\quad }l} {{{\varvec{\Lambda }}}_w^2 } &{} 0 \\ 0 &{} 0 \\ \end{array} }} \right] \in {\mathbb {R}}^{r_t \times r_t }\) and \({{\varvec{\Lambda }}}_w =\mathrm{diag}(q_1^2 ,q_2^2 ,\ldots ,q_{r_w }^2 )\in {\mathbb {R}}^{r_w \times r_w }\) are diagonal matrices (as \(r_w <r_t )\). The eigenvalues \(q_k^2 >0\) for \(k=1,2,\ldots ,r_w \). Therefore,

$$\begin{aligned}&{\mathbf{S}}_{\mathrm{W}}^{\prime } =( {{\mathbf{S}}_{\mathrm{W}} +\alpha {\mathbf{I}}})={\mathbf{UDU}}^\mathrm{T}, \hbox { where } {\mathbf{D}}={{\varvec{\Lambda }}}^2+\alpha {\mathbf{I}}\nonumber \\&\hbox {or}\,{\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}{\mathbf{S}}_{\mathrm{W}}^{\prime } {\mathbf{UD}}^{-1/2}={\mathbf{I}} \end{aligned}$$
(13)

The between class scatter matrix \({\mathbf{S}}_{\mathrm{B}} \) can be transformed by multiplying \({\mathbf{UD}}^{-1/2}\) on the right side and \({\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}\) on the left side of \({\mathbf{S}}_{\mathrm{B}} \) as \({\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}{\mathbf{S}}_{\mathrm{B}} {\mathbf{UD}}^{-1/2}\). The EVD of this matrix will give

$$\begin{aligned} {\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}{\mathbf{S}}_{\mathrm{B}} {\mathbf{UD}}^{-1/2}={\mathbf{ED}}_{\mathrm{B}} {\mathbf{E}}^{\mathbf{T}}, \end{aligned}$$
(14)

where \({\mathbf{E}}\in {\mathbb {R}}^{r_t \times r_t }\) is an orthogonal matrix and \({\mathbf{D}}_{\mathrm{B}} \in {\mathbb {R}}^{r_t \times r_t }\) is a diagonal matrix. Equation 14 can be rearranged as

$$\begin{aligned} {\mathbf{E}}^{\mathbf{T}}{\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}{\mathbf{S}}_{\mathrm{B}} {\mathbf{UD}}^{-1/2}{\mathbf{E}}={\mathbf{D}}_{\mathrm{B}}, \end{aligned}$$
(15)

Let the leading eigenvalue of \({\mathbf{D}}_{\mathrm{B}} \) is \(\gamma \) and its corresponding eigenvector is \({\mathbf{e}}\in {\mathbf{E}}\). Then Eq. 15 can be rewritten as

$$\begin{aligned} {\mathbf{e}}^{\mathbf{T}}{\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}{\mathbf{S}}_{\mathrm{B}} {\mathbf{UD}}^{-1/2}{\mathbf{e}}={\gamma ,} \end{aligned}$$
(16)

The eigenvector \({\mathbf{e}}\) can be multiplied right side and \({\mathbf{e}}^\mathrm{T}\) on left side of Eq. 13, we get

$$\begin{aligned} {\mathbf{e}}^\mathrm{T}{\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}{\mathbf{S}}_{\mathrm{W}}^{\prime } {\mathbf{UD}}^{-1/2}{\mathbf{e}}=1 \end{aligned}$$
(17)

It can be seen from Eqs. 13 and 15 that matrix \({\mathbf{W}}={\mathbf{UD}}^{-1/2}{\mathbf{E}}\) diagonalizes both \({\mathbf{S}}_{\mathrm{B}} \) and \({\mathbf{S}}_{\mathrm{W}}^{\prime } \), simultaneously. Also vector \({\mathbf{w}}={\mathbf{UD}}^{-1/2}{\mathbf{e}}\) simultaneously gives \(\gamma \) and unity eigenvalue in Eqs. 16 and 17. Therefore, \({\mathbf{w}}\) is a solution of Eq. 12. Substituting \({\mathbf{w}}={\mathbf{UD}}^{-1/2}{\mathbf{e}}\) in Eq. 12, we get \(J=\gamma ;\) i.e., \({\mathbf{w}}\) is a solution of Eq. 12.

From Lemma 1, the maximum eigenvalue of expression \(({\mathbf{S}}_{\mathrm{W}} +\alpha {\mathbf{I}})^{-1}{\mathbf{S}}_{\mathrm{B}} {\mathbf{w}}={\gamma }{\mathbf{w}}\) is \(\gamma _m =\lambda _{\mathrm{max}} >0\) (i.e., real, positive and finite). Therefore, the eigenvectors corresponding to this positive \(\gamma _m \) should also be in real hyperplane (i.e., the components of the vector \({\mathbf{w}}\) have to have real values). Since \({\mathbf{w}}={\mathbf{UD}}^{-1/2}{\mathbf{e}}\) with \({\mathbf{w}}\) to be in real hyperplane, we must have \({\mathbf{D}}^{-1/2}\) to be real.

Since \({\mathbf{D}}={{\varvec{\Lambda }}}^2+\alpha {\mathbf{I}}=\mathrm{diag}(q_1^2 +\alpha ,q_2^2 +\alpha ,\ldots ,q_{r_w }^2 +\alpha ,\alpha ,\ldots ,\alpha )\), we have

$$\begin{aligned} {\mathbf{D}}^{-1/2}&= \mathrm{diag}(1/\sqrt{q_1^2 +\alpha } ,1/\sqrt{q_2^2 +\alpha } ,\ldots ,1/\sqrt{q_{r_w }^2 +\alpha },\nonumber \\&\quad 1/\sqrt{\alpha },\ldots ,1/\sqrt{\alpha }). \end{aligned}$$

Therefore, the elements of \({\mathbf{D}}^{-1/2}\), must satisfy \(1/\sqrt{q_k^2 +\alpha } >0\) and \(1/\sqrt{\alpha }>0\) for \(k=1,2,\ldots ,r_w \) (note \(r_w <r_t )\); i.e., \(\alpha \) cannot be negative or \(\alpha >0\). Furthermore, if \(r_w =r_t \) then matrix \({\mathbf{S}}_{\mathrm{W}} \) will be a non-singular matrix and its inverse will exist. In this case, regularization is not required and therefore \(\alpha \!=\!0\). Thus, \(\alpha \!\ge \! 0\) for \(r_w \!\le \! r_t \). This concludes the proof. \(\quad \square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, A., Paliwal, K.K., Imoto, S. et al. A feature selection method using improved regularized linear discriminant analysis. Machine Vision and Applications 25, 775–786 (2014). https://doi.org/10.1007/s00138-013-0577-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-013-0577-y

Keywords

Navigation