Abstract
Investigation of genes, using data analysis and computer-based methods, has gained widespread attention in solving human cancer classification problem. DNA microarray gene expression datasets are readily utilized for this purpose. In this paper, we propose a feature selection method using improved regularized linear discriminant analysis technique to select important genes, crucial for human cancer classification problem. The experiment is conducted on several DNA microarray gene expression datasets and promising results are obtained when compared with several other existing feature selection methods.
Similar content being viewed by others
Notes
SVM-RFE [15] is a wrapper-based method. It is an iterative method which works backward from an initial set of features. The SVM aims to find maximum margin hyperplane between the two classes to minimize classification error using some kernel function.
Since RLDA or Improved RLDA is a method for solving small sample size (SSS) problem, the value of q has to be in (\(n,d\)).
Most of the datasets are downloaded from the Kent Ridge Bio-medical Dataset (KRBD) (http://datam.i2r.a-star.edu.sg/datasets/krbd/). The datasets are transformed or reformatted and made available by KRBD repository and we have used them without any further preprocessing. Some datasets which are not available on KRBD repository are downloaded and directly used from respective authors’ supplement link. The URL addresses for all the datasets are given in the Reference Section.
The cross-validation-based results are shown in Appendix A. The comparison of improved RLDA with different values of regularization parameter has been shown in Appendix B.
Ingenuity Pathway Analysis (IPA) (http://www.ingenuity.com) is a software that helps researchers to model, analyze, and understand the complex biological and chemical systems at the core of life science research. IPA has been broadly adopted by the life science research community. IPA helps to understand complex ’omics data at multiple levels by integrating data from a variety of experimental platforms and providing insight into the molecular and chemical interactions, cellular phenotypes, and disease processes of the system. IPA provides insight into the causes of observed gene expression changes and into the predicted downstream biological effects of those changes. Even if the experimental data is not available, IPA can be used to intelligently search the Ingenuity Knowledge Base for information on genes, proteins, chemicals, drugs, and molecular relationships to build biological models or to get up to speed in a relevant area of research. IPA provides the right biological context to facilitate informed decision-making, advance research project design, and generate new testable hypotheses.
References
Anton, H.: Calculus. Wiley, New York (1995)
Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsemeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002). [Data Source1: http://sdmc.lit.org.sg/GEDatasets/Datasets.html] [Data Source2: http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=63]
Banerjee, M., Mitra, S., Banka, H.: Evolutinary-rough feature selection in gene expression data. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 37, 622–632 (2007)
Cong G., Tan K.-L., Tung A.K.H., Xu X.: Mining top-k covering rule groups for gene expression data. In: The ACM SIGMOD International Conference on Management of Data, pp. 670–681 (2005)
Dai, D.Q., Yuen, P.C.: Regularized discriminant analysis and its application to face recognition. Pattern Recognit. 36(3), 845–847 (2003)
Dai, D.Q., Yuen, P.C.: Face recognition by regularized discriminant analysis. IEEE Trans. SMC 37(4), 1080–1085 (2007)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 523–529 (2003)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discriminant methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002)
Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999). [Data Source: http://datam.i2r.a-star.edu.sg/datasets/krbd/]
Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 8(1), 86–100 (2007)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, NY (2001)
Huang, R., Liu, Q., Lu, H., Ma, S.: Solving the small sample size problem of LDA. Proc. ICPR 3, 29–32 (2002)
Huang, Y., Xu, D., Nie, F.: Semi-supervised dimension reduction using trace ratio criterion. IEEE Trans. Neural Netw. Learn. Syst. 23(3), 519–526 (2012)
Huang, Y., Xu, D., Nie, F.: Patch distribution compatible semi-supervised dimension reduction for face and human gait recognition. IEEE Trans. Circuits Syst. Video Technol. 22(3), 479–488 (2012)
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural network. Nat. Med. 7, 673–679 (2001). [Data Source: http://research.nhgri.nih.gov/microarray/Supplement/]
Li, J., Wong, L.: Using rules to analyse bio-medical data: a comparison between C4.5 and PCL. In: Advances in Web-Age Information Management, pp. 254–265. Springer, Berlin (2003)
Liu, J., Chen, S.C., Tan, X.Y.: Efficient pseudo-inverse linear discriminant analysis and its nonlinear form for face recognition. Int. J. Patt. Recogn. Artif. Intell. 21(8), 1265–1278 (2007)
Nie, F., Huang, H., Cai X., Ding, C.: Efficient and robust feature selection via joint \(l_{2,1} \)-norms minimization, NIPS (2010)
Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18, 546–554 (2002)
Pavlidis, P., Weston, J., Cai, J. and Grundy, W.N.: Gene functional classification from heterogeneous data. In: International Conference on Computational Biology, pp. 249–255 (2001)
Peng, H., Long, F., Dong, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Sharma, A., Imoto, S., Miyano, S.: A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans. Computat. Biol. Bioinf. 9(3), 754–764 (2012)
Sharma, A., Imoto, S., Miyano, S.: A between-class overlapping filter-based method for transcriptome data analysis. J. Bioinf. Computat. Biol. 10(5), 1250010-1–1250010-20 (2012)
Sharma, A., Imoto, S., Miyano, S., Sharma, V.: Null space based feature selection method for gene expression data. Int. J. Mach. Learn. Cybern. 3(4), 269–276 (2012). doi:10.1007/s13042-011-0061-9
Sharma, A., Koh, C.H., Imoto, S., Miyano, S.: Strategy of finding optimal number of features on gene expression data. IEE. Electron. Lett. 47(8), 480–482 (2011)
Sharma, A., Paliwal, K.K.: Fast principal component analysis using fixed-point algorithm. Pattern Recognit. Lett. 28(10), 1151–1155 (2007)
Sharma, A., Paliwal, K.K.: Rotational linear discriminant analysis for dimensionality reduction. IEEE Trans. Knowl. Data Eng. 20(10), 1336–1347 (2008)
Sharma, A., Paliwal, K.K.: A gradient linear discriminant analysis for small sample sized problem. Neural Process. Lett. 27(1), 17–24 (2008)
Sharma, A., Paliwal, K.K.: A new perspective to null linear discriminant analysis method and its fast implementation using random matrix multiplication with scatter matrices. Pattern Recognit. 45, 2205–2213 (2012)
Sharma, A., Lyons, J., Dehzangi, A., Paliwal, K.K.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theoret. Biol. 320(7), 41–46 (2013)
Sharma, A., Paliwal, K.K., Imoto, S., Miyano, S., Sharma, V., Ananthanarayanan, R.: A feature selection method using fixed-point algorithm for DNA microarray gene expression data. Int. J. Knowl. Based Intell. Eng. Syst. (2013, accepted)
Su, Y., Murali, T.M., Pavlovic, V., Kasif, S.: RankGene: identification of diagnostic genes based on expression data, Bioinformatics, pp. 1578–1579 (2003)
Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinf. 2(3 Suppl), S75–83 (2003)
Tao, L., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(14), 2429–2437 (2004)
Thomas, J., Olson, J.M., Tapscott, S.J., Zhao, L.P.: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 11, 1227–1236 (2001)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)
Wang, A., Gehan, E.A.: Gene selection for microarray data analysis using principal component analysis. Stat. Med. 24, 2069–2087 (2005)
Wu, G., Xu, W., Zhang, Y., Wei, Y.: A preconditioned conjugate gradient algorithm fo GeneRank with application to microarray data mining. Data Mining Knowl. Discov. (2011). doi:10.1007/s10618-011-0245-7
Xu, D., Yan, S.: Semi-supervised bilinear subspace learning. IEEE Trans. Image Process. 18(7), 1671–1676 (2009)
Zhou, L., Wang, L., Shen, C., Barnes, N.: Hippocampal shape classification using redundancy constrained feature selection. Medical Image Computing and Computer-Assisted Intervention, MICCAI 2010. In: Lecture Notes in Computer Science, vol. 6362, pp. 266–273. Springer, Berlin (2010)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
In this section, we use cross-validation procedure to compute average classification accuracy using four distinct classifiers and the proposed feature selection method. Three datasets have been used for this purpose are SRBCT, MLL and Acute Leukemia. The classification accuracy using fold \(k=5\) and fold \(k=10\) are given in Tables 10, 11 and 12. It can be observed that the classification accuracy obtained by \(k\)-fold cross-validation procedure is comparably similar to the classification accuracy obtained in Tables 2-4.
Appendix B
In this appendix, we compare different values of regularization parameter with the proposed improved RLDA technique. In order to show this, we computed classification accuracy on four different values of \(\alpha \) for RLDA technique. These are \(\delta =[0.001,0.01,0.1,1]\), where \(\alpha =\delta *\lambda _{\mathrm{W}} \) and \(\lambda _{\mathrm{W}} \) is the maximum eigenvalue of within-class scatter matrix. We applied threefold cross-validation procedure on a number of datasets and shown the results in columns 2–5 of Table 11. The last column of the table denotes the classification accuracy using improved RLDA technique (Table 13).
It can be observed from the table that the different values of the regularization parameter give different classification accuracies and therefore, the choice of the regularization parameter affects the classification performance. Thus, it is important to select the regularization parameter correctly to get the good classification performance. It can be observed that for all the datasets, the proposed technique is exhibiting promising results.
Appendix C
Corollary 1
The value of regularization parameter is non-negative; i.e., \(\alpha \ge 0\) for \(r_w \le r_t \), where \(r_t =\mathrm{rank}({\mathbf{S}}_{\mathrm{T}} )\) and \(r_w =\mathrm{rank}({\mathbf{S}}_{\mathrm{W}} )\).
Proof
From Eq. 2, we can write
where \({\mathbf{S}}_{\mathrm{B}} \in {\mathbb {R}}^{r_t \times r_t }\) and \({\mathbf{S}}_{\mathrm{W}} \in {\mathbb {R}}^{r_t \times r_t }\). We can rearrange the above expression as
The eigenvalue decomposition (EVD) of \({\mathbf{S}}_{\mathrm{W}} \) matrix (assuming \(r_w <r_t )\) can be given as \({\mathbf{S}}_{\mathrm{W}} ={\mathbf{U\Lambda }}^2{\mathbf{U}}^\mathrm{T},\) where \({\mathbf{U}}\in {\mathbb {R}}^{r_t \times r_t }\) is an orthogonal matrix, \({{\varvec{\Lambda }}}^2=\small \left[ {{\begin{array}{l@{\quad }l} {{{\varvec{\Lambda }}}_w^2 } &{} 0 \\ 0 &{} 0 \\ \end{array} }} \right] \in {\mathbb {R}}^{r_t \times r_t }\) and \({{\varvec{\Lambda }}}_w =\mathrm{diag}(q_1^2 ,q_2^2 ,\ldots ,q_{r_w }^2 )\in {\mathbb {R}}^{r_w \times r_w }\) are diagonal matrices (as \(r_w <r_t )\). The eigenvalues \(q_k^2 >0\) for \(k=1,2,\ldots ,r_w \). Therefore,
The between class scatter matrix \({\mathbf{S}}_{\mathrm{B}} \) can be transformed by multiplying \({\mathbf{UD}}^{-1/2}\) on the right side and \({\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}\) on the left side of \({\mathbf{S}}_{\mathrm{B}} \) as \({\mathbf{D}}^{-1/2}{\mathbf{U}}^\mathrm{T}{\mathbf{S}}_{\mathrm{B}} {\mathbf{UD}}^{-1/2}\). The EVD of this matrix will give
where \({\mathbf{E}}\in {\mathbb {R}}^{r_t \times r_t }\) is an orthogonal matrix and \({\mathbf{D}}_{\mathrm{B}} \in {\mathbb {R}}^{r_t \times r_t }\) is a diagonal matrix. Equation 14 can be rearranged as
Let the leading eigenvalue of \({\mathbf{D}}_{\mathrm{B}} \) is \(\gamma \) and its corresponding eigenvector is \({\mathbf{e}}\in {\mathbf{E}}\). Then Eq. 15 can be rewritten as
The eigenvector \({\mathbf{e}}\) can be multiplied right side and \({\mathbf{e}}^\mathrm{T}\) on left side of Eq. 13, we get
It can be seen from Eqs. 13 and 15 that matrix \({\mathbf{W}}={\mathbf{UD}}^{-1/2}{\mathbf{E}}\) diagonalizes both \({\mathbf{S}}_{\mathrm{B}} \) and \({\mathbf{S}}_{\mathrm{W}}^{\prime } \), simultaneously. Also vector \({\mathbf{w}}={\mathbf{UD}}^{-1/2}{\mathbf{e}}\) simultaneously gives \(\gamma \) and unity eigenvalue in Eqs. 16 and 17. Therefore, \({\mathbf{w}}\) is a solution of Eq. 12. Substituting \({\mathbf{w}}={\mathbf{UD}}^{-1/2}{\mathbf{e}}\) in Eq. 12, we get \(J=\gamma ;\) i.e., \({\mathbf{w}}\) is a solution of Eq. 12.
From Lemma 1, the maximum eigenvalue of expression \(({\mathbf{S}}_{\mathrm{W}} +\alpha {\mathbf{I}})^{-1}{\mathbf{S}}_{\mathrm{B}} {\mathbf{w}}={\gamma }{\mathbf{w}}\) is \(\gamma _m =\lambda _{\mathrm{max}} >0\) (i.e., real, positive and finite). Therefore, the eigenvectors corresponding to this positive \(\gamma _m \) should also be in real hyperplane (i.e., the components of the vector \({\mathbf{w}}\) have to have real values). Since \({\mathbf{w}}={\mathbf{UD}}^{-1/2}{\mathbf{e}}\) with \({\mathbf{w}}\) to be in real hyperplane, we must have \({\mathbf{D}}^{-1/2}\) to be real.
Since \({\mathbf{D}}={{\varvec{\Lambda }}}^2+\alpha {\mathbf{I}}=\mathrm{diag}(q_1^2 +\alpha ,q_2^2 +\alpha ,\ldots ,q_{r_w }^2 +\alpha ,\alpha ,\ldots ,\alpha )\), we have
Therefore, the elements of \({\mathbf{D}}^{-1/2}\), must satisfy \(1/\sqrt{q_k^2 +\alpha } >0\) and \(1/\sqrt{\alpha }>0\) for \(k=1,2,\ldots ,r_w \) (note \(r_w <r_t )\); i.e., \(\alpha \) cannot be negative or \(\alpha >0\). Furthermore, if \(r_w =r_t \) then matrix \({\mathbf{S}}_{\mathrm{W}} \) will be a non-singular matrix and its inverse will exist. In this case, regularization is not required and therefore \(\alpha \!=\!0\). Thus, \(\alpha \!\ge \! 0\) for \(r_w \!\le \! r_t \). This concludes the proof. \(\quad \square \)
Rights and permissions
About this article
Cite this article
Sharma, A., Paliwal, K.K., Imoto, S. et al. A feature selection method using improved regularized linear discriminant analysis. Machine Vision and Applications 25, 775–786 (2014). https://doi.org/10.1007/s00138-013-0577-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-013-0577-y