Abstract
Microarray data is often characterized by high dimension and small sample size. Gene ranking is one of the most widely explored techniques to reduce the dimension because of its simplicity and computational efficiency. Many ranking methods have been suggested which depict their efficiency dependent upon the problem at hand. We have investigated the performance of six ranking methods on eleven cancer microarray datasets. The performance is evaluated in terms of classification accuracy and number of genes. Experimental results on all dataset show that there is significant variation in classification accuracy which depends on the choice of ranking method and classifier. Empirical results show that Brown Forsythe test statistics and Mutual Information method exhibit high accuracy with few genes whereas Gini Index and Pearson Coefficient perform poorly in most cases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Array. Proc. Nat’l Academy of Science 96(12), 6745–6750 (1999)
Bellman, R.: Adaptive Control Processes. In: A Guided Tour, Princeton University Press, Princeton (1961)
Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., et al.: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795), 536–540 (2000)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees. Chapman and Hall, Boca Raton (1984)
Brown, M.B., Forsythe, A.B.: The small sample behavior of some statistics which test the equality of several means. Technometrics 16, 129–132 (1974)
Cochran, W.G.: Problems arising in the analysis of a series of similar experiments. J. R. Stat. Soc. Ser. C Appl. Stat. 4, 102–118 (1937)
Dechang, C., Zhenqiu, L., Xiaobin, M., Dong, H.: Selecting Genes by Test Statistics. Journal of Biomedicine and Biotechnology 2, 132–138 (2005)
Demsar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
Dowdy, S., Wearden, S.: Statistics for research. Wiley (1983)
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics 11, 86–92 (1940)
Fu, L.M., Liu, C.S.F.: Evaluation of gene importance in microarray data based upon probability of selection. BMC Bioinformatics 6, 67 (2005)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Guyon, I., Elisseff, A.: An Introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Hartung, J., Argac, D., Makambi, K.: Small sample properties of tests on homogeneity in oneway ANOVA and meta-analysis. Statist Papers 43, 197–235 (2002)
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Kohavi, R., John, G.: Wrapper for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Li, T., Zhang, C., Ogihara, M.: Comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)
Neter, J., Kutner, M.H., Nachtsheim, C.J., et al.: Applied Linear Statistical Models, 4th edn. McGraw-Hill, Chicago (1996)
Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M.E., Batchelor, T.T., Black, P.M., von Deimling, A., Pomeroy, S.L., Golub, T.R., Louis, D.N.: Gene expressionbased classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)
Pearson, K.: Notes on the History of Correlation. Biometrika 13(1), 25–45 (1920)
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98(26), 15149–15154 (2001)
Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S.S., Van De Rijn, M., Walthamet, M., et al.: Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines. Nature Genet. 24, 227–235 (2000)
Shah, S., Kusiak, A.: Cancer gene search with data mining and genetic algorithms. Computers in Biology Medicine 37(2), 251–261 (2007)
Shannon, C.E., Weaver, W.: The mathematical theory of Communication. University of Illinois Press, Urbana (1949)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Su, A.I., Welsh, J.B., Sapinoso, L.M., Kern, S.G., Dimitrov, P., Lapp, H., Schultz, P.G., Powell, S.M., Moskaluk, C.A., Frierson, H.F., Hampton, G.M.: Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61(20), 7388–7393 (2001)
Su, Y., Murali, T.M., et al.: RankGene: identification of diagnostic genes based on expression data. Bionformatics 19(12), 1578–1579 (2003)
Welch, B.L.: On the comparison of several mean values: An alternative approach. Biometrika 38, 330–336 (1951)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sardana, M., Kaur, B., Agrawal, R.K. (2013). Performance Evaluation of Ranking Methods for Relevant Gene Selection in Cancer Microarray Datasets. In: Batyrshin, I., González Mendoza, M. (eds) Advances in Artificial Intelligence. MICAI 2012. Lecture Notes in Computer Science(), vol 7629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37807-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-37807-2_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37806-5
Online ISBN: 978-3-642-37807-2
eBook Packages: Computer ScienceComputer Science (R0)