f-Information Measures for Selection of Discriminative Genes from Microarray Data

Maji, Pradipta; Paul, Sushmita

doi:10.1007/978-3-319-05630-2_5

Pradipta Maji³ &
Sushmita Paul³

1419 Accesses

Abstract

Microarray technology is one of the important biotechnological means that allows to record the expression levels of thousands of genes simultaneously within a number of different samples. An important application of microarray gene expression data in functional genomics is to classify samples according to their gene expression profiles. Among the large amount of genes present in microarray gene expression data, only a small fraction of them is effective for performing a certain diagnostic test. In this regard, mutual information has been shown to be successful for selecting a set of relevant and nonredundant genes from microarray data. However, information theory offers many more measures such as the f-information measures that may be suitable for selection of genes from microarray gene expression data. This chapter presents different f-information measures as the evaluation criteria for gene selection problem. The performance of different f-information measures is compared with that of mutual information based on the predictive accuracy of naive Bayes classifier, k-nearest neighbor rule, and support vector machine. An important finding is that some f-information measures are shown to be effective for selecting relevant and nonredundant genes from microarray data. The effectiveness of different f-information measures, along with a comparison with mutual information, is demonstrated on several cancer data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Nat Acad Sci USA 96(12):6745–6750
Article Google Scholar
Baldi P, Long AD (2001) A bayesian framework for the analysis of microarray expression data: regularized \(t\)-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519
Article Google Scholar
Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000) Tissue classification with gene expression profiles. J Comput Biol 7(3/4):559–584
Article Google Scholar
Blanco R, Larranaga P, Inza I, Sierra B (2004) Gene selection for cancer classification using wrapper approaches. Int J Pattern Recognit Artif Intell 18(8):1373–1390
Article Google Scholar
Bø T, Jonassen I (2002) New feature subset selection procedures for classification of expression profiles. Genome Biol 3(4):17
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Das SK (1971) Feature selection with a linear dependence measure. IEEE Trans Comput 20(9):1106–1109
Article Google Scholar
Dash M, Liu H (2000) Unsupervised feature selection. In: Proceedings of Pacific Asia conference on knowledge discovery and data mining, pp 110–121
Google Scholar
Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the international conference on computational systems, Bioinformatics, pp 523–528
Google Scholar
Domany E (2003) Cluster analysis of gene expression data. J Stat Phys 110(3–6):1117–1139
Article MATH Google Scholar
Duda RO, Hart PE, Stork DG (1999) Pattern classification and scene analysis. Wiley, New York
Google Scholar
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87
Article MATH MathSciNet Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Article MATH MathSciNet Google Scholar
Fox R, Dimmic M (2006) A two-sample Bayesian \(t\)-test for microarray data. BMC Bioinformatics 7(1):126
Article Google Scholar
Gevaert O, Smet FD, Timmerman D, Moreau Y, Moor BD (2006) Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 22(14):e184–e190
Article Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Article Google Scholar
Gruzdz A, Ihnatowicz A, Slezak D (2006) Interactive gene clustering—a case study of breast cancer microarray data. Inf Syst Front 8:21–27
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Article MATH Google Scholar
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th international conference on machine learning, pp 359–366
Google Scholar
Heydorn RP (1971) Redundancy in feature extraction. IEEE Trans Comput 20(9):1051–1054
Article MATH Google Scholar
Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9(11):1106–1115
Article Google Scholar
Hu Q, Pan W, An S, Ma P, Wei J (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cybern 1(1–4):63–74
Article Google Scholar
Inza I, Larranaga P, Blanco R, Cerrolaza AJ (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31(2):91–103
Article Google Scholar
Jafari P, Azuaje F (2006) An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med Inform Decis Mak 6(1):27
Article Google Scholar
Jain AK, Dubes RC (1988) Algorithms clustering data. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386
Article Google Scholar
Jiang H, Deng Y, Chen HS, Tao L, Sha Q, Chen J, Tsai CJ, Zhang S (2004) Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics 5(1):81
Article Google Scholar
Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics 6(1):148
Article Google Scholar
Kiranagi BB, Guru DS, Ichino M (2007) Exploitation of multivalued type proximity for symbolic feature selection. In: Proceedings of the international conference on computing: theory and applications, 2007
Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Article MATH Google Scholar
Kononenko I, Simec E, Sikonja MR (1997) Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl Intell 7:39–55
Article Google Scholar
Lee JW, Lee JB, Park M, Song SH (2005) An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal 48(4):869–885
Article MATH MathSciNet Google Scholar
Li J, Su H, Chen H, Futscher BW (2007) Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans Inf Technol Biomed 11(4):398–405
Article Google Scholar
Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12):1131–1142
Article Google Scholar
Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437
Article Google Scholar
Liao JG, Chin KV (2007) Logistic regression for disease classification using microarray data: model selection in a large \(p\) and small \(n\) case. Bioinformatics 23(15):1945–1951
Article Google Scholar
Liu Q, Sung A, Chen Z, Liu J, Chen L, Qiao M, Wang Z, Huang X, Deng Y (2011) Gene selection and classification for cancer microarray data based on machine learning and similarity measures. BMC Genomics 12(Suppl 5):S1
Article Google Scholar
Liu X, Krishnan A, Mondry A (2005) An entropy based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6(1):76
Article Google Scholar
Loennstedt I, Speed TP (2002) Replicated microarray data. Statistica Sinica 12:31–46
MATH MathSciNet Google Scholar
Lyons-Weiler J, Patel S, Becich M, Godfrey T (2004) Tests for finding complex patterns of differential expression in cancers: towards individualized medicine. BMC Bioinformatics 5(1):110
Article Google Scholar
Ma S, Huang J (2005) Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 21(24):4356–4362
Article Google Scholar
Maji P (2009) \(f\)-information measures for efficient selection of discriminative genes from microarray data. IEEE Trans Biomed Eng 56(4):1063–1069
Article MathSciNet Google Scholar
Maji P, Pal SK (2010) Fuzzy-rough sets for information measures and selection of relevant genes from microarray data. IEEE Trans Syst Man Cybern B Cybern 40(3):741–752
Article Google Scholar
Mamitsuka H (2006) Selecting features in microarray classification using ROC curves. Pattern Recognit 39(12):2393–2404
Article MATH Google Scholar
McLachlan GJ, Do KA, Ambroise C (2004) Analyzing microarray gene expression data. Wiley, Hoboken
Book MATH Google Scholar
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Article Google Scholar
Miyano S, Imoto S, Sharma A (2012) A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans Comput Biol Bioinf 9(3):754–764
Article Google Scholar
Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol 8(1):37–52
Article Google Scholar
Ooi CH, Tan P (2003) Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1):37–44
Article Google Scholar
Pan W (2003) On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics 19(11):1333–1340
Article Google Scholar
Pang H, George SL, Hui K, Tong T (2012) Gene selection using iterative feature elimination random forests for survival outcomes. IEEE/ACM Trans Comput Biol Bioinf 9(5):1422–1431
Article Google Scholar
Park PJ, Pagano M, Bonetti M (2001) A nonparametric scoring algorithm for identifying informative genes from microarray data. In: Proceedings of Pacific symposium on biocomputing, pp. 52–63
Google Scholar
Pavlidis P, Poirazi P (2006) Individualized markers optimize class prediction of microarray data. BMC Bioinformatics 7(1):345
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Pluim JPW, Maintz JBA, Viergever MA (2004) \(f\)-information measures in medical image registration. IEEE Trans Med Imaging 23(12):1508–1516
Article Google Scholar
Ruiz R, Riquelme JC, Ruiz JSA (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit 39(12):2383–2392
Article Google Scholar
Saeys Y, Inza I, Larraaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article Google Scholar
Shah M, Marchand M, Corbeil J (2012) Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans Pattern Anal Mach Intell 34(1):174–186
Article Google Scholar
Sharma A, Imoto S, Miyano S, Sharma V (2012) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern 3(4):269–276
Article Google Scholar
Slavkov I, Gjorgjioski V, Struyf J, Deroski S (2010) Finding explained groups of time-course gene expression profiles with predictive clustering trees. Mol BioSyst 6:729–740
Article Google Scholar
Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643
Article Google Scholar
Thomas JG, Olson JM, Tapscott SJ, Zhao LP (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11(7):1227–1236
Article Google Scholar
Tusher V, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Nat Acad Sci USA 98:5116–5121
Article MATH Google Scholar
Uriarte RD, de Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1):3
Article Google Scholar
Vajda I (1989) Theory of statistical inference and information. Kluwer Academic, Dordrecht
MATH Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KFX, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29(1):37–46
Article MATH Google Scholar
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Nat Acad Sci USA 98(20):11462–11467
Google Scholar
Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 18th international conference on machine learning, pp 601–608
Google Scholar
Xiong M, Fang X, Zhao J (2001) Biomarker identification by feature wrappers. Genome Res 11(11):1878–1887
Google Scholar
Yang F, Mao KZ (2011) Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans Comput Biol Bioinf 8(4):1080–1092
Article Google Scholar
Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2):133–143
Article Google Scholar
Yeung K, Bumgarner R (2003) Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol 4(12):R83
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indian Statistical Institute, Kolkata, West Bengal, India
Pradipta Maji & Sushmita Paul

Authors

Pradipta Maji
View author publications
You can also search for this author in PubMed Google Scholar
Sushmita Paul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pradipta Maji .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Maji, P., Paul, S. (2014). f-Information Measures for Selection of Discriminative Genes from Microarray Data. In: Scalable Pattern Recognition Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-05630-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-05630-2_5
Published: 20 March 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05629-6
Online ISBN: 978-3-319-05630-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics