Correlation-Based Relevancy and Redundancy Measures for Efficient Gene Selection

Mao, Kezhi Z.; Tang, Wenyin

doi:10.1007/978-3-540-75286-8_23

Kezhi Z. Mao¹ &
Wenyin Tang¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4774))

Included in the following conference series:

IAPR International Workshop on Pattern Recognition in Bioinformatics

1076 Accesses
1 Citations

Abstract

The gene-label correlation provides an effective measure of the relevancy of a gene. However, this measure evaluates genes on an individual basis, and the gene sets thus obtained may exhibit severe redundancy. In this study, we propose a new correlation heuristic for set-based gene selection, with the goal of alleviating the redundancy problem. The new correlation heuristic consists of two components that account for gene relevancy and redundancy respectively. The relevancy of a gene is evaluated in terms of its correlation with class label on an individual basis, while the redundancy of a gene with respect to a given gene subset is measured by its correlation with a new dimension built upon the gene subset. The new correlation heuristic retains the simplicity of individual gene evaluation and the redundancy handling capacity of set-based gene evaluation. Two different ways of using the relevancy and redundancy measures are presented in this study. One way is the maximization of the ratio of relevancy measure to redundancy measure, and another way is the maximization of the relevancy measure subtracting redundancy measure. Experimental studies on six gene expression problems show that both criteria produce excellent results.

Download to read the full chapter text

Chapter PDF

Feature Selection in Gene Expression Profile Employing Relevancy and Redundancy Measures and Binary Whale Optimization Algorithm (BWOA)

Gene Selection Based on Supervised Vector Representation of Genes

Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Braga-Neto, U., Dougherty, E.R.: Bolstered error estimation. Pattern Recognition 37(6), 1267–1281 (2004a)
Article MATH Google Scholar
Braga-Neto, U.M., Dougherty, E.R.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004b)
Article Google Scholar
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of 2nd IEEE Computer Society Bioinformatics Conference. IEEE Computer Society Press, Los Alamitos (2003a)
Google Scholar
Dudoit, S., Fridyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)
Article MATH MathSciNet Google Scholar
Efron, B., Tibshirani, R.: Improvements on cross-validation: the.632+ bootstrap method. Journal of the American Statistical Association 92(438), 548–560 (1997)
Article MATH MathSciNet Google Scholar
Fan, L., Yang, Y.: Analysis of recursive gene selection approaches from microarray data. Bioinformatics 21(19), 3741–3747 (2005)
Article MathSciNet Google Scholar
Furlanello, C., Serafini, M., Merler, S., Jurman, G.: Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics 4(54) (2003)
Google Scholar
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Gordon, G.J., Jensen, R.V., Hsiao, L.-L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62 (2002)
Google Scholar
Guan, Z., Zhao, H.: A semiparametric approach for marker gene selection based on gene expression data. Bioinformatics 21(4), 529–536 (2005)
Article Google Scholar
Gui, J., Li, H.: Penalized cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21(13), 3001–3008 (2005)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Article MATH Google Scholar
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of Seventeenth International Conference on Machine Learning, San Francisco, CA, USA (2000)
Google Scholar
Li, Y., Campbell, C., Tipping, M.: Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics 18(10), 1332–1339 (2002)
Article Google Scholar
Liu, X., Krishnan, A., Mondry, A.: Entropy-based gene selection for cancer classification using microarray data. BMC Bioinformatics 6(76) (2005)
Google Scholar
Pomeroy, S.L.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415 (2002)
Google Scholar
van’t Veer, Dai, H., van de Vijver, He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 (2002)
Google Scholar
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J.A., Marks, J.R., Nevins, J.R.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 98(20), 11462–11467 (2001)
Article Google Scholar
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5 (2004)
Google Scholar
Zhang, H.H., Ahn, J., Lin, X., Park, C.: Gene selection using support vector machines with non-convex penalty. Bioinformatics 22(1), 88–95 (2006)
Article Google Scholar
Zhou, X., Mao, K.Z.: Ls bound based gene selection for dna microarray data. Bioinformatics 21(8), 1559–1564 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical & Electronic Engineering, Nanyang Technological University, 639798, Singapore
Kezhi Z. Mao & Wenyin Tang

Authors

Kezhi Z. Mao
View author publications
You can also search for this author in PubMed Google Scholar
Wenyin Tang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jagath C. Rajapakse Bertil Schmidt Gwenn Volkert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mao, K.Z., Tang, W. (2007). Correlation-Based Relevancy and Redundancy Measures for Efficient Gene Selection. In: Rajapakse, J.C., Schmidt, B., Volkert, G. (eds) Pattern Recognition in Bioinformatics. PRIB 2007. Lecture Notes in Computer Science(), vol 4774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75286-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-540-75286-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75285-1
Online ISBN: 978-3-540-75286-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Correlation-Based Relevancy and Redundancy Measures for Efficient Gene Selection

Abstract

Chapter PDF

Similar content being viewed by others

Feature Selection in Gene Expression Profile Employing Relevancy and Redundancy Measures and Binary Whale Optimization Algorithm (BWOA)

Gene Selection Based on Supervised Vector Representation of Genes

Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Correlation-Based Relevancy and Redundancy Measures for Efficient Gene Selection

Abstract

Chapter PDF

Similar content being viewed by others

Feature Selection in Gene Expression Profile Employing Relevancy and Redundancy Measures and Binary Whale Optimization Algorithm (BWOA)

Gene Selection Based on Supervised Vector Representation of Genes

Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation