Skip to main content

Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis

  • Conference paper
  • First Online:
Analysis of Large and Complex Data

Abstract

For many functional genomic experiments, identifying the most characterizing genes is a main challenge. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on a set of discriminative genes. Analyzing overlapping between gene expression of different classes is an effective criterion for identifying relevant genes. However, genes selected according to maximizing a relevance score could have rich redundancy. We propose a scheme for minimizing selection redundancy, in which the Proportional Overlapping Score (POS) technique is extended by using a recursive approach to assign a set of complementary discriminative genes. The proposed scheme exploits the gene masks defined by POS to identify more integrated genes in terms of their classification patterns. The approach is validated by comparing its classification performance with other feature selection methods, Wilcoxon Rank Sum, mRMR, MaskedPainter and POS, for several benchmark gene expression datasets using three different classifiers: Random Forest; k Nearest Neighbour; Support Vector Machine. The experimental results of classification error rates show that our proposal achieves a better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Alhopuro, P., Sammalkorpi, H., Niittymäki, I., Biström, M., Raitila, A., Saharinen, J., et al. (2012). Candidate driver genes in microsatellite-unstable colorectal cancer. International Journal of Cancer, 130(7), 1558–1566.

    Article  Google Scholar 

  • Apiletti, D., Baralis, E., Bruno, G., & Fiori, A. (2012). Maskedpainter: Feature selection for microarray data analysis. Intelligent Data Analysis, 16(4),717–737.

    Google Scholar 

  • De Jay, N., Papillon-Cavanagh, S., Olsen, C., El-Hachem, N., Bontempi, G., & Haibe-Kains, B. (2013). mRMRe: An R package for parallelized mRMR ensemble feature selection. Bioinformatics, 29(18), 2365–2368.

    Article  Google Scholar 

  • Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.

    Article  Google Scholar 

  • Gordon, G., Jensen, R., Hsiao, L., Gullans, S., Blumenstock, E., Ramaswamy, S., et al. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 62(17), 4963–4967.

    Google Scholar 

  • Jorissen, R. N., Gibbs, P., Christie, M., Prakash, S., Lipton, L., Desai, J., et al. (2009). Metastasis-associated gene expression changes predict poor outcomes in patients with Dukes stage B and C colorectal cancer. Clinical Cancer Research, 15(24), 7642–7651.

    Article  Google Scholar 

  • Kestler, H., Lindner, W., & Müller, A. (2006). Learning and feature selection using the set covering machine with data-dependent rays on gene expression profiles. In F. Schwenker & S. Marinai (Eds.), Artificial neural networks in pattern recognition (ANNPR 06) volume LNAI 4087 (pp 286–297). Heidelberg: Springer.

    Google Scholar 

  • Laiho, P., Kokko, A., Vanharanta, S., Salovaara, R., Sammalkorpi, H., Järvinen, H., et al. (2007). Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene, 26(2), 312–320.

    Article  Google Scholar 

  • Lausen, B., Hothorn, T., Bretz, F., & Schumacher, M. (2004). Assessment of optimal selected prognostic factors. Biometrical Journal, 46(3), 364–374.

    Article  MathSciNet  Google Scholar 

  • Lausser, L., Müssel, C., Maucher, M., & Kestler, H. A. (2013). Measuring and visualizing the stability of biomarker selection techniques. Computational Statistics, 28(1), 51–65.

    Article  MathSciNet  MATH  Google Scholar 

  • Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., & Lausen, B. (2014b). propOverlap: Feature (gene) selection based on the proportional overlapping scores. R package version 1.0, http://CRAN.R-project.org/package=propOverlap

  • Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., Metodiev, M., et al. (2014a). A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinformatics, 15, 274.

    Google Scholar 

  • Michiels, S., Koscielny, S., & Hill, C. (2005). Prediction of cancer outcome with microarrays: A multiple random validation strategy. The Lancet, 365(9458), 488–492.

    Article  Google Scholar 

  • Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.

    Article  Google Scholar 

  • Statnikov, A., Aliferis, C. F., Tsamardinos, I., Hardin, D., & Levy, S. (2005). A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5), 631–643.

    Article  Google Scholar 

  • Su, Y., Murali, T., Pavlovic, V., Schaffer, M., & Kasif, S. (2003). Rankgene: Identification of diagnostic genes based on expression data. Bioinformatics, 19(12), 1578–1579.

    Article  Google Scholar 

  • Tukey, J. (1977). Exploratory data analysis. Reading, Mass. Addison-Wesley.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osama Mahmoud .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mahmoud, O., Harrison, A., Gul, A., Khan, Z., Metodiev, M.V., Lausen, B. (2016). Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_24

Download citation

Publish with us

Policies and ethics