Advertisement

Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis

  • Osama MahmoudEmail author
  • Andrew Harrison
  • Asma Gul
  • Zardad Khan
  • Metodi V. Metodiev
  • Berthold Lausen
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

For many functional genomic experiments, identifying the most characterizing genes is a main challenge. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on a set of discriminative genes. Analyzing overlapping between gene expression of different classes is an effective criterion for identifying relevant genes. However, genes selected according to maximizing a relevance score could have rich redundancy. We propose a scheme for minimizing selection redundancy, in which the Proportional Overlapping Score (POS) technique is extended by using a recursive approach to assign a set of complementary discriminative genes. The proposed scheme exploits the gene masks defined by POS to identify more integrated genes in terms of their classification patterns. The approach is validated by comparing its classification performance with other feature selection methods, Wilcoxon Rank Sum, mRMR, MaskedPainter and POS, for several benchmark gene expression datasets using three different classifiers: Random Forest; k Nearest Neighbour; Support Vector Machine. The experimental results of classification error rates show that our proposal achieves a better performance.

References

  1. Alhopuro, P., Sammalkorpi, H., Niittymäki, I., Biström, M., Raitila, A., Saharinen, J., et al. (2012). Candidate driver genes in microsatellite-unstable colorectal cancer. International Journal of Cancer, 130(7), 1558–1566.CrossRefGoogle Scholar
  2. Apiletti, D., Baralis, E., Bruno, G., & Fiori, A. (2012). Maskedpainter: Feature selection for microarray data analysis. Intelligent Data Analysis, 16(4),717–737.Google Scholar
  3. De Jay, N., Papillon-Cavanagh, S., Olsen, C., El-Hachem, N., Bontempi, G., & Haibe-Kains, B. (2013). mRMRe: An R package for parallelized mRMR ensemble feature selection. Bioinformatics, 29(18), 2365–2368.CrossRefGoogle Scholar
  4. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.CrossRefGoogle Scholar
  5. Gordon, G., Jensen, R., Hsiao, L., Gullans, S., Blumenstock, E., Ramaswamy, S., et al. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 62(17), 4963–4967.Google Scholar
  6. Jorissen, R. N., Gibbs, P., Christie, M., Prakash, S., Lipton, L., Desai, J., et al. (2009). Metastasis-associated gene expression changes predict poor outcomes in patients with Dukes stage B and C colorectal cancer. Clinical Cancer Research, 15(24), 7642–7651.CrossRefGoogle Scholar
  7. Kestler, H., Lindner, W., & Müller, A. (2006). Learning and feature selection using the set covering machine with data-dependent rays on gene expression profiles. In F. Schwenker & S. Marinai (Eds.), Artificial neural networks in pattern recognition (ANNPR 06) volume LNAI 4087 (pp 286–297). Heidelberg: Springer.Google Scholar
  8. Laiho, P., Kokko, A., Vanharanta, S., Salovaara, R., Sammalkorpi, H., Järvinen, H., et al. (2007). Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene, 26(2), 312–320.CrossRefGoogle Scholar
  9. Lausen, B., Hothorn, T., Bretz, F., & Schumacher, M. (2004). Assessment of optimal selected prognostic factors. Biometrical Journal, 46(3), 364–374.MathSciNetCrossRefGoogle Scholar
  10. Lausser, L., Müssel, C., Maucher, M., & Kestler, H. A. (2013). Measuring and visualizing the stability of biomarker selection techniques. Computational Statistics, 28(1), 51–65.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., & Lausen, B. (2014b). propOverlap: Feature (gene) selection based on the proportional overlapping scores. R package version 1.0, http://CRAN.R-project.org/package=propOverlap
  12. Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., Metodiev, M., et al. (2014a). A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinformatics, 15, 274.Google Scholar
  13. Michiels, S., Koscielny, S., & Hill, C. (2005). Prediction of cancer outcome with microarrays: A multiple random validation strategy. The Lancet, 365(9458), 488–492.CrossRefGoogle Scholar
  14. Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.CrossRefGoogle Scholar
  15. Statnikov, A., Aliferis, C. F., Tsamardinos, I., Hardin, D., & Levy, S. (2005). A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5), 631–643.CrossRefGoogle Scholar
  16. Su, Y., Murali, T., Pavlovic, V., Schaffer, M., & Kasif, S. (2003). Rankgene: Identification of diagnostic genes based on expression data. Bioinformatics, 19(12), 1578–1579.CrossRefGoogle Scholar
  17. Tukey, J. (1977). Exploratory data analysis. Reading, Mass. Addison-Wesley.zbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Osama Mahmoud
    • 1
    • 2
    Email author
  • Andrew Harrison
    • 1
  • Asma Gul
    • 1
    • 3
  • Zardad Khan
    • 1
  • Metodi V. Metodiev
    • 4
  • Berthold Lausen
    • 1
  1. 1.Department of Mathematical SciencesUniversity of EssexColchesterUK
  2. 2.Department of Applied StatisticsHelwan UniversityCairoEgypt
  3. 3.Department of StatisticsShaheed Benazir Bhutto Women University PeshawarKhyber PukhtoonkhwaPakistan
  4. 4.School of Biological Sciences/Proteomics UnitUniversity of EssexColchesterUK

Personalised recommendations