Incremental Wrapper Based Random Forest Gene Subset Selection for Tumor Discernment

  • Alia FatimaEmail author
  • Usman Qamar
  • Saad Rehman
  • Aiman Khan Nazir
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 903)


High-dimensional cancer related dataset permits the researchers to timely diagnose and facilitate in effective treatment of the cancer. Biomedicine application process on the thousands of features. It is challenging to extract the precise statistics from this high-dimensional dataset. This paper presents the Incremental Wrapper based Random Forest Gene Subset Selection of Tumor discernment that mechanisms on the principle of incremental wrapper based feature subset selection with random forest classification algorithm and this algorithm also works as performance validator. Incremental wrapper based feature subset selection is a technique to pick out a finest conceivable subset of genes from the high-dimensional data with low computational cost. Random Forest will increase the overall performance as it works better in cancer related high-dimensional dataset. The efficacy of the random forest classification algorithm as performance validator will significantly improve by working on a selective discriminative subset of prognostic genes as compare to the raw data. We evaluate the proposed methodology on the six publicly available cancer related high dimensional datasets and found that the proposed methodology outperform as compare to standard random forests.


Cancer classification Random forest IWSS Incremental wrapper based gene subset selection 


  1. 1.
    Ahmad, F., Isa, N.A.M., Hussain, Z., Osman, M.K., Sulaiman, S.N.: A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. Pattern Anal. Appl. 18, 861–870 (2015)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Mishra, D., Sahu, B.: Feature selection for cancer classification: a signal-to-noise ratio approach. Int. J. Sci. Eng. Res. 2, 1–7 (2011)Google Scholar
  3. 3.
    Deng, L., Pei, J., Ma, J., Lee, D.L.: A rank sum test method for informative gene discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 410–419 (2004)Google Scholar
  4. 4.
    Dasgupta, S., Saha, G., Mondal, R.: A comparison between methods for generating differentially expressed genes from microarray data for prediction of disease. In: Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT) (2015)Google Scholar
  5. 5.
    Hua, J., Tembe, W.D., Dougherty, E.R.: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn. 42, 409–424 (2009)CrossRefGoogle Scholar
  6. 6.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefGoogle Scholar
  7. 7.
    Wu, B., et al.: Comparison of statistical methods for classification of Ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)CrossRefGoogle Scholar
  8. 8.
    Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7, 3 (2006)CrossRefGoogle Scholar
  9. 9.
    Shu, W., Shen, H.: Incremental feature selection based on rough set in dynamic incomplete data. Pattern Recogn. 47, 3890–3906 (2014)CrossRefGoogle Scholar
  10. 10.
    Prabhakar, S., Jain, A.K.: Decision-level fusion in fingerprint verification. Pattern Recogn. 35, 861–874 (2002)CrossRefGoogle Scholar
  11. 11.
    Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43, 5–13 (2010)CrossRefGoogle Scholar
  12. 12.
    You, W., Yang, Z., Ji, G.: PLS-based recursive feature elimination for high-dimensional small sample. Knowl. Based Syst. 55, 15–28 (2014)CrossRefGoogle Scholar
  13. 13.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefGoogle Scholar
  14. 14.
    Inza, I., Larrañaga, P., Blanco, R., Cerrolaza, A.J.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 31, 91–103 (2004)CrossRefGoogle Scholar
  15. 15.
    Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn. 39, 2383–2392 (2006)CrossRefGoogle Scholar
  16. 16.
    Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–91 (1993)CrossRefGoogle Scholar
  17. 17.
    Cancer program data sets (2010). Broad Institute.
  18. 18.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010).
  19. 19.
    Dataset repository in ARFF (weka) (2010). BioInformatics Group Seville.

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Alia Fatima
    • 1
    Email author
  • Usman Qamar
    • 1
  • Saad Rehman
    • 1
  • Aiman Khan Nazir
    • 1
  1. 1.National University of Sciences and Technology (NUST)IslamabadPakistan

Personalised recommendations