Skip to main content

Incremental Wrapper Based Random Forest Gene Subset Selection for Tumor Discernment

  • 436 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 903)

Abstract

High-dimensional cancer related dataset permits the researchers to timely diagnose and facilitate in effective treatment of the cancer. Biomedicine application process on the thousands of features. It is challenging to extract the precise statistics from this high-dimensional dataset. This paper presents the Incremental Wrapper based Random Forest Gene Subset Selection of Tumor discernment that mechanisms on the principle of incremental wrapper based feature subset selection with random forest classification algorithm and this algorithm also works as performance validator. Incremental wrapper based feature subset selection is a technique to pick out a finest conceivable subset of genes from the high-dimensional data with low computational cost. Random Forest will increase the overall performance as it works better in cancer related high-dimensional dataset. The efficacy of the random forest classification algorithm as performance validator will significantly improve by working on a selective discriminative subset of prognostic genes as compare to the raw data. We evaluate the proposed methodology on the six publicly available cancer related high dimensional datasets and found that the proposed methodology outperform as compare to standard random forests.

Keywords

  • Cancer classification
  • Random forest
  • IWSS
  • Incremental wrapper based gene subset selection

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-99133-7_13
  • Chapter length: 7 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-99133-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.

References

  1. Ahmad, F., Isa, N.A.M., Hussain, Z., Osman, M.K., Sulaiman, S.N.: A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. Pattern Anal. Appl. 18, 861–870 (2015)

    MathSciNet  CrossRef  Google Scholar 

  2. Mishra, D., Sahu, B.: Feature selection for cancer classification: a signal-to-noise ratio approach. Int. J. Sci. Eng. Res. 2, 1–7 (2011)

    Google Scholar 

  3. Deng, L., Pei, J., Ma, J., Lee, D.L.: A rank sum test method for informative gene discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 410–419 (2004)

    Google Scholar 

  4. Dasgupta, S., Saha, G., Mondal, R.: A comparison between methods for generating differentially expressed genes from microarray data for prediction of disease. In: Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT) (2015)

    Google Scholar 

  5. Hua, J., Tembe, W.D., Dougherty, E.R.: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn. 42, 409–424 (2009)

    CrossRef  Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    CrossRef  Google Scholar 

  7. Wu, B., et al.: Comparison of statistical methods for classification of Ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)

    CrossRef  Google Scholar 

  8. Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7, 3 (2006)

    CrossRef  Google Scholar 

  9. Shu, W., Shen, H.: Incremental feature selection based on rough set in dynamic incomplete data. Pattern Recogn. 47, 3890–3906 (2014)

    CrossRef  Google Scholar 

  10. Prabhakar, S., Jain, A.K.: Decision-level fusion in fingerprint verification. Pattern Recogn. 35, 861–874 (2002)

    CrossRef  Google Scholar 

  11. Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43, 5–13 (2010)

    CrossRef  Google Scholar 

  12. You, W., Yang, Z., Ji, G.: PLS-based recursive feature elimination for high-dimensional small sample. Knowl. Based Syst. 55, 15–28 (2014)

    CrossRef  Google Scholar 

  13. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    CrossRef  Google Scholar 

  14. Inza, I., Larrañaga, P., Blanco, R., Cerrolaza, A.J.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 31, 91–103 (2004)

    CrossRef  Google Scholar 

  15. Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn. 39, 2383–2392 (2006)

    CrossRef  Google Scholar 

  16. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–91 (1993)

    CrossRef  Google Scholar 

  17. Cancer program data sets (2010). Broad Institute. http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi

  18. Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml

  19. Dataset repository in ARFF (weka) (2010). BioInformatics Group Seville. http://www.upo.es/eps/bigs/datasets.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alia Fatima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Fatima, A., Qamar, U., Rehman, S., Nazir, A.K. (2018). Incremental Wrapper Based Random Forest Gene Subset Selection for Tumor Discernment. In: , et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99133-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99132-0

  • Online ISBN: 978-3-319-99133-7

  • eBook Packages: Computer ScienceComputer Science (R0)