Skip to main content

A Comparative Study of Feature Selection and Classification Techniques for High-Throughput DNA Methylation Data

  • Conference paper
  • First Online:
Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016 (AISI 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 533))

Abstract

The high dimensionality of data is a common problem in classification. In this work, a small number of significant features is investigated to classify data of two sample groups. Various feature selection and classification techniques are applied in a collection of four high-throughput DNA methylation microarray data sets. Using accuracy as a performance metric, the repeated 10-fold cross-validation strategy is implemented to evaluate the different proposed techniques. Combining the Signal to Noise Ratio (SNR) and Wilcoxon rank-sum test filter methods with Support Vector Machine-Recursive Feature Elimination (SVM-RFE) as an embedded method has resulted in a perfect performance. In addition, the linear classifiers showed excellent results compared to others classifiers when applied to such data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, D., Xie, Z., Le Pape, M., Dye, T.: An evaluation of statistical methods for dna methylation microarray data analysis. BMC Bioinform. 16(1), 1 (2015)

    Article  Google Scholar 

  2. Das, P.M., Singal, R.: DNA methylation and cancer. J. Clin. Oncol. 22(22), 4632–4642 (2004)

    Article  Google Scholar 

  3. Zhuang, J., Widschwendter, M., Teschendorff, A.E.: A comparison of feature selection and classification methods in dna methylation studies using the illumina infinium platform. BMC Bioinform. 13(1), 59 (2012)

    Article  Google Scholar 

  4. Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)

    Article  Google Scholar 

  5. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  6. Cai, Z., Xu, D., Zhang, Q., Zhang, J., Ngai, S.M., Shao, J.: Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol. BioSyst. 11(3), 791–800 (2015)

    Article  Google Scholar 

  7. Ma, Z., Teschendorff, A.E.: A variational bayes beta mixture model for feature selection in dna methylation studies. J. Bioinform. Computat. Biol. 11(04), 1350005 (2013)

    Article  Google Scholar 

  8. Meng, H., Murrelle, E.L., Li, G.: Identification of a small optimal subset of CpG sites as bio-markers from high-throughput DNA methylation profiles. BMC Bioinform. 9(1), 457 (2008)

    Article  Google Scholar 

  9. Amin, I.I., Hassanien, A.E., Kassim, S.K., Hefny, H.A.: Big DNA methylation data analysis and visualizing in a common form of breast cancer. In: Hassanien, A.E., Azar, A.T., Snasael, V., Kacprzyk, J., Abawajy, J.H. (eds.) Big Data in Complex Systems. SBD, vol. 9, pp. 375–392. Springer, Heidelberg (2015)

    Google Scholar 

  10. Valavanis, I., Pilalis, E., Georgiadis, P., Kyrtopoulos, S., Chatziioannou, A.: Cancer biomarkers from genome-scale DNA methylation: Comparison of evolutionary and semantic analysis methods. Microarrays 4(4), 647–670 (2015)

    Article  Google Scholar 

  11. Gunavathi, C., Premalatha, K.: Cuckoo search optimisation for feature selection in cancer classification: a new approach. Int. J. Data Min. Bioinform. 13(3), 248–265 (2015)

    Article  Google Scholar 

  12. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  13. Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9), 1106–1114 (2007)

    Article  Google Scholar 

  14. Bibikova, M., Le, J., Barnes, B., Saedinia-Melnyk, S., Zhou, L., Shen, R., Gunderson, K.L.: Genome-wide dna methylation profiling using infinium\(\textregistered \) assay. Epigenomics 1(1), 177–200 (2009)

    Article  Google Scholar 

  15. Bibikova, M., Barnes, B., Tsan, C., Ho, V., Klotzle, B., Le, J.M., Delano, D., Zhang, L., Schroth, G.P., Gunderson, K.L., et al.: High density dna methylation array with single CpG site resolution. Genomics 98(4), 288–295 (2011)

    Article  Google Scholar 

  16. Lipworth, L., Morgans, A.K., Edwards, T.L., Barocas, D.A., Chang, S.S., Herrell, S.D., Penson, D.F., Resnick, M.J., Smith, J.A., Clark, P.E.: Renal cell cancer histological subtype distribution differs by race and sex. BJU Int. 117(2), 260–265 (2016)

    Article  Google Scholar 

  17. Liu, Y., Aryee, M.J., Padyukov, L., Fallin, M.D., Hesselberg, E., Runarsson, A., Reinius, L., Acevedo, N., Taub, M., Ronninger, M., et al.: Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat. Biotechnol. 31(2), 142–147 (2013)

    Article  Google Scholar 

  18. Teschendorff, A.E., Menon, U., Gentry-Maharaj, A., Ramus, S.J., Weisenberger, D.J., Shen, H., Campan, M., Noushmehr, H., Bell, C.G., Maxwell, A.P., et al.: Age-dependent dna methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 20(4), 440–446 (2010)

    Article  Google Scholar 

  19. Dedeurwaerder, S., Defrance, M., Bizet, M., Calonne, E., Bontempi, G., Fuks, F.: A comprehensive overview of infinium humanmethylation450 data processing. Briefings Bioinform. 15(6), 929–941 (2013)

    Article  Google Scholar 

  20. Chen, Y.A., Lemire, M., Choufani, S., Butcher, D.T., Grafodatskaya, D., Zanke, B.W., Gallinger, S., Hudson, T.J., Weksberg, R.: Discovery of cross-reactive probes and polymorphic CpGs in the illumina infinium humanmethylation450 microarray. Epigenetics 8(2), 203–209 (2013)

    Article  Google Scholar 

  21. Zhang, Q., Wu, H., Zheng, H.: Aberrantly methylated CpG island detection in colon cancer. J. Proteomics Bioinform. 2015 (2015)

    Google Scholar 

  22. Romanski, P., Kotthoff, L.: Fselector: Selecting attributes (2013). https://cran.r-project.org/web/packages/FSelector/. R package version 0.19

  23. Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 8(1), 1 (2007)

    Article  Google Scholar 

  24. Liang, J.D., Ping, X.O., Tseng, Y.J., Huang, G.T., Lai, F., Yang, P.M.: Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods. Comput. Methods Programs Biomed. 117(3), 425–434 (2014)

    Article  Google Scholar 

  25. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)

    Google Scholar 

  26. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  27. Keller, A.D., Schummer, M., Hood, L., Ruzzo, W.L.: Bayesian classification of DNA array expression data. Technical Report UW-CSE-2000-08-01 (2000)

    Google Scholar 

  28. Huerta, E.B., Duval, B., Hao, J.K.: A hybrid LDA and genetic algorithm for gene selection and classification of microarray data. Neurocomputing 73(13), 2375–2383 (2010)

    Article  Google Scholar 

  29. Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, pp. 421–427 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alhasan Alkuhlani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Alkuhlani, A., Nassef, M., Farag, I. (2017). A Comparative Study of Feature Selection and Classification Techniques for High-Throughput DNA Methylation Data. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_76

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48308-5_76

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48307-8

  • Online ISBN: 978-3-319-48308-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics