Skip to main content

Effective Disease Prediction on Gene Family Abundance Using Feature Selection and Binning Approach

  • Conference paper
  • First Online:
IT Convergence and Security

Abstract

Metagenomic is now a novel source for supporting diagnosis and prognosis human diseases. Numerous studies have pointed to crucial roles of metagenomics in personalized medicine approaches. Recent years, machine learning has been widely deploying in a vast amount of metagenomic research. Usually, gene family data are characterized by very high dimension which can be up to millions of features. However, the number of obtained samples is rather small compared to the number of attributes. Therefore, the results in validation sets often exhibit poor performance while we can get high accuracy during training phrases. Moreover, a very large number of features on each gene family dataset consumes a considerable time in processing and learning. In this study, we propose feature selection methods using Ridge Regression on datasets including gene families, then the new obtained set of features is binned by an equal width binning approach and fetched into either a Linear Regression and a One-Dimensional Convolutional Neural Network (CNN1D) to do prediction tasks. The experiments are examined on more than 1000 samples of gene family abundance datasets related to Liver Cirrhosis, Colorectal Cancer, Inflammatory Bowel Disease, Obesity and Type 2 Diabetes. The results from the proposed method combining between feature selection algorithms and binning show significant improvements in both prediction performance and execution time compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Handelsman J (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev 68:669–684

    Article  Google Scholar 

  2. Behjati S, Tarpey PS (2013) What is next generation sequencing? https://doi.org/10.1136/archdischild-2013-304340

  3. Ehrlich SD (2016) The human gut microbiome impacts health and disease. CR Biol 339(7–8):319–323. https://doi.org/10.1016/j.crvi.2016.04.008 (PMID: 27236827)

    Article  Google Scholar 

  4. Truong DT et al (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 12:902–903

    Article  Google Scholar 

  5. NIH HMP Working Group, Peterson J, Garges S et al (2009) The NIH Human Microbiome Project. Genome Res 19:2317–2323. https://doi.org/10.1101/gr.096651.109

  6. Fabijanić M, Vlahoviček K (2016) Big data, evolution, and metagenomes: predicting disease from gut microbiota codon usage profiles. In: Carugo O, Eisenhaber F (eds) Data mining techniques for the life sciences. Methods in molecular biology, vol 1415. © Springer Science+Business Media, New York. https://doi.org/10.1007/978-1-4939-3572-7_26

  7. Ditzler G et al (2015) Fizzy: feature subset selection for metagenomics. BMC Bioinform 16:358. 10.1186/s12859-015-0793-8

    Google Scholar 

  8. Cai L, Wu H, Li D, Zhou K, Zou F (2015) Type 2 diabetes biomarkers of human gut microbiota selected via iterative sure independent screening method. PLoS ONE 10(10):e0140827

    Article  Google Scholar 

  9. Pasolli E et al (2016) Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLOS Comput Biol. https://doi.org/10.1371/journal.pcbi.1004977

  10. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc: Series B (Stat Methodol) 67(2):301–320

    Article  MathSciNet  Google Scholar 

  11. Hacılar H et al (2020) Inflammatory bowel disease biomarkers of human gut microbiota selected via ensemble feature selection methods

    Google Scholar 

  12. Lazar C et al (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. https://doi.org/10.1109/TCBB.2012.33

  13. Liu H. Evolving feature selection

    Google Scholar 

  14. Statnikov A et al (2013) A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1(1):11. https://doi.org/10.1186/2049-2618-1-11 (PMID: 24456583)

  15. Wagner A et al (1994) Surveys of gene families using polymerase chain reaction: PCR selection and PCR drift. https://doi.org/10.1093/sysbio/43.2.250

  16. Nguyen TH, Zucker J (2019) Enhancing metagenome-based disease prediction by unsupervised binning approaches. In: 2019 11th international conference on knowledge and systems engineering (KSE), Da Nang, Vietnam, pp 1–5

    Google Scholar 

  17. Qin J et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65. https://doi.org/10.1038/nature08821 (PMID: 20203603)

    Article  Google Scholar 

  18. Qin N et al (2014) Alterations of the human gut microbiome in liver cirrhosis. Nature 513(7516):59–64. https://doi.org/10.1038/nature13568 (PMID: 25079328)

    Article  Google Scholar 

  19. Zeller G et al (2014) Potential of fecal microbiota for early‐stage detection of colorectal cancer. Mol Syst Biol 10(11):766. https://doi.org/10.15252/msb.20145645.

  20. Le Chatelier E et al (2013) Richness of human gut microbiome correlates with metabolic markers. Nature 500(7464):541–546. https://doi.org/10.1038/nature12506 (PMID: 23985870)

    Article  Google Scholar 

  21. Qin J et al (2012) A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490(7418):55–60. https://doi.org/10.1038/nature11450 (PMID: 23023125)

    Article  Google Scholar 

  22. Nguyen TH (2019) Metagenome-based disease classification with deep learning and visualizations based on self-organizing maps. Lecture notes in computer science book series (LNCS), vol 11814. Springer. ISSN: 0302-9743

    Google Scholar 

  23. Pasolli E et al (2017) Accessible, curated metagenomic data through experiment hub, pp 1023–1024. ISSN 1548-7105

    Google Scholar 

  24. Abubucker S et al (2012) Metabolic reconstruction for metagenomic data and its application to the human microbiome, vol 8, pp e1002-358. ISSN 1553-7358

    Google Scholar 

  25. Nguyen TH et al (2019) Disease prediction using synthetic image representations of metagenomic data and convolutional neural networks. In: IEEE Xplore

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thanh-Hai Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, TH. et al. (2021). Effective Disease Prediction on Gene Family Abundance Using Feature Selection and Binning Approach. In: Kim, H., Kim, K.J. (eds) IT Convergence and Security. Lecture Notes in Electrical Engineering, vol 712. Springer, Singapore. https://doi.org/10.1007/978-981-15-9354-3_2

Download citation

Publish with us

Policies and ethics