Abstract
Metagenomic is now a novel source for supporting diagnosis and prognosis human diseases. Numerous studies have pointed to crucial roles of metagenomics in personalized medicine approaches. Recent years, machine learning has been widely deploying in a vast amount of metagenomic research. Usually, gene family data are characterized by very high dimension which can be up to millions of features. However, the number of obtained samples is rather small compared to the number of attributes. Therefore, the results in validation sets often exhibit poor performance while we can get high accuracy during training phrases. Moreover, a very large number of features on each gene family dataset consumes a considerable time in processing and learning. In this study, we propose feature selection methods using Ridge Regression on datasets including gene families, then the new obtained set of features is binned by an equal width binning approach and fetched into either a Linear Regression and a One-Dimensional Convolutional Neural Network (CNN1D) to do prediction tasks. The experiments are examined on more than 1000 samples of gene family abundance datasets related to Liver Cirrhosis, Colorectal Cancer, Inflammatory Bowel Disease, Obesity and Type 2 Diabetes. The results from the proposed method combining between feature selection algorithms and binning show significant improvements in both prediction performance and execution time compared to the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Handelsman J (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev 68:669–684
Behjati S, Tarpey PS (2013) What is next generation sequencing? https://doi.org/10.1136/archdischild-2013-304340
Ehrlich SD (2016) The human gut microbiome impacts health and disease. CR Biol 339(7–8):319–323. https://doi.org/10.1016/j.crvi.2016.04.008 (PMID: 27236827)
Truong DT et al (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 12:902–903
NIH HMP Working Group, Peterson J, Garges S et al (2009) The NIH Human Microbiome Project. Genome Res 19:2317–2323. https://doi.org/10.1101/gr.096651.109
Fabijanić M, Vlahoviček K (2016) Big data, evolution, and metagenomes: predicting disease from gut microbiota codon usage profiles. In: Carugo O, Eisenhaber F (eds) Data mining techniques for the life sciences. Methods in molecular biology, vol 1415. © Springer Science+Business Media, New York. https://doi.org/10.1007/978-1-4939-3572-7_26
Ditzler G et al (2015) Fizzy: feature subset selection for metagenomics. BMC Bioinform 16:358. 10.1186/s12859-015-0793-8
Cai L, Wu H, Li D, Zhou K, Zou F (2015) Type 2 diabetes biomarkers of human gut microbiota selected via iterative sure independent screening method. PLoS ONE 10(10):e0140827
Pasolli E et al (2016) Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLOS Comput Biol. https://doi.org/10.1371/journal.pcbi.1004977
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc: Series B (Stat Methodol) 67(2):301–320
Hacılar H et al (2020) Inflammatory bowel disease biomarkers of human gut microbiota selected via ensemble feature selection methods
Lazar C et al (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. https://doi.org/10.1109/TCBB.2012.33
Liu H. Evolving feature selection
Statnikov A et al (2013) A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1(1):11. https://doi.org/10.1186/2049-2618-1-11 (PMID: 24456583)
Wagner A et al (1994) Surveys of gene families using polymerase chain reaction: PCR selection and PCR drift. https://doi.org/10.1093/sysbio/43.2.250
Nguyen TH, Zucker J (2019) Enhancing metagenome-based disease prediction by unsupervised binning approaches. In: 2019 11th international conference on knowledge and systems engineering (KSE), Da Nang, Vietnam, pp 1–5
Qin J et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65. https://doi.org/10.1038/nature08821 (PMID: 20203603)
Qin N et al (2014) Alterations of the human gut microbiome in liver cirrhosis. Nature 513(7516):59–64. https://doi.org/10.1038/nature13568 (PMID: 25079328)
Zeller G et al (2014) Potential of fecal microbiota for early‐stage detection of colorectal cancer. Mol Syst Biol 10(11):766. https://doi.org/10.15252/msb.20145645.
Le Chatelier E et al (2013) Richness of human gut microbiome correlates with metabolic markers. Nature 500(7464):541–546. https://doi.org/10.1038/nature12506 (PMID: 23985870)
Qin J et al (2012) A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490(7418):55–60. https://doi.org/10.1038/nature11450 (PMID: 23023125)
Nguyen TH (2019) Metagenome-based disease classification with deep learning and visualizations based on self-organizing maps. Lecture notes in computer science book series (LNCS), vol 11814. Springer. ISSN: 0302-9743
Pasolli E et al (2017) Accessible, curated metagenomic data through experiment hub, pp 1023–1024. ISSN 1548-7105
Abubucker S et al (2012) Metabolic reconstruction for metagenomic data and its application to the human microbiome, vol 8, pp e1002-358. ISSN 1553-7358
Nguyen TH et al (2019) Disease prediction using synthetic image representations of metagenomic data and convolutional neural networks. In: IEEE Xplore
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nguyen, TH. et al. (2021). Effective Disease Prediction on Gene Family Abundance Using Feature Selection and Binning Approach. In: Kim, H., Kim, K.J. (eds) IT Convergence and Security. Lecture Notes in Electrical Engineering, vol 712. Springer, Singapore. https://doi.org/10.1007/978-981-15-9354-3_2
Download citation
DOI: https://doi.org/10.1007/978-981-15-9354-3_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9353-6
Online ISBN: 978-981-15-9354-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)