Deep Learning Approach for Pathogen Detection Through Shotgun Metagenomics Sequence Classification
- 778 Downloads
Studies have shown that shotgun metagenomics sequencing facilitates the evaluation of diverse viruses, bacteria, and eukaryotic microbes and assists in exploring their abundances in complex samples. Due to the challenges of processing a substantial amount of sequences and overall computational complexity, it is time-consuming to analyze these data through traditional database sequence comparison approaches. Deep learning has been widely used to solve many classification problems, including those in the bioinformatics field, and has demonstrated its accuracy and efficiency for analyzing large-scale datasets. The purpose of this work is to explore how a long short-term memory (LSTM) network can be used to learn sequential genome patterns through pathogen detection from metagenome data. Our experimental result showed that we can obtain similar accuracy to the conventional BLAST method, but at a speed that is about 36 times faster.
KeywordsShotgun metagenomics sequencing Sequence classification Deep learning LSTM GPU acceleration Parallel computing
The authors are members of Fujitsu next generation Cloud Research Alliance Laboratory (FCRAL). This research and development work was partially supported by the MIC/SCOPE #172107106 and by Fujitsu Ltd.
- 1.Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)Google Scholar
- 2.NCBI: BLAST: Basic Local Alignment Search Tool. https://blast.ncbi.nlm.nih.gov/Blast.cgi
- 3.BWA: Aligner Burrows-Wheeler (BWA). http://bio-bwa.sourceforge.net/
- 4.Zielezinski, A., Vinga, S., Almeida, J., Karlowski, W.M.: Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 18, 186 (2017)Google Scholar
- 5.Sill, J., Takacs, G., Mackey, L., Lin, D.: Feature-Weighted Linear Stacking, arXiv:0911.0460