Abstract
SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm’s I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.
Similar content being viewed by others
References
Marras SAE, Russell Kramer F, Tyagi S (1999) Multiplex detection of single-nucleotide variations using molecular beacons. Genet Anal Biomol Eng 14(5):151–156
Orita M, Iwahana H, Kanazawa H et al (1989) Detection of polymorphisms of human DNA by gel electrophoresis as single-strand conformation polymorphisms. Proc Natl Acad Sci 86(8):2766–2770
Pirastu M, Kan YW, Cao A et al (1983) Prenatal diagnosis of β-thalassemia: detection of a single nucleotide mutation in DNA. N Engl J Med 309(5):284–287
Michaels SD, Amasino RM (1998) A robust method for detecting single-nucleotide changes as polymorphic markers by PCR. Plant J 14(3):381–385
Kwok PY, Chen X (2003) Detection of single nucleotide polymorphisms. Curr Issues Mol Biol 5:43–60
Edmonson MN, Zhang J, Yan C et al (2011) Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format. Bioinformatics 27(6):865–866
BGI, Short Oligonucleotide Analysis Package, SOAPsnv. http://soap.genomics.org.cn/SOAPsnv.html
National Supercomputer Centre of Guangzhou. http://www.nscc-gz.cn
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441
Pileup format. http://samtools.sourceforge.net/pileup.shtml
NCBI. www.ncbi.nlm.nih.gov/
Derek B (2013) BamTools API tutorial version 1.0. https://github.com/pezmaster31/bamtools/wiki/Using-the-API
Top500 Supercomputers, November 2013. http://www.top500.org/lists/2013/11/
Acknowledgments
We would like to thank Professor Derek Barnett from Boston College (Washington College) for providing the source code for BamTools and related test data. We would also like to thank Dr. Tony Cox and Zemin Ning from England SANGER institute for discussion of the problem and thus improving our own understanding. Thanks to researchers and scientists of Shenzhen BGI (Yingrui Li, Ruibang Luo, Bingqiang Wang, and Yujian Shi) for offering great help, research platform, professional knowledge, and abundant experimental data, as well as suggestions in the process of thesis writing and modifying. Thanks to Chunlin Chen for specific work in program testing and code implementation. This work is supported by NSFC Grant 61272056, U1435222, and 1133005.
Author information
Authors and Affiliations
Corresponding author
Additional information
Xiaoqian Zhu, Shaoliang Peng, Shaojie Liu, Xiang Gu, and Ming Gao have contributed equally to this work.
Rights and permissions
About this article
Cite this article
Zhu, X., Peng, S., Liu, S. et al. A Massively Parallel Computational Method of Reading Index Files for SOAPsnv. Interdiscip Sci Comput Life Sci 7, 397–404 (2015). https://doi.org/10.1007/s12539-015-0123-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-015-0123-x