Skip to main content
Log in

A Massively Parallel Computational Method of Reading Index Files for SOAPsnv

  • Original Research Article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm’s I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Marras SAE, Russell Kramer F, Tyagi S (1999) Multiplex detection of single-nucleotide variations using molecular beacons. Genet Anal Biomol Eng 14(5):151–156

    Article  CAS  Google Scholar 

  2. Orita M, Iwahana H, Kanazawa H et al (1989) Detection of polymorphisms of human DNA by gel electrophoresis as single-strand conformation polymorphisms. Proc Natl Acad Sci 86(8):2766–2770

    Article  CAS  Google Scholar 

  3. Pirastu M, Kan YW, Cao A et al (1983) Prenatal diagnosis of β-thalassemia: detection of a single nucleotide mutation in DNA. N Engl J Med 309(5):284–287

    Article  CAS  Google Scholar 

  4. Michaels SD, Amasino RM (1998) A robust method for detecting single-nucleotide changes as polymorphic markers by PCR. Plant J 14(3):381–385

    Article  CAS  Google Scholar 

  5. Kwok PY, Chen X (2003) Detection of single nucleotide polymorphisms. Curr Issues Mol Biol 5:43–60

    CAS  PubMed  Google Scholar 

  6. Edmonson MN, Zhang J, Yan C et al (2011) Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format. Bioinformatics 27(6):865–866

    Article  CAS  Google Scholar 

  7. BGI, Short Oligonucleotide Analysis Package, SOAPsnv. http://soap.genomics.org.cn/SOAPsnv.html

  8. National Supercomputer Centre of Guangzhou. http://www.nscc-gz.cn

  9. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079

    Article  Google Scholar 

  10. Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441

    Article  CAS  Google Scholar 

  11. Pileup format. http://samtools.sourceforge.net/pileup.shtml

  12. GenBank. https://www.ncbi.nlm.nih.gov/genbank/

  13. NCBI. www.ncbi.nlm.nih.gov/

  14. Derek B (2013) BamTools API tutorial version 1.0. https://github.com/pezmaster31/bamtools/wiki/Using-the-API

  15. Top500 Supercomputers, November 2013. http://www.top500.org/lists/2013/11/

Download references

Acknowledgments

We would like to thank Professor Derek Barnett from Boston College (Washington College) for providing the source code for BamTools and related test data. We would also like to thank Dr. Tony Cox and Zemin Ning from England SANGER institute for discussion of the problem and thus improving our own understanding. Thanks to researchers and scientists of Shenzhen BGI (Yingrui Li, Ruibang Luo, Bingqiang Wang, and Yujian Shi) for offering great help, research platform, professional knowledge, and abundant experimental data, as well as suggestions in the process of thesis writing and modifying. Thanks to Chunlin Chen for specific work in program testing and code implementation. This work is supported by NSFC Grant 61272056, U1435222, and 1133005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaoliang Peng.

Additional information

Xiaoqian Zhu, Shaoliang Peng, Shaojie Liu, Xiang Gu, and Ming Gao have contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Peng, S., Liu, S. et al. A Massively Parallel Computational Method of Reading Index Files for SOAPsnv. Interdiscip Sci Comput Life Sci 7, 397–404 (2015). https://doi.org/10.1007/s12539-015-0123-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-015-0123-x

Keywords

Navigation