Skip to main content
Log in

Optimization of SAMtools sorting using OpenMP tasks

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

SAMtools is a widely-used genomics application for post-processing high-throughput sequence alignment data. Such sequence alignment data are commonly sorted to make downstream analysis more efficient. However, this sorting process itself can be computationally- and I/O-intensive: high-throughput sequence alignment files in the de facto standard binary alignment/map (BAM) format can be many gigabytes in size, and may need to be decompressed before sorting and compressed afterwards. As a result, BAM-file sorting can be a bottleneck in genomics workflows. This paper describes a case study on the performance analysis and optimization of SAMtools for sorting large BAM files. OpenMP task parallelism and memory optimization techniques resulted in a speedup of 5.9X versus the upstream SAMtools 1.3.1 for an internal (in-memory) sort of 24.6 GiB of compressed BAM data (102.6 GiB uncompressed) with 32 processor cores, while a 1.98X speedup was achieved for an external (out-of-core) sort of a 271.4 GiB BAM file.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Source code for SAMtools optimizations available at https://doi.org/10.5281/zenodo.262169, and HTSlib optimizations at https://doi.org/10.5281/zenodo.262161

  2. https://github.com/samtools/htslib/pull/51

  3. http://www.htslib.org/benchmarks/zlib.html

  4. https://github.com/samtools/htslib/pull/397

  5. https://github.com/smowton/htslib/compare/parallel_read

  6. Note that taskyield is a no-op as of gcc 6.2.0

  7. This check could occur after task generation and before returning from the routine; however, the implementation did not consistently perform as well in practice, possibly due to an undetermined effect on task scheduling.

References

  1. Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010). doi:10.1002/cpe.1553

    Google Scholar 

  2. Bhimji, W., Bard, D., Romanus, M., Paul, D., Ovsyannikov, A., Friesen, B., Bryson, M., Correa, J., Lockwood, G.K., Tsulaia, V., et al.: Accelerating science with the NERSC burst buffer early user program. In: 2016 Cray User Group (CUG 2016) (2016). https://cug.org/proceedings/cug2016_proceedings/includes/files/pap162.pdf

  3. Bonfield, J.K.: The Scramble conversion tool. Bioinformatics 30(19), 2818–2819 (2014). doi:10.1093/bioinformatics/btu390

    Article  Google Scholar 

  4. Consortium TGP: Nature A global reference for human genetic variation. 526(7571), 68–74 (2015). doi:10.1038/nature15393

  5. Declerck, T., Antypas, K., Bard, D, Bhimji, W., Canon, S., Cholia, S., He, H.Y., Jacobsen, D., Prabhat, N.J.W.: Cori-A system to support data-intensive computing. In: 2016 Cray User Group (CUG 2016) (2016). https://cug.org/proceedings/cug2016_proceedings/includes/files/pap171.pdf

  6. Diekmann, R., Gehring, J., Luling, R., Monien, B., Nubel, M., Wanka, R.: Sorting large data sets on a massively parallel system. In: Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing, pp. 2–9 (1994). 10.1109/SPDP.1994.346188

  7. Faust, G.G., Hall, I.M.: SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30(17), 2503–2505 (2014). doi:10.1093/bioinformatics/btu314

    Article  Google Scholar 

  8. Herzeel, C., Costanza, P., Decap, D., Fostier, J., Reumers, J.: elPrep: high-performance preparation of sequence alignment/map files for variant calling. PLoS ONE 10(7), 1–16 (2015). doi:10.1371/journal.pone.0132868

    Article  Google Scholar 

  9. Intel Corporation: Programming Intel QuickAssist Technology Hardware Accelerators for Optimal Performance. Technical reports (2015). https://01.org/sites/default/files/page/332125_002_0.pdf

  10. Kelly, B.J., Fitch, J.R., Hu, Y., Corsmeier, D.J., Zhong, H., Wetzel, A.N., Nordquist, R.D., Newsom, D.L., White, P.: Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics. Genome Biol. 16(1), 6 (2015). doi:10.1186/s13059-014-0577-x

    Article  Google Scholar 

  11. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., Subgroup, G.P.D.P.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009). doi:10.1093/bioinformatics/btp352

    Article  Google Scholar 

  12. Lin, M.: Faster BAM sorting with SAMtools and RocksDB (2014). http://devblog.dnanexus.com/faster-bam-sorting-with-samtools-and-rocksdb/

  13. Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991). doi:10.1145/103727.103729

    Article  Google Scholar 

  14. OpenMP Architecture Review Board (2013) OpenMP Application Program Interface, Version 4.0. http://www.openmp.org/resources/openmp-compilers/

  15. Picard. https://broadinstitute.github.io/picard/

  16. Puckelwartz, M.J., Pesce, L.L., Nelakuditi, V., Dellefave-Castillo, L., Golbus, J.R., Day, S.M., Cappola, T.P., Dorn II, G.W., Foster, I.T., McNally, E.M.: Supercomputing for the parallelization of whole genome analysis. Bioinformatics 30(11), 1508 (2014). doi:10.1093/bioinformatics/btu071

    Article  Google Scholar 

  17. Raczy, C., Petrovski, R., Saunders, C.T., Chorny, I., Kruglyak, S., Margulies, E.H., Chuang, H.Y., Kllberg, M., Kumar, S.A., Liao, A., Little, K.M., Strmberg, M.P., Tanner, S.W.: Isaac: ultra-fast whole-genome secondary analysis on illumina sequencing platforms. Bioinformatics 29(16), 2041 (2013). doi:10.1093/bioinformatics/btt314

    Article  Google Scholar 

  18. Rengasamy, V., Madduri, K.: SPRITE: a fast parallel SNP detection pipeline, pp. 159–177. Springer, Cham (2016). doi:10.1007/978-3-319-41321-1_9

    Google Scholar 

  19. Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., Robinson, G.E.: Big data: astronomical or genomical? PLoS Biol. 13(7), 1–11 (2015). doi:10.1371/journal.pbio.1002195

    Article  Google Scholar 

  20. Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J., Prins, P.: Sambamba: fast processing of NGS alignment formats. Bioinformatics 31(12), 2032–2034 (2015). doi:10.1093/bioinformatics/btv098

    Article  Google Scholar 

  21. Tischler, G.: biobambam2 (2017). https://github.com/gt1/biobambam2

  22. Weeks, N.T., Luecke, G.R.: Performance analysis and optimization of SAMtools sorting. In: 4th International Workshop on Parallelism in Bioinformatics (PBio2016) (in press)

  23. Wetterstrand, K.: DNA Sequencing costs: data from the NHGRI genome sequencing program (GSP) (2016). http://www.genome.gov/sequencingcostsdata

Download references

Acknowledgements

This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathan T. Weeks.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weeks, N.T., Luecke, G.R. Optimization of SAMtools sorting using OpenMP tasks. Cluster Comput 20, 1869–1880 (2017). https://doi.org/10.1007/s10586-017-0874-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-0874-8

Keywords

Navigation