Skip to main content

CUSHAW Suite: Parallel and Efficient Algorithms for NGS Read Alignment

  • Chapter
  • First Online:
Algorithms for Next-Generation Sequencing Data
  • 1787 Accesses

Abstract

Next generation sequencing (NGS) technologies have enabled cheap, large-scale, and high-throughput production of short DNA sequence reads and thereby have promoted the explosive growth of data volume. Unfortunately, the produced reads are short and prone to contain errors that are incurred during sequencing cycles. Both large data volume and sequencing errors have complicated the mapping of NGS reads onto the reference genome and have motivated the development of various aligners for very short reads, typically less than 100 base pairs (bps) in length. As read length continues to increase, propelled by advances in NGS technologies, these longer reads tend to have higher sequencing error rates and more true mutations (including substitutions, insertions, or deletions) to the genome. Such new characteristics make inefficient the aligners, which are optimized for very short reads and support only ungapped alignments or gapped alignments with very limited number of gaps (typically one gap), and thereby call for new aligners with fully gapped alignment supported. In this chapter, we present the CUSHAW software suite for NGS read alignment, which is open-source and consists of three individual aligners: CUSHAW, CUSHAW2, and CUSHAW3. This suite offers parallel and efficient NGS read alignments to large genomes, such as the human genome, by harnessing multi-core CPUs or compute unified device architecture (CUDA)-enabled graphics processing units (GPUs). Moreover, it has the capability to align both base-space and color-space reads and is consistently shown to be one of the best alignment tools through our performance evaluations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)

    Article  Google Scholar 

  2. Simpson, J.T., Wong, K., Jackman, S.D., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009)

    Article  Google Scholar 

  3. Liu, Y., Schmidt, B., Maskell, D.L.: Parallelized short read assembly of large genomes using de Bruijn graphs. BMC Bioinformatics 12, 354 (2011)

    Article  Google Scholar 

  4. Luo, R., Liu, B., Xie, Y., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012)

    Article  Google Scholar 

  5. Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11, 473–83 (2010)

    Article  Google Scholar 

  6. Yang, X., Chockalingam, S.P., Aluru, S.: A survey of error-correction methods for next-generation sequencing. Brief. Bioinform. 14, 56–66 (2013)

    Article  Google Scholar 

  7. Peng, Y., Leung, H.C., Yiu, S.M., et al.: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27, i94–i101 (2011)

    Article  Google Scholar 

  8. Yang, X., Zola, J., Aluru, S.: Parallel metagenomic sequence clustering via sketching and quasi-clique enumeration on map-reduce clouds. In: 25th International Parallel and Distributed Processing Symposium, pp. 1223–1233 (2011)

    Google Scholar 

  9. Nguyen, T.D., Schmidt, B., Kwoh, C.K.: Fast Dendrogram-based OTU Clustering using Sequence Embedding. In: 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 63–72 (2014)

    Google Scholar 

  10. Smith, A.D., Xuan, Z., Zhang, M.Q.: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9, 128 (2008)

    Article  Google Scholar 

  11. Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008)

    Article  Google Scholar 

  12. Homer, N., Merriman, B., Nelson, S.F.: BFAST: an alignment tool for large scale genome resequencing. PLoS ONE 4, e7767 (2009)

    Article  Google Scholar 

  13. Langmead, B., Trapnell, C., Pop, M., et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

    Article  Google Scholar 

  14. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1755–1760 (2009)

    Google Scholar 

  15. Liu, Y., Schmidt, B., Maskell, D.L.: CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform. Bioinformatics 28, 1830–1837 (2012)

    Article  Google Scholar 

  16. Li, R., Yu, C., Li, Y., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009)

    Article  Google Scholar 

  17. Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010)

    Article  Google Scholar 

  18. Rizk, G., Lavenier, D.: GASSST: global alignment short sequence search tool. Bioinformatics 26, 2534–2540 (2010)

    Article  Google Scholar 

  19. Langmead, B., Salzberg, S.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)

    Article  Google Scholar 

  20. Liu, Y., Schmidt, B.: Long read alignment based on maximal exact match seeds. Bioinformatics 28, i318–i324 (2012)

    Article  Google Scholar 

  21. Liu, Y., Schmidt, B.: CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing. IEEE Des. Test 31, 31–39 (2014)

    Article  Google Scholar 

  22. Liu, Y., Popp, B., Schmidt, B.: CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding. PLoS ONE 9, e86869 (2014)

    Article  Google Scholar 

  23. González-Domínguez, J., Liu, Y., Schmidt, B.: Parallel and scalable short-read alignment on multi-core clusters Using UPC++. PLoS ONE 11, e0145490 (2016)

    Article  Google Scholar 

  24. Marco-Sola, S., Sammeth, M., Guigó, R., et al.: The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1885–1888 (2012)

    Article  Google Scholar 

  25. Mu, J.C., Jiang, H., Kiani, A., et al.: Fast and accurate read alignment for resequencing. Bioinformatics 28, 2366–2373 (2012)

    Article  Google Scholar 

  26. Luo, R., Wong, T., Zhu, J., et al.: SOAP3-dp: fast, accurate and sensitive GPU-based short-read aligner. PLoS ONE 8, e65632 (2013)

    Article  Google Scholar 

  27. Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (2013). arXiv:1303.3997 [q-bio.GN]

    Google Scholar 

  28. Altschul, S.F., Gish, W., Miller, W., et al.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  Google Scholar 

  29. Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, CA (1994)

    Google Scholar 

  30. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52, 4 (2005)

    MathSciNet  MATH  Google Scholar 

  31. Novoalign, http://www.novocraft.com/products/novoalign

  32. David, M., Dzamba, M., Lister, D., et al.: SHRiMP2: sensitive yet practical short read mapping. Bioinformatics 27, 1011–1012 (2011)

    Article  Google Scholar 

  33. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  34. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  35. Blom, J., Jakobi, T., Doppmeier, D., et al.: Exact and complete short read alignment to microbial genomes using Graphics Processing Unit programming. Bioinformatics 27, 1351–1358 (2011)

    Article  Google Scholar 

  36. Kiełbasa, S.M., Wan, R., Sato, K., et al.: Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011)

    Article  Google Scholar 

  37. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  38. Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all epsilon-matches over a given length. J. Comput. Biol. 13, 296–308 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  39. Ewing, B., Green, P.: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998)

    Article  Google Scholar 

  40. Rognes, T.: Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinformatics 12, 221 (2011)

    Article  Google Scholar 

  41. Liu, Y., Schmidt, B.: GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurr. Comput. Pract. Exp. 27, 958–972 (2014). doi:10.1002/cpe.3371

    Article  Google Scholar 

  42. Li, H., Handsaker, B., Wysoker, A., et al.: The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)

    Article  Google Scholar 

  43. Highnam, G., Wang, J.J., Kusler, D., et al.: An analytical framework for optimizing variant discovery from personal genomes. Nat. Commun. 6, 6275 (2015)

    Article  Google Scholar 

  44. Sherry, S.T., Ward, M.H., Kholodov, M., et al.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001)

    Google Scholar 

  45. Huang, W., Li, L., Myers, J.R., et al.: ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012)

    Article  Google Scholar 

  46. McKenna, A., Hanna, M., Banks, E., et al.: The Genome Analysis Toolkit: a MapReduce framework for analyzing next generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)

    Article  Google Scholar 

  47. Liu, Y., Schmidt, B., Maskell, D.L.: MSA-CUDA: multiple sequence alignment on graphics processing units with CUDA. In: 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, pp. 121–128 (2009)

    Google Scholar 

  48. Liu, Y., Maskelml, D.L., Schmidt, B.: CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC Res. Notes 2, 73 (2009)

    Article  Google Scholar 

  49. Alachiotis, N., Berger, S.A., Stamatakis, A.: Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel. BMC Bioinformatics 13, 196 (2012)

    Article  Google Scholar 

  50. Liu, Y., Schmidt, B.: SWAPHI: Smith-Waterman protein database search on Xeon Phi coprocessors. In: 25th IEEE International Conference on Application-specific Systems, Architectures and Processors, pp. 184–185 (2014)

    Google Scholar 

  51. Liu, Y., Tran, T.T., Lauenroth, F., et al.: SWAPHI-LS: Smith-Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In: 2014 IEEE International Conference on Cluster Computing, pp. 257–265 (2014)

    Google Scholar 

  52. Wang, L., Chan, Y., Duan, X., et al.: XSW: accelerating biological database search on Xeon Phi. In: 28th IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp. 950–957 (2014)

    Google Scholar 

Download references

Acknowledgements

We thank the Novocraft Technologies Company for granting a trial license of Novoalign.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongchao Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Liu, Y., Schmidt, B. (2017). CUSHAW Suite: Parallel and Efficient Algorithms for NGS Read Alignment. In: Elloumi, M. (eds) Algorithms for Next-Generation Sequencing Data. Springer, Cham. https://doi.org/10.1007/978-3-319-59826-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59826-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59824-6

  • Online ISBN: 978-3-319-59826-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics