Skip to main content

Advertisement

Log in

Review of alignment and SNP calling algorithms for next-generation sequencing data

  • Animal Genetics • Review
  • Published:
Journal of Applied Genetics Aims and scope Submit manuscript

Abstract

Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms—suffix tries and hash tables—have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abouelhoda MI, Kurtz S, Ohlebusch E (2004) Replacing suffix trees with enhanced suffix arrays. J Discrete Algorithms 2:53–86

    Article  Google Scholar 

  • Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, Sahinalp SC, Gibbs RA, Eichler EE (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41:1061–1067

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Altmann A1, Weber P, Bader D, Preuss M, Binder EB, Müller-Myhsok B (2012) A beginners guide to SNP calling from high-throughput DNA-sequencing data. Hum Genet 131(10):1541–54

  • Ansorge WJ (2009) Next-generation DNA sequencing techniques. N Biotechnol 25:195–203

    Article  PubMed  CAS  Google Scholar 

  • Auffray C, Chen Z, Hood L (2009) Systems medicine: the future of medical genomics and healthcare. Genome Med 1:2

    Article  PubMed  PubMed Central  Google Scholar 

  • Blanca JM, Pascual L, Ziarsolo P, Nuez F, Cañizares J (2011) ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using next generation sequence. BMC Genomics 12:285

    Article  PubMed  PubMed Central  Google Scholar 

  • Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R; 1000 Genomes Project Analysis Group (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • David M, Dzamba M, Lister D, Ilie L, Brudno M (2011) SHRiMP2: sensitive yet practical SHort Read Mapping. Bioinformatics 27(7):1011–1012

    Article  PubMed  CAS  Google Scholar 

  • Guffanti A, Iacono M, Pelucchi P, Kim N, Soldà G, Croft LJ, Taft RJ, Rizzi E, Askarian-Amiri M, Bonnal RJ, Callari M, Mignone F, Pesole G, Bertalot G, Bernardi LR, Albertini A, Lee C, Mattick JS, Zucchi I, De Bellis G (2009) A transcriptional sketch of a primary human breast cancer by 454 deep sequencing. BMC Genomics 10:163–179

    Article  PubMed  PubMed Central  Google Scholar 

  • Handel AE, Disanto G, Ramagopalan SV (2013) Next-generation sequencing in understanding complex neurological disease. Expert Rev Neurother 13(2):215–227

    Article  PubMed  CAS  Google Scholar 

  • Hoffmann S, Otto C, Kurtz S, Sharma CM, Khaitovich P, Vogel J, Stadler PF, Hackermüller J (2009) Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comput Biol 5(9):e1000502

    Article  PubMed  PubMed Central  Google Scholar 

  • Homer N, Merriman B, Nelson SF (2009) BFAST: an alignment tool for large scale genome resequencing. PLoS One 4(11):e7767

    Article  PubMed  PubMed Central  Google Scholar 

  • Horner DS, Pavesi G, Castrignanò T, De Meo PD, Liuni S, Sammeth M, Picardi E, Pesole G (2010) Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Brief Bioinform 11:181–197

    Article  PubMed  CAS  Google Scholar 

  • Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22(3):568–576

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Langmead B, Salzberg S (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    Article  PubMed  PubMed Central  Google Scholar 

  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5):589–595

    Article  PubMed  PubMed Central  Google Scholar 

  • Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11:473–483

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Li H, Ruan J, Durbin R (2008a) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Li R, Li Y, Kristiansen K, Wang J (2008b) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714

    Article  PubMed  CAS  Google Scholar 

  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup (2009a) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  • Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J (2009b) SNP detection for massively parallel whole-genome resequencing. Genome Res 19(6):1124–1132

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009c) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25:1966–1967

    Article  PubMed  CAS  Google Scholar 

  • Liu C-M, Wong T, Wu E, Luo R, Yiu S-M, Li Y, Wang B, Yu C, Chu X, Zhao K, Li R, Lam T-W (2012) SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28(6):878–879

    Article  PubMed  CAS  Google Scholar 

  • Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18

    Article  PubMed  PubMed Central  Google Scholar 

  • Luo R, Wong T, Zhu J, Liu C-M, Zhu X, Wu E, Lee L-K, Lin H, Zhu W, Cheung DW, Ting H-F, Yiu S-M, Peng S, Yu C, Li Y, Li R, Lam T-W (2013) SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner. PLoS One 8(5):e65632

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Ma B, Tromp J, Li M (2002) PatternHunter: faster and more sensitive homology search. Bioinformatics 18:440–445

    Article  PubMed  CAS  Google Scholar 

  • Malhis N, Butterfield YSN, Ester M, Jones SJM (2009) Slider—maximum use of probability information for alignment of short sequence reads and SNP detection. Bioinformatics 25:6–13

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24:133–141

    Article  PubMed  CAS  Google Scholar 

  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Medvedev P, Stanciu M, Brudno M (2009) Computational methods for discovering structural variation with next-generation sequencing. Nat Methods 6:S13–S20

    Article  PubMed  CAS  Google Scholar 

  • Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46

    Article  PubMed  CAS  Google Scholar 

  • Meuwissen T, Goddard M (2010) Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics 185:623–631

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453

    Article  PubMed  CAS  Google Scholar 

  • Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12(6):443–451

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Ning Z, Cox AJ, Mullikin JC (2001) SSAHA: a fast search method for large DNA databases. Genome Res 11(10):1725–1729

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z (2014) A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform 15(2):256–278

    Article  PubMed  PubMed Central  Google Scholar 

  • Pérez-Enciso M, Ferretti L (2010) Massive parallel sequencing in animal genetics: wherefroms and wheretos. Anim Genet 41(6):561–569

    Article  PubMed  Google Scholar 

  • Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M, Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, Doré J, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J; MetaHIT Consortium, Bork P, Ehrlich SD, Wang J (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59–65

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Ruffalo M, LaFramboise T, Koyutürk M (2011) Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 27(20):2790–2796

    Article  PubMed  CAS  Google Scholar 

  • Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A, Brudno M (2009) SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol 5:e1000386

    Article  PubMed  PubMed Central  Google Scholar 

  • Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA, Liu Y, Weinstock GM, Wheeler DA, Gibbs RA, Yu F (2010) A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res 20(2):273–280

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  PubMed  CAS  Google Scholar 

  • Smith AD, Xuan Z, Zhang MQ (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9:128

    Article  PubMed  PubMed Central  Google Scholar 

  • Smith AD, Chung WY, Hodges E, Kendall J, Hannon G, Hicks J, Xuan Z, Zhang MQ (2009) Updates to the RMAP short-read mapping software. Bioinformatics 25:2841–2842

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O’Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321:956–960

    Article  PubMed  CAS  Google Scholar 

  • Taylor KH, Kramer RS, Davis JW, Guo J, Duff DJ, Xu D, Caldwell CW, Shi H (2007) Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. Cancer Res 67:8511–8518

    Article  PubMed  CAS  Google Scholar 

  • Van Tassell CP, Smith TP, Matukumalli LK, Taylor JF, Schnabel RD, Lawley CT, Haudenschild CD, Moore SS, Warren WC, Sonstegard TS (2008) SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat Methods 5(3):247–252

    Article  PubMed  Google Scholar 

  • Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H (2011) SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res 39(19):e132

    Article  PubMed  CAS  PubMed Central  Google Scholar 

Download references

Compliance with Ethical Standards

This study was conducted without external funding sources and did not involve research in animals.

Conflict of interest

All authors claim no potential conflicts of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Mielczarek.

Additional information

Communicated by: Maciej Szydlowski

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mielczarek, M., Szyda, J. Review of alignment and SNP calling algorithms for next-generation sequencing data. J Appl Genetics 57, 71–79 (2016). https://doi.org/10.1007/s13353-015-0292-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13353-015-0292-7

Keywords

Navigation