High-Throughput Sequencing Data Analysis Software: Current State and Future Developments



In previous chapters of this book, there is detailed treatment of the technicalities of such problems as de novo sequence assembly and sequence alignment. In this chapter, we take a different perspective. Drawing on nearly a decade of the authors’ collective experience in providing bioinformatics support to bench-based biologists, we focus on the practical applications and on the biologist ­end-user’s experience. We attempt to make some observations, speculations and recommendations that might help the “wet” biologist who wishes to take responsibility for dealing their own data.


Sequence Assembly Alignment Tool Reference Genome Sequence Assembly Tool Pacific Bioscience 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Birol I, Jackman SD, Nielsen CB, Qian JQ, Varhol R, Stazyk G, Morin RD, Zhao Y, Hirst M, Schein JE, Horsman DE, Connors JM, Gascoyne RD, Marra MA, Jones SJ (2009) De novo transcriptome assembly with ABySS. Bioinformatics 25:2872–7.PubMedCrossRefGoogle Scholar
  2. Bryant DW Jr, Wong WK, Mockler TC (2009) QSRA: a quality-value guided de novo short read assembler. BMC Bioinformatics 10:69.PubMedCrossRefGoogle Scholar
  3. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38:1767–71.PubMedCrossRefGoogle Scholar
  4. Hiatt JB, Patwardhan RP, Turner EH, et al (2010) Parallel, tag-directed assembly of locally derived short sequence reads. Nat Methods 7:119–22.PubMedCrossRefGoogle Scholar
  5. Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, Ding L (2009) VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25:2283–5.PubMedCrossRefGoogle Scholar
  6. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. doi 10.1186/gb-2009-10-3-r25.PubMedCrossRefGoogle Scholar
  7. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–60.PubMedCrossRefGoogle Scholar
  8. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–9.PubMedCrossRefGoogle Scholar
  9. Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–8.PubMedCrossRefGoogle Scholar
  10. Li R, Zhu H, Ruan J, et al (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–72.PubMedCrossRefGoogle Scholar
  11. Maccallum I, Przybylski D, Gnerre S, et al (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10:R103.PubMedCrossRefGoogle Scholar
  12. Malhis N, Jones SJ (2010) High quality SNP calling using Illumina data at shallow coverage. Bioinformatics 26:1029–35.PubMedCrossRefGoogle Scholar
  13. Ondov BD, Varadarajan A, Passalacqua KD, Bergman NH (2008) Efficient mapping of Applied Biosystems SOLiD sequence data to a reference genome for functional genomic applications. Bioinformatics 24:2776–7.Google Scholar
  14. Phillippy AM, Schatz MC, Pop M (2008) Genome assembly forensics: finding the elusive mis-assembly. Genome Biol 9:R55.PubMedCrossRefGoogle Scholar
  15. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu AL, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJ, Hoodless PA, Birol I (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–12.PubMedCrossRefGoogle Scholar
  16. Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A, et al (2009) SHRiMP: Accurate Mapping of Short Color-space Reads. PLoS Comput Biol 5(5):e1000386. doi:10.1371/journal.pcbi.1000386.PubMedCrossRefGoogle Scholar
  17. Russell AG, Charette JM, Spencer DF, Gray MW (2006) An early evolutionary origin for the minor spliceosome. Nature 443:863–6.PubMedCrossRefGoogle Scholar
  18. SeqAnswers (2011) Accessed 12 Feb 2011.
  19. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–23.PubMedCrossRefGoogle Scholar
  20. Sorber K, Chiu C, Webster D, et al (2008) The long march: a sample preparation technique that enhances contig length and coverage by high-throughput short-read sequencing. PLoS One 3:e3495.PubMedCrossRefGoogle Scholar
  21. Sundquist A, Ronaghi M, Tang H, et al (2007) Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One 2:e484.PubMedCrossRefGoogle Scholar
  22. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–11.PubMedCrossRefGoogle Scholar
  23. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J (2010) MapSplice: accuratemapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38:e178.PubMedCrossRefGoogle Scholar
  24. Young AL, Abaan HO, Zerbino D, et al (2010) A new strategy for genome assembly using short sequence reads and reduced representation libraries. Genome Res 20:249–56.PubMedCrossRefGoogle Scholar
  25. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008 18:821–9.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.School of BiosciencesUniversity of ExeterExeterUK

Personalised recommendations