Analysis of Next-Generation Sequencing Data Using Galaxy

  • Daniel Blankenberg
  • Jennifer Hillman-Jackson
Part of the Methods in Molecular Biology book series (MIMB, volume 1150)


The extraordinary throughput of next-generation sequencing (NGS) technology is outpacing our ability to analyze and interpret the data. This chapter will focus on practical informatics methods, strategies, and software tools for transforming NGS data into usable information through the use of a web-based platform, Galaxy. The Galaxy interface is explored through several different types of example analyses. Instructions for running one’s own Galaxy server on local hardware or on cloud computing resources are provided. Installing new tools into a personal Galaxy instance is also demonstrated.

Key words

NGS Genomics Informatics RNA-seq ChIP-seq Workflows Reproducibility Open source Web-based workbench Big data analysis 



Efforts of the Galaxy team (E. Afgan, D. Baker, D.B., D. Bouvier, M. Cech, D. Clements, N. Coraor, C. Eberhard, D. Francheteau, J. Goecks, S. Guerler, J.J., G. Von Kuster, R. Lazarus, Anton Nekrutenko, and James Taylor) were instrumental in making this work possible. We extend a special thank you to the Galaxy community for their continuing contributions, both inspirational and technical.


  1. 1.
    Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15(10):1451–1455. doi: 10.1101/gr.4086505, gr.4086505 [pii]PubMedCentralPubMedCrossRefGoogle Scholar
  2. 2.
    Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. Chapter 19:Unit 19 10 11-21. doi:  10.1002/0471142727.mb1910s89
  3. 3.
    Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86. doi: 10.1186/gb-2010-11-8-r86, gb-2010-11-8-r86 [pii]PubMedCentralPubMedCrossRefGoogle Scholar
  4. 4.
    Kasprzyk A (2011) BioMart: driving a paradigm change in biological data management. Database (Oxford) 2011:bar049. doi: 10.1093/database/bar049 bar049 [pii]
  5. 5.
    Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ (2004) The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32(Database issue):D493–D496. doi: 10.1093/nar/gkh103 32/suppl_1/D493 [pii] PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36. doi: 10.1186/gb-2013-14-4-r36, gb-2013-14-4-r36 [pii]PubMedCrossRefGoogle Scholar
  7. 7.
    Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515. doi: 10.1038/nbt.1621 nbt.1621 [pii] PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. doi: 10.1093/bioinformatics/btp324 btp324 [pii] PubMedCentralPubMedCrossRefGoogle Scholar
  9. 9.
    Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. doi: 10.1186/gb-2009-10-3-r25 gb-2009-10-3-r25 [pii] PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9(9):R137. doi: 10.1186/gb-2008-9-9-r137 gb-2008-9-9-r137 [pii] PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006. doi: 10.1101/gr.229102 PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38(6):1767–1771. doi: 10.1093/nar/gkp1137 gkp1137 [pii] PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Biochemistry and Molecular BiologyPenn State UniversityUniversity ParkUSA

Personalised recommendations