Skip to main content

Analysis of RNA-Seq Data Using TopHat and Cufflinks

  • Protocol
Plant Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1374))

Abstract

The recent advances in high throughput RNA sequencing (RNA-Seq) have generated huge amounts of data in a very short span of time for a single sample. These data have required the parallel advancement of computing tools to organize and interpret them meaningfully in terms of biological implications, at the same time using minimum computing resources to reduce computation costs. Here we describe the method of analyzing RNA-seq data using the set of open source software programs of the Tuxedo suite: TopHat and Cufflinks. TopHat is designed to align RNA-seq reads to a reference genome, while Cufflinks assembles these mapped reads into possible transcripts and then generates a final transcriptome assembly. Cufflinks also includes Cuffdiff, which accepts the reads assembled from two or more biological conditions and analyzes their differential expression of genes and transcripts, thus aiding in the investigation of their transcriptional and post transcriptional regulation under different conditions. We also describe the use of an accessory tool called CummeRbund, which processes the output files of Cuffdiff and gives an output of publication quality plots and figures of the user’s choice. We demonstrate the effectiveness of the Tuxedo suite by analyzing RNA-Seq datasets of Arabidopsis thaliana root subjected to two different conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12(2):87–98. doi:10.1038/nrg2934

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11(1):31–46. doi:10.1038/nrg2626

    Article  CAS  PubMed  Google Scholar 

  3. Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24(3):133–141. doi:10.1016/j.tig.2007.12.007

    Article  CAS  PubMed  Google Scholar 

  4. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–1517. doi:10.1101/gr.079558.108

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Ozsolak F, Platt AR, Jones DR, Reifenberger JG, Sass LE, McInerney P, Thompson JF, Bowers J, Jarosz M, Milos PM (2009) Direct RNA sequencing. Nature 461(7265):814–818. doi:10.1038/nature08390

    Article  CAS  PubMed  Google Scholar 

  6. Roy SW, Irimia M (2008) When good transcripts go bad: artifactual RT-PCR ‘splicing’ and genome analysis. BioEssays 30(6):601–605. doi:10.1002/bies.20749

    Article  CAS  PubMed  Google Scholar 

  7. Garber M, Grabherr MG, Guttman M, Trapnell C (2011) Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 8(6):469–477. doi:10.1038/nmeth.1613

    Article  CAS  PubMed  Google Scholar 

  8. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515. doi:10.1038/nbt.1621

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63. doi:10.1038/nrg2484

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470

    Article  CAS  PubMed  Google Scholar 

  11. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci U S A 93(20):10614–10619

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  12. Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius R, Watahiki A, Nakamura M, Arakawa T, Fukuda S, Sasaki D, Podhajska A, Harbers M, Kawai J, Carninci P, Hayashizaki Y (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100(26):15776–15781. doi:10.1073/pnas.2136655100

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  13. Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, Sasaki D, Imamura K, Kai C, Harbers M, Hayashizaki Y, Carninci P (2006) CAGE: cap analysis of gene expression. Nat Methods 3(3):211–222. doi:10.1038/nmeth0306-211

    Article  CAS  PubMed  Google Scholar 

  14. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270(5235):484–487

    Article  CAS  PubMed  Google Scholar 

  15. Reinartz J, Bruyns E, Lin JZ, Burcham T, Brenner S, Bowen B, Kramer M, Woychik R (2002) Massively parallel signature sequencing (MPSS) as a tool for in-depth quantitative gene expression profiling in all organisms. Brief Funct Genomic Proteomic 1(1):95–104

    Article  CAS  PubMed  Google Scholar 

  16. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol 18(6):630–634. doi:10.1038/76469

    Article  CAS  PubMed  Google Scholar 

  17. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628. doi:10.1038/nmeth.1226

    Article  CAS  PubMed  Google Scholar 

  18. Feldmeyer B, Wheat CW, Krezdorn N, Rotter B, Pfenninger M (2011) Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance. BMC Genomics 12:317. doi:10.1186/1471-2164-12-317

    Article  PubMed Central  PubMed  Google Scholar 

  19. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. doi:10.1186/gb-2009-10-3-r25

    Article  PubMed Central  PubMed  Google Scholar 

  20. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26(5):589–595. doi:10.1093/bioinformatics/btp698

    Article  PubMed Central  PubMed  Google Scholar 

  21. Simpson JT, Durbin R (2010) Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12):i367–i373. doi:10.1093/bioinformatics/btq217

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111. doi:10.1093/bioinformatics/btp120

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  23. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578. doi:10.1038/nprot.2012.016

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Goff L, Trapnell C, Kelley D, Guide PRSCU, biocViews Clustering D, DataRepresentation D, GeneExpression I, MultipleComparison Q, RNASeq R, since BioC IB (2012) Analysis, exploration, manipulation, and visualization of Cufflinks high-throughput sequencing data. R package version 2 (1)

    Google Scholar 

  25. Goecks J, Nekrutenko A, Taylor J, Galaxy T (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86. doi:10.1186/gb-2010-11-8-r86

    Article  PubMed Central  PubMed  Google Scholar 

  26. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966–1967. doi:10.1093/bioinformatics/btp336

    Article  CAS  PubMed  Google Scholar 

  27. Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21(6):936–939. doi:10.1101/gr.111120.110

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18), e178. doi:10.1093/nar/gkq622

    Article  PubMed Central  PubMed  Google Scholar 

  29. Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881. doi:10.1093/bioinformatics/btq057

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  30. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol MJ, Gnirke A, Nusbaum C, Rinn JL, Lander ES, Regev A (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510. doi:10.1038/nbt.1633

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  31. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829. doi:10.1101/gr.074492.107

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  32. Lee S, Seo CH, Lim B, Yang JO, Oh J, Kim M, Lee S, Lee B, Kang C, Lee S (2011) Accurate quantification of transcriptome from RNA-Seq data by effective length normalization. Nucleic Acids Res 39(2):e9. doi:10.1093/nar/gkq1015

    Article  PubMed Central  PubMed  Google Scholar 

  33. Wang L, Feng Z, Wang X, Wang X, Zhang X (2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26(1):136–138. doi:10.1093/bioinformatics/btp612

    Article  PubMed  Google Scholar 

  34. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106. doi:10.1186/gb-2010-11-10-r106

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  35. Twine NA, Janitz K, Wilkins MR, Janitz M (2011) Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer’s disease. PLoS One 6(1), e16266

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  36. Vidal EA, Moyano TC, Krouk G, Katari MS, Tanurdzic M, McCombie WR, Coruzzi GM, Gutierrez RA (2013) Integrated RNA-seq and sRNA-seq analysis identifies novel nitrate-responsive genes in Arabidopsis thaliana roots. BMC Genomics 14:701. doi:10.1186/1471-2164-14-701

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  37. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079. doi:10.1093/bioinformatics/btp352

    Article  PubMed Central  PubMed  Google Scholar 

  38. Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB, Feldmann KA (2006) Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol Biol 60(1):69–85. doi:10.1007/s11103-005-2564-9

    Article  CAS  PubMed  Google Scholar 

  39. Filichkin SA, Priest HD, Givan SA, Shen R, Bryant DW, Fox SE, Wong WK, Mockler TC (2010) Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res 20(1):45–58. doi:10.1101/gr.093302.109

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sreya Ghosh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Ghosh, S., Chan, CK.K. (2016). Analysis of RNA-Seq Data Using TopHat and Cufflinks. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology, vol 1374. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3167-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3167-5_18

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3166-8

  • Online ISBN: 978-1-4939-3167-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics