Enhancer RNAs pp 201-219 | Cite as

Bioinformatics Pipeline for Transcriptome Sequencing Analysis

  • Sarah DjebaliEmail author
  • Valentin Wucher
  • Sylvain Foissac
  • Christophe Hitte
  • Erwan Corre
  • Thomas DerrienEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1468)


The development of High Throughput Sequencing (HTS) for RNA profiling (RNA-seq) has shed light on the diversity of transcriptomes. While RNA-seq is becoming a de facto standard for monitoring the population of expressed transcripts in a given condition at a specific time, processing the huge amount of data it generates requires dedicated bioinformatics programs. Here, we describe a standard bioinformatics protocol using state-of-the-art tools, the STAR mapper to align reads onto a reference genome, Cufflinks to reconstruct the transcriptome, and RSEM to quantify expression levels of genes and transcripts. We present the workflow using human transcriptome sequencing data from two biological replicates of the K562 cell line produced as part of the ENCODE3 project.

Key words

Transcriptome sequencing Protocols RNA-seq Bioinformatics workflow 


  1. 1.
    Wang Z, Gerstein M, Snyder M (2009) RNA-seq: a revolutionary tool for transcriptomics. Nature 10:57–63Google Scholar
  2. 2.
    Djebali S, Davis CA, Merkel A et al (2012) Landscape of transcription in human cells. Nature 488:101–108CrossRefGoogle Scholar
  3. 3.
    Dobin A, Davis CA, Schlesinger F et al (2012) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Li B, Ruotti V, Stewart RM et al (2010) RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500CrossRefPubMedGoogle Scholar
  6. 6.
    T.E.P. Consortium, T.E.P. Consortium, O.C. Data Analysis Coordination et al (2013) An integrated encyclopedia of DNA elements in the human genome. Nature 488:57–74Google Scholar
  7. 7.
    Martens JHA, Stunnenberg HG (2013) BLUEPRINT: mapping human blood cell epigenomes. Haematologica 98:1487–1489CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Steijger T, Abril JF, Engström PG et al (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat Methods 10:1177–1184Google Scholar
  9. 9.
    Engström PG, Steijger T, Sipos B et al (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 10:1185–1191Google Scholar
  10. 10.
    Roberts A, Goff L, Pertea G et al (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7:562–578CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Marco-Sola S, Sammeth M, Guigó R et al (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods 9:1185–1188Google Scholar
  12. 12.
    Pertea M, Pertea GM, Antonescu CM et al (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33:290–295CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Montgomery SB, Sammeth M, Gutierrez-Arcelus M et al (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464:773–777CrossRefPubMedGoogle Scholar
  14. 14.
    Roberts A, Pachter L (2013) Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods 10:71–73CrossRefPubMedGoogle Scholar
  15. 15.
    Patro R, Mount SM, Kingsford C (2014) Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol 32:462–464CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512CrossRefPubMedGoogle Scholar
  17. 17.
    Sacomoto GAT, Kielbassa J, Chikhi R et al (2012) KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. BMC Bioinformatics 13(Suppl 6):S5PubMedPubMedCentralGoogle Scholar
  18. 18.
    Rosenbloom KR, Sloan CA, Malladi VS et al (2013) ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res 41:D56–D63CrossRefPubMedGoogle Scholar
  19. 19.
    Harrow J, Frankish A, Gonzalez JM et al (2012) GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 22:1760–1774CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Derrien T, Johnson R, Bussotti G et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22:1775–1789CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Pei B, Sisu C, Frankish A et al (2012) The GENCODE pseudogene resource. Genome Biol 13:R51CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Cunningham F, Amode MR, Barrell D et al (2015) Ensembl 2015. Nucleic Acids Res 43:D662–D669CrossRefPubMedGoogle Scholar
  24. 24.
    Knowles DG, Röder M, Merkel A et al (2013) Grape RNA-seq analysis pipeline environment. Bioinformatics 29:614–621CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Jiang L, Schlesinger F, Davis CA et al (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res 21:1543–1551CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Risso D, Ngai J, Speed TP et al (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32:896–902CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Sarah Djebali
    • 1
    Email author
  • Valentin Wucher
    • 2
  • Sylvain Foissac
    • 1
  • Christophe Hitte
    • 2
  • Erwan Corre
    • 3
  • Thomas Derrien
    • 2
    Email author
  1. 1.INRA GenPhySECastanet-TolosanFrance
  2. 2.CNRS UMR6290 Dog Genetic TeamRennesFrance
  3. 3.CNRS-UPMC, ABiMS PlatformStation Biologique de Roscoff, FR2424, CS 90074RoscoffFrance

Personalised recommendations