Skip to main content

A Guide to RNAseq Data Analysis Using Bioinformatics Approaches

  • Chapter
  • First Online:
Advances in Bioinformatics

Abstract

The emergence of Next Generation Sequencing (NGS), such as DNA, RNA and other small RNA sequencing technologies, gave rise to a huge amount of raw data on a massive scale. To analyse that data and to obtain the biological interpretation as a challenging act, advancements in computational biology and bioinformatics applications emerged as the need of the hour. RNAseq accounts for exploration of comprehensive expression profile of genes and quantifies the presence of RNA content in the biological sample. In addition to this, RNAseq also provides information for alternative splice variants, novel gene identification, differentially expressing genes, etc. The workflow for RNAseq data analysis requires quality check of the data, mapping onto a reference genome/transcriptome, read quantification, differential expression analysis and functional annotation. Various tools and softwares with different algorithms have been developed to provide biological understanding of the data and to meet the demands of the analyst. An overview of the tools and softwares has been provided in the chapter that can be exploited to analyse the data for different investigations. Also, a glimpse of other RNAseq techniques such as single cell RNAseq and small RNA sequencing has been discussed as an introduction to newer forms of RNA sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  • Andrews S (2010) FastQC: a quality control tool for high throughput sequence data

    Google Scholar 

  • Ansorge WJ (2009) Next-generation DNA sequencing techniques. New Biotechnol 25:195–203

    Article  CAS  Google Scholar 

  • Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32:D115–D119

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Avital G, Hashimshony T, Yanai I (2014) Seeing is believing: new methods for in situsingle-cell transcriptomics. Genome Biol 15:110

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Betel D, Wilson M, Gabow A, Marks DS, Sander C (2008) The microRNA.org resource: targets and expression. Nucleic Acids Res 36:D149–D153

    Article  CAS  PubMed  Google Scholar 

  • Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bushmanova E, Antipov D, Lapidus A, Suvorov V, Prjibelski AD (2016) rnaQUAST: a quality assessment tool for de novo transcriptome assemblies. Bioinformatics 32:2210–2212

    Article  CAS  PubMed  Google Scholar 

  • Bushmanova E, Antipov D, Lapidus A, Prjibelski AD (2019) rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. GigaScience 8:giz100

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, SzczeÅ›niak MW, Gaffney DJ, Elo LL, Zhang X (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Deorowicz S, Grabowski S (2011) Compression of DNA sequence reads in FASTQ format. Bioinformatics 27:860–862

    Article  CAS  PubMed  Google Scholar 

  • Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10:1–7

    Article  Google Scholar 

  • Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230

    Article  CAS  PubMed  Google Scholar 

  • Freedman A (2016) Best practices for de novo transcriptome assembly with trinity

    Google Scholar 

  • Garber M, Grabherr MG, Guttman M, Trapnell C (2011) Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 8:469–477

    Article  CAS  PubMed  Google Scholar 

  • Geniza M, Jaiswal P (2017) Tools for building de novo transcriptome assembly. Curr Plant Biol 11:41–45

    Article  Google Scholar 

  • Ghosh S, Chan C-KK (2016) Analysis of RNA-Seq data using TopHat and Cufflinks. In: Plant bioinformatics. Springer, New York, pp 339–361

    Chapter  Google Scholar 

  • Gilbert D (2003) Sequence file format conversion with command-line Readseq. Curr Protoc Bioinformatics 00(1):A-1E.1–A-1E.4

    Article  Google Scholar 

  • Glebova O, Temate-Tiagueu Y, Caciula A, Al Seesi S, Artyomenko A, Mangul S, Lindsay J, Măndoiu II, Zelikovsky A (2016) Transcriptome quantification and differential expression from NGS data. In: Computational methods for next generation sequencing data analysis. Wiley, Hoboken, NJ, pp 301–327

    Chapter  Google Scholar 

  • Gordon A, Hannon G (2010) Fastx-toolkit. FASTQ/A short-reads pre-processing tools. Unpublished. http://hannonlab.cshl.edu/fastx_toolkit

  • Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q et al (2011) Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat Biotechnol 29:644

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Griffiths-Jones S, Saini HK, Van Dongen S, Enright AJ (2007) miRBase: tools for microRNA genomics. Nucleic Acids Res 36:D154–D158

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Han Y, Gao S, Muegge K, Zhang W, Zhou B (2015) Advanced applications of RNA sequencing and challenges. Bioinform Biol Insights 9:BBI-S28991

    Article  Google Scholar 

  • Hedges DJ, Guettouche T, Yang S, Bademci G, Diaz A, Andersen A, Hulme WF, Linker S, Mehta A, Edwards YJ (2011) Comparison of three targeted enrichment strategies on the SOLiD sequencing platform. PLoS One 6:e18595

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Khatri P, Sirota M, Butte AJ (2012) Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol 8:e1002375

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim S-K, Nam J-W, Rhee J-K, Lee W-J, Zhang B-T (2006) miTarget: microRNA target gene prediction using a support vector machine. BMC Bioinformatics 7:1–12

    Article  CAS  Google Scholar 

  • Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz DA, Kirschner MW (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161:1187–1201

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kopylova E, Noé L, Touzet H (2012) SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28:3211–3217

    Article  CAS  PubMed  Google Scholar 

  • Korthauer KD, Chu L-F, Newton MA, Li Y, Thomson J, Stewart R, Kendziorski C (2015) scDD: a statistical approach for identifying differential distributions in single-cell RNA-seq experiments. bioRxiv 035501

    Google Scholar 

  • Lall S, Grün D, Krek A, Chen K, Wang Y-L, Dewey CN, Sood P, Colombo T, Bray N, MacMenamin P (2006) A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol 16:460–471

    Article  CAS  PubMed  Google Scholar 

  • Langmead B (2010) Aligning short sequencing reads with Bowtie. Curr Protoc Bioinformatics 32:11–17

    Article  Google Scholar 

  • Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120:15–20

    Article  CAS  PubMed  Google Scholar 

  • Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659

    Article  CAS  PubMed  Google Scholar 

  • Li W, Jiang T (2012) Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads. Bioinformatics 28:2914–2921

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li Z, Xuejun L (2016) A comprehensive review on RNA-seq data analysis. Trans Nanjing Univ Aeronaut Astronaut 33(3):339–361

    CAS  Google Scholar 

  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Li W, Feng J, Jiang T (2011) IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J Comput Biol 18:1693–1707

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161:1202–1214

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Maragkakis M, Reczko M, Simossis VA, Alexiou P, Papadopoulos GL, Dalamagas T, Giannopoulos G, Goumas G, Koukis E, Kourtis K (2009) DIANA-microT web server: elucidating microRNA functions through target prediction. Nucleic Acids Res 37:W273–W276

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Maretty L, Sibbesen JA, Krogh A (2014) Bayesian transcriptome assembly. Genome Biol 15:501

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12

    Article  Google Scholar 

  • Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682

    Article  CAS  PubMed  Google Scholar 

  • Merriman B, Ion Torrent R&D Team, Rothberg JM (2012) Progress in ion torrent semiconductor chip based sequencing. Electrophoresis 33:3397–3417

    Article  CAS  PubMed  Google Scholar 

  • Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010:pdb-prot5448

    Article  PubMed  Google Scholar 

  • Mezlini AM, Smith EJ, Fiume M, Buske O, Savich GL, Shah S, Aparicio S, Chiang DY, Goldenberg A, Brudno M (2013) iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res 23:519–529

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, Thomas PD (2016) PANTHER version 11: expanded annotation data from gene ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res 45:D183–D189

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Motameny S, Wolters S, Nürnberg P, Schumacher B (2010) Next generation sequencing of miRNAs–strategies, resources and methods. Genes 1:70–84

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mulcare D (2004) NGS toolkit. Part 8: the National Geodetic Survey. NADCON tool. Prof Surv Mag 24(2):120–125

    Google Scholar 

  • Nakasugi K, Crowhurst R, Bally J, Waterhouse P (2014) Combining transcriptome assemblies from multiple de novo assemblers in the allo-tetraploid plant Nicotiana benthamiana. PLoS One 9:e91776

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Niemenmaa M, Kallio A, Schumacher A, Klemelä P, Korpelainen E, Heljanko K (2012) Hadoop-BAM: directly manipulating next generation sequencing data in the cloud. Bioinformatics 28:876–877

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33:290–295

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y (2012) A tale of three next generation sequencing platforms: comparison of ion torrent, pacific biosciences and illumina MiSeq sequencers. BMC Genomics 13:1–13

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Rehmsmeier M, Steffen P, Höchsmann M, Giegerich R (2004) Fast and effective prediction of microRNA/target duplexes. RNA 10:1507–1517

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Roberts A, Pimentel H, Trapnell C, Pachter L (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27:2325–2329

    Article  CAS  PubMed  Google Scholar 

  • Schmieder R, Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–864

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sharma P, Bhunia S, Poojary SS, Tekcham DS, Barbhuiya MA, Gupta S, Shrivastav BR, Tiwari PK (2016) Global methylation profiling to identify epigenetic signature of gallbladder cancer and gallstone disease. Tumor Biol 37:14687–14699

    Article  CAS  Google Scholar 

  • Sharma P, Kumar S, Beriwal S, Sharma P, Bhairappanavar SB, Verma RJ, Das J (2020) Comparative transcriptome profiling and co-expression network analysis reveals functionally coordinated genes associated with metabolic processes of Andrographis paniculata. Plant Gene 23:100234

    Article  CAS  Google Scholar 

  • Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA (2007) The DAVID gene functional classification tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8:R183

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Smith-Unna R, Boursnell C, Patro R, Hibberd JM, Kelly S (2016) TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134–1144

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • T O’Neil S, Emrich SJ (2013) Assessing De Novo transcriptome assembly metrics for consistency and utility. BMC Genomics 14:465

    Article  Google Scholar 

  • Tomescu AI, Kuosmanen A, Rizzi R, Mäkinen V (2013) A novel min-cost flow method for estimating transcript expression with RNA-Seq. BMC Bioinformatics 14(Suppl 5):S15

    Article  PubMed  PubMed Central  Google Scholar 

  • Trapnell C (2013) Cufflinks. cuffdiff (v6). Open module on GenePattern public server. GenePattern. https://software.broadinstitute.org/cancer/software/genepattern/modules/docs/Cufflinks.cuffdiff/6

  • Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Voshall A, Moriyama EN (2018) Next-generation transcriptome assembly: strategies and performance analysis. In: Bioinformatics in the era of post genomics and big data. IntechOpen, London, pp 15–36

    Google Scholar 

  • Wang J, Duncan D, Shi Z, Zhang B (2013) WEB-based gene set analysis toolkit (WebGestalt): update 2013. Nucleic Acids Res 41:W77–W83

    Article  PubMed  PubMed Central  Google Scholar 

  • Wang Y, Hu H, Li X (2017) rRNAFilter: a fast approach for ribosomal RNA read removal without a reference database. J Comput Biol 24:368–375

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P, Klioutchnikov G, Kriventseva EV, Zdobnov EM (2018) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543–548

    Article  CAS  PubMed  Google Scholar 

  • Wong N, Wang X (2015) miRDB: an online resource for microRNA target prediction and functional annotations. Nucleic Acids Res 43:D146–D152

    Article  CAS  PubMed  Google Scholar 

  • Wyrzykiewicz T, Cole D (1994) Sequencing of oligonucleotide phosphorothioates based on solid-supported desulfurization. Nucleic Acids Res 22:2667–2669

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, Huang W, He G, Gu S, Li S (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30:1660–1666

    Article  CAS  PubMed  Google Scholar 

  • Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L (2006) WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 34:W293–W297

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yu G, Wang L-G, Han Y, He Q-Y (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16:284–287

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zappia L, Phipson B, Oshlack A (2018) Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput Biol 14:e1006245

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Zhao S, Zhang B, Zhang Y, Gordon W, Du S, Paradis T, Vincent M, von Schack D (2016) Bioinformatics for RNA-seq data analysis. Bioinformatics—updated features and applications. InTechOpen, London, pp 125–149

    Google Scholar 

  • Zyprych-Walczak J, Szabelska A, Handschuh L, Górczak K, Klamecka K, Figlerowicz M, Siatkowski I (2015) The impact of normalization methods on RNA-Seq data analysis. Biomed Res Int 2015:621690

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sharma, P., Sharma, B.S., Verma, R.J. (2021). A Guide to RNAseq Data Analysis Using Bioinformatics Approaches. In: Singh, V., Kumar, A. (eds) Advances in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-33-6191-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-6191-1_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-6190-4

  • Online ISBN: 978-981-33-6191-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics