Abstract
RNA-Seq has become the de facto standard technique for characterization and quantification of transcriptomes, and a large number of methods and tools have been proposed to model and detect differential gene expression based on the comparison of transcript abundances across different samples. However, state-of-the-art methods for this task are usually designed for pairwise comparisons, that is, can identify significant variation of expression only between two conditions or samples. We describe the use of RNentropy, a methodology based on information theory, devised to overcome this limitation. RNentropy can thus detect significant variations of gene expression in RNA-Seq data across any number of samples and conditions, and can be applied downstream of any analysis pipeline for the quantification of gene expression from raw sequencing data. RNentropy takes as input gene (or transcript) expression values, defined with any measure suitable for the comparison of transcript levels across samples and conditions. The output consists of genes (or transcripts) exhibiting significant variation of expression across the conditions studied, together with the samples in which they result to be over- or underexpressed. RNentropy is implemented as an R package and freely available from the CRAN repository. We provide a detailed guide to the functions and parameters of the package and usage examples to demonstrate the software capabilities, also showing how it can be applied to the analysis of single-cell RNA sequencing data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351. https://doi.org/10.1038/nrg.2016.49
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63. https://doi.org/10.1038/nrg2484
Kulkarni A, Anderson AG, Merullo DP, Konopka G (2019) Beyond bulk: a review of single cell transcriptomics methodologies and applications. Curr Opin Biotechnol 58:129–136. https://doi.org/10.1016/j.copbio.2019.03.001
Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515. https://doi.org/10.1038/nbt.1621
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. https://doi.org/10.1038/nbt.1883
Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323. https://doi.org/10.1186/1471-2105-12-323
Anders S, Pyl PT, Huber W (2015) HTSeq-A Python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169. https://doi.org/10.1093/bioinformatics/btu638
Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525–527. https://doi.org/10.1038/nbt.3519
Patro R, Duggal G, Love MI et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419. https://doi.org/10.1038/nmeth.4197
Fay DS (2013) A biologist’s guide to statistical thinking and analysis. WormBook, Pasadena, CA, pp 1–54. https://doi.org/10.1895/wormbook.1.159.1
Brennecke P, Anders S, Kim JK et al (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 10:1093–1098. https://doi.org/10.1038/nmeth.2645
Garber M, Grabherr MG, Guttman M, Trapnell C (2011) Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 8:469–477. https://doi.org/10.1038/nmeth.1613
Costa-Silva J, Domingues D, Lopes FM (2017) RNA-Seq differential expression analysis: an extended review and a software tool. PLoS One 12:1–18. https://doi.org/10.1371/journal.pone.0190152
Robinson MD, McCarthy DJ, Smyth GK (2009) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. https://doi.org/10.1093/bioinformatics/btp616
Schurch NJ, Schofield P, Gierliński M et al (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22:839–851. https://doi.org/10.1261/rna.053959.115
Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:1–19. https://doi.org/10.1186/s13059-016-0881-8
Kryuchkova-Mostacci N, Robinson-Rechavi M (2017) A benchmark of gene expression tissue-specificity metrics. Brief Bioinform 18:205–214. https://doi.org/10.1093/bib/bbw008
Mcintyre LM, Lopiano KK, Morse AM et al (2011) RNA-seq : technical variability and sampling. BMC Genomics. https://doi.org/10.1186/1471-2164-12-293
Mccarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 40:4288–4297. https://doi.org/10.1093/nar/gks042
Zambelli F, Mastropasqua F, Picardi E et al (2018) RNentropy: an entropy-based tool for the detection of significant variation of gene expression across multiple RNA-Seq experiments. Nucleic Acids Res 46(8):e46. https://doi.org/10.1093/nar/gky055
Bhattacherjee A, Djekidel MN, Chen R et al (2019) Cell type-specific transcriptional programs in mouse prefrontal cortex during adolescence and addiction. Nat Commun 10:4169. https://doi.org/10.1038/s41467-019-12054-3
Wang K, Phillips CA, Rogers GL et al (2014) Differential Shannon entropy and differential coefficient of variation: alternatives and augmentations to differential expression in the search for disease-related genes. Int J Comput Biol Drug Des 7:183–194. https://doi.org/10.1504/IJCBDD.2014.061656
Vajapeyam S (2014) Understanding Shannon’s entropy metric for information. arXiv 1405:2061
McDonald JH (2014) Handbook of biological statistics, 3rd edn. Sparky House Publishing, Baltimore, MD
Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc B 57:289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Fano RM, Hawkins D (1961) Transmission of information: a statistical theory of communications. Am J Physiol 29:793–794. https://doi.org/10.1119/1.1937609
Zhang Y, Chen K, Sloan SA et al (2014) An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci 34:11929–11947. https://doi.org/10.1523/JNEUROSCI.1860-14.2014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Zambelli, F., Pavesi, G. (2021). Using RNentropy to Detect Significant Variation in Gene Expression Across Multiple RNA-Seq or Single-Cell RNA-Seq Samples. In: Picardi, E. (eds) RNA Bioinformatics. Methods in Molecular Biology, vol 2284. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1307-8_6
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1307-8_6
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1306-1
Online ISBN: 978-1-0716-1307-8
eBook Packages: Springer Protocols