Abstract
At present, RNA-seq has become the most common and powerful platform in the study of transcriptomes. A major goal of RNA-seq analysis is the identification of genes and molecular pathways which are differentially expressed in two altered situations. Such difference in expression profiles might be linked with changes in biology giving an indication for further intense investigation. Generally, the traditional statistical methods used in the study of differential expression analysis of gene profiles are restricted to individual genes and do not provide any information regarding interactivities of genes contributing to a certain biological system. This need led the scientists to develop new computational methods to identify such interactions of genes. The most common approach used to study gene-set interactivities is gene network inference. Co-expression gene networks are the correlation-based networks which are commonly used to identify the set of genes significantly involved in the occurrence or presence of a particular biological process. This chapter describes a basic procedure of an RNA-seq analysis along with a brief description about the techniques used in the analysis: an illustration on a real data set is also shown. In addition, a basic pipeline is presented to elucidate how to construct a co-expression network and detect modules from the RNA-seq data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tavassoly I, Goldfarb J, Iyengar R (2018) Systems biology primer: the basic methods and approaches. Essays Biochem 62(4):487–500. https://doi.org/10.1042/EBC20180003
Longo G, Montévil M (2014) Perspectives in organisms. Lecture Notes in Morphogenesis, pp 23–27. Available at: https://link.springer.com/content/pdf/10.1007/978-3-642-35938-5.pdf
Bu Z, Callaway DJE (2011) Chapter 5—Proteins MOVE! Protein dynamics and long-range allostery in cell signaling. In: Donev RBT-A, P. C. and S. B. (ed.) Protein structure and diseases. Academic Press, pp 163–221. https://doi.org/10.1016/B978-0-12-381262-9.00005-7
Zewail AH (2008) Physical biology: from atoms to medicine. Imperial college press
Churko JM et al (2013) Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases. Circ Res 112(12): 1613–1623. https://doi.org/10.1161/CIRCRESAHA.113.300939
Schuster SC (2008) Next-generation sequencing transforms today’s biology. Nat Methods 5(1):16–18. https://doi.org/10.1038/nmeth1156
Zhao S et al (2014) Comparison of RNA-seq and microarray in transcriptome profiling of activated T cells. PloS one. Public Library of Science, 9(1): e78644
Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930. https://doi.org/10.1093/bioinformatics/btt656
Anders S, Pyl PT, Huber W (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166–169. https://doi.org/10.1093/bioinformatics/btu638
McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleic Acids Res 40(10): 4288–4297. https://doi.org/10.1093/nar/gks042
’t Hoen PAC et al (2008) Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res 36(21): e141. https://doi.org/10.1093/nar/gkn705
Cloonan N et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. United States, 5(7): 613–619. https://doi.org/10.1038/nmeth.1223
Langmead B, Hansen KD, Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11(8): R83. https://doi.org/10.1186/gb-2010-11-8-r83
Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10): R106. https://doi.org/10.1186/gb-2010-11-10-r106
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (Oxford, England), 26(1): 139–140. https://doi.org/10.1093/bioinformatics/btp616
Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics (Oxford, England). England, 23(21): 2881–2887. https://doi.org/10.1093/bioinformatics/btm453
Robinson MD, Smyth GK (2008) Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics (Oxford, England). England, 9(2): 321–332. https://doi.org/10.1093/biostatistics/kxm030
Nagalakshmi U et al (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science (New York, N.Y.), 320(5881): 1344–1349. https://doi.org/10.1126/science.1158441
Lund SP et al (2012) Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. De Gruyter, 11(5)
Lun ATL, Chen Y, Smyth GK (2016) It’s DE-licious: a recipe for differential expression analyses of RNA-seq experiments using Quasi-Likelihood methods in edgeR. Methods in molecular biology (Clifton, N.J.). United States, vol 1418, pp 391–416. https://doi.org/10.1007/978-1-4939-3578-9_19
Phipson B et al (2013) Empirical Bayes in the presence of exceptional cases, with application to microarray data. Phytochemistry 26(8):2247–2250
Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. De Gruyter, 3(1)
Zhou Y-H, Xia K, Wright FA (2011) A powerful and flexible approach to the analysis of RNA sequence count data. Bioinformatics (Oxford, England), 27(19): 2672–2678. https://doi.org/10.1093/bioinformatics/btr449
Wu H, Wang C, Wu Z (2013) A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics (Oxford, England), 14(2): 232–243. https://doi.org/10.1093/biostatistics/kxs033
Hardcastle TJ, Kelly KA (2010) baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinform 11: 422. https://doi.org/10.1186/1471-2105-11-422
Van De Wiel MA et al (2013) Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics (Oxford, England). England, 14(1): 113–128. https://doi.org/10.1093/biostatistics/kxs031
Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):R25. https://doi.org/10.1186/gb-2010-11-3-r25
Bullard JH et al (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinform 11(1):94. https://doi.org/10.1186/1471-2105-11-94
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc: Ser B (Methodological). Wiley Online Library, 57(1): 289–300
Kurppa KJ et al (2020) Treatment-induced tumor dormancy through YAP-mediated transcriptional reprogramming of the apoptotic pathway. Cancer Cell 37(1): 104–122.e12. https://doi.org/10.1016/j.ccell.2019.12.006
Wu D et al (2010) ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics. Oxford University Press, 26(17): 2176–2182
Cho K-H et al (2007) Reverse engineering of gene regulatory networks. IET Syst Biol. IET 1(3):149–163
Csete ME, Doyle JC (2002) Reverse engineering of biological complexity. Science. American Association for the Advancement of Science, 295(5560): 1664–1669
Kitano H (2000) Perspectives on systems biology. New Gener Comput. Springer, 18(3): 199–216
Bansal M et al (2007) How to infer gene networks from expression profiles. Mol Syst Biol 3: 78. https://doi.org/10.1038/msb4100120
Bellazzi R, Zupan B (2007) Towards knowledge-based gene expression data mining. J Biomed Infor. United States, 40(6): 787–802. https://doi.org/10.1016/j.jbi.2007.06.005
Ernst J et al (2007) Reconstructing dynamic regulatory maps. Mol Syst Biol 3: 74. https://doi.org/10.1038/msb4100115
Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science (New York, N.Y.). United States, 303(5659): 799–805. https://doi.org/10.1126/science.1094068
Gilbert D et al (2006) Computational methodologies for modelling, analysis and simulation of signalling networks. Briefings Bioinform. England, 7(4): 339–353. https://doi.org/10.1093/bib/bbl043
Hecker M et al (2009) Gene regulatory network inference: data integration in dynamic models-a review. Bio Syst. Ireland, 96(1): 86–103. https://doi.org/10.1016/j.biosystems.2008.12.004
Markowetz F, Spang R (2007) Inferring cellular networks--a review. BMC Bioinform 8(Suppl 6): S5. https://doi.org/10.1186/1471-2105-8-S6-S5
Schlitt T, Brazma A (2007) Current approaches to gene regulatory network modelling. BMC Bioinform 8(Suppl 6): S9. https://doi.org/10.1186/1471-2105-8-S6-S9
Stigler B et al (2007) Reverse engineering of dynamic networks. Ann New York Acad Sci. United States, 1115: 168–177. https://doi.org/10.1196/annals.1407.012
Lee WP, Tzou W-S (2009) Computational methods for discovering gene networks from expression data. Briefings Bioinform 10(4): 408–423. https://doi.org/10.1093/bib/bbp028
Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media
Dong J, Horvath S (2007) Understanding network concepts in modules. BMC Syst Biol. Springer 1(1):24
Horvath S, Dong J (2008) Geometric interpretation of gene coexpression network analysis. PLoS Comput Biol. Public Library of Science, 4(8): e1000117
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinform 9(1):559. https://doi.org/10.1186/1471-2105-9-559
Sulaimanov N, Koeppl H (2016) Graph reconstruction using covariance-based methods. EURASIP J Bioinf Syst Biol 1:19. https://doi.org/10.1186/s13637-016-0052-y
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. The Ann Stat. Institute of Mathematical Statistics, 34(3): 1436–1462
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics. Oxford University Press, 9(3): 432–441
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc. Taylor & Francis, 101(476): 1418–1429
Bien J, Tibshirani RJ (2011) Sparse estimation of a covariance matrix. Biometrika. Oxford University Press, 98(4): 807–820
Inbar E et al (2017) The Transcriptome of Leishmania major developmental stages in their natural sand fly vector. mBio 8(2): e00029–17 (Edited by L. D. Sibley). https://doi.org/10.1128/mBio.00029-17
Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. De Gruyter, 4(1)
Li A, Horvath S (2007) Network neighborhood analysis with the multi-node topological overlap measure. Bioinformatics. Oxford University Press, 23(2): 222–231
Ravasz E et al (2002) Hierarchical organization of modularity in metabolic networks. Science (New York, N.Y.). United States, 297(5586): 1551–1555. https://doi.org/10.1126/science.1073374
Yip AM, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinform. BioMed Central 8(1): 22
Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics (Oxford, England). England, 24(5): 719–720. https://doi.org/10.1093/bioinformatics/btm563
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Javed, S. (2021). Differential Expression Analysis of RNA-Seq Data and Co-expression Networks. In: Pham, T.D., Yan, H., Ashraf, M.W., Sjöberg, F. (eds) Advances in Artificial Intelligence, Computation, and Data Science. Computational Biology, vol 31. Springer, Cham. https://doi.org/10.1007/978-3-030-69951-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-69951-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69950-5
Online ISBN: 978-3-030-69951-2
eBook Packages: Computer ScienceComputer Science (R0)