Abstract
Airway epithelial cells (AECs) play a central role in the pathogenesis of many lung diseases. Consequently, advancements in our understanding of the underlying causes of lung diseases, and the development of novel treatments, depend on continued detailed study of these cells. Generation and analysis of high-throughput gene expression data provide an indispensable tool for carrying out the type of broad-scale investigations needed to identify the key genes and molecular pathways that regulate, distinguish, and predict distinct pulmonary pathologies. Of the available technologies for generating genome-wide expression data, RNA sequencing (RNA-seq) has emerged as the most powerful. Hence many researchers are turning to this approach in their studies of lung disease. For the relatively uninitiated, computational analysis of RNA-seq data can be daunting, given the large number of methods and software packages currently available. The aim of this chapter is to provide a broad overview of the major steps involved in processing and analyzing RNA-seq data, with a special focus on methods optimized for data generated from AECs. We take the reader from the point of obtaining sequence reads from the lab to the point of making biological inferences with expression data. Along the way, we discuss the statistical and computational considerations one typically confronts during different phases of analysis and point to key methods, software packages, papers, online guides, and other resources that can facilitate successful RNA-seq analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Holtzman MJ, Byers DE, Alexander-Brett J, Wang XY (2014) The role of airway epithelial cells and innate immune cells in chronic respiratory disease. Nat Rev Immunol 14(10):686–698
Heijink IH, de Bruin HG, van den Berge M, Bennink LJC, Brandenburg SM, Gosens R, van Oosterhout AJ, Postma DS (2013) Role of aberrant WNT signalling in the airway epithelial response to cigarette smoke in chronic obstructive pulmonary disease. Thorax 68(8):709–716. https://doi.org/10.1136/thoraxjnl-2012-201667
Pilette C, Godding V, Kiss R, Delos M, Verbeken E, Decaestecker C, De Paepe K, Vaerman JP, Decramer M, Sibille Y (2001) Reduced epithelial expression of secretory component in small airways correlates with airflow obstruction in chronic obstructive pulmonary disease. Am J Respir Crit Care Med 163(1):185–194
Mall M, Grubb BR, Harkema JR, O'Neal WK, Boucher RC (2004) Increased airway epithelial Na+ absorption produces cystic fibrosis-like lung disease in mice. Nat Med 10(5):487–493. https://doi.org/10.1038/nm1028
Oglesby IK, Vencken SF, Agrawal R, Gaughan K, Molloy K, Higgins G, McNally P, McElvaney NG, Mall MA, Greene CM (2015) miR-17 overexpression in cystic fibrosis airway epithelial cells decreases interleukin-8 production. Eur Respir J 46(5):1350–1360. https://doi.org/10.1183/09031936.00163414
Kuperman DA, Huang XZ, Koth LL, Chang GH, Dolganov GM, Zhu Z, Elias JA, Sheppard D, Erle DJ (2002) Direct effects of interleukin-13 on epithelial cells cause airway hyperreactivity and mucus overproduction in asthma. Nat Med 8(8):885–889. https://doi.org/10.1038/nm734
Hackett TL, Warner SM, Stefanowicz D, Shaheen F, Pechkovsky DV, Murray LA, Argentieri R, Kicic A, Stick SM, Bai TR, Knight DA (2009) Induction of epithelial-mesenchymal transition in primary airway epithelial cells from patients with asthma by transforming growth factor-beta 1. Am J Respir Crit Care Med 180(2):122–133. https://doi.org/10.1164/rccm.200811-1730OC
Craig VJ, Polverino F, Laucho-Contreras ME, Shi YY, Liu YS, Osorio JC, Tesfaigzi Y, Pinto-Plata V, Gochuico BR, Rosas IO, Owen CA (2014) Mononuclear phagocytes and airway epithelial cells: novel sources of matrix metalloproteinase-8 (MMP-8) in patients with idiopathic pulmonary fibrosis. PLoS One 9(5). https://doi.org/10.1371/journal.pone.0097485
Xu Y, Mizuno T, Sridharan A, Du YN, Guo MZ, Tang J, Wikenheiser-Brokamp KA, Perl AKT, Funari VA, Gokey JJ, Stripp BR, Whitsett JA (2016) Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis. JCI Insight 1(20):1–18. https://doi.org/10.1172/jci.insight.90558
Costa V, Aprile M, Esposito R, Ciccodicola A (2013) RNA-Seq and human complex diseases: recent accomplishments and future perspectives. Eur J Hum Genet 21(2):134–142. https://doi.org/10.1038/ejhg.2012.129
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63. https://doi.org/10.1038/nrg2484
Hackett NR, Butler MW, Shaykhiev R, Salit J, Omberg L, Rodriguez-Flores JL, Mezey JG, Strulovici-Barel Y, Wang G, Didon L, Crystal RG (2012) RNA-Seq quantification of the human small airway epithelium transcriptome. BMC Genomics 13:82. https://doi.org/10.1186/1471-2164-13-82
Poole A, Urbanek C, Eng C, Schageman J, Jacobson S, O'Connor BP, Galanter JM, Gignoux CR, Roth LA, Kumar R, Lutz S, Liu AH, Fingerlin TE, Setterquist RA, Burchard EG, Rodriguez-Santana J, Seibold MA (2014) Dissecting childhood asthma with nasal transcriptomics distinguishes subphenotypes of disease. J Allergy Clin Immunol 133(3):670–678. https://doi.org/10.1016/j.jaci.2013.11.025
Tian B, Li XL, Kalita M, Widen SG, Yang J, Bhavnani SK, Dang B, Kudlicki A, Sinha M, Kong FP, Wood TG, Luxon BA, Brasier AR (2015) Analysis of the TGF beta-induced program in primary airway epithelial cells shows essential role of NF-kappa B/RelA signaling network in type II epithelial mesenchymal transition. BMC Genomics 16. https://doi.org/10.1186/s12864-015-1707-x
Nance T, Smith KS, Anaya V, Richardson R, Ho L, Pala M, Mostafavi S, Battle A, Feghali-Bostwick C, Rosen G, Montgomery SB (2014) Transcriptome analysis reveals differential splicing events in IPF lung tissue. PLoS One 9(3). https://doi.org/10.1371/journal.pone.0092111
Wesolowska-Andersen A, Everman JL, Davidson R, Rios C, Herrin R, Eng C, Janssen WJ, Liu AH, Oh SS, Kumar R, Fingerlin TE, Rodriguez-Santana J, Burchard EG, Seibold MA (2017) Dual RNA-seq reveals viral infections in asthmatic children without respiratory illness which are associated with changes in the airway transcriptome. Genome Biol 18(12):1–17. https://doi.org/10.1186/s13059-016-1140-8
Andrews S (2017) FastQC: a quality control tool for high throughput sequence data. Available online at http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Jiang HS, Lei R, Ding SW, Zhu SF (2014) Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15:182. https://doi.org/10.1186/1471-2105-15-182
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120. https://doi.org/10.1093/bioinformatics/btu170
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17(1):10
Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881. https://doi.org/10.1093/bioinformatics/btq057
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111. https://doi.org/10.1093/bioinformatics/btp120
Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He XP, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu JZ (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178. https://doi.org/10.1093/nar/gkq622
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635
Kim D, Landmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12(4):357–U121. https://doi.org/10.1038/nmeth.3317
Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34(5):525–527. https://doi.org/10.1038/nbt.3519
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data P (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Anders S, Pyl PT, Huber W (2015) HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166–169. https://doi.org/10.1093/bioinformatics/btu638
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515. https://doi.org/10.1038/nbt.1621
Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930. https://doi.org/10.1093/bioinformatics/btt656
Liao Y, Smyth GK, Shi W (2012) The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41:e108
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12). https://doi.org/10.1186/s13059-014-0550-8
Ritchie ME, Phipson B, Wu D, Hu YF, Law CW, Shi W, Smyth GK (2015) Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):1–13. https://doi.org/10.1093/nar/gkv007
Zhao SL, Guo Y, Sheng QH, Shyr Y (2014) Advanced heat map and clustering analysis using Heatmap3. Biomed Res Int 2014:6. https://doi.org/10.1155/2014/986048
Krijthe JH (2015) Rtsne: T-distributed stochastic neighbor embedding using a Barnes-Hut implementation. https://github.com/jkrijthe/Rtsne
Fresno C, Fernández EA (2013) RDAVIDWebService: a versatile R interface to DAVID. Bioinformatics 29(21):2810–2811
Chen EY, Tan CM, Lou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma A (2013) Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14:128. https://doi.org/10.1186/1471-2105-14-128
Falcon S, Gentleman R (2007) Using GOstats to test gene lists for GO term association. Bioinformatics 23(2):257–258
Krämer A, Green J, Pollard J, Tugendreich S (2014) Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30(4):523–530
Janky R, Verfaillie A, Imrichova H, Van de Sande B, Standaert L, Christiaens V, Hulselmans G, Herten K, Sanchez MN, Potier D, Svetlichnyy D, Atak ZK, Fiers M, Marine JC, Aerts S (2014) iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 10(7):e1003731. https://doi.org/10.1371/journal.pcbi.1003731
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. https://doi.org/10.1101/gr.1239303
Robles JA, Qureshi SE, Stephen SJ, Wilson SR, Burden CJ, Taylor JM (2012) Efficient experimental design and analysis strategies for the detection of differential expression using RNA-sequencing. BMC Genomics 13:484. https://doi.org/10.1186/1471-2164-13-484
Williams AG, Thomas S, Wyman SK, Holloway AK (2014) RNA-seq data: challenges in and recommendations for experimental design and analysis. Curr Protoc Human Genet 83:11.13.1–11.13.20
Wu Z, Wu H (2016) Experimental design and power calculation for RNA-seq experiments. In: Mathé E, Davis S (eds) Statistical genomics. Methods in molecular biology, vol 1418. Humana Press, New York, NY
Peixoto L, Risso D, Poplawski SG, Wimmer ME, Speed TP, Wood MA, Abel T (2015) How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets. Nucleic Acids Res 43(16):7664–7674. https://doi.org/10.1093/nar/gkv736
Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011) Differential expression in RNA-seq: a matter of depth. Genome Res 21(12):2213–2223. https://doi.org/10.1101/gr.124321.111
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15(2):121–132. https://doi.org/10.1038/nrg3642
Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, Ratsch G, Goldman N, Hubbard TJ, Harrow J, Guigo R, Bertone P, The RGASP Consortium (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 10(12):1185–1191. https://doi.org/10.1038/nmeth.2722
Chhangawala S, Rudy G, Mason CE, Rosenfeld JA (2015) The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol 16(131). https://doi.org/10.1186/s13059-015-0697-y
Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23(21):2881–2887. https://doi.org/10.1093/bioinformatics/btm453
Law CW, Chen YS, Shi W, Smyth GK (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15(2):R29. https://doi.org/10.1186/gb-2014-15-2-r29
Oshlack A, Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14. https://doi.org/10.1186/1745-6150-4-14
Bullard JH, Purdom E, Hansen KD, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11:94. https://doi.org/10.1186/1471-2105-11-94
McIntyre LM, Lopiano KK, Morse AM, Amin V, Oberg AL, Young LJ, Nuzhdin SV (2011) RNA-seq: technical variability and sampling. BMC Genomics 12:293. https://doi.org/10.1186/1471-2164-12-293
Soneson C, Delorenzi M (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14:91. https://doi.org/10.1186/1471-2105-14-91
Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L (2011) Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol 12:R22
Soneson C, Love MI, Robinson MD (2016) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research 4:1521
Rapaport F, Khanin R, Liang YP, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D (2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol 14(9). https://doi.org/10.1186/gb-2013-14-9-r95
Seyednasrollah F, Laiho A, Elo LL (2015) Comparison of software packages for detecting differential expression in RNA-seq studies. Brief Bioinform 16(1):59–70. https://doi.org/10.1093/bib/bbt086
Wesolowska-Andersen A, Seibold MA (2015) Airway molecular endotypes of asthma: dissecting the heterogeneity. Curr Opin Allergy Clin Immunol 15(2):163–168. https://doi.org/10.1097/aci.0000000000000148
Woodruff PG, Modrek B, Choy DF, Jia GQ, Abbas AR, Ellwanger A, Arron JR, Koth LL, Fahy JV (2009) T-helper type 2-driven inflammation defines major subphenotypes of asthma. Am J Respir Crit Care Med 180(5):388–395. https://doi.org/10.1164/rccm.200903-0392OC
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4:17. The Berkeley Electronic Press
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559. https://doi.org/10.1186/1471-2105-9-559
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Jackson, N.D., Ringel, L., Seibold, M.A. (2018). Computational Analysis of RNA-Seq Data from Airway Epithelial Cells for Studying Lung Disease. In: Alper, S., Janssen, W. (eds) Lung Innate Immunity and Inflammation. Methods in Molecular Biology, vol 1809. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8570-8_15
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8570-8_15
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8569-2
Online ISBN: 978-1-4939-8570-8
eBook Packages: Springer Protocols