Skip to main content

Computational Analysis of RNA-Seq Data from Airway Epithelial Cells for Studying Lung Disease

  • Protocol
  • First Online:
Lung Innate Immunity and Inflammation

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1809))

Abstract

Airway epithelial cells (AECs) play a central role in the pathogenesis of many lung diseases. Consequently, advancements in our understanding of the underlying causes of lung diseases, and the development of novel treatments, depend on continued detailed study of these cells. Generation and analysis of high-throughput gene expression data provide an indispensable tool for carrying out the type of broad-scale investigations needed to identify the key genes and molecular pathways that regulate, distinguish, and predict distinct pulmonary pathologies. Of the available technologies for generating genome-wide expression data, RNA sequencing (RNA-seq) has emerged as the most powerful. Hence many researchers are turning to this approach in their studies of lung disease. For the relatively uninitiated, computational analysis of RNA-seq data can be daunting, given the large number of methods and software packages currently available. The aim of this chapter is to provide a broad overview of the major steps involved in processing and analyzing RNA-seq data, with a special focus on methods optimized for data generated from AECs. We take the reader from the point of obtaining sequence reads from the lab to the point of making biological inferences with expression data. Along the way, we discuss the statistical and computational considerations one typically confronts during different phases of analysis and point to key methods, software packages, papers, online guides, and other resources that can facilitate successful RNA-seq analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Holtzman MJ, Byers DE, Alexander-Brett J, Wang XY (2014) The role of airway epithelial cells and innate immune cells in chronic respiratory disease. Nat Rev Immunol 14(10):686–698

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Heijink IH, de Bruin HG, van den Berge M, Bennink LJC, Brandenburg SM, Gosens R, van Oosterhout AJ, Postma DS (2013) Role of aberrant WNT signalling in the airway epithelial response to cigarette smoke in chronic obstructive pulmonary disease. Thorax 68(8):709–716. https://doi.org/10.1136/thoraxjnl-2012-201667

    Article  PubMed  Google Scholar 

  3. Pilette C, Godding V, Kiss R, Delos M, Verbeken E, Decaestecker C, De Paepe K, Vaerman JP, Decramer M, Sibille Y (2001) Reduced epithelial expression of secretory component in small airways correlates with airflow obstruction in chronic obstructive pulmonary disease. Am J Respir Crit Care Med 163(1):185–194

    Article  CAS  PubMed  Google Scholar 

  4. Mall M, Grubb BR, Harkema JR, O'Neal WK, Boucher RC (2004) Increased airway epithelial Na+ absorption produces cystic fibrosis-like lung disease in mice. Nat Med 10(5):487–493. https://doi.org/10.1038/nm1028

    Article  PubMed  CAS  Google Scholar 

  5. Oglesby IK, Vencken SF, Agrawal R, Gaughan K, Molloy K, Higgins G, McNally P, McElvaney NG, Mall MA, Greene CM (2015) miR-17 overexpression in cystic fibrosis airway epithelial cells decreases interleukin-8 production. Eur Respir J 46(5):1350–1360. https://doi.org/10.1183/09031936.00163414

    Article  PubMed  CAS  Google Scholar 

  6. Kuperman DA, Huang XZ, Koth LL, Chang GH, Dolganov GM, Zhu Z, Elias JA, Sheppard D, Erle DJ (2002) Direct effects of interleukin-13 on epithelial cells cause airway hyperreactivity and mucus overproduction in asthma. Nat Med 8(8):885–889. https://doi.org/10.1038/nm734

    Article  PubMed  CAS  Google Scholar 

  7. Hackett TL, Warner SM, Stefanowicz D, Shaheen F, Pechkovsky DV, Murray LA, Argentieri R, Kicic A, Stick SM, Bai TR, Knight DA (2009) Induction of epithelial-mesenchymal transition in primary airway epithelial cells from patients with asthma by transforming growth factor-beta 1. Am J Respir Crit Care Med 180(2):122–133. https://doi.org/10.1164/rccm.200811-1730OC

    Article  PubMed  CAS  Google Scholar 

  8. Craig VJ, Polverino F, Laucho-Contreras ME, Shi YY, Liu YS, Osorio JC, Tesfaigzi Y, Pinto-Plata V, Gochuico BR, Rosas IO, Owen CA (2014) Mononuclear phagocytes and airway epithelial cells: novel sources of matrix metalloproteinase-8 (MMP-8) in patients with idiopathic pulmonary fibrosis. PLoS One 9(5). https://doi.org/10.1371/journal.pone.0097485

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Xu Y, Mizuno T, Sridharan A, Du YN, Guo MZ, Tang J, Wikenheiser-Brokamp KA, Perl AKT, Funari VA, Gokey JJ, Stripp BR, Whitsett JA (2016) Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis. JCI Insight 1(20):1–18. https://doi.org/10.1172/jci.insight.90558

    Article  Google Scholar 

  10. Costa V, Aprile M, Esposito R, Ciccodicola A (2013) RNA-Seq and human complex diseases: recent accomplishments and future perspectives. Eur J Hum Genet 21(2):134–142. https://doi.org/10.1038/ejhg.2012.129

    Article  PubMed  CAS  Google Scholar 

  11. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63. https://doi.org/10.1038/nrg2484

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Hackett NR, Butler MW, Shaykhiev R, Salit J, Omberg L, Rodriguez-Flores JL, Mezey JG, Strulovici-Barel Y, Wang G, Didon L, Crystal RG (2012) RNA-Seq quantification of the human small airway epithelium transcriptome. BMC Genomics 13:82. https://doi.org/10.1186/1471-2164-13-82

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Poole A, Urbanek C, Eng C, Schageman J, Jacobson S, O'Connor BP, Galanter JM, Gignoux CR, Roth LA, Kumar R, Lutz S, Liu AH, Fingerlin TE, Setterquist RA, Burchard EG, Rodriguez-Santana J, Seibold MA (2014) Dissecting childhood asthma with nasal transcriptomics distinguishes subphenotypes of disease. J Allergy Clin Immunol 133(3):670–678. https://doi.org/10.1016/j.jaci.2013.11.025

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Tian B, Li XL, Kalita M, Widen SG, Yang J, Bhavnani SK, Dang B, Kudlicki A, Sinha M, Kong FP, Wood TG, Luxon BA, Brasier AR (2015) Analysis of the TGF beta-induced program in primary airway epithelial cells shows essential role of NF-kappa B/RelA signaling network in type II epithelial mesenchymal transition. BMC Genomics 16. https://doi.org/10.1186/s12864-015-1707-x

  15. Nance T, Smith KS, Anaya V, Richardson R, Ho L, Pala M, Mostafavi S, Battle A, Feghali-Bostwick C, Rosen G, Montgomery SB (2014) Transcriptome analysis reveals differential splicing events in IPF lung tissue. PLoS One 9(3). https://doi.org/10.1371/journal.pone.0092111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wesolowska-Andersen A, Everman JL, Davidson R, Rios C, Herrin R, Eng C, Janssen WJ, Liu AH, Oh SS, Kumar R, Fingerlin TE, Rodriguez-Santana J, Burchard EG, Seibold MA (2017) Dual RNA-seq reveals viral infections in asthmatic children without respiratory illness which are associated with changes in the airway transcriptome. Genome Biol 18(12):1–17. https://doi.org/10.1186/s13059-016-1140-8

    Article  CAS  Google Scholar 

  17. Andrews S (2017) FastQC: a quality control tool for high throughput sequence data. Available online at http://www.bioinformatics.babraham.ac.uk/projects/fastqc

    Google Scholar 

  18. Jiang HS, Lei R, Ding SW, Zhu SF (2014) Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15:182. https://doi.org/10.1186/1471-2105-15-182

    Article  PubMed  PubMed Central  Google Scholar 

  19. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120. https://doi.org/10.1093/bioinformatics/btu170

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17(1):10

    Google Scholar 

  21. Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881. https://doi.org/10.1093/bioinformatics/btq057

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111. https://doi.org/10.1093/bioinformatics/btp120

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He XP, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu JZ (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178. https://doi.org/10.1093/nar/gkq622

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635

    Article  PubMed  CAS  Google Scholar 

  25. Kim D, Landmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12(4):357–U121. https://doi.org/10.1038/nmeth.3317

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34(5):525–527. https://doi.org/10.1038/nbt.3519

    Article  PubMed  CAS  Google Scholar 

  27. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data P (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Anders S, Pyl PT, Huber W (2015) HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166–169. https://doi.org/10.1093/bioinformatics/btu638

    Article  PubMed  CAS  Google Scholar 

  29. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515. https://doi.org/10.1038/nbt.1621

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930. https://doi.org/10.1093/bioinformatics/btt656

    Article  PubMed  CAS  Google Scholar 

  31. Liao Y, Smyth GK, Shi W (2012) The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41:e108

    Article  CAS  Google Scholar 

  32. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12). https://doi.org/10.1186/s13059-014-0550-8

  33. Ritchie ME, Phipson B, Wu D, Hu YF, Law CW, Shi W, Smyth GK (2015) Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):1–13. https://doi.org/10.1093/nar/gkv007

    Article  CAS  Google Scholar 

  34. Zhao SL, Guo Y, Sheng QH, Shyr Y (2014) Advanced heat map and clustering analysis using Heatmap3. Biomed Res Int 2014:6. https://doi.org/10.1155/2014/986048

    Article  Google Scholar 

  35. Krijthe JH (2015) Rtsne: T-distributed stochastic neighbor embedding using a Barnes-Hut implementation. https://github.com/jkrijthe/Rtsne

  36. Fresno C, Fernández EA (2013) RDAVIDWebService: a versatile R interface to DAVID. Bioinformatics 29(21):2810–2811

    Article  CAS  PubMed  Google Scholar 

  37. Chen EY, Tan CM, Lou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma A (2013) Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14:128. https://doi.org/10.1186/1471-2105-14-128

    Article  PubMed  PubMed Central  Google Scholar 

  38. Falcon S, Gentleman R (2007) Using GOstats to test gene lists for GO term association. Bioinformatics 23(2):257–258

    Article  CAS  PubMed  Google Scholar 

  39. Krämer A, Green J, Pollard J, Tugendreich S (2014) Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30(4):523–530

    Article  CAS  PubMed  Google Scholar 

  40. Janky R, Verfaillie A, Imrichova H, Van de Sande B, Standaert L, Christiaens V, Hulselmans G, Herten K, Sanchez MN, Potier D, Svetlichnyy D, Atak ZK, Fiers M, Marine JC, Aerts S (2014) iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 10(7):e1003731. https://doi.org/10.1371/journal.pcbi.1003731

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. https://doi.org/10.1101/gr.1239303

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Robles JA, Qureshi SE, Stephen SJ, Wilson SR, Burden CJ, Taylor JM (2012) Efficient experimental design and analysis strategies for the detection of differential expression using RNA-sequencing. BMC Genomics 13:484. https://doi.org/10.1186/1471-2164-13-484

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Williams AG, Thomas S, Wyman SK, Holloway AK (2014) RNA-seq data: challenges in and recommendations for experimental design and analysis. Curr Protoc Human Genet 83:11.13.1–11.13.20

    Article  Google Scholar 

  44. Wu Z, Wu H (2016) Experimental design and power calculation for RNA-seq experiments. In: Mathé E, Davis S (eds) Statistical genomics. Methods in molecular biology, vol 1418. Humana Press, New York, NY

    Google Scholar 

  45. Peixoto L, Risso D, Poplawski SG, Wimmer ME, Speed TP, Wood MA, Abel T (2015) How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets. Nucleic Acids Res 43(16):7664–7674. https://doi.org/10.1093/nar/gkv736

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011) Differential expression in RNA-seq: a matter of depth. Genome Res 21(12):2213–2223. https://doi.org/10.1101/gr.124321.111

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15(2):121–132. https://doi.org/10.1038/nrg3642

    Article  PubMed  CAS  Google Scholar 

  48. Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, Ratsch G, Goldman N, Hubbard TJ, Harrow J, Guigo R, Bertone P, The RGASP Consortium (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 10(12):1185–1191. https://doi.org/10.1038/nmeth.2722

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Chhangawala S, Rudy G, Mason CE, Rosenfeld JA (2015) The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol 16(131). https://doi.org/10.1186/s13059-015-0697-y

  50. Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23(21):2881–2887. https://doi.org/10.1093/bioinformatics/btm453

    Article  PubMed  CAS  Google Scholar 

  51. Law CW, Chen YS, Shi W, Smyth GK (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15(2):R29. https://doi.org/10.1186/gb-2014-15-2-r29

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. Oshlack A, Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14. https://doi.org/10.1186/1745-6150-4-14

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Bullard JH, Purdom E, Hansen KD, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11:94. https://doi.org/10.1186/1471-2105-11-94

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. McIntyre LM, Lopiano KK, Morse AM, Amin V, Oberg AL, Young LJ, Nuzhdin SV (2011) RNA-seq: technical variability and sampling. BMC Genomics 12:293. https://doi.org/10.1186/1471-2164-12-293

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Soneson C, Delorenzi M (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14:91. https://doi.org/10.1186/1471-2105-14-91

    Article  PubMed  PubMed Central  Google Scholar 

  56. Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L (2011) Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol 12:R22

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Soneson C, Love MI, Robinson MD (2016) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research 4:1521

    Article  PubMed Central  Google Scholar 

  58. Rapaport F, Khanin R, Liang YP, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D (2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol 14(9). https://doi.org/10.1186/gb-2013-14-9-r95

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Seyednasrollah F, Laiho A, Elo LL (2015) Comparison of software packages for detecting differential expression in RNA-seq studies. Brief Bioinform 16(1):59–70. https://doi.org/10.1093/bib/bbt086

    Article  PubMed  CAS  Google Scholar 

  60. Wesolowska-Andersen A, Seibold MA (2015) Airway molecular endotypes of asthma: dissecting the heterogeneity. Curr Opin Allergy Clin Immunol 15(2):163–168. https://doi.org/10.1097/aci.0000000000000148

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. Woodruff PG, Modrek B, Choy DF, Jia GQ, Abbas AR, Ellwanger A, Arron JR, Koth LL, Fahy JV (2009) T-helper type 2-driven inflammation defines major subphenotypes of asthma. Am J Respir Crit Care Med 180(5):388–395. https://doi.org/10.1164/rccm.200903-0392OC

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  62. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    Google Scholar 

  63. Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4:17. The Berkeley Electronic Press

    Article  Google Scholar 

  64. Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559. https://doi.org/10.1186/1471-2105-9-559

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max A. Seibold .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Jackson, N.D., Ringel, L., Seibold, M.A. (2018). Computational Analysis of RNA-Seq Data from Airway Epithelial Cells for Studying Lung Disease. In: Alper, S., Janssen, W. (eds) Lung Innate Immunity and Inflammation. Methods in Molecular Biology, vol 1809. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8570-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8570-8_15

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8569-2

  • Online ISBN: 978-1-4939-8570-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics