Advertisement

Identification and Expression Analysis of Long Intergenic Noncoding RNAs

  • Ming-an Sun
  • Rihong Zhai
  • Qing Zhang
  • Yejun Wang
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1751)

Abstract

Long intergenic noncoding RNAs (lincRNAs) have caught increasing attention in recent years. The advance of RNA-Seq has greatly facilitated the discovery of novel lincRNAs. However, the computational analysis of lincRNAs is still challenging. In this protocol, we presented a step-by-step protocol for computational analyses of lincRNAs, including read processing and alignment, transcript assembly, lincRNA identification and annotation, and differential expression analysis.

Key words

Noncoding RNA lncRNA lincRNA RNA-Seq Differential expression STAR Cufflinks CPAT 

Notes

Acknowledgments

This work was supported by a Natural Science Funding of Shenzhen (JCYJ201607115221141) and a Shenzhen Peacock Plan fund (827-000116) to Y.W. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  1. 1.
    International Human Genome Sequencing C (2004) Finishing the euchromatic sequence of the human genome. Nature 431(7011):931–945.  https://doi.org/10.1038/nature03001 CrossRefGoogle Scholar
  2. 2.
    Consortium EP, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermuller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaoz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Loytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, Program NCS, Baylor College of Medicine Human Genome Sequencing C, Washington University Genome Sequencing C, Broad I, Children's Hospital Oakland Research I, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CW, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JN, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PI, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrimsdottir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B, de Jong PJ (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146):799–816.  https://doi.org/10.1038/nature05874 CrossRefGoogle Scholar
  3. 3.
    Mercer TR, Dinger ME, Mattick JS (2009) Long non-coding RNAs: insights into functions. Nat Rev Genet 10(3):155–159.  https://doi.org/10.1038/nrg2521 CrossRefPubMedGoogle Scholar
  4. 4.
    Wilusz JE, Sunwoo H, Spector DL (2009) Long noncoding RNAs: functional surprises from the RNA world. Genes Dev 23(13):1494–1504.  https://doi.org/10.1101/gad.1800909 CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Bahn JH, Zhang Q, Li F, Chan TM, Lin X, Kim Y, Wong DT, Xiao X (2015) The landscape of microRNA, Piwi-interacting RNA, and circular RNA in human saliva. Clin Chem 61(1):221–230.  https://doi.org/10.1373/clinchem.2014.230433 CrossRefPubMedGoogle Scholar
  6. 6.
    Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigo R (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22(9):1775–1789.  https://doi.org/10.1101/gr.132159.111 CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25(18):1915–1927.  https://doi.org/10.1101/gad.17446611 CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458(7235):223–227.  https://doi.org/10.1038/nature07672 CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Zhang Y, Wagner EK, Guo X, May I, Cai Q, Zheng W, He C, Long J (2016) Long intergenic non-coding RNA expression signature in human breast cancer. Sci Rep 6:37821.  https://doi.org/10.1038/srep37821 CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Soreq L, Guffanti A, Salomonis N, Simchovitz A, Israel Z, Bergman H, Soreq H (2014) Long non-coding RNA and alternative splicing modulations in Parkinson's leukocytes identified by RNA sequencing. PLoS Comput Biol 10(3):e1003517.  https://doi.org/10.1371/journal.pcbi.1003517 CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Ulitsky I, Bartel DP (2013) lincRNAs: genomics, evolution, and mechanisms. Cell 154(1):26–46.  https://doi.org/10.1016/j.cell.2013.06.020 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Quinn JJ, Chang HY (2016) Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet 17(1):47–62.  https://doi.org/10.1038/nrg.2015.10 CrossRefPubMedGoogle Scholar
  13. 13.
    Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, Barrette TR, Prensner JR, Evans JR, Zhao S, Poliakov A, Cao X, Dhanasekaran SM, Wu YM, Robinson DR, Beer DG, Feng FY, Iyer HK, Chinnaiyan AM (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 47(3):199–208.  https://doi.org/10.1038/ng.3192 CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Liu SJ, Nowakowski TJ, Pollen AA, Lui JH, Horlbeck MA, Attenello FJ, He D, Weissman JS, Kriegstein AR, Diaz AA, Lim DA (2016) Single-cell analysis of long non-coding RNAs in the developing human neocortex. Genome Biol 17:67.  https://doi.org/10.1186/s13059-016-0932-1 CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Xie C, Yuan J, Li H, Li M, Zhao G, Bu D, Zhu W, Wu W, Chen R, Zhao Y (2014) NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res 42(Database issue):D98–103.  https://doi.org/10.1093/nar/gkt1222 CrossRefPubMedGoogle Scholar
  16. 16.
    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21.  https://doi.org/10.1093/bioinformatics/bts635 CrossRefPubMedGoogle Scholar
  17. 17.
    Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database C (2011) The sequence read archive. Nucleic Acids Res 39(Database issue):D19–D21.  https://doi.org/10.1093/nar/gkq1019 CrossRefPubMedGoogle Scholar
  18. 18.
    Andrews S (2010) FastQC: a quality control tool for high throughput sequence data. Available at: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  19. 19.
    Krueger F (2015) A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. Available at: https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
  20. 20.
    Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12.  https://doi.org/10.14806/ej.17.1.200 CrossRefGoogle Scholar
  21. 21.
    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079.  https://doi.org/10.1093/bioinformatics/btp352 CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515.  https://doi.org/10.1038/nbt.1621 CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Wang L, Park HJ, Dasari S, Wang S, Kocher JP, Li W (2013) CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res 41(6):e74.  https://doi.org/10.1093/nar/gkt006 CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44(D1):D279–D285.  https://doi.org/10.1093/nar/gkv1344 CrossRefPubMedGoogle Scholar
  25. 25.
    Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111.  https://doi.org/10.1093/bioinformatics/btp120 CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Roberts A, Pimentel H, Trapnell C, Pachter L (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27(17):2325–2329.  https://doi.org/10.1093/bioinformatics/btr355 CrossRefPubMedGoogle Scholar
  27. 27.
    Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G (2007) CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35(Web Server issue):W345–W349.  https://doi.org/10.1093/nar/gkm391 CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550.  https://doi.org/10.1186/s13059-014-0550-8 CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140.  https://doi.org/10.1093/bioinformatics/btp616 CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2018

Authors and Affiliations

  • Ming-an Sun
    • 1
  • Rihong Zhai
    • 2
  • Qing Zhang
    • 3
  • Yejun Wang
    • 4
  1. 1.Epigenomics and Computational Biology LabBiocomplexity Institute of Virginia TechBlacksburgUSA
  2. 2.School of Public HealthShenzhen University Health Science CenterShenzhenChina
  3. 3.Integrative Biology and PhysiologyThe University of California, Los Angeles (UCLA)Los AngelesUSA
  4. 4.Department of Cell Biology and Genetics, School of Basic Medical SciencesShenzhen University Health Science CenterShenzhenChina

Personalised recommendations