An Easy-to-Follow Pipeline for Long Noncoding RNA Identification: A Case Study in Diploid Strawberry Fragaria vesca

  • Chunying KangEmail author
  • Zhongchi LiuEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1933)


Long noncoding RNAs (lncRNAs), defined as transcripts longer than 200 nucleotides without coding potential, are a new class of regulatory molecules with roles in diverse biological processes. New lncRNAs can readily be identified by mining RNA-seq data from a wide range of plant species. However, challenges remain as to how one can distinguish functional lncRNAs from mRNAs coding for small peptides or products of pseudogenes without any function. In this chapter, stepwise instruction is provided using RNA-seq datasets of developing wild strawberry fruit to illustrate each step. The workflow can be divided into three parts. Part I concerns standard RNA-seq data processing and analysis; part II describes lncRNA identification; part III describes several approaches aimed at shedding lights on lncRNA function. The description is intended for beginners with easy-to-follow steps. Text boxes provide codes and explanations. While it is relatively easy to identify lncRNAs, it is difficult to infer their function in the absence of coding information. Multiple RNA-seq libraries across tissues and stages are useful resources for deducing possible function of lncRNAs based on their expression and co-regulation.

Key words

lncRNA RNA-seq Strawberry Identification Correlation analysis 



This work was supported by the National Natural Science Foundation of China (31572098 and 31772274) to C.K., US National Science Foundation Grant (IOS1444987) to Z.L., and the Scientific and Technological Self-innovation Foundation of Huazhong Agricultural University (2014RC005 to Z.L. and 2014RC017 to C.K.).


  1. 1.
    Kang C, Liu Z (2015) Global identification and analysis of long non-coding RNAs in diploid strawberry Fragaria vesca during flower and fruit development. BMC Genomics 16(1):1–15. Scholar
  2. 2.
    Liu J, Jung C, Xu J, Wang H, Deng S, Bernad L (2012) Genome-wide analysis uncovers regulation of long intergenic noncoding RNAs in Arabidopsis. Plant Cell 24:4333–4345. Scholar
  3. 3.
    Zhang YC, Liao JY, Li ZY, Yu Y, Zhang JP, Li QF (2014) Genome-wide screening and functional analysis identify a large number of long noncoding RNAs involved in the sexual reproduction of rice. Genome Biol 15:512. Scholar
  4. 4.
    Li L, Eichten SR, Shimizu R, Petsch K, Yeh CT, Wu W, Chettoor AM, Givan SA, Cole RA, Fowler JE, Evans MM, Scanlon MJ, Yu J, Schnable PS, Timmermans MC, Springer NM, Muehlbauer GJ (2014) Genome-wide discovery and characterization of maize long non-coding RNAs. Genome Biol 15(2):R40. Scholar
  5. 5.
    Chekanova JA (2015) Long non-coding RNAs and their functions in plants. Curr Opin Plant Biol 27:207–216. Scholar
  6. 6.
    Ariel F, Jegu T, Latrasse D, Romero-Barrios N, Christ A, Benhamed M (2014) Noncoding transcription by alternative RNA polymerases dynamically regulates an auxin-driven chromatin loop. Mol Cell 55:383–396. Scholar
  7. 7.
    Wierzbicki AT, Haag JR, Pikaard CS (2008) Noncoding transcription by RNA polymerase pol IVb/pol V mediates transcriptional silencing of overlapping and adjacent genes. Cell 135(4):635–648. Scholar
  8. 8.
    Sana J, Faltejskova P, Svoboda M, Slaby O (2012) Novel classes of non-coding RNAs and cancer. J Transl Med 10:103. Scholar
  9. 9.
    Hollender CA, Geretz AC, Slovin JP, Liu Z (2012) Flower and early fruit development in a diploid strawberry, Fragaria vesca. Planta 235:1123–1139. Scholar
  10. 10.
    Kang C, Darwish O, Geretz A, Shahan R, Alkharouf N, Liu Z (2013) Genome-scale transcriptomic insights into early-stage fruit development in woodland strawberry Fragaria vesca. Plant Cell 25(6):1960–1978. Scholar
  11. 11.
    Hollender CA, Kang C, Darwish O, Geretz A, Matthews BF, Slovin J (2014) Floral transcriptomes in woodland strawberry uncover developing receptacle and anther gene networks. Plant Physiol 165. Scholar
  12. 12.
    Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O, Delcher AL (2011) The genome of woodland strawberry (Fragaria vesca). Nat Genet 43:109–116. Scholar
  13. 13.
    Hawkins C, Caruana J, Schiksnis E, Liu Z (2016) Genome-scale DNA variant analysis and functional validation of a SNP underlying yellow fruit color in wild strawberry. Sci Rep 6:29017. Scholar
  14. 14.
    Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L (2007) CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35:W345–W349. Scholar
  15. 15.
    Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842. Scholar
  16. 16.
    Thorvaldsdottir H, Robinson JT, Mesirov JP (2013) Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14(2):178–192. Scholar
  17. 17.
    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szczesniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13. Scholar
  18. 18.
    Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578. Scholar
  19. 19.
    Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29(1):24–26. Scholar
  20. 20.
    Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Science 323(5910):133–138. Scholar
  21. 21.
    Sharon D, Tilgner H, Grubert F, Snyder M (2013) A single-molecule long-read survey of the human transcriptome. Nat Biotechnol 31(11):1009–1014. Scholar
  22. 22.
    Waterhouse PM, Hellens RP (2015) Plant biology: coding in non-coding RNAs. Nature 520(7545):41–42. Scholar
  23. 23.
    Laing WA, Martinez-Sanchez M, Wright MA, Bulley SM, Brewster D, Dare AP, Rassam M, Wang D, Storey R, Macknight RC, Hellens RP (2015) An upstream open reading frame is essential for feedback regulation of ascorbate biosynthesis in Arabidopsis. Plant Cell 27(3):772–786. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry SciencesHuazhong Agricultural UniversityWuhanChina
  2. 2.Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkUSA

Personalised recommendations