Skip to main content
Log in

Pipelines for cross-species and genome-wide prediction of long noncoding RNA binding

  • Protocol
  • Published:

From Nature Protocols

View current issue Submit your manuscript

Abstract

Abundant long, noncoding RNAs (lncRNAs) in mammals can bind to DNA sequences and recruit histone- and DNA-modifying enzymes to binding sites to epigenetically regulate target genes. However, most lncRNAs’ binding motifs and target sites are unknown. The large numbers of lncRNAs and target sites in the whole genome make it infeasible to examine lncRNA binding to DNA purely experimentally. Here, we report a protocol for lncRNA/DNA-binding analysis that is built upon a database containing the GENCODE-annotated human and mouse lncRNAs, the orthologs of these lncRNAs in 17 mammals, and the genome sequences of the 17 mammals. Cross-species and genome-wide lncRNA/DNA-binding analysis begins with and is driven by database search. The predicted DNA-binding motifs and binding sites answer the general question of which lncRNAs may epigenetically regulate which genes, and can be used to identify potential sites for genome and epigenome editing. To use the protocol, preliminary knowledge of the base-pairing rules that guide the binding of noncoding RNAs to DNA to form triplexes, as well as the skills required to use the UCSC Genome Browser, are needed. A genome-wide prediction takes from 2 to 10 d, and the results are sent to users automatically by e-mail. The platform is updated continuously, making it possible to study more lncRNAs and larger genomic regions in less computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: The platform that integrates multiple data sources and programs.
Fig. 2: The number of orthologs of the 13,562 human lncRNAs in the 16 other mammals.
Fig. 3: The workflows in this protocol.
Fig. 4: The web page of the database search.
Fig. 5: The web page showing search results.
Fig. 6: The web page showing database functions.
Fig. 7: The web page for batch TFO/TTS prediction after a database search.
Fig. 8: The web page showing the graphic display of an lncRNA and its orthologs.
Fig. 9: The web page showing the text display of the lncRNA RP11-375H19.2 and its orthologs.
Fig. 10: The TTS distributions of human PTENP1-asRNA and PTENP1 in the human PTEN locus.
Fig. 11: The TFO1 of human PTENP1-asRNA is in exon 1.
Fig. 12: The TTSs of CDKN2B-AS1 and other lncRNAs in the human CDKN2A/2B locus.
Fig. 13: The 83 lncRNAs on human Y chromosome have no TTS in the human CDKN2A/2B locus.

Similar content being viewed by others

Code availability

The source code is available at our website (http://lncRNA.smu.edu.cn) and the GitHub website (https://github.com/LongTarget/) under a GNU Affero General Public License v.3.0.

Data availability

Example datasets that include TFOclass1 and TFOsorted files of all examples are available at our website (http://lncRNA.smu.edu.cn) and the GitHub website (https://github.com/LongTarget/).

References

  1. Maeda, N. et al. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet. 2, e62 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Jia, H. et al. Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA 16, 1478–1487 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Kapranov, P. et al. The majority of total nuclear-encoded non-ribosomal RNA in a human cell is ‘dark matter’ un-annotated RNA. BMC Biol. 8, 149 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Ulitsky, I. Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nat. Rev. Genet. 17, 601–614 (2016).

    Article  CAS  PubMed  Google Scholar 

  10. Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, J. T. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Tsai, M. C. et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Monnier, P. et al. H19 lncRNA controls gene expression of the Imprinted Gene Network by recruiting MBD1. Proc. Natl Acad. Sci. USA 110, 20693–20698 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Lee, J. T. Lessons from X-chromosome inactivation: long ncRNA as guides and tethers to the epigenome. Genes Dev. 23, 1831–1842 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Buske, F. A., Bauer, D. C., Mattick, J. S. & Bailey, T. L. Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res. 22, 1372–1381 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. He, S., Zhang, H., Liu, H. & Zhu, H. LongTarget: a tool to predict lncRNA DNA-binding motifs and binding sites via Hoogsteen base-pairing analysis. Bioinformatics 31, 178–186 (2015).

    Article  CAS  PubMed  Google Scholar 

  16. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ram, O. et al. Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells. Cell 147, 1628–1639 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Article  PubMed Central  Google Scholar 

  19. Rigoutsos, I. et al. N-BLR, a primate-specific non-coding transcript leads to colorectal cancer invasion and migration. Genome Biol. 18, 98 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Liu, H., Shang, X. & Zhu, H. LncRNA/DNA binding analysis reveals losses and gains and lineage specificity of genomic imprinting in mammals. Bioinformatics 33, 1431–1436 (2017).

    CAS  PubMed  Google Scholar 

  21. Abu Almakarem, A. S., Petrov, A. I., Stombaugh, J., Zirbel, C. L. & Leontis, N. B. Comprehensive survey and geometric classification of base triples in RNA structures. Nucleic Acids Res. 40, 1407–1423 (2012).

    Article  CAS  PubMed  Google Scholar 

  22. Kotake, Y. et al. Long non-coding RNA ANRIL is required for the PRC2 recruitment to and silencing ofp15(INK4B) tumor suppressor gene. Oncogene 30, 1956–1962 (2011).

    Article  CAS  PubMed  Google Scholar 

  23. Gabory, A. et al. H19 acts as a trans regulator of the imprinted gene network controlling growth in mice. Development 136, 3413–3421 (2009).

    Article  CAS  PubMed  Google Scholar 

  24. Lun, A. T., Chen, Y. & Smyth, G. K. It’s DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Methods Mol. Biol. 1418, 391–416 (2016).

    Article  PubMed  Google Scholar 

  25. Flavahan, W. A., Gaskell, E. & Bernstein, B. E. Epigenetic plasticity and the hallmarks of cancer. Science 357, eaal2380 (2017).

  26. Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 25, 927–936 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li, L., Helms, J. A. & Chang, H. Y. Comment on “Hotair Is Dispensable for Mouse Development”. PLoS Genet. 12, e1006406 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Amandio, A. R., Necsulea, A., Joye, E., Mascrez, B. & Duboule, D. Hotair Is Dispensible for Mouse Development. PLoS Genet. 12, e1006232 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Dinger, M. E. et al. NRED: a database of long noncoding RNA expression. Nucleic Acids Res. 37, D122–D126 (2009).

    Article  CAS  PubMed  Google Scholar 

  30. Amaral, P. P., Clark, M. B., Gascoigne, D. K., Dinger, M. E. & Mattick, J. S. lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res. 39, D146–D151 (2011).

    Article  CAS  PubMed  Google Scholar 

  31. Chen, G. et al. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 41, D983–D986 (2013).

    Article  CAS  PubMed  Google Scholar 

  32. Fang, S. et al. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res. 46, D308–D314 (2018).

    Article  CAS  PubMed  Google Scholar 

  33. Yu, W. et al. Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature 451, 202–206 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Yap, K. L. et al. Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Mol. Cell 38, 662–674 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Luo, M. et al. A genome-wide survey of imprinted genes in rice seeds reveals imprinting primarily occurs in the endosperm. PLoS Genet. 7, e1002125 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Smits, G. et al. Conservation of the H19 noncoding RNA and H19-IGF2 imprinting mechanism in therians. Nat. Genet. 40, 971–976 (2008).

    Article  CAS  PubMed  Google Scholar 

  37. Barlow, D. P. & Bartolomei, M. S. Genomic imprinting in mammals. Cold Spring Harb. Perspect. Biol. 6, a018382 (2014).

  38. Johnsson, P. et al. A pseudogene long-noncoding-RNA network regulates PTEN transcription and translation in human cells. Nat. Struct. Mol. Biol. 20, 440–446 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Lister, N. et al. The molecular dynamics of long noncoding RNA control of transcription in PTEN and its pseudogene. Proc. Natl Acad. Sci. USA 114, 9942–9947 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. He, S., Gu, W., Li, Y. & Zhu, H. ANRIL/CDKN2B-AS shows two-stage clade-specific evolution and becomes conserved after transposon insertions in simians. BMC Evol. Biol. 13, 247 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Pasmant, E. et al. Characterization of a germ-line deletion, including the entire INK4/ARF locus, in a melanoma-neural system tumor family: identification of ANRIL, an antisense noncoding RNA whose expression coclusters with ARF. Cancer Res. 67, 3963–3969 (2007).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work received financial support (to H. Zhu) from the NSFC (31571348 and 31771456), the Special Program for Applied Research on SuperComputation of the NSFC-Guangdong Joint Fund, and the Guangzhou Science and Technology Innovation Committee (201607010067).

Author information

Authors and Affiliations

Authors

Contributions

S.H. and X.Y. performed the genome searches; H. Zhang and J.L. built the database; H. Zhang, Y.W., and H. Zhu revised the LongTarget code; H. Zhu designed the study, analyzed the data, and drafted the manuscript.

Corresponding authors

Correspondence to Hai Zhang or Hao Zhu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

He, S., Zhang, H., Liu, H. & Zhu, H. Bioinformatics 31, 178–186 (2015): https://doi.org/10.1093/bioinformatics/btu643

Liu, H., Shang, X. & Zhu, H. Bioinformatics 33, 1431–1436 (2017): https://doi.org/10.1093/bioinformatics/btw818

Wang, S. et al. Cell Death Dis. 9, 805 (2018): https://doi.org/10.1038/s41419-018-0869-2

Integrated supplementary information

Supplementary Figure 1 The coordinates of some TTSs of RP11-375H19.2 collected in the database.

To open this window that shows TTSs of TFO1, click the blue TFO1 button in the webpage shown in Fig. 9.

Supplementary Figure 2 The initial records in the Excel file that reports the TTS distribution of H19 at all transcripts in the human genome hg38.

“bs” means binding site. bs_chr, bs_start, and bs_end indicate the chromosome number, start coordinate, and end coordinate of a TTS. TTS_area is the area of the peak of a TTS (as shown in custom tracks of TTS distributions) and indicates the strength of the TTS.

Supplementary Figure 3 The expression level of CDKN2B-AS1 in 53 tissues.

The picture is obtained from the GTEx Gene track in the UCSC Genome Browser by choosing the GTEx Transcript track in the Expression section to display GTEx genes graphically.

Supplementary Figure 4 The TTS distributions of marmoset lncRNAs in the marmoset CDKN2A/2B locus.

From top to bottom are 14 custom tracks of the TTS distribution of 14 lncRNAs, the track of Ensembl Genes, the custom track of CDKN2B-AS1_Marmoset, and the track of RepeatMasker. Three TTSs at transposable elements of Simple or Low Complexity are marked by three blue vertical lines. The results indicate that, as in humans, many lncRNAs bind to promoters of CDKN2A/2B.

Supplementary Figure 5 The TTS distributions of mouse lncRNAs in the mouse CDKN2A/2B locus.

Some TTSs at promoters and CpG islands (in green) are marked in yellow, and some TTSs at transposable elements (in the RepeatMasker track) and repetitive elements (in the SimpleRepeats track) are marked in blue. Some lncRNAs have TTSs only at transposable and/or repetitive elements.

Supplementary Figure 6 The TTS distributions of H19 and other human lncRNAs in the human IGF2 region.

From top to bottom are custom tracks of the TTS distribution of 16 lncRNAs, UCSC Genes, CpG Islands, ENCODE DNA Methylation (the colored lines indicate DNA methylation signals), and ENCODE Histone Modification (the colored areas indicate histone modification signals). This figure indicates that many lncRNAs may bind to the IGF2 region at the site H19 binds to.

Supplementary Figure 7 Exons of the orthologue of human CDKN2B-AS1 in marmoset.

This webpage shows the coordinates of all exons of the orthologue of human CDKN2B-AS1 in marmoset.

Supplementary Figure 8 The format of the custom gene track file marmoset-CDKN2B-AS1.

The gene name is CDKN2B-AS1_Marmoset as shown in Supplementary Figure 4. All custom gene track files should follow the same format, but can adopt any file name.

Supplementary Figure 9 The results of a M:N case of genome-wide prediction.

The figure shows some of the records in the Excel file that reports TTSs of lncRNA transcripts at the genomic regions of protein-coding transcripts. Here peak_area and TTS_area are defined as in Supplementary Figure 2.

Supplementary Figure 10 Two ENST ID lists.

The two ENST ID lists are the inputs of the M:N case of genome-wide prediction shown in Supplementary Figure 9. These Ensembl ENST IDs are differentially expressed lncRNA transcripts and differentially expressed protein-coding transcripts from an RNA-seq analysis we made that compare the gene expression in 12 human colorectal cancer tissues with the gene expression in 3 normal colorectal tissues (unpublished observations [Sha He, Yujian Wen, Hao Zhu]). These transcripts were assembled using reads of RNA-seq by the StringTie program (Nat. Biotechnol. 33, 290–295; 2015) and differential expression was determined by the EdgeR program (Genome Biol. 17, 75; 2016).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Methods

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, J., Wen, Y., He, S. et al. Pipelines for cross-species and genome-wide prediction of long noncoding RNA binding. Nat Protoc 14, 795–818 (2019). https://doi.org/10.1038/s41596-018-0115-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-018-0115-5

  • Springer Nature Limited

This article is cited by

Navigation