Functional & Integrative Genomics

, Volume 17, Issue 2–3, pp 353–363 | Cite as

A systemic identification approach for primary transcription start site of Arabidopsis miRNAs from multidimensional omics data

  • Qi You
  • Hengyu Yan
  • Yue Liu
  • Xin Yi
  • Kang Zhang
  • Wenying Xu
  • Zhen Su
Original Article


The 22-nucleotide non-coding microRNAs (miRNAs) are mostly transcribed by RNA polymerase II and are similar to protein-coding genes. Unlike the clear process from stem-loop precursors to mature miRNAs, the primary transcriptional regulation of miRNA, especially in plants, still needs to be further clarified, including the original transcription start site, functional cis-elements and primary transcript structures. Due to several well-characterized transcription signals in the promoter region, we proposed a systemic approach integrating multidimensional “omics” (including genomics, transcriptomics, and epigenomics) data to improve the genome-wide identification of primary miRNA transcripts. Here, we used the model plant Arabidopsis thaliana to improve the ability to identify candidate promoter locations in intergenic miRNAs and to determine rules for identifying primary transcription start sites of miRNAs by integrating high-throughput omics data, such as the DNase I hypersensitive sites, chromatin immunoprecipitation-sequencing of polymerase II and H3K4me3, as well as high throughput transcriptomic data. As a result, 93% of refined primary transcripts could be confirmed by the primer pairs from a previous study. Cis-element and secondary structure analyses also supported the feasibility of our results. This work will contribute to the primary transcriptional regulatory analysis of miRNAs, and the conserved regulatory pattern may be a suitable miRNA characteristic in other plant species.


Primary transcription start site Epigenomics Intergenic miRNA Arabidopsis Cis-element 



This work was supported by grants from the National Natural Science Foundation of China (31371291).

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

10142_2016_541_MOESM1_ESM.jpg (1.2 mb)
Figure S1 (A) The box plot of 31 RNA-seq samples’ normalized WIG values. The horizontal axis represents the relative distance from the preTSS of miR158a. The vertical axis represents the log10 WIG value corresponding to the position in the genome. (B) The UCSC genome browser displays the precursor miR158a structure and the high-throughput sequencing profile pattern near the miRNA, including CAGE (red), DH sites (orange), Pol II (pink), H3K4me3 (green) and transcriptome expression profiles (other color). (JPEG 1213 kb)
10142_2016_541_MOESM2_ESM.jpg (2.9 mb)
Figure S2 Pol II average profiles of six distance ranges near the preTSSs of miRNAs. The X-axis ranges from −500 bp upstream to 500 bp downstream of the miRNA preTSSs. The “100” indicates the miRNAs with ppdistances (distance between pri-TSS and precursor TSS) less than 100 nt, and the Pol II average profile (red line) of these miRNA is the peak closest to the preTSS. The average profile peak of “200” (orange line) miRNAs shifted upstream. The average profile peaks of “300” (green line) and “400” (light blue line) miRNAs shifted farther. The average profile peaks of “500” (dark blue line) and “>500” (purple line) are out of range. (JPEG 2966 kb)
10142_2016_541_MOESM3_ESM.jpg (3.2 mb)
Figure S3 Average DH profiles of six distance ranges near the precursor TSS of miRNAs. The X-axis ranges from 1000 bp upstream to 1000 bp downstream of the precursor TSSs. The “100” indicates the miRNAs with ppdistances (distance between pri-TSS and precursor TSS) less than 100 nt, and the DH sites’ average profile (red line) of these miRNAs is the peak closest to the precursor TSS. The average profile peak of “200″ (orange line) miRNAs shifted upstream. The average profile peaks of “300″ (green line), “400″ (light blue line) and “500″ (dark blue line) miRNAs shifted farther. The average profile peak of “>500″ (purple line) is 800-nt upstream, far from the precursor TSS. (JPEG 3237 kb)
10142_2016_541_MOESM4_ESM.jpg (3 mb)
Figure S4 H3K4me3 average profiles of six distance ranges near the precursor TSS of miRNAs. The X-axis ranges from 500 bp upstream to 500 bp downstream of the precursor TSS s. The “100” indicates miRNAs with ppdistances (distance between pri-TSS and precursor TSS) less than 100 nt, and the H3K4me3 average profile (red line) of these miRNAs peaks downstream of the precursor TSS. The average profile peak of “200” (orange line) miRNAs shifted upstream and is still located downstream of the precursor TSS. The average profile peaks of “300” (green line) and “400” (light blue line) miRNAs shifted farther. The average profile peaks of “500” (dark blue line) and “>500” (purple line) are out of range. (JPEG 3089 kb)
10142_2016_541_MOESM5_ESM.jpg (3.8 mb)
Figure S5 Logos of transcription factor-binding sites. The logos of these cis-elements are drawn based on FASTA sequences cut from promoters of the 254 best and middle-level miRNAs. (JPEG 3898 kb)
10142_2016_541_MOESM6_ESM.jpg (8.9 mb)
Figure S6 (A) The multiple alignment between primary transcripts of miR165a in mirex2.0 (miR165a_mirEX2.0) and the miR165a we predicted (primary miR165a). There is only a 1 nt difference in the 5′ end of the two miR165a primary transcripts. (B) The epigenetic modifications and transcriptomic profiles of miR158a in the UCSC genome browser, including CAGE data (red), DH sites (orange), Pol II (pink), H3K4me3 (green) and RNA-seq expression profiles (other colors). The blue arrow indicates the pri-TSS we predicted and the purple arrow stands for the TSS predicted in the mirEX2.0 database. (JPEG 9115 kb)
10142_2016_541_MOESM7_ESM.jpg (8.6 mb)
Figure S7 (A) Comparison between miR163 primary transcripts in the mirEX2.0 database and our prediction results. All of the primary transcripts, precursor and lncRNAs are based on the UCSC scale below. The light blue gene structures indicate the primary transcripts of miR163 in the mirEX2.0 database. The two gray boxes represent primers used, one located close to the 5′ end and the other located at the 3′ end. The red gene structure indicates our predicted primary transcript of miR163. The UCSC genome browser exhibits the location of precursor miR163 and an lncRNA that overlaps it. In addition, epigenetic modifications and transcriptomic profiles of miR163 are shown, including CAGE data (red), DH sites (orange), Pol II (pink), H3K4me3 (green) and RNA-seq expression profiles (other colors). (2) Box plot of 31 RNA-seq profiles. Reads accumulate continuously in the region 175 bp upstream of the precursor TSS of miR163. (3) Comparison of the pri-TSS of miR163 among mirEX2.0 database, our prediction results and previous experimental results (PMID:21,602,291). Our result has a short 5′ terminal extension, from which the RNA-seq reads start to accumulate. (JPEG 8831 kb)
10142_2016_541_MOESM8_ESM.jpg (1.8 mb)
Figure S8 (A) A classical primary miRNA in the RNAseqS group. Nine expression profiles of RNA-seq samples are displayed (top). There is at least one broad peak (WIG value above 0.5 and length over 100 bp) near the pri-TSS (middle). The box plot of 31 samples’ expression profiles shows a high accumulation of reads in the pri-miR158a region (bottom). (B) An example of the RNAseqM group (top). Compared with the RNAseqS group, less reads accumulated (bottom) or the peaks are not broad (50 bp < length < 100 bp). For instance, although the average WIG value of the 31 RNA-seq samples is above 0.5, the length of the peaks are less than 100 bp (middle). (C) An example of the RNAseqN group. Compared with the other two groups, there are few short peaks or no peaks (left). To be specific, the average WIG value of the 31 RNA-seq samples is less than 0.03 (right bottom) and short peaks (less than 50 bp) are distribute distantly in the primary miRNA region (right top). (JPEG 1868 kb)
10142_2016_541_MOESM10_ESM.doc (92 kb)
Table S1 Public epigenomics dataset. (DOC 91 kb)
10142_2016_541_MOESM10_ESM.doc (92 kb)
Table S2 Public transcriptomics dataset. (DOC 91 kb)
10142_2016_541_MOESM9_ESM.xls (96 kb)
Table S3 Details of primary miRNA transcripts. (XLS 96 kb)


  1. Abe H, Urao T, Ito T, Seki M, Shinozaki K, Yamaguchi-Shinozaki K (2003) Arabidopsis AtMYC2 (bHLH) and AtMYB2 (MYB) function as transcriptional activators in abscisic acid signaling. Plant Cell 15:63–78CrossRefPubMedPubMedCentralGoogle Scholar
  2. Akpinar BA, Kantar M, Budak H (2015) Root precursors of microRNAs in wild emmer and modern wheats show major differences in response to drought stress Functional & integrative genomics 15:587–598 doi: 10.1007/s10142-015-0453-0
  3. Alptekin B, Budak H (2016) Wheat miRNA ancestors: evident by transcriptome analysis of A, B, and D genome donors. Functional & integrative genomics. doi: 10.1007/s10142-016-0487-y Google Scholar
  4. Axtell MJ, Bartel DP (2005) Antiquity of microRNAs and their targets in land plants. Plant Cell 17:1658–1673. doi: 10.1105/tpc.105.032185 CrossRefPubMedPubMedCentralGoogle Scholar
  5. Baek D et al (2013) Regulation of miR399f transcription by AtMYB2 affects phosphate starvation responses in Arabidopsis. Plant Physiol 161:362–373. doi: 10.1104/pp.112.205922 CrossRefPubMedGoogle Scholar
  6. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297CrossRefPubMedGoogle Scholar
  7. Baulcombe D (2004) RNA silencing in plants. Nature 431:356–363. doi: 10.1038/nature02874 CrossRefPubMedGoogle Scholar
  8. Bi W, Wu L, Coustry F, de Crombrugghe B, Maity SN (1997) DNA binding specificity of the CCAAT-binding factor CBF/NF-Y. J Biol Chem 272:26562–26572CrossRefPubMedGoogle Scholar
  9. Bielewicz D et al (2013) Introns of plant pri-miRNAs enhance miRNA biogenesis. EMBO Rep 14:622–628. doi: 10.1038/embor.2013.62 CrossRefPubMedPubMedCentralGoogle Scholar
  10. Boyle AP, Guinney J, Crawford GE, Furey TS (2008) F-seq: a feature density estimator for high-throughput sequence tags. Bioinformatics 24:2537–2538. doi: 10.1093/bioinformatics/btn480 CrossRefPubMedPubMedCentralGoogle Scholar
  11. Boyle AP et al (2011) High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res 21:456–464. doi: 10.1101/gr.112656.110 CrossRefPubMedPubMedCentralGoogle Scholar
  12. Brion P, Westhof E (1997) Hierarchy and dynamics of RNA folding Annual review of biophysics and biomolecular structure 26:113–137 doi: 10.1146/annurev.biophys.26.1.113
  13. Budak H, Akpinar BA (2015) Plant miRNAs: biogenesis, organization and origins. Functional & integrative genomics 15:523–531. doi: 10.1007/s10142-015-0451-2 CrossRefGoogle Scholar
  14. Carninci P et al (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38:626–635. doi: 10.1038/ng1789 CrossRefPubMedGoogle Scholar
  15. Chekanova JA (2015) Long non-coding RNAs and their functions in plants. Curr Opin Plant Biol 27:207–216. doi: 10.1016/j.pbi.2015.08.003 CrossRefPubMedGoogle Scholar
  16. Cumbie JS, Ivanchenko MG, Megraw M (2015) NanoCAGE-XL and CapFilter: an approach to genome wide identification of high confidence transcription start sites. BMC Genomics 16:597. doi: 10.1186/s12864-015-1670-6 CrossRefPubMedPubMedCentralGoogle Scholar
  17. Cuperus JT, Fahlgren N, Carrington JC (2011) Evolution and functional diversification of MIRNA genes. Plant Cell 23:431–442. doi: 10.1105/tpc.110.082784 CrossRefPubMedPubMedCentralGoogle Scholar
  18. Duncan CD, Weeks KM (2010) Nonhierarchical ribonucleoprotein assembly suggests a strain-propagation model for protein-facilitated RNA folding Biochemistry 49:5418–5425 doi: 10.1021/bi100267g
  19. Freeling M, Subramaniam S (2009) Conserved noncoding sequences (CNSs) in higher plants. Curr Opin Plant Biol 12:126–132. doi: 10.1016/j.pbi.2009.01.005 CrossRefPubMedGoogle Scholar
  20. Grace ML, Chandrasekharan MB, Hall TC, Crowe AJ (2004) Sequence and spacing of TATA box elements are critical for accurate initiation from the β-phaseolin promoter. J Biol Chem 279:8102–8110. doi: 10.1074/jbc.M309376200 CrossRefPubMedGoogle Scholar
  21. Guo X, Gao L, Wang Y, Chiu DK, Wang T, Deng Y (2016) Advances in long noncoding RNAs: identification, structure prediction and function annotation. Briefings in functional genomics 15:38–46. doi: 10.1093/bfgp/elv022 PubMedGoogle Scholar
  22. Ha M, Ng DW, Li WH, Chen ZJ (2011) Coordinated histone modifications are associated with gene expression variation within and between species. Genome Res 21:590–598. doi: 10.1101/gr.116467.110 CrossRefPubMedPubMedCentralGoogle Scholar
  23. Hesselberth JR et al (2009) Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods 6:283–289. doi: 10.1038/nmeth.1313 CrossRefPubMedPubMedCentralGoogle Scholar
  24. Kang SG, Price J, Lin PC, Hong JC, Jang JC (2010) The arabidopsis bZIP1 transcription factor is involved in sugar signaling, protein networking, and DNA binding. Mol Plant 3:361–373. doi: 10.1093/mp/ssp115 CrossRefPubMedGoogle Scholar
  25. Kirik V, Kolle K, Misera S, Baumlein H (1998) Two novel MYB homologues with changed expression in late embryogenesis-defective Arabidopsis mutants. Plant Mol Biol 37:819–827CrossRefPubMedGoogle Scholar
  26. Kornberg RD (1999) Eukaryotic transcriptional control. Trends Cell Biol 9:M46–M49CrossRefPubMedGoogle Scholar
  27. Kung JT, Colognori D, Lee JT (2013) Long noncoding RNAs: past, present, and future. Genetics 193:651–669. doi: 10.1534/genetics.112.146704 CrossRefPubMedPubMedCentralGoogle Scholar
  28. Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, Kim VN (2004) MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23:4051–4060. doi: 10.1038/sj.emboj.7600385 CrossRefPubMedPubMedCentralGoogle Scholar
  29. Lifton RP, Goldberg ML, Karp RW, Hogness DS (1978) The organization of the histone genes in Drosophila melanogaster: functional and evolutionary implications. Cold Spring Harbor symposia on quantitative biology 42 Pt 2:1047–1051CrossRefGoogle Scholar
  30. Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL (2011) ViennaRNA Package 2.0. Algorithms for molecular biology : AMB 6:26. doi: 10.1186/1748-7188-6-26 CrossRefPubMedPubMedCentralGoogle Scholar
  31. Marsico A et al (2013) PROmiRNA: a new miRNA promoter recognition method uncovers the complex regulation of intronic miRNAs. Genome Biol 14:R84. doi: 10.1186/gb-2013-14-8-r84 CrossRefPubMedPubMedCentralGoogle Scholar
  32. Mejia-Guerra MK, Li W, Galeano NF, Vidal M, Gray J, Doseff AI, Grotewold E (2015) Core promoter plasticity between maize tissues and genotypes contrasts with predominance of sharp transcription initiation sites. Plant Cell 27:3309–3320. doi: 10.1105/tpc.15.00630 CrossRefPubMedPubMedCentralGoogle Scholar
  33. Ng DW, Zhang C, Miller M, Palmer G, Whiteley M, Tholl D, Chen ZJ (2011) Cis- and trans-Regulation of miR163 and target genes confers natural variation of secondary metabolites in two Arabidopsis species and their allopolyploids. Plant Cell 23:1729–1740. doi: 10.1105/tpc.111.083915 CrossRefPubMedPubMedCentralGoogle Scholar
  34. Peterson KJ, Dietrich MR, McPeek MA (2009) MicroRNAs and metazoan macroevolution: insights into canalization, complexity, and the Cambrian explosion. BioEssays : news and reviews in molecular, cellular and developmental biology 31:736–747. doi: 10.1002/bies.200900033 CrossRefGoogle Scholar
  35. Ramji DP, Foka P (2002) CCAAT/enhancer-binding proteins: structure, function and regulation. The Biochemical journal 365:561–575. doi: 10.1042/BJ20020508 CrossRefPubMedPubMedCentralGoogle Scholar
  36. Rubio-Somoza I, Weigel D (2013) Coordination of flower maturation by a regulatory circuit of three microRNAs. PLoS Genet 9:e1003374. doi: 10.1371/journal.pgen.1003374 CrossRefPubMedPubMedCentralGoogle Scholar
  37. Shiraki T et al. (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage Proceedings of the National Academy of Sciences of the United States of America 100:15776–15781 doi: 10.1073/pnas.2136655100
  38. Sims RJ, 3rd, Mandal SS, Reinberg D (2004) Recent highlights of RNA-polymerase-II-mediated transcription Current opinion in cell biology 16:263–271 doi: 10.1016/
  39. Szarzynska B et al (2009) Gene structures and processing of Arabidopsis thaliana HYL1-dependent pri-miRNAs. Nucleic Acids Res 37:3083–3093. doi: 10.1093/nar/gkp189 CrossRefPubMedPubMedCentralGoogle Scholar
  40. Tanzer A, Stadler PF (2004) Molecular evolution of a microRNA cluster. J Mol Biol 339:327–335. doi: 10.1016/j.jmb.2004.03.065 CrossRefPubMedGoogle Scholar
  41. Thurman RE et al (2012) The accessible chromatin landscape of the human genome. Nature 489:75–82. doi: 10.1038/nature11232 CrossRefPubMedPubMedCentralGoogle Scholar
  42. Tjaden G, Edwards JW, Coruzzi GM (1995) Cis elements and trans-acting factors affecting regulation of a nonphotosynthetic light-regulated gene for chloroplast glutamine synthetase. Plant Physiol 108:1109–1117CrossRefPubMedPubMedCentralGoogle Scholar
  43. Voinnet O (2009) Origin, biogenesis, and activity of plant microRNAs. Cell 136:669–687. doi: 10.1016/j.cell.2009.01.046 CrossRefPubMedGoogle Scholar
  44. Wang L, Wang S, Li W (2012) RSeQC: quality control of RNA-seq experiments. Bioinformatics 28:2184–2185. doi: 10.1093/bioinformatics/bts356 CrossRefPubMedGoogle Scholar
  45. Wang H, Chung PJ, Liu J, Jang IC, Kean MJ, Xu J, Chua NH (2014) Genome-wide identification of long noncoding natural antisense transcripts and their responses to light in Arabidopsis Genome research 24:444–453 doi: 10.1101/gr.165555.113
  46. Welsch R, Maass D, Voegel T, Dellapenna D, Beyer P (2007) Transcription factor RAP2.2 and its interacting partner SINAT2: stable elements in the carotenogenesis of Arabidopsis leaves. Plant Physiol 145:1073–1085. doi: 10.1104/pp.107.104828 CrossRefPubMedPubMedCentralGoogle Scholar
  47. LYi X, Zhang Z, Ling Y, Xu W, Su Z (2015) PNRD: a plant non-coding RNA database Nucleic acids research 43:D982–989 doi: 10.1093/nar/gku1162
  48. Zhang Y et al (2008) Model-based analysis of ChIP-seq (MACS). Genome Biol 9:R137. doi: 10.1186/gb-2008-9-9-r137 CrossRefPubMedPubMedCentralGoogle Scholar
  49. Zhang X, Bernatavichute YV, Cokus S, Pellegrini M, Jacobsen SE (2009) Genome-wide analysis of mono-, di- and trimethylation of histone H3 lysine 4 in Arabidopsis thaliana. Genome Biol 10:R62. doi: 10.1186/gb-2009-10-6-r62 CrossRefPubMedPubMedCentralGoogle Scholar
  50. Zhang W, Zhang T, Wu Y, Jiang J (2012) Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis. Plant Cell 24:2719–2731. doi: 10.1105/tpc.112.098061 CrossRefPubMedPubMedCentralGoogle Scholar
  51. Zhang S, Liu Y, Yu B (2015) New insights into pri-miRNA processing and accumulation in plants Wiley Interdisciplinary Reviews: RNA 6:533–545 doi: 10.1002/wrna.1292
  52. Zhou X, Ruan J, Wang G, Zhang W (2007) Characterization and identification of microRNA core promoters in four model species. PLoS Comput Biol 3:e37. doi: 10.1371/journal.pcbi.0030037 CrossRefPubMedPubMedCentralGoogle Scholar
  53. Zielezinski A et al (2015) mirEX 2.0 - an integrated environment for expression profiling of plant microRNAs. BMC plant biology 15:144. doi: 10.1186/s12870-015-0533-2 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Qi You
    • 1
  • Hengyu Yan
    • 1
  • Yue Liu
    • 1
  • Xin Yi
    • 1
  • Kang Zhang
    • 1
  • Wenying Xu
    • 1
  • Zhen Su
    • 1
  1. 1.State Key Laboratory of Plant Physiology and Biochemistry, College of Biological SciencesChina Agricultural UniversityBeijingChina

Personalised recommendations