Advertisement

Computational and Bioinformatics Methods for MicroRNA Gene Prediction

  • Jens Allmer
Part of the Methods in Molecular Biology book series (MIMB, volume 1107)

Abstract

MicroRNAs (miRNAs) have attracted ever-increasing interest in recent years. Since experimental approaches for determining miRNAs are nontrivial in their application, computational methods for the prediction of miRNAs have gained popularity. Such methods can be grouped into two broad categories (1) performing ab initio predictions of miRNAs from primary sequence alone and (2) additionally employing phylogenetic conservation. Most methods acknowledge the importance of hairpin or stem–loop structures and employ various methods for the prediction of RNA secondary structure. Machine learning has been employed in both categories with classification being the predominant method. In most cases, positive and negative examples are necessary for performing classification. Since it is currently elusive to experimentally determine all possible miRNAs for an organism, true negative examples are hard to come by, and therefore the accuracy assessment of algorithms is hampered. In this chapter, first RNA secondary structure prediction is introduced since it provides a basis for miRNA prediction. This is followed by an assessment of homology and then ab initio miRNA prediction methods.

Keywords

miRNA Secondary structure prediction Homology-based prediction Ab initio prediction miRNA prediction accuracy Multiple sequence alignment-based prediction 

Notes

Acknowledgements

I would like to thank Müşerref Duygu Saçar for preparing Fig. 1. This study was in part supported by an award received from the Turkish Academy of Sciences for outstanding young scientists (TUBA GEBIP, http://www.tuba.gov.tr).

References

  1. 1.
    Soldà G, Makunin IV, Sezerman OU et al (2009) An Ariadne’s thread to the identification and annotation of noncoding RNAs in eukaryotes. Brief Bioinform 10:475–489PubMedCrossRefGoogle Scholar
  2. 2.
    Dinger ME, Pang KC, Mercer TR et al (2008) Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol 4:e1000176PubMedCrossRefGoogle Scholar
  3. 3.
    Sewer A, Paul N, Landgraf P et al (2005) Identification of clustered microRNAs using an ab initio prediction method. BMC Bioinformatics 6:267PubMedCrossRefGoogle Scholar
  4. 4.
    Griffiths-Jones S, Moxon S, Marshall M et al (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 33:D121–D124PubMedCrossRefGoogle Scholar
  5. 5.
    Rodriguez A, Griffiths-Jones S, Ashurst JL et al (2004) Identification of mammalian microRNA host genes and transcription units. Genome Res 14:1902–1910PubMedCrossRefGoogle Scholar
  6. 6.
    Pfeffer S, Zavolan M, Grässer FA et al (2004) Identification of virus-encoded microRNAs. Science 304:734–736PubMedCrossRefGoogle Scholar
  7. 7.
    Fahlgren N, Jogdeo S, Kasschau KD et al (2010) MicroRNA gene evolution in arabidopsis lyrata and arabidopsis thaliana. Plant Cell 22:1074–1089PubMedCrossRefGoogle Scholar
  8. 8.
    Aravin A, Tuschl T (2005) Identification and characterization of small RNAs involved in RNA silencing. FEBS Lett 579:5830–5840PubMedCrossRefGoogle Scholar
  9. 9.
    Bentwich I (2005) Prediction and validation of microRNAs and their targets. FEBS Lett 579:5904–5910PubMedCrossRefGoogle Scholar
  10. 10.
    Janssen S, Schudoma C, Steger G et al (2011) Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction. BMC Bioinformatics 12:429PubMedCrossRefGoogle Scholar
  11. 11.
    Mathews DH, Turner DH (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 317:191–203PubMedCrossRefGoogle Scholar
  12. 12.
    Juan V, Wilson C (1999) RNA secondary structure prediction based on free energy and phylogenetic analysis. J Mol Biol 289:935–947PubMedCrossRefGoogle Scholar
  13. 13.
    Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31:3429–3431PubMedCrossRefGoogle Scholar
  14. 14.
    Krüger J, Rehmsmeier M (2006) RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res 34:W451–W454PubMedCrossRefGoogle Scholar
  15. 15.
    Reuter JS, Mathews DH (2010) RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11:129PubMedCrossRefGoogle Scholar
  16. 16.
    Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415PubMedCrossRefGoogle Scholar
  17. 17.
    Shapiro BA (1988) An algorithm for comparing multiple RNA secondary structures. Comput Appl Biosci 4:387–393PubMedGoogle Scholar
  18. 18.
    Aksay C, Salari R, Karakoc E et al (2007) taveRNA: a web suite for RNA algorithms and applications. Nucleic Acids Res 35:W325–W329PubMedCrossRefGoogle Scholar
  19. 19.
    Janssen S, Giegerich R (2010) Faster computation of exact RNA shape probabilities. Bioinformatics 26:632–639PubMedCrossRefGoogle Scholar
  20. 20.
    Markham NR, Zuker M (2008) UNAFold: software for nucleic acid folding and hybridization. In: Keith JM (ed) Bioinformatics: structure, function and applications. Humana Press, Totowa, NJ, pp 3–31Google Scholar
  21. 21.
    Leung W-S, Lin MCM, Cheung DW et al (2008) Filtering of false positive microRNA candidates by a clustering-based approach. BMC Bioinformatics 9(Suppl 12):S3PubMedCrossRefGoogle Scholar
  22. 22.
    Dezulian T, Remmert M, Palatnik JF et al (2006) Identification of plant microRNA homologs. Bioinformatics 22:359–360PubMedCrossRefGoogle Scholar
  23. 23.
    Artzi S, Kiezun A, Shomron N (2008) MiRNAminer: a tool for homologous microRNA gene search. BMC Bioinformatics 9:39PubMedCrossRefGoogle Scholar
  24. 24.
    Gerlach D, Kriventseva EV, Rahman N et al (2009) miROrtho: computational survey of microRNA genes. Nucleic Acids Res 37:D111–D117PubMedCrossRefGoogle Scholar
  25. 25.
    Maselli V, Bernardo DD, Banfi S (2008) CoGemiR: a comparative genomics microRNA database. BMC Genomics 9:457PubMedCrossRefGoogle Scholar
  26. 26.
    Guerra-Assunção JA, Enright AJ (2010) MapMi: automated mapping of microRNA loci. BMC Bioinformatics 11:133PubMedCrossRefGoogle Scholar
  27. 27.
    Lim LP, Lau NC, Weinstein EG et al (2003) The microRNAs of Caenorhabditis elegans. Genes Dev 17:991–1008PubMedCrossRefGoogle Scholar
  28. 28.
    Ohler U, Yekta S, Lim LP et al (2004) Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. RNA 10:1309–1322PubMedCrossRefGoogle Scholar
  29. 29.
    Nam J-W, Kim J, Kim S-K et al (2006) ProMiR II: a web server for the probabilistic prediction of clustered, nonclustered, conserved and nonconserved microRNAs. Nucleic Acids Res 34:W455–W458PubMedCrossRefGoogle Scholar
  30. 30.
    Berezikov E, Guryev V, van de Belt J et al (2005) Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120:21–24PubMedCrossRefGoogle Scholar
  31. 31.
    Huang T-H, Fan B, Rothschild MF et al (2007) MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans. BMC Bioinformatics 8:341PubMedCrossRefGoogle Scholar
  32. 32.
    Bonnet E, Wuyts J, Rouzé P et al (2004) Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. Proc Natl Acad Sci U S A 101:11511–11516PubMedCrossRefGoogle Scholar
  33. 33.
    Wang X-J, Reyes JL, Chua N-H et al (2004) Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol 5:R65PubMedCrossRefGoogle Scholar
  34. 34.
    Lang Q, Jin C, Lai L et al (2011) Tobacco microRNAs prediction and their expression infected with cucumber mosaic virus and potato virus X. Mol Biol Rep 38:1523–1531PubMedCrossRefGoogle Scholar
  35. 35.
    Gruber AR, Findeiß S, Washietl S et al (2010) Rnaz 2.0: improved noncoding RNA detection. Pac Symp Biocomput 15:69–79Google Scholar
  36. 36.
    Rivas E, Klein RJ, Jones TA et al (2001) Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr Biol 11:1369–1373PubMedCrossRefGoogle Scholar
  37. 37.
    Liang H, Li W-H (2009) Lowly expressed human microRNA genes evolve rapidly. Mol Biol Evol 26:1195–1198PubMedCrossRefGoogle Scholar
  38. 38.
    Lu J, Shen Y, Wu Q et al (2008) The birth and death of microRNA genes in Drosophila. Nat Genet 40:351–355PubMedCrossRefGoogle Scholar
  39. 39.
    Keshavan R, Virata M, Keshavan A et al (2010) Computational identification of Ciona intestinalis microRNAs. Zoolog Sci 27:162–170PubMedCrossRefGoogle Scholar
  40. 40.
    Lai EC, Tomancak P, Williams RW et al (2003) Computational identification of Drosophila microRNA genes. Genome Biol 4:R42PubMedCrossRefGoogle Scholar
  41. 41.
    Huang JC, Morris QD, Frey BJ (2007) Bayesian inference of MicroRNA targets from sequence and expression data. J Comput Biol 14:550–563PubMedCrossRefGoogle Scholar
  42. 42.
    Nam J-W, Shin K-R, Han J et al (2005) Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res 33:3570–3581PubMedCrossRefGoogle Scholar
  43. 43.
    Hertel J, Stadler PF (2006) Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics 22:197–202CrossRefGoogle Scholar
  44. 44.
    Berezikov E, Cuppen E, Plasterk RHA (2006) Approaches to microRNA discovery. Nat Genet 38(Suppl):2–7CrossRefGoogle Scholar
  45. 45.
    Hafner M, Landthaler M, Burger L et al (2010) Transcriptome-wide identification of RNA-binding protein and MicroRNA target sites by PAR-CLIP. Cell 141:129–141PubMedCrossRefGoogle Scholar
  46. 46.
    Vogel J, Sharma CM (2005) How to find small non-coding RNAs in bacteria. Biol Chem 386:1219–1238PubMedGoogle Scholar
  47. 47.
    Hüttenhofer A, Vogel J (2006) Experimental approaches to identify non-coding RNAs. Nucleic Acids Res 34:635–646PubMedCrossRefGoogle Scholar
  48. 48.
    Lau NC, Lim LP, Weinstein EG et al (2001) An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294:858–862PubMedCrossRefGoogle Scholar
  49. 49.
    Bentwich I (2008) Identifying human microRNAs. Curr Top Microbiol Immunol 320:257–269PubMedCrossRefGoogle Scholar
  50. 50.
    Ding J, Zhou S, Guan J (2010) MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features. BMC Bioinformatics 11(Suppl 1):S11PubMedCrossRefGoogle Scholar
  51. 51.
    Xue C, Li F, He T et al (2005) Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6:310PubMedCrossRefGoogle Scholar
  52. 52.
    Jiang P, Wu H, Wang W et al (2007) MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res 35:W339–W344PubMedCrossRefGoogle Scholar
  53. 53.
    Ng KLS, Mishra SK (2007) De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures. Bioinformatics 23:1321–1330PubMedCrossRefGoogle Scholar
  54. 54.
    Teune J-H, Steger G (2010) NOVOMIR: De Novo Prediction of MicroRNA-Coding Regions in a Single Plant-Genome. J Nucleic Acids 2010. doi: 10.4061/2010/495904, Pubmed: 20871826Google Scholar
  55. 55.
    Thieme CJ, Gramzow L, Lobbes D et al (2011) SplamiR-prediction of spliced miRNAs in plants. Bioinformatics (Oxford, England) 27:1215–1223CrossRefGoogle Scholar
  56. 56.
    Wu Y, Wei B, Liu H et al (2011) MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences. BMC Bioinformatics 12:107PubMedCrossRefGoogle Scholar
  57. 57.
    Shi W, Hendrix D, Levine M et al (2009) A distinct class of small RNAs arises from pre-miRNA-proximal regions in a simple chordate. Nat Struct Mol Biol 16:183–189PubMedCrossRefGoogle Scholar
  58. 58.
    Yousef M, Jung S, Showe LC et al (2008) Learning from positive examples when the negative class is undetermined–microRNA gene identification. Algorithms Mol Biol 3:2PubMedCrossRefGoogle Scholar
  59. 59.
    Gardner PP, Daub J, Tate JG et al (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37:D136–D140PubMedCrossRefGoogle Scholar
  60. 60.
    Taccioli C, Fabbri E, Visone R et al (2009) UCbase & miRfunc: a database of ultraconserved sequences and microRNA function. Nucleic Acids Res 37:D41–D48PubMedCrossRefGoogle Scholar
  61. 61.
    Saçar MD, Hamzeiy H, and Allmer J (2013) Can MiRBase provide positive data for machine learning for the detection of MiRNA hairpins?. J Integr Bioinform 10:215Google Scholar
  62. 62.
    Cakir MV, Allmer J (2010) Systematic computational analysis of potential RNAi regulation in Toxoplasma gondii. Health Informatics and Bioinformatics (HIBIT), 2010 5th International Symposium on, pp. 31–38 IEEE, Ankara, TurkeyGoogle Scholar
  63. 63.
    Nam S, Li M, Choi K et al (2009) MicroRNA and mRNA integrated analysis (MMIA): a web tool for examining biological functions of microRNA expression. Nucleic Acids Res 37:W356–W362PubMedCrossRefGoogle Scholar
  64. 64.
    Naeem H, Küffner R, Csaba G et al (2010) miRSel: automated extraction of associations between microRNAs and genes from the biomedical literature. BMC Bioinformatics 11:135PubMedCrossRefGoogle Scholar
  65. 65.
    Backes C, Meese E, Lenhof H et al (2010) A dictionary on microRNAs and their putative target pathways. Nucleic Acids Res 38:4476–4486PubMedCrossRefGoogle Scholar
  66. 66.
    Long Y-S, Deng G-F, Sun X-S et al (2011) Identification of the transcriptional promoters in the proximal regions of human microRNA genes. Mol Biol Rep 38:4153–4157PubMedCrossRefGoogle Scholar
  67. 67.
    Hendrix D, Levine M, Shi W (2010) miRTRAP, a computational method for the systematic identification of miRNAs from high throughput sequencing data. Genome Biol 11:R39PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Jens Allmer
    • 1
  1. 1.Molecular Biology and GeneticsIzmir Institute of TechnologyIzmirTurkey

Personalised recommendations