Exploring Essential Attributes for Detecting MicroRNA Precursors from Background Sequences

  • Yun Zheng
  • Wynne Hsu
  • Mong Li Lee
  • Limsoon Wong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4316)


MicroRNAs (miRNAs) have been shown to play important roles in post-transcriptional gene regulation. The hairpin structure is a key characteristic of the microRNAs precursors (pre-miRNAs). How to encode their hairpin structures is a critical step to correctly detect the pre-miRNAs from background sequences, i.e., pseudo miRNA precursors. In this paper, we have proposed to encode the hairpin structures of the pre-miRNA with a set of features, which captures both the global and local structure characteristics of the pre-miRNAs. Furthermore, we find that four essential attributes are discriminatory for classifying human pre-miRNAs and background sequences with an information theory approach. The experimental results show that the number of conserved essential attributes decreases when the phylogenetic distance between the species increases. Specifically, one A-U pair, which produces the U at the start position of most mature miRNAs, in the pre-miRNAs is found to be well conserved in different species for the purpose of biogenesis.


Mutual Information Local Feature Essential Attribute Mature miRNAs Hairpin Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alvarez-Garcia, I., Miska, E.A.: MicroRNA functions in animal development and human disease. Development 132, 4653–4662 (2005)CrossRefGoogle Scholar
  2. 2.
    Ambros, V.: The functions of animal microRNAs. Nature 431, 350–355 (2004)CrossRefGoogle Scholar
  3. 3.
    Bartel, D.P.: MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116, 281–297 (2004)CrossRefGoogle Scholar
  4. 4.
    Lee, Y., et al.: The nuclear RNase III Drosha initiates microRNA processing. Nature 425, 415–419 (2003)CrossRefGoogle Scholar
  5. 5.
    Zamore, P.D., Haley, B.: Ribo-gnome: The Big World of Small RNAs. Science 309(5740), 1519–1524 (2005)CrossRefGoogle Scholar
  6. 6.
    Lai, E.C., et al.: Computational identification of Drosophila microRNA genes. Genome Biol. 4, R42 (2003)CrossRefGoogle Scholar
  7. 7.
    Lim, L.P., et al.: Vertebrate MicroRNA Genes. Science 299(5612), 1540 (2003)CrossRefGoogle Scholar
  8. 8.
    Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucl. Acids Res. 31(13), 3406–3415 (2003)CrossRefGoogle Scholar
  9. 9.
    Berezikov, E., et al.: Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120, 21–24 (2005)CrossRefGoogle Scholar
  10. 10.
    Boffelli, D., et al.: Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome. Science 299(5611), 1391–1394 (2003)CrossRefGoogle Scholar
  11. 11.
    Xue, C., et al.: Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6(1), 310 (2005)CrossRefGoogle Scholar
  12. 12.
    Bentwich, I., et al.: Identification of hundreds of conserved and nonconserved human microRNAs. Nature Genetics 37(7), 766–770 (2005)CrossRefGoogle Scholar
  13. 13.
    Sewer, A., et al.: Identification of clustered microRNAs using an ab initio prediction method. BMC Bioinformatics 6(1), 267 (2005)CrossRefGoogle Scholar
  14. 14.
    Yang, L., Hsu, W., Lee, M., Wong, L.: Identification of microRNA precursors via svm. In: Proc. of the 4th Asia-Pacific Bioinformatics Conference, pp. 267–276 (2006)Google Scholar
  15. 15.
    Lewis, B.P., Burge, C.B., Bartel, D.P.: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20 (2005)CrossRefGoogle Scholar
  16. 16.
    Hofacker, I.L.: Vienna RNA secondary structure server. Nucl. Acids Res. 31(13), 3429–3431 (2003)CrossRefGoogle Scholar
  17. 17.
    Zheng, Y., Kwoh, C.K.: Identifying simple discriminatory gene vectors with an information theory approach. In: Proc. of the 4th Computational Systems Bioinformatics Conference, CSB 2005, Stanford, CA, pp. 12–23 (2005)Google Scholar
  18. 18.
    Bonnet, E., et al.: Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics 20(17), 2911–2917 (2004)CrossRefGoogle Scholar
  19. 19.
    Wang, X.J., et al.: Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biology 5(9), R65 (2004)CrossRefGoogle Scholar
  20. 20.
    Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucl. Acids Res. 9(1), 133–148 (1981)CrossRefGoogle Scholar
  21. 21.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  22. 22.
    Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  23. 23.
    Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  24. 24.
    Cohen, W.W.: Fast effective rule induction. In: Proc. of the 12th International Conference on Machine Learning, Tahoe City, CA, pp. 115–123. Morgan Kaufmann, San Francisco (1995)Google Scholar
  25. 25.
    Shannon, C., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1963)Google Scholar
  26. 26.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, Inc., New York (1991)MATHCrossRefGoogle Scholar
  27. 27.
    Hamming, R.: Error detecting and error correcting codes. Bell System Technical Jounral 9, 147–160 (1950)MathSciNetGoogle Scholar
  28. 28.
    Griffiths-Jones, S.: The microRNA Registry. Nucl. Acids Res. 32(90001), D109–D111 (2004)CrossRefGoogle Scholar
  29. 29.
    Karolchik, D., et al.: The UCSC Genome Browser Database. Nucl. Acids Res. 31(1), 51–54 (2003)CrossRefGoogle Scholar
  30. 30.
    Frank, E., et al.: Data mining in bioinformatics using Weka. Bioinformatics 20(15), 2479–2481 (2004)CrossRefGoogle Scholar
  31. 31.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. of the 13th International Joint Conference on Artificial Intelligence, IJCAI 1993, Chambery, France, pp. 1022–1027 (1993)Google Scholar
  32. 32.
    Griffiths-Jones, S., et al.: miRBase: microRNA sequences, targets and gene nomenclature. Nucl. Acids Res. 34(S1), D140–144 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yun Zheng
    • 1
  • Wynne Hsu
    • 1
  • Mong Li Lee
    • 1
  • Limsoon Wong
    • 1
  1. 1.Department of Computer ScienceSchool of Computing, National University of SingaporeSingapore

Personalised recommendations