Abstract
MicroRNAs (miRNAs) are short (~22 nucleotides), endogenously-initiated non-coding RNAs that control gene expression post transcriptionally, either by the degradation of target miRNAs or by the inhibition of protein translation. The prediction of miRNA genes is a challenging problem towards the understanding of post transcriptional gene regulation. The present paper focuses on developing a computational method for the identification of miRNA precursors.
We propose a machine learning algorithm based on Random Forests (RF) for miRNA prediction. The prediction algorithm relies on a set of features; compiled from known features as well as others introduced for the first time; that results in a performance that is better than most well known miRNA classifiers. The method achieves 91.3% accuracy, 86% f-measure, 97.2% specificity, 93.4% precision and 79.6% sensitivity, when tested on real data. Our method succeeds in getting better results than MiPred (the best currently known RF algorithm in literature), Triplet-SVM and Virgo and EumiR.
The obtained results indicate that Random Forests is a better alternative to Support Vector Machines (SVM) for miRNA prediction, especially from the point of view of accuracy and f-measure metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D., et al.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)
Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., Lu, Z.: Mipred: classification of real and pseudo microrna precursors using random forest prediction model with combined features. Nucleic Acids Research 35, W339–W344 (2007)
Lim, L., Lau, N., Weinstein, E., Abdelhakim, A., Yekta, S., Rhoades, M., Burge, C., Bartel, D.: The micrornas of caenorhabditis elegans. Genes & Development 17, 991 (2003)
Lai, E., Tomancak, P., Williams, R., Rubin, G.: Computational identication of drosophila microrna genes. Genome Biology 4 (2003)
Bonnet, E., Wuyts, J., Rouz, P., Van de Peer, Y.: Detection of 91 potential conserved plant micrornas in arabidopsis thaliana and oryza sativa identies important target genes. Proc. Natl. Acad. Sci. USA 101, 11511–11516 (2004)
Jones-Rhoades, M., Bartel, D.: Computational identification of plant micrornas and their targets, including a stress-induced mirna. Molecular Cell 14, 787–799 (2004)
Ng, K., Mishra, S.: De novo svm classification of precursor micrornas from genomic pseudo hairpins using global and intrinsic folding measures. Bioinformatics 23, 1321–1330 (2007)
Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M., Tuschl, T., van Nimwegen, E., Zavolan, M.: Identication of clustered micrornas using an ab initio prediction method. BMC Bioinformatics 6 (2005)
Xue, C., Li, F., He, T., Liu, G., Li, Y., Zhang, X.: Classification of real and pseudo microrna precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6 (2005)
Zheng, Y., Hsu, W., Li Lee, M., Wong, L.: Exploring essential attributes for detecting microRNA precursors from background sequences. In: Dalkilic, M.M., Kim, S., Yang, J. (eds.) VDMB 2006. LNCS (LNBI), vol. 4316, pp. 131–145. Springer, Heidelberg (2006)
Batuwita, R., Palade, V.: Micropred: effective classification of pre-mirnas for human mirna gene prediction. Bioinformatics 25, 989–995 (2009)
Pasaila, D., Mohorianu, I., Sucila, A., Pantiru, S., Ciortuz, L.: Yet another svm for mirna recognition: yasmir. Technical report, Citeseer (2010)
Shiva, K., Faraz, A., Vinod, S.: Prediction of viral microrna precursors based on human microrna precursor sequence and structural features. Virology Journal 6 (2009)
Hofacker, I., Fontana, W., Stadler, P., Bonhoeffer, L., Tacker, M., Schuster, P.: Fast folding and comparison of rna secondary structures. Monatshefte für Chemie/Chemical Monthly 125, 167–188 (1994)
Griffiths-Jones, S.: The microrna registry. Nucleic Acids Research 32, D109–D111 (2004)
Pruitt, K., Maglott, D.: Refseq and locuslink: Ncbi gene-centered resources. Nucleic Acids Research 29, 137–140 (2001)
Bonnet, E., Wuyts, J., Rouzé, P., Van de Peer, Y.: Evidence that microrna precursors, unlike other non-coding rnas, have lower folding free energies than random sequences. Bioinformatics 20, 2911–2917 (2004)
Freyhult, E., Gardner, P.P., Moulton, V.: A comparison of rna folding measures. BMC Bioinformatics 6, 241 (2005)
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5, 3–55 (2001)
van der Burgt, A., Fiers, M.W., Nap, J.P., van Ham, R.C.: In silico mirna prediction in metazoan genomes: balancing between sensitivity and specificity. BMC Genomics 10, 204 (2009)
Loong, S.N.K., Mishra, S.K.: Unique folding of precursor micrornas: Quantitative evidence and implications for de novo identification. Rna 13, 170–187 (2007)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11, 10–18 (2009)
Griffiths-Jones, S., Grocock, R., Van Dongen, S., Bateman, A., Enright, A.: mirbase: microrna sequences, targets and gene nomenclature. Nucleic Acids Research 34, D140–D144 (2006)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
ElGokhy, S.M., Shibuya, T., Shoukry, A. (2014). Improving miRNA Classification Using an Exhaustive Set of Features. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-07581-5_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07580-8
Online ISBN: 978-3-319-07581-5
eBook Packages: EngineeringEngineering (R0)