Exploring Essential Attributes for Detecting MicroRNA Precursors from Background Sequences
MicroRNAs (miRNAs) have been shown to play important roles in post-transcriptional gene regulation. The hairpin structure is a key characteristic of the microRNAs precursors (pre-miRNAs). How to encode their hairpin structures is a critical step to correctly detect the pre-miRNAs from background sequences, i.e., pseudo miRNA precursors. In this paper, we have proposed to encode the hairpin structures of the pre-miRNA with a set of features, which captures both the global and local structure characteristics of the pre-miRNAs. Furthermore, we find that four essential attributes are discriminatory for classifying human pre-miRNAs and background sequences with an information theory approach. The experimental results show that the number of conserved essential attributes decreases when the phylogenetic distance between the species increases. Specifically, one A-U pair, which produces the U at the start position of most mature miRNAs, in the pre-miRNAs is found to be well conserved in different species for the purpose of biogenesis.
KeywordsMutual Information Local Feature Essential Attribute Mature miRNAs Hairpin Structure
Unable to display preview. Download preview PDF.
- 14.Yang, L., Hsu, W., Lee, M., Wong, L.: Identification of microRNA precursors via svm. In: Proc. of the 4th Asia-Pacific Bioinformatics Conference, pp. 267–276 (2006)Google Scholar
- 17.Zheng, Y., Kwoh, C.K.: Identifying simple discriminatory gene vectors with an information theory approach. In: Proc. of the 4th Computational Systems Bioinformatics Conference, CSB 2005, Stanford, CA, pp. 12–23 (2005)Google Scholar
- 21.Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
- 22.Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
- 23.Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
- 24.Cohen, W.W.: Fast effective rule induction. In: Proc. of the 12th International Conference on Machine Learning, Tahoe City, CA, pp. 115–123. Morgan Kaufmann, San Francisco (1995)Google Scholar
- 25.Shannon, C., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1963)Google Scholar
- 31.Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. of the 13th International Joint Conference on Artificial Intelligence, IJCAI 1993, Chambery, France, pp. 1022–1027 (1993)Google Scholar