Multispecies Gene Entropy Estimation, a Data Mining Approach
This paper presents a data mining approach to estimate multispecies gene entropy by using a self-organizing map (SOM) to mine a homologous gene set. The gene distribution function for each gene in the feature space is approximated by its probability distribution in the feature space. The phylogenetic applications of the multispecies gene entropy are investigated in an example of inferring the species phylogeny of eight yeast species. It is found that genes with the nearest K-L distances to the minimum entropy gene are more likely to be phylogenetically informative. The K-L distances of genes are strongly correlated with the spectral radiuses of their identity percentage matrices. The images of identity percentage matrices of the genes with small K-L distances to the minimum entropy gene are more similar to the image of the minimum entropy gene in their frequency domains after fast Fourier transforms (FFT) than the images of those genes with large K-L distances to the minimum entropy gene. Finally, a K-L distance based gene concatenation approach under gene clustering is proposed to infer species phylogenies robustly and systematically.
KeywordsSequence Space Gene Entropy Informative Gene Data Mining Approach Good Match Unit
Unable to display preview. Download preview PDF.
- 3.Lanctot, J., Li, M., Yang, E.: Estimating DNA sequence entropy. In: Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pp. 409–418 (2000)Google Scholar
- 6.Yeo, G., Burge, C.: Maximum entropy modeling of short Sequence motifs applications to RNA splicing signals. In: RECOMB 2003, Berlin, Germany, pp. 322–331 (2003)Google Scholar
- 12.Page, R., Holmes, E.: Molecular evolution, a phylogenetics approach. Blackwell Science, Malden (1998)Google Scholar
- 13.Nei, M., Kumar, S.: Molecular Evolution and Phylogenetics, 2nd edn. Oxford University Press, Oxford (2000)Google Scholar
- 15.Tamayo, P., Solni, D., Mesirov, J., Zhu, Q., Kitareewan, K., Dmitrovsky, E., Lander, E., Golub, T.: Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation. Proc. Nat’l. Academy of Sciences of the United States of Am. 96(6), 2907–2912 (1999)CrossRefGoogle Scholar
- 16.Nikkila, J., Toronen, P., Kaski, S., Venna, J., Castren, E., Wong, G.: Analysis and visualization of gene expression data using self-organizing maps. Neural Networks, Special issue on New Developments on Self-Organizing Maps 15, 9530–9660 (2002)Google Scholar
- 19.Dunham, M.: Data mining introductory and advanced topics. Prentice Hall, Englewood Cliffs (2002)Google Scholar
- 20.Felsentein, J.: Inferring Phylogenies, Sinauer Associates, Inc. (2004)Google Scholar
- 22.Shimodaira, H., Hasegawa, M.: Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16, 1114–1116 (1999)Google Scholar