Incorporating Biological Domain Knowledge into Cluster Validity Assessment
This paper presents an approach for assessing cluster validity based on similarity knowledge extracted from the Gene Ontology (GO) and databases annotated to the GO. A knowledge-driven cluster validity assessment system for microarray data was implemented. Different methods were applied to measure similarity between yeast genes products based on the GO. This research proposes two methods for calculating cluster validity indices using GO-driven similarity. The first approach processes overall similarity values, which are calculated by taking into account the combined annotations originating from the three GO hierarchies. The second approach is based on the calculation of GO hierarchy-independent similarity values, which originate from each of these hierarchies. A traditional node-counting method and an information content technique have been implemented to measure knowledge-based similarity between genes products (biological distances). The results contribute to the evaluation of clustering outcomes and the identification of optimal cluster partitions, which may represent an effective tool to support biomedical knowledge discovery in gene expression data analysis.
KeywordsGene Ontology Validity Index Cluster Validity Index Gene Expression Data Analysis Saccharomyces Genome Database
Unable to display preview. Download preview PDF.
- 6.Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18, S145–S154 (2002)Google Scholar
- 11.Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico State University, Las Cruces, New, Mexico, pp. 133–138 (1994)Google Scholar
- 12.Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pp. 448–453 (1995)Google Scholar
- 13.Azuaje, F., Bodenreider, O.: Incorporating ontology-driven similarity knowledge into functional genomics: an exploratory study. In: Proceedings of the fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2004), pp. 317–324 (2004)Google Scholar
- 14.Wang, H., Azuaje, F., Bodenreider, O., Dopazo, J.: Gene expression correlation and gene ontology-based similarity: An assessment of quantitative relationships. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, La Jolla-California, pp. 25–31. IEEE Press, Los Alamitos (2004)Google Scholar
- 16.Speer, N., Spieth, C., Zell, A.: A memetic clustering algorithm for the functional partition of genes based on the Gene Ontology. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004), pp. 252–259. IEEE Press, Los Alamitos (2004)Google Scholar
- 19.Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychologie, 190–241 (1976)Google Scholar
- 20.Goodman, L., Kruskal, W.: Measures of associations for cross-validations. Journal of Ameracan Statistical Association, 732–764 (1954)Google Scholar