A Novel Approach for Effective Learning of Cluster Structures with Biological Data Applications
Recently DNA microarray gene expression studies have been actively performed for mining unknown biological knowledge hidden under a large volume of gene expression data in a systematic way. In particular, the problem of finding groups of co-expressed genes or samples has been largely investigated due to its usefulness in characterizing unknown gene functions or performing more sophisticated tasks, such as modeling biological pathways. Nevertheless, there are still some difficulties in practice to identify good clusters since many clustering methods require user’s arbitrary selection of the number of target clusters. In this paper we propose a novel approach to systematically identifying good candidates of cluster numbers so that we can minimize the arbitrariness in cluster generation. Our experimental results on both synthetic dataset and real gene expression dataset show the applicability and usefulness of this approach in microarray data mining.
KeywordsSynthetic Data Cluster Structure Synthetic Dataset Effective Learn Adjusted Rand Index
Unable to display preview. Download preview PDF.
- 13.Dhilon, I., et al.: Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics 19, 1612–1619Google Scholar
- 15.Bolshakova, N., Azuaje, F.: Estimating the number of clusters in DNA microarray data. Methods Inf. Med. 45(2), 153–157 (2006)Google Scholar
- 18.Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. The Johns Hopkins University Press (1996)Google Scholar