A Novel Approach for Effective Learning of Cluster Structures with Biological Data Applications

  • Miyoung Shin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4316)


Recently DNA microarray gene expression studies have been actively performed for mining unknown biological knowledge hidden under a large volume of gene expression data in a systematic way. In particular, the problem of finding groups of co-expressed genes or samples has been largely investigated due to its usefulness in characterizing unknown gene functions or performing more sophisticated tasks, such as modeling biological pathways. Nevertheless, there are still some difficulties in practice to identify good clusters since many clustering methods require user’s arbitrary selection of the number of target clusters. In this paper we propose a novel approach to systematically identifying good candidates of cluster numbers so that we can minimize the arbitrariness in cluster generation. Our experimental results on both synthetic dataset and real gene expression dataset show the applicability and usefulness of this approach in microarray data mining.


Synthetic Data Cluster Structure Synthetic Dataset Effective Learn Adjusted Rand Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hand, D.J., Heard, N.A.: Finding groups in gene expression data. Journal of Biomedicine and Biotechnology 2, 215–225 (2005)CrossRefGoogle Scholar
  2. 2.
    Slonim, D.K.: From patterns to pathways: gene expression data analysis comes of age. Nature genetics supplement 32, 502–508 (2002)CrossRefGoogle Scholar
  3. 3.
    Walker, M.G.: Pharmaceutical target identification by gene expression analysis. Mini reviews in medicinal chemistry 1, 197–205 (2001)CrossRefGoogle Scholar
  4. 4.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Bostein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)CrossRefGoogle Scholar
  5. 5.
    Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96, 2907–2912 (1999)CrossRefGoogle Scholar
  6. 6.
    Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  7. 7.
    Liu, H., Li, J., Wong, L.: Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics 21(16), 3377–3384 (2005)CrossRefGoogle Scholar
  8. 8.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)CrossRefGoogle Scholar
  9. 9.
    Toh, H., Horimoto, K.: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 18(2), 287–297 (2002)CrossRefGoogle Scholar
  10. 10.
    Xu, R., Wunsch II, D.: Survey of clustering algorithms. IEEE Trans. on Neural Networks 16(3), 645–678 (2005)CrossRefGoogle Scholar
  11. 11.
    Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19, 1110–1115 (2003)CrossRefGoogle Scholar
  12. 12.
    Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090–1099 (2003)CrossRefGoogle Scholar
  13. 13.
    Dhilon, I., et al.: Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics 19, 1612–1619Google Scholar
  14. 14.
    Sharan, R., et al.: Click and expander: a system for clustering and visualizing gene expression data. Bioinformatics 19, 1787–1799 (2003)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Bolshakova, N., Azuaje, F.: Estimating the number of clusters in DNA microarray data. Methods Inf. Med. 45(2), 153–157 (2006)Google Scholar
  16. 16.
    Amato, R., et al.: A multi-step approach to time series analysis and gene expression clustering. Bioinformatics 22(5), 589–596 (2006)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Tseng, V.S., Kao, C.-P.: Efficiently mining gene expression data via a novel parameterless clustering method. IEEE/ACM trans. on Comp. Biology and Bioinformatics 2(4), 355–365 (2005)CrossRefGoogle Scholar
  18. 18.
    Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. The Johns Hopkins University Press (1996)Google Scholar
  19. 19.
    Quackenbush, J.: Computational analysis of microarray data. Nature Reviews Genetics 2, 418–422 (2001)CrossRefGoogle Scholar
  20. 20.
    Cho, R.J., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)CrossRefGoogle Scholar
  21. 21.
    Shin, M., Park, S.H.: Microarray expression data analysis using seed-based clustering method. Key engineering materials 277, 343–348 (2005)CrossRefGoogle Scholar
  22. 22.
    Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Miyoung Shin
    • 1
  1. 1.School of Electrical Engineering and Computer ScienceKyungpook National UniversityDaeguKorea

Personalised recommendations