Subspace Clustering of Microarray Data Based on Domain Transformation

  • Jongeun Jun
  • Seokkyung Chung
  • Dennis McLeod
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4316)


We propose a mining framework that supports the identification of useful knowledge based on data clustering. With the recent advancement of microarray technologies, we focus our attention on gene expression datasets mining. In particular, given that genes are often co-expressed under subsets of experimental conditions, we present a novel subspace clustering algorithm. In contrast to previous approaches, our method is based on the observation that the number of subspace clusters is related with the number of maximal subspace clusters to which any gene pair can belong. By performing discretization to gene expression profiles, the similarity between two genes is transformed as a sequence of symbols that represents the maximal subspace cluster for the gene pair. This domain transformation (from genes into gene-gene relations) allows us to make the number of possible subspace clusters dependent on the number of genes. Based on the symbolic representations of genes, we present an efficient subspace clustering algorithm that is scalable to the number of dimensions. In addition, the running time can be drastically reduced by utilizing inverted index and pruning non-interesting subspaces. Experimental results indicate that the proposed method efficiently identifies co-expressed gene subspace clusters for a yeast cell cycle dataset.


Gene Expression Data Gene Pair Hash Table Subspace Cluster Inverted Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD International Conference on Management of Data (1998)Google Scholar
  2. 2.
    The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Research 11(8), 1425–1433 (2001)Google Scholar
  3. 3.
    Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 5–73 (1998)CrossRefGoogle Scholar
  4. 4.
    Chung, S., Jun, J., McLeod, D.: Mining gene expression datasets using density-based clustering. In: Proceedings of ACM CIKM International Conference on Information and Knowledge Management (2004)Google Scholar
  5. 5.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)MATHGoogle Scholar
  6. 6.
    Gasch, A., Eisen, M.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3(11), 1–22 (2002)CrossRefGoogle Scholar
  7. 7.
    Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(15), 531–537 (1999)CrossRefGoogle Scholar
  8. 8.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  9. 9.
    Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter 6(1), 90–105 (2004)CrossRefGoogle Scholar
  10. 10.
    Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  11. 11.
    Tamayo, P., et al.: Interpreting patterns of gene expression with self organizing maps. Proceedings of National Academy of Science 96(6), 2907–2912 (1999)CrossRefGoogle Scholar
  12. 12.
    Tang, C., Zhang, A., Pei, J.: Mining phenotypes and informative genes from gene expression data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2003)Google Scholar
  13. 13.
    Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2002)Google Scholar
  14. 14.
    Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18(4), 536–545 (2002)CrossRefGoogle Scholar
  15. 15.
    Zaki, M.J., Peters, M.: CLICKS: Mining subspace clusters in categorical data via K-partite maximal cliques. In: Proceedings of International Conference on Data Engineering (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jongeun Jun
    • 1
  • Seokkyung Chung
    • 2
  • Dennis McLeod
    • 1
  1. 1.Department of Computer ScienceUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Yahoo! Inc.Santa ClaraUSA

Personalised recommendations