Journal of Signal Processing Systems

, Volume 50, Issue 3, pp 267–280 | Cite as

Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data



We propose a framework for biclustering gene expression profiles. This framework applies dominant set approach to create sets of sorting vectors for the sorting of the rows in the data matrix. In this way, the coexpressed rows of gene expression vectors could be gathered. We iteratively sort and transpose the gene expression data matrix to gather the blocks of coexpressed subset. Weighted correlation coefficient is used to measure the similarity in the gene level and the condition level. Their weights are updated each time using the sorting vector of the previous iteration. In this way, the highly correlated bicluster is located at one corner of the rearranged gene expression data matrix. We applied our approach to synthetic data and three real gene expression data sets with encouraging results. Secondly, we propose ACV (average correlation value) to evaluate the homogeneity of a bicluster or a data matrix. This criterion conforms to the intuitive biological notion of coexpressed set of genes or samples and is compared with the mean squared residue score. ACV is found to be more appropriate for both additive models and multiplicative models.


biclustering gene expression data microarray data weighted correlation coefficient 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J. Hartigan, “Clustering Algorithms,” Wiley, 1975.Google Scholar
  2. 2.
    Y. Cheng and G. Church, “Biclustering of Expression Data,” in Proc. Eighth Int’l Conf. Intelligent Systems for Molecular Biology (ISMB’00), 2000, pp. 93–103.Google Scholar
  3. 3.
    S.C. Madeira and A.L. Oliveira, “Biclustering Algorithms for Biological Data Analysis: A Survey,” IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 1, no. 1, 2004, pp. 24–45.CrossRefGoogle Scholar
  4. 4.
    G. Getz, E. Levine and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Natl. Acad. Sci. U.S.A., vol. 97, 2000, pp. 12079–12084.CrossRefGoogle Scholar
  5. 5.
    C. Tang, L. Zhang, I. Ahang and M. Ramanathan, “Interrelated Two-Way Clustering: An Unsupervised Approach for Gene Expression Data Analysis,” in Proc. Second IEEE Int’l Symp. Bioinformatics and Bioeng., 2001, pp. 41–48.Google Scholar
  6. 6.
    J.A. Hartigan, “Direct Clustering of a Data Matrix,” J. Am. Stat. Assoc. (JASA), vol. 67, no. 337, 1972, pp. 123–129.CrossRefGoogle Scholar
  7. 7.
    H. Cho, I.S. Dhillon, Y. Guan and S. Sra, “Minimum Sum-Squared Residue Cococlustering of Gene Expression Data,” in Proc. Fourth SIAM Int’l Conf. Data Mining, 2004.Google Scholar
  8. 8.
    J. Yang, W. Wang, H. Wang and P. Yu, “δ-Clustering: Capturing Subspace Correlation in a Large Data Set,” in Proc. 18th IEEE Int’l Conf. Data Eng., 2002, pp. 517–528.Google Scholar
  9. 9.
    J. Yang, W. Wang, H. Wang and P. Yu, “Enhanced Biclustering on Expression Data,” in Proc. Third IEEE Conf. Bioinformatics and Bioeng., 2003, pp. 321–327.Google Scholar
  10. 10.
    H. Wang, W. Wang, J. Yang and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” in Proc. 2002 ACM SIGMOD Int’l Conf. Management of Data, 2002, pp. 394–405.Google Scholar
  11. 11.
    L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” Technical Report, Stanford University, 2000.Google Scholar
  12. 12.
    M. Pavan and M. Pelillo, “A new Graph-Theoretic Approach to Clustering and Segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2003, pp. 3068–3077.Google Scholar
  13. 13.
    J.M. Bland and D.G. Altman, “Calculating Correlation Coefficients with Repeated Observations: Part 2–Correlation Between Subjects,” BMJ, vol. 310, 1995, p. 633.Google Scholar
  14. 14.
    M.B. Eisen, P.T. Spellman, P.O. Brown and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Natl. Acad. Sci. U.S.A., vol. 95, 1998, pp. 14863–14868.CrossRefGoogle Scholar
  15. 15.
    T.S. Motzkin and E.G. Straus, “Maxima for Graphs and A New Proof of A Theorem of Turan,” Can. J. Math., vol. 17, 1965, pp. 533–540.MATHMathSciNetGoogle Scholar
  16. 16.
    X. Fu, L. Teng, Y. Li, W. Chen, Y. Mao, I.-F. Shen and Y. Xie, “Finding Dominant Sets in Microarray Data,” Front. Biosci., vol. 10, 2005, pp. 3068–3077.CrossRefGoogle Scholar
  17. 17.
    A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown and L.M. Staudt, “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, 2000, pp. 503–510.CrossRefGoogle Scholar
  18. 18.
    V.R. Iyer, M.B. Eisen, D.T. Ross, G. Schuler, T. Moore, J.C.F. Lee, J.M. Trent, L.M. Staudt, J. Hudson Jr., M.S. Boguski, D. Lashkari, D. Shalon, D. Botstein and P.O. Brown, “The Transcriptional Program in the Response of Human Fibroblasts to Serum,” Science, vol. 283, 1999, pp. 83–87.CrossRefGoogle Scholar
  19. 19.
    S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church, “Systematic Determination of Genetic Network Architecture,” Nat. Genet., vol. 22, 1999, pp. 281–285.CrossRefGoogle Scholar
  20. 20.
    X.L. Ji, L.L. Jesse and Z.R. Sun, “Mining Gene Expression Data Using a Novel Approach Based on Hidden Markov Models,” FEBS Lett., vol. 542, 2003, pp. 125–131.CrossRefGoogle Scholar
  21. 21.
    J. Liu and W. Wang, “OP-Cluster: Clustering by Tendency in High Dimensional Space,” in Proc. Third IEEE Int’l Conf. Data Mining, 2003, pp. 187–194.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringThe Chinese University of HongkongHong KongPeople’s Republic of China

Personalised recommendations