Skip to main content
Log in

Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

We propose a framework for biclustering gene expression profiles. This framework applies dominant set approach to create sets of sorting vectors for the sorting of the rows in the data matrix. In this way, the coexpressed rows of gene expression vectors could be gathered. We iteratively sort and transpose the gene expression data matrix to gather the blocks of coexpressed subset. Weighted correlation coefficient is used to measure the similarity in the gene level and the condition level. Their weights are updated each time using the sorting vector of the previous iteration. In this way, the highly correlated bicluster is located at one corner of the rearranged gene expression data matrix. We applied our approach to synthetic data and three real gene expression data sets with encouraging results. Secondly, we propose ACV (average correlation value) to evaluate the homogeneity of a bicluster or a data matrix. This criterion conforms to the intuitive biological notion of coexpressed set of genes or samples and is compared with the mean squared residue score. ACV is found to be more appropriate for both additive models and multiplicative models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. Hartigan, “Clustering Algorithms,” Wiley, 1975.

  2. Y. Cheng and G. Church, “Biclustering of Expression Data,” in Proc. Eighth Int’l Conf. Intelligent Systems for Molecular Biology (ISMB’00), 2000, pp. 93–103.

  3. S.C. Madeira and A.L. Oliveira, “Biclustering Algorithms for Biological Data Analysis: A Survey,” IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 1, no. 1, 2004, pp. 24–45.

    Article  Google Scholar 

  4. G. Getz, E. Levine and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Natl. Acad. Sci. U.S.A., vol. 97, 2000, pp. 12079–12084.

    Article  Google Scholar 

  5. C. Tang, L. Zhang, I. Ahang and M. Ramanathan, “Interrelated Two-Way Clustering: An Unsupervised Approach for Gene Expression Data Analysis,” in Proc. Second IEEE Int’l Symp. Bioinformatics and Bioeng., 2001, pp. 41–48.

  6. J.A. Hartigan, “Direct Clustering of a Data Matrix,” J. Am. Stat. Assoc. (JASA), vol. 67, no. 337, 1972, pp. 123–129.

    Article  Google Scholar 

  7. H. Cho, I.S. Dhillon, Y. Guan and S. Sra, “Minimum Sum-Squared Residue Cococlustering of Gene Expression Data,” in Proc. Fourth SIAM Int’l Conf. Data Mining, 2004.

  8. J. Yang, W. Wang, H. Wang and P. Yu, “δ-Clustering: Capturing Subspace Correlation in a Large Data Set,” in Proc. 18th IEEE Int’l Conf. Data Eng., 2002, pp. 517–528.

  9. J. Yang, W. Wang, H. Wang and P. Yu, “Enhanced Biclustering on Expression Data,” in Proc. Third IEEE Conf. Bioinformatics and Bioeng., 2003, pp. 321–327.

  10. H. Wang, W. Wang, J. Yang and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” in Proc. 2002 ACM SIGMOD Int’l Conf. Management of Data, 2002, pp. 394–405.

  11. L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” Technical Report, Stanford University, 2000.

  12. M. Pavan and M. Pelillo, “A new Graph-Theoretic Approach to Clustering and Segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2003, pp. 3068–3077.

  13. J.M. Bland and D.G. Altman, “Calculating Correlation Coefficients with Repeated Observations: Part 2–Correlation Between Subjects,” BMJ, vol. 310, 1995, p. 633.

    Google Scholar 

  14. M.B. Eisen, P.T. Spellman, P.O. Brown and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Natl. Acad. Sci. U.S.A., vol. 95, 1998, pp. 14863–14868.

    Article  Google Scholar 

  15. T.S. Motzkin and E.G. Straus, “Maxima for Graphs and A New Proof of A Theorem of Turan,” Can. J. Math., vol. 17, 1965, pp. 533–540.

    MATH  MathSciNet  Google Scholar 

  16. X. Fu, L. Teng, Y. Li, W. Chen, Y. Mao, I.-F. Shen and Y. Xie, “Finding Dominant Sets in Microarray Data,” Front. Biosci., vol. 10, 2005, pp. 3068–3077.

    Article  Google Scholar 

  17. A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown and L.M. Staudt, “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, 2000, pp. 503–510.

    Article  Google Scholar 

  18. V.R. Iyer, M.B. Eisen, D.T. Ross, G. Schuler, T. Moore, J.C.F. Lee, J.M. Trent, L.M. Staudt, J. Hudson Jr., M.S. Boguski, D. Lashkari, D. Shalon, D. Botstein and P.O. Brown, “The Transcriptional Program in the Response of Human Fibroblasts to Serum,” Science, vol. 283, 1999, pp. 83–87.

    Article  Google Scholar 

  19. S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church, “Systematic Determination of Genetic Network Architecture,” Nat. Genet., vol. 22, 1999, pp. 281–285.

    Article  Google Scholar 

  20. X.L. Ji, L.L. Jesse and Z.R. Sun, “Mining Gene Expression Data Using a Novel Approach Based on Hidden Markov Models,” FEBS Lett., vol. 542, 2003, pp. 125–131.

    Article  Google Scholar 

  21. J. Liu and W. Wang, “OP-Cluster: Clustering by Tendency in High Dimensional Space,” in Proc. Third IEEE Int’l Conf. Data Mining, 2003, pp. 187–194.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Teng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teng, L., Chan, L. Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data. J Sign Process Syst Sign Image 50, 267–280 (2008). https://doi.org/10.1007/s11265-007-0121-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-007-0121-2

Keywords

Navigation