Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data

Teng, Li; Chan, Laiwan

doi:10.1007/s11265-007-0121-2

Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data

Published: 16 August 2007

Volume 50, pages 267–280, (2008)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Li Teng¹ &
Laiwan Chan¹

320 Accesses
45 Citations
Explore all metrics

Abstract

We propose a framework for biclustering gene expression profiles. This framework applies dominant set approach to create sets of sorting vectors for the sorting of the rows in the data matrix. In this way, the coexpressed rows of gene expression vectors could be gathered. We iteratively sort and transpose the gene expression data matrix to gather the blocks of coexpressed subset. Weighted correlation coefficient is used to measure the similarity in the gene level and the condition level. Their weights are updated each time using the sorting vector of the previous iteration. In this way, the highly correlated bicluster is located at one corner of the rearranged gene expression data matrix. We applied our approach to synthetic data and three real gene expression data sets with encouraging results. Secondly, we propose ACV (average correlation value) to evaluate the homogeneity of a bicluster or a data matrix. This criterion conforms to the intuitive biological notion of coexpressed set of genes or samples and is compared with the mean squared residue score. ACV is found to be more appropriate for both additive models and multiplicative models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data

Article Open access 22 March 2016

Identifying Different Types of Biclustering Patterns Using a Correlation-Based Dilated Biclusters Algorithm

A Repeated Local Search Algorithm for BiClustering of Gene Expression Data

References

J. Hartigan, “Clustering Algorithms,” Wiley, 1975.
Y. Cheng and G. Church, “Biclustering of Expression Data,” in Proc. Eighth Int’l Conf. Intelligent Systems for Molecular Biology (ISMB’00), 2000, pp. 93–103.
S.C. Madeira and A.L. Oliveira, “Biclustering Algorithms for Biological Data Analysis: A Survey,” IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 1, no. 1, 2004, pp. 24–45.
Article Google Scholar
G. Getz, E. Levine and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Natl. Acad. Sci. U.S.A., vol. 97, 2000, pp. 12079–12084.
Article Google Scholar
C. Tang, L. Zhang, I. Ahang and M. Ramanathan, “Interrelated Two-Way Clustering: An Unsupervised Approach for Gene Expression Data Analysis,” in Proc. Second IEEE Int’l Symp. Bioinformatics and Bioeng., 2001, pp. 41–48.
J.A. Hartigan, “Direct Clustering of a Data Matrix,” J. Am. Stat. Assoc. (JASA), vol. 67, no. 337, 1972, pp. 123–129.
Article Google Scholar
H. Cho, I.S. Dhillon, Y. Guan and S. Sra, “Minimum Sum-Squared Residue Cococlustering of Gene Expression Data,” in Proc. Fourth SIAM Int’l Conf. Data Mining, 2004.
J. Yang, W. Wang, H. Wang and P. Yu, “δ-Clustering: Capturing Subspace Correlation in a Large Data Set,” in Proc. 18th IEEE Int’l Conf. Data Eng., 2002, pp. 517–528.
J. Yang, W. Wang, H. Wang and P. Yu, “Enhanced Biclustering on Expression Data,” in Proc. Third IEEE Conf. Bioinformatics and Bioeng., 2003, pp. 321–327.
H. Wang, W. Wang, J. Yang and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” in Proc. 2002 ACM SIGMOD Int’l Conf. Management of Data, 2002, pp. 394–405.
L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” Technical Report, Stanford University, 2000.
M. Pavan and M. Pelillo, “A new Graph-Theoretic Approach to Clustering and Segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2003, pp. 3068–3077.
J.M. Bland and D.G. Altman, “Calculating Correlation Coefficients with Repeated Observations: Part 2–Correlation Between Subjects,” BMJ, vol. 310, 1995, p. 633.
Google Scholar
M.B. Eisen, P.T. Spellman, P.O. Brown and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Natl. Acad. Sci. U.S.A., vol. 95, 1998, pp. 14863–14868.
Article Google Scholar
T.S. Motzkin and E.G. Straus, “Maxima for Graphs and A New Proof of A Theorem of Turan,” Can. J. Math., vol. 17, 1965, pp. 533–540.
MATH MathSciNet Google Scholar
X. Fu, L. Teng, Y. Li, W. Chen, Y. Mao, I.-F. Shen and Y. Xie, “Finding Dominant Sets in Microarray Data,” Front. Biosci., vol. 10, 2005, pp. 3068–3077.
Article Google Scholar
A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown and L.M. Staudt, “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, 2000, pp. 503–510.
Article Google Scholar
V.R. Iyer, M.B. Eisen, D.T. Ross, G. Schuler, T. Moore, J.C.F. Lee, J.M. Trent, L.M. Staudt, J. Hudson Jr., M.S. Boguski, D. Lashkari, D. Shalon, D. Botstein and P.O. Brown, “The Transcriptional Program in the Response of Human Fibroblasts to Serum,” Science, vol. 283, 1999, pp. 83–87.
Article Google Scholar
S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church, “Systematic Determination of Genetic Network Architecture,” Nat. Genet., vol. 22, 1999, pp. 281–285.
Article Google Scholar
X.L. Ji, L.L. Jesse and Z.R. Sun, “Mining Gene Expression Data Using a Novel Approach Based on Hidden Markov Models,” FEBS Lett., vol. 542, 2003, pp. 125–131.
Article Google Scholar
J. Liu and W. Wang, “OP-Cluster: Clustering by Tendency in High Dimensional Space,” in Proc. Third IEEE Int’l Conf. Data Mining, 2003, pp. 187–194.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hongkong, Hong Kong, People’s Republic of China
Li Teng & Laiwan Chan

Authors

Li Teng
View author publications
You can also search for this author in PubMed Google Scholar
Laiwan Chan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Teng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teng, L., Chan, L. Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data. J Sign Process Syst Sign Image 50, 267–280 (2008). https://doi.org/10.1007/s11265-007-0121-2

Download citation

Received: 31 May 2007
Accepted: 13 June 2007
Published: 16 August 2007
Issue Date: March 2008
DOI: https://doi.org/10.1007/s11265-007-0121-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data

Abstract

Access this article

Similar content being viewed by others

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data

Identifying Different Types of Biclustering Patterns Using a Correlation-Based Dilated Biclusters Algorithm

A Repeated Local Search Algorithm for BiClustering of Gene Expression Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data

Abstract

Access this article

Similar content being viewed by others

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data

Identifying Different Types of Biclustering Patterns Using a Correlation-Based Dilated Biclusters Algorithm

A Repeated Local Search Algorithm for BiClustering of Gene Expression Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation