Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
We propose a framework for biclustering gene expression profiles. This framework applies dominant set approach to create sets of sorting vectors for the sorting of the rows in the data matrix. In this way, the coexpressed rows of gene expression vectors could be gathered. We iteratively sort and transpose the gene expression data matrix to gather the blocks of coexpressed subset. Weighted correlation coefficient is used to measure the similarity in the gene level and the condition level. Their weights are updated each time using the sorting vector of the previous iteration. In this way, the highly correlated bicluster is located at one corner of the rearranged gene expression data matrix. We applied our approach to synthetic data and three real gene expression data sets with encouraging results. Secondly, we propose ACV (average correlation value) to evaluate the homogeneity of a bicluster or a data matrix. This criterion conforms to the intuitive biological notion of coexpressed set of genes or samples and is compared with the mean squared residue score. ACV is found to be more appropriate for both additive models and multiplicative models.
- J. Hartigan, “Clustering Algorithms,” Wiley, 1975.
- Y. Cheng and G. Church, “Biclustering of Expression Data,” in Proc. Eighth Int’l Conf. Intelligent Systems for Molecular Biology (ISMB’00), 2000, pp. 93–103.
- S.C. Madeira and A.L. Oliveira, “Biclustering Algorithms for Biological Data Analysis: A Survey,” IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 1, no. 1, 2004, pp. 24–45. CrossRef
- G. Getz, E. Levine and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Natl. Acad. Sci. U.S.A., vol. 97, 2000, pp. 12079–12084. CrossRef
- C. Tang, L. Zhang, I. Ahang and M. Ramanathan, “Interrelated Two-Way Clustering: An Unsupervised Approach for Gene Expression Data Analysis,” in Proc. Second IEEE Int’l Symp. Bioinformatics and Bioeng., 2001, pp. 41–48.
- J.A. Hartigan, “Direct Clustering of a Data Matrix,” J. Am. Stat. Assoc. (JASA), vol. 67, no. 337, 1972, pp. 123–129. CrossRef
- H. Cho, I.S. Dhillon, Y. Guan and S. Sra, “Minimum Sum-Squared Residue Cococlustering of Gene Expression Data,” in Proc. Fourth SIAM Int’l Conf. Data Mining, 2004.
- J. Yang, W. Wang, H. Wang and P. Yu, “δ-Clustering: Capturing Subspace Correlation in a Large Data Set,” in Proc. 18th IEEE Int’l Conf. Data Eng., 2002, pp. 517–528.
- J. Yang, W. Wang, H. Wang and P. Yu, “Enhanced Biclustering on Expression Data,” in Proc. Third IEEE Conf. Bioinformatics and Bioeng., 2003, pp. 321–327.
- H. Wang, W. Wang, J. Yang and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” in Proc. 2002 ACM SIGMOD Int’l Conf. Management of Data, 2002, pp. 394–405.
- L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” Technical Report, Stanford University, 2000.
- M. Pavan and M. Pelillo, “A new Graph-Theoretic Approach to Clustering and Segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2003, pp. 3068–3077.
- J.M. Bland and D.G. Altman, “Calculating Correlation Coefficients with Repeated Observations: Part 2–Correlation Between Subjects,” BMJ, vol. 310, 1995, p. 633.
- M.B. Eisen, P.T. Spellman, P.O. Brown and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Natl. Acad. Sci. U.S.A., vol. 95, 1998, pp. 14863–14868. CrossRef
- T.S. Motzkin and E.G. Straus, “Maxima for Graphs and A New Proof of A Theorem of Turan,” Can. J. Math., vol. 17, 1965, pp. 533–540.
- X. Fu, L. Teng, Y. Li, W. Chen, Y. Mao, I.-F. Shen and Y. Xie, “Finding Dominant Sets in Microarray Data,” Front. Biosci., vol. 10, 2005, pp. 3068–3077. CrossRef
- A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown and L.M. Staudt, “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, 2000, pp. 503–510. CrossRef
- V.R. Iyer, M.B. Eisen, D.T. Ross, G. Schuler, T. Moore, J.C.F. Lee, J.M. Trent, L.M. Staudt, J. Hudson Jr., M.S. Boguski, D. Lashkari, D. Shalon, D. Botstein and P.O. Brown, “The Transcriptional Program in the Response of Human Fibroblasts to Serum,” Science, vol. 283, 1999, pp. 83–87. CrossRef
- S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church, “Systematic Determination of Genetic Network Architecture,” Nat. Genet., vol. 22, 1999, pp. 281–285. CrossRef
- X.L. Ji, L.L. Jesse and Z.R. Sun, “Mining Gene Expression Data Using a Novel Approach Based on Hidden Markov Models,” FEBS Lett., vol. 542, 2003, pp. 125–131. CrossRef
- J. Liu and W. Wang, “OP-Cluster: Clustering by Tendency in High Dimensional Space,” in Proc. Third IEEE Int’l Conf. Data Mining, 2003, pp. 187–194.
- Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data
Journal of Signal Processing Systems
Volume 50, Issue 3 , pp 267-280
- Cover Date
- Print ISSN
- Online ISSN
- Springer US
- Additional Links
- gene expression data
- microarray data
- weighted correlation coefficient
- Industry Sectors