Abstract
Many clustering algorithms have been used to identify co-expressed genes in gene expression data. Since proteins typically interact with different groups of proteins in order to serve different biological roles, when responding to different external stimulants, the genes that produce these proteins are expected to co-express with more than one group of genes and therefore belong to more than one cluster. This poses a challenge to existing clustering algorithms as there is a need for overlapping clusters to be discovered in a noisy environment. For this reason, we propose an effective clustering approach, which consists of an initial clustering phase and a second re-clustering phase, in this paper. The proposed approach has several desirable features as follows. It makes use of both local and global information inherent in gene expression data to discover overlapping clusters by computing both a local pairwise similarity measure between gene expression profiles and a global probabilistic measure of interestingness of hidden patterns. When performing re-clustering, the proposed approach is able to distinguish between relevant and irrelevant expression data. In addition, it is able to make explicit the patterns discovered in each cluster for easy interpretation. For performance evaluation, the proposed approach has been tested with both simulated and real expression data sets. Experimental results show that it is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered and also the expression levels at which each cluster of genes co-expresses under different conditions can be better understood.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, Calif (2006)
Lockhart, D.J., Winzeler, E.A.: Genomic, Gene Expression and DNA Arrays. Nature 405(6788), 827–836 (2000)
Brazma, A., Robinson, A., Cameron, G., Ashburner, M.: One-stop Shop for Microarray Data. Nature 403(6771), 699–700 (2000)
Ward, J.H.: Hierarchical Grouping to Optimize an Objective Function. J. Am. Stat. Assoc. 58, 236–244 (1963)
MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observation. In: Proc. Symp.Math. Stat. and Prob. Berkeley., vol. 1, pp. 281–297 (1967)
Kohonen, T.: Self-organization and Associative Memory. Springer, New York (1989)
Eisen, M.B., et al.: Cluster Analysis and Display of Genome-wide Expression Patterns. Proc. Natl Acad. Sci. USA. 95(25), 14863–14868 (1998)
Tavazoie, S., et al.: Systematic Determination of Genetic Network Architecture. Nat. Genet. 22(3), 281–285 (1999)
Tamayo, P., et al.: Interpreting Patterns of Gene Expression with Self-organizing Maps: Methods and Application to Hematopoietic Differentiation. Proc. Natl. Acad. Sci. USA. 96(6), 2907–2912 (1999)
Berrar, D.P., Dubitzky, W., Granzow, M.: A Practical Approach to Microarray Data Analysis. Kluwer Academic Publishers, Boston Mass (2003)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Gasch, A.P., Eisen, M.B.: Exploring the Conditional Coregulation of Yeast Gene Expression through Fuzzy k-means Clustering. Genome Biol 3(11), 1–22 (2002)
Yang, J., Wang, W., Wang, H., Yu, P.: Enhanced Biclustering on Expression Data. In: Proc. Third IEEE Conf. Bioinformatics and Bioeng, pp. 321–327 (2003)
Preli, A., et al.: A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data. Bioinformatics 22(9), 1122–1129 (2006)
Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-dependent Discretization for Inductive Learning from Continuous and Mixed-mode Data. IEEE Trans. Pattern Anal. Machine Intell. 17(7), 641–651 (1995)
Ewens, W.J., Grant, G.R.: Statistical Methods in Bioinformatics. Springer, Heidelberg (2005)
Haberman, S.J.: The Analysis of Residuals in Cross-classified Tables. Biometrics 29, 205–220 (1973)
Chan, K.C.C., Wong, A.K.C.: A Statistical Technique for Extracting Classificatory Knowledge from Databases. Knowledge Discovery in Databases, pp. 107–123. AAAI/MIT Press, MA (1991)
Au, W.H., Chan, K.C.C., Yao, X.: A Novel Evolutionary Data Mining Algorithm with Applications to Churn Modeling. IEEE Trans. Evolutionary Computation. 7(6), 532–545 (2003)
Chan, K.C.C., Wong, A.K.C., Chiu, D.K.Y.: Learning Sequential Patterns for Probabilistic Inductive Prediction. IEEE Trans. Systems, Man and Cybernetics 24(10), 1532–1547 (1994)
Mateos, A., Dopazo, J., Jansen, R., Tu, Y., Gerstein, M., Stolovitzky, G.: Systematic Learning of Gene Functional Classes from DNA Array Expression Data by Using Multiplayer Perceptrons. Genome Res 12(11), 1703–1715 (2002)
Iyer, V.R., et al.: The Transcriptional Program in the Response of Human Fibroblast to Serum. Science 283, 83–87 (1999)
Yeung, K.Y., Ruzzo, W.L.: Principal Component Analysis for Clustering Gene Expression Data. Bioinformatics 17(9), 763–774 (2001)
Sharan, R., et al.: CLICK and EXPANDER: A System for Clustering and Visualizing Gene Expression Data. Bioinformatics 19(14), 1787–1799 (2003)
Rousseeuw, J.P.: Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comp. Appl. Math. 20, 53–65 (1987)
Ball, C.A., et al.: Saccharomyces Genome Database provides Tools to Survey Gene Expression and Functional Analysis Data. Nucleic Acids Res 29(1), 80–81 (2001)
Chu, S., et al.: The Transcriptional Program of Sporulation in Budding Yeast. Science 282, 699–705 (1998)
Mewes, H.W., et al.: MIPS: A Database for Genomes and Protein Sequences. Nucleic Acids Res, 31–34 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ma, P.C.H., Chan, K.C.C. (2008). Mining Gene Expression Patterns for the Discovery of Overlapping Clusters. In: Marchiori, E., Moore, J.H. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2008. Lecture Notes in Computer Science, vol 4973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78757-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-78757-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78756-3
Online ISBN: 978-3-540-78757-0
eBook Packages: Computer ScienceComputer Science (R0)