Mining Gene Expression Patterns for the Discovery of Overlapping Clusters

Ma, Patrick C. H.; Chan, Keith C. C.

doi:10.1007/978-3-540-78757-0_11

Patrick C. H. Ma¹ &
Keith C. C. Chan¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4973))

Included in the following conference series:

European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics

900 Accesses

Abstract

Many clustering algorithms have been used to identify co-expressed genes in gene expression data. Since proteins typically interact with different groups of proteins in order to serve different biological roles, when responding to different external stimulants, the genes that produce these proteins are expected to co-express with more than one group of genes and therefore belong to more than one cluster. This poses a challenge to existing clustering algorithms as there is a need for overlapping clusters to be discovered in a noisy environment. For this reason, we propose an effective clustering approach, which consists of an initial clustering phase and a second re-clustering phase, in this paper. The proposed approach has several desirable features as follows. It makes use of both local and global information inherent in gene expression data to discover overlapping clusters by computing both a local pairwise similarity measure between gene expression profiles and a global probabilistic measure of interestingness of hidden patterns. When performing re-clustering, the proposed approach is able to distinguish between relevant and irrelevant expression data. In addition, it is able to make explicit the patterns discovered in each cluster for easy interpretation. For performance evaluation, the proposed approach has been tested with both simulated and real expression data sets. Experimental results show that it is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered and also the expression levels at which each cluster of genes co-expresses under different conditions can be better understood.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, Calif (2006)
Google Scholar
Lockhart, D.J., Winzeler, E.A.: Genomic, Gene Expression and DNA Arrays. Nature 405(6788), 827–836 (2000)
Article Google Scholar
Brazma, A., Robinson, A., Cameron, G., Ashburner, M.: One-stop Shop for Microarray Data. Nature 403(6771), 699–700 (2000)
Article Google Scholar
Ward, J.H.: Hierarchical Grouping to Optimize an Objective Function. J. Am. Stat. Assoc. 58, 236–244 (1963)
Article Google Scholar
MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observation. In: Proc. Symp.Math. Stat. and Prob. Berkeley., vol. 1, pp. 281–297 (1967)
Google Scholar
Kohonen, T.: Self-organization and Associative Memory. Springer, New York (1989)
Google Scholar
Eisen, M.B., et al.: Cluster Analysis and Display of Genome-wide Expression Patterns. Proc. Natl Acad. Sci. USA. 95(25), 14863–14868 (1998)
Article Google Scholar
Tavazoie, S., et al.: Systematic Determination of Genetic Network Architecture. Nat. Genet. 22(3), 281–285 (1999)
Article Google Scholar
Tamayo, P., et al.: Interpreting Patterns of Gene Expression with Self-organizing Maps: Methods and Application to Hematopoietic Differentiation. Proc. Natl. Acad. Sci. USA. 96(6), 2907–2912 (1999)
Article Google Scholar
Berrar, D.P., Dubitzky, W., Granzow, M.: A Practical Approach to Microarray Data Analysis. Kluwer Academic Publishers, Boston Mass (2003)
Book Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
MATH Google Scholar
Gasch, A.P., Eisen, M.B.: Exploring the Conditional Coregulation of Yeast Gene Expression through Fuzzy k-means Clustering. Genome Biol 3(11), 1–22 (2002)
Article Google Scholar
Yang, J., Wang, W., Wang, H., Yu, P.: Enhanced Biclustering on Expression Data. In: Proc. Third IEEE Conf. Bioinformatics and Bioeng, pp. 321–327 (2003)
Google Scholar
Preli, A., et al.: A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data. Bioinformatics 22(9), 1122–1129 (2006)
Article Google Scholar
Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-dependent Discretization for Inductive Learning from Continuous and Mixed-mode Data. IEEE Trans. Pattern Anal. Machine Intell. 17(7), 641–651 (1995)
Article Google Scholar
Ewens, W.J., Grant, G.R.: Statistical Methods in Bioinformatics. Springer, Heidelberg (2005)
MATH Google Scholar
Haberman, S.J.: The Analysis of Residuals in Cross-classified Tables. Biometrics 29, 205–220 (1973)
Article Google Scholar
Chan, K.C.C., Wong, A.K.C.: A Statistical Technique for Extracting Classificatory Knowledge from Databases. Knowledge Discovery in Databases, pp. 107–123. AAAI/MIT Press, MA (1991)
Google Scholar
Au, W.H., Chan, K.C.C., Yao, X.: A Novel Evolutionary Data Mining Algorithm with Applications to Churn Modeling. IEEE Trans. Evolutionary Computation. 7(6), 532–545 (2003)
Article Google Scholar
Chan, K.C.C., Wong, A.K.C., Chiu, D.K.Y.: Learning Sequential Patterns for Probabilistic Inductive Prediction. IEEE Trans. Systems, Man and Cybernetics 24(10), 1532–1547 (1994)
Article Google Scholar
Mateos, A., Dopazo, J., Jansen, R., Tu, Y., Gerstein, M., Stolovitzky, G.: Systematic Learning of Gene Functional Classes from DNA Array Expression Data by Using Multiplayer Perceptrons. Genome Res 12(11), 1703–1715 (2002)
Article Google Scholar
Iyer, V.R., et al.: The Transcriptional Program in the Response of Human Fibroblast to Serum. Science 283, 83–87 (1999)
Article Google Scholar
Yeung, K.Y., Ruzzo, W.L.: Principal Component Analysis for Clustering Gene Expression Data. Bioinformatics 17(9), 763–774 (2001)
Article Google Scholar
Sharan, R., et al.: CLICK and EXPANDER: A System for Clustering and Visualizing Gene Expression Data. Bioinformatics 19(14), 1787–1799 (2003)
Article MathSciNet Google Scholar
Rousseeuw, J.P.: Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comp. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Ball, C.A., et al.: Saccharomyces Genome Database provides Tools to Survey Gene Expression and Functional Analysis Data. Nucleic Acids Res 29(1), 80–81 (2001)
Article Google Scholar
Chu, S., et al.: The Transcriptional Program of Sporulation in Budding Yeast. Science 282, 699–705 (1998)
Article Google Scholar
Mewes, H.W., et al.: MIPS: A Database for Genomes and Protein Sequences. Nucleic Acids Res, 31–34 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong, China
Patrick C. H. Ma & Keith C. C. Chan

Authors

Patrick C. H. Ma
View author publications
You can also search for this author in PubMed Google Scholar
Keith C. C. Chan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Elena Marchiori Jason H. Moore

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, P.C.H., Chan, K.C.C. (2008). Mining Gene Expression Patterns for the Discovery of Overlapping Clusters. In: Marchiori, E., Moore, J.H. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2008. Lecture Notes in Computer Science, vol 4973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78757-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-78757-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78756-3
Online ISBN: 978-3-540-78757-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics