Abstract
In contrast to conventional clustering algorithms, where a single dataset is used to produce a clustering solution, we introduce herein a MapReduce approach for clustering of datasets generated in multiple-experiment settings. It is inspired by the map-reduce functions commonly used in functional programming and consists of two distinctive phases. Initially, the selected clustering algorithm is applied (mapped) to each experiment separately. This produces a list of different clustering solutions, one per experiment. These are further transformed (reduced) by portioning the cluster centers into a single clustering solution. The obtained partition is not disjoint in terms of the different participating genes, and it is further analyzed and refined by applying formal concept analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- BiNGO:
-
biological networks gene ontology
- DNA:
-
deoxyribonucleic acid
- DTW:
-
dynamic time warping
- EM:
-
expectation-maximization
- FCA:
-
formal concept analysis
- GO-id:
-
GO category identification
- GO:
-
gene ontology
- SI:
-
silhouette index
References
T.C. Havens, J.M Keller, M Popescu, J.C Bezdek, E. MacNeal Rehrig, H.M Appel, J.C Schultz: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and gene ontology annotations, Proc. North Am. Fuzzy Inf. Process. Soc. (2008) pp. 1–6
D. Huang, W. Pan: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data, Bioinformatics 22(10), 1259–1268 (2006)
J. Kasturi, R. Acharya: Clustering of diverse genomic data using information fusion, Bioinformatics 21(4), 423–429 (2005)
G. Li, Z. Wang: Incorporating heterogeneous biological data sources in clustering gene expression data, Health 1, 17–23 (2009)
R. Kustra, A. Zagdanski: Incorporating gene ontology in clustering gene expression data, Proc. 19th IEEE Symp. Comput.-Based Med. Syst. (2006) pp. 555–563
E. Johnson, H. Kargupta: Collective hierarchical clustering from distributed, heterogeneous data, LNCS 1759, 221–244 (1999)
A. Strehl, J. Ghosh: Cluster ensembles – A knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3, 583–617 (2002)
A. Topchy, K. Jain, W. Punch: Clustering ensembles: Models of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell. 27, 1866–1881 (2005)
A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39(1), 1–38 (1977)
E. Kostadinova, V. Boeva, N. Lavesson: Clustering of multiple microarray experiments using information integration, LNCS 6865, 123–137 (2011)
B. Ganter, G. Stumme, R. Wille (Eds.): Formal Concept Analysis: Foundations and Applications, Lect. Notes Artif. Intell., Vol. 3626 (Springer, Berlin, Heidelberg 2005)
J. Besson, C. Robardet, J.-F. Boulicaut: Constraint-based mining of formal concepts in transactional data, LNCS 3056, 615–624 (2004)
J. Besson, C. Robardet, J.-F. Boulicaut, S. Rome: Constraint-based concept mining and its application to microarray data analysis, Intell. Data Anal. 9(1), 59–82 (2005)
D.P. Potter: A combinatorial approach to scientific exploration of gene expression data: An integrative method using formal concept analysis for the comparative analysis of microarray data. Ph.D. Thesis (Department of Mathematics, Virginia Tech 2005)
V. Choi, Y. Huang, V. Lam, D. Potter, R. Laubenbacher, K. Duca: Using formal concept analysis for microarray data comparison, J. Bioinf. Comput. Biol. 6(1), 65–75 (2008)
M. Kaytoue-Uberall, S. Duplessis, A. Napoli: Using Formal Concept Analysis for the Extraction of Groups of Coexpressed Genes CCIS 14 (Springer, Berlin, Heidelberg 2008) pp. 445–455
G. Rustici, J. Mata, K. Kivinen, P. Lió, C.J. Penkett, G. Burns, J. Hayles, A. Brazma, P. Nurse, J. Bähler: Periodic gene expression program of the fission yeast cell cycle, Nat. Genet. 36, 809–817 (2004)
E. Tsiporkova, V. Boeva: Two-pass imputation algorithm for missing value estimation in gene expression time series, J. Bioinf. Comput. Biol. 5(5), 1005–1022 (2007)
V. Boeva, E. Tsiporkova: A multipurpose time series data standardization method. Intelligent systems: From theory to practice, Stud. Comput. Intell. 299, 445–460 (2010)
A.K. Jain, M.N. Murty, P.J. Flynn: Data clustering: A review, ACM Comput. Surv. 31(3), 264–323 (1999)
M. Ester, H.P. Kriegel, J. Sander, X. Xu: A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd ACM SIGKDD, Portland (1996) pp. 226–231
M. Eisen, P.T Spollman, P.O Brown, D. Botstein: Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)
S. Datta, S. Datta: Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics 19, 459–466 (2003)
J.B. MacQueen: Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symp. Math. Stat. Prob. 1, 281–297 (1967)
L. Kaufman, P.J. Rousseeuw: Fitting Groups in Data: An Introduction to Cluster Analysis (Wiley, New York 1990)
G. Babu, M. Murty: A near optimal initial seed value selection in k-means algorithm using a genetic algorithm, Pattern Recognit. Lett. 14, 763–769 (1993)
S.S. Khan, A. Ahmad: Cluster center initialization algorithm for k-means clustering, Pattern Recognit. Lett. 25, 1293–1302 (2004)
M. Al-Daoud: A new algorithm for cluster initialization, World Acad. Sci. Eng. Technol. 4, 74–76 (2005)
V. Boeva, E. Tsiporkova, E. Kostadinova: Analysis of multiple DNA microarrays (2012), available online at http://cst.tu-plovdiv.bg/bi/SupplementaryMaterial_MapReduce-FCA.pdf
M. Halkidi, Y. Batistakis, M. Vazirgiannis: On clustering validation techniques, J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)
S. Theodoridis, K. Koutroubas: Pattern Recognition (Academic, New York 1999)
A.K. Jain, R.C. Dubes: Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs 2006)
J. Handl, J. Knowles, D.B. Bell: Computational cluster validation in post-genomic data analysis, Bioinformatics 21, 3201–3212 (2005)
P. Rousseeuw: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20, 53–65 (1987)
M. de Hoon: Open clustering software (Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, University of Tokyo, 2012) available at http://bonsai.hgc.jp/∼mdehoon/software/cluster/software.htm
S. Maere, K. Heymans, M. Kuiper: BiNGO: A Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21, 3448–3449 (2005)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag
About this chapter
Cite this chapter
Boeva, V., Tsiporkova, E., Kostadinova, E. (2014). Analysis of Multiple DNA Microarray Datasets. In: Kasabov, N. (eds) Springer Handbook of Bio-/Neuroinformatics. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30574-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-30574-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30573-3
Online ISBN: 978-3-642-30574-0
eBook Packages: EngineeringEngineering (R0)