Analysis of Multiple DNA Microarray Datasets

Boeva, Veselka; Tsiporkova, Elena; Kostadinova, Elena

doi:10.1007/978-3-642-30574-0_14

Veselka Boeva²,
Elena Tsiporkova³ &
Elena Kostadinova⁴

Part of the book series: Springer Handbooks ((SHB))

7296 Accesses
4 Citations

Abstract

In contrast to conventional clustering algorithms, where a single dataset is used to produce a clustering solution, we introduce herein a MapReduce approach for clustering of datasets generated in multiple-experiment settings. It is inspired by the map-reduce functions commonly used in functional programming and consists of two distinctive phases. Initially, the selected clustering algorithm is applied (mapped) to each experiment separately. This produces a list of different clustering solutions, one per experiment. These are further transformed (reduced) by portioning the cluster centers into a single clustering solution. The obtained partition is not disjoint in terms of the different participating genes, and it is further analyzed and refined by applying formal concept analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 269.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

BiNGO:: biological networks gene ontology
DNA:: deoxyribonucleic acid
DTW:: dynamic time warping
EM:: expectation-maximization
FCA:: formal concept analysis
GO-id:: GO category identification
GO:: gene ontology
SI:: silhouette index

References

T.C. Havens, J.M Keller, M Popescu, J.C Bezdek, E. MacNeal Rehrig, H.M Appel, J.C Schultz: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and gene ontology annotations, Proc. North Am. Fuzzy Inf. Process. Soc. (2008) pp. 1–6
Google Scholar
D. Huang, W. Pan: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data, Bioinformatics 22(10), 1259–1268 (2006)
Article Google Scholar
J. Kasturi, R. Acharya: Clustering of diverse genomic data using information fusion, Bioinformatics 21(4), 423–429 (2005)
Article Google Scholar
G. Li, Z. Wang: Incorporating heterogeneous biological data sources in clustering gene expression data, Health 1, 17–23 (2009)
Article Google Scholar
R. Kustra, A. Zagdanski: Incorporating gene ontology in clustering gene expression data, Proc. 19th IEEE Symp. Comput.-Based Med. Syst. (2006) pp. 555–563
Google Scholar
E. Johnson, H. Kargupta: Collective hierarchical clustering from distributed, heterogeneous data, LNCS 1759, 221–244 (1999)
Google Scholar
A. Strehl, J. Ghosh: Cluster ensembles – A knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet Google Scholar
A. Topchy, K. Jain, W. Punch: Clustering ensembles: Models of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell. 27, 1866–1881 (2005)
Article Google Scholar
A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
E. Kostadinova, V. Boeva, N. Lavesson: Clustering of multiple microarray experiments using information integration, LNCS 6865, 123–137 (2011)
Google Scholar
B. Ganter, G. Stumme, R. Wille (Eds.): Formal Concept Analysis: Foundations and Applications, Lect. Notes Artif. Intell., Vol. 3626 (Springer, Berlin, Heidelberg 2005)
Google Scholar
J. Besson, C. Robardet, J.-F. Boulicaut: Constraint-based mining of formal concepts in transactional data, LNCS 3056, 615–624 (2004)
Google Scholar
J. Besson, C. Robardet, J.-F. Boulicaut, S. Rome: Constraint-based concept mining and its application to microarray data analysis, Intell. Data Anal. 9(1), 59–82 (2005)
Google Scholar
D.P. Potter: A combinatorial approach to scientific exploration of gene expression data: An integrative method using formal concept analysis for the comparative analysis of microarray data. Ph.D. Thesis (Department of Mathematics, Virginia Tech 2005)
Google Scholar
V. Choi, Y. Huang, V. Lam, D. Potter, R. Laubenbacher, K. Duca: Using formal concept analysis for microarray data comparison, J. Bioinf. Comput. Biol. 6(1), 65–75 (2008)
Article Google Scholar
M. Kaytoue-Uberall, S. Duplessis, A. Napoli: Using Formal Concept Analysis for the Extraction of Groups of Coexpressed Genes CCIS 14 (Springer, Berlin, Heidelberg 2008) pp. 445–455
Google Scholar
G. Rustici, J. Mata, K. Kivinen, P. Lió, C.J. Penkett, G. Burns, J. Hayles, A. Brazma, P. Nurse, J. Bähler: Periodic gene expression program of the fission yeast cell cycle, Nat. Genet. 36, 809–817 (2004)
Article Google Scholar
E. Tsiporkova, V. Boeva: Two-pass imputation algorithm for missing value estimation in gene expression time series, J. Bioinf. Comput. Biol. 5(5), 1005–1022 (2007)
Article Google Scholar
V. Boeva, E. Tsiporkova: A multipurpose time series data standardization method. Intelligent systems: From theory to practice, Stud. Comput. Intell. 299, 445–460 (2010)
Article Google Scholar
A.K. Jain, M.N. Murty, P.J. Flynn: Data clustering: A review, ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
M. Ester, H.P. Kriegel, J. Sander, X. Xu: A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd ACM SIGKDD, Portland (1996) pp. 226–231
Google Scholar
M. Eisen, P.T Spollman, P.O Brown, D. Botstein: Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)
Article Google Scholar
S. Datta, S. Datta: Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics 19, 459–466 (2003)
Article Google Scholar
J.B. MacQueen: Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symp. Math. Stat. Prob. 1, 281–297 (1967)
MathSciNet Google Scholar
L. Kaufman, P.J. Rousseeuw: Fitting Groups in Data: An Introduction to Cluster Analysis (Wiley, New York 1990)
Book Google Scholar
G. Babu, M. Murty: A near optimal initial seed value selection in k-means algorithm using a genetic algorithm, Pattern Recognit. Lett. 14, 763–769 (1993)
Article MATH Google Scholar
S.S. Khan, A. Ahmad: Cluster center initialization algorithm for k-means clustering, Pattern Recognit. Lett. 25, 1293–1302 (2004)
Article Google Scholar
M. Al-Daoud: A new algorithm for cluster initialization, World Acad. Sci. Eng. Technol. 4, 74–76 (2005)
Google Scholar
V. Boeva, E. Tsiporkova, E. Kostadinova: Analysis of multiple DNA microarrays (2012), available online at http://cst.tu-plovdiv.bg/bi/SupplementaryMaterial_MapReduce-FCA.pdf
M. Halkidi, Y. Batistakis, M. Vazirgiannis: On clustering validation techniques, J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)
Article MATH Google Scholar
S. Theodoridis, K. Koutroubas: Pattern Recognition (Academic, New York 1999)
Google Scholar
A.K. Jain, R.C. Dubes: Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs 2006)
Google Scholar
J. Handl, J. Knowles, D.B. Bell: Computational cluster validation in post-genomic data analysis, Bioinformatics 21, 3201–3212 (2005)
Article Google Scholar
P. Rousseeuw: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
M. de Hoon: Open clustering software (Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, University of Tokyo, 2012) available at http://bonsai.hgc.jp/∼mdehoon/software/cluster/software.htm
S. Maere, K. Heymans, M. Kuiper: BiNGO: A Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21, 3448–3449 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Systems and Technologies, Technical University of Sofia, Branch Plovdiv, Str. Tsanko Dyustabanov 25, 4000, Plovdiv, Bulgaria
Veselka Boeva
Department of ICT & Software Engineering, Sirris, The Collective Center for the Belgian Technological Industry, A. Reyerslaan 80, 1030, Brussels, Belgium
Elena Tsiporkova
Department of Computer Systems and Technologies, Technical University of Sofia, Plovdiv Branch, Str. Tsanko Dyustabanov 25, 4000, Plovdiv, Bulgaria
Elena Kostadinova

Authors

Veselka Boeva
View author publications
You can also search for this author in PubMed Google Scholar
Elena Tsiporkova
View author publications
You can also search for this author in PubMed Google Scholar
Elena Kostadinova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Veselka Boeva , Elena Tsiporkova or Elena Kostadinova .

Editor information

Editors and Affiliations

KEDRI – Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, 120 Mayoral Drive, 1010, Auckland, New Zealand
Nikola Kasabov

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boeva, V., Tsiporkova, E., Kostadinova, E. (2014). Analysis of Multiple DNA Microarray Datasets. In: Kasabov, N. (eds) Springer Handbook of Bio-/Neuroinformatics. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30574-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-30574-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30573-3
Online ISBN: 978-3-642-30574-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics