Advertisement

Analysis of Multiple DNA Microarray Datasets

Chapter
Part of the Springer Handbooks book series (SHB)

Abstract

In contrast to conventional clustering algorithms, where a single dataset is used to produce a clustering solution, we introduce herein a MapReduce approach for clustering of datasets generated in multiple-experiment settings. It is inspired by the map-reduce functions commonly used in functional programming and consists of two distinctive phases. Initially, the selected clustering algorithm is applied (mapped) to each experiment separately. This produces a list of different clustering solutions, one per experiment. These are further transformed (reduced) by portioning the cluster centers into a single clustering solution. The obtained partition is not disjoint in terms of the different participating genes, and it is further analyzed and refined by applying formal concept analysis.

Abbreviations

BiNGO

biological networks gene ontology

DNA

deoxyribonucleic acid

DTW

dynamic time warping

EM

expectation-maximization

FCA

formal concept analysis

GO-id

GO category identification

GO

gene ontology

SI

silhouette index

References

  1. 14.1.
    T.C. Havens, J.M Keller, M Popescu, J.C Bezdek, E. MacNeal Rehrig, H.M Appel, J.C Schultz: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and gene ontology annotations, Proc. North Am. Fuzzy Inf. Process. Soc. (2008) pp. 1–6Google Scholar
  2. 14.2.
    D. Huang, W. Pan: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data, Bioinformatics 22(10), 1259–1268 (2006)CrossRefGoogle Scholar
  3. 14.3.
    J. Kasturi, R. Acharya: Clustering of diverse genomic data using information fusion, Bioinformatics 21(4), 423–429 (2005)CrossRefGoogle Scholar
  4. 14.4.
    G. Li, Z. Wang: Incorporating heterogeneous biological data sources in clustering gene expression data, Health 1, 17–23 (2009)CrossRefGoogle Scholar
  5. 14.5.
    R. Kustra, A. Zagdanski: Incorporating gene ontology in clustering gene expression data, Proc. 19th IEEE Symp. Comput.-Based Med. Syst. (2006) pp. 555–563Google Scholar
  6. 14.6.
    E. Johnson, H. Kargupta: Collective hierarchical clustering from distributed, heterogeneous data, LNCS 1759, 221–244 (1999)Google Scholar
  7. 14.7.
    A. Strehl, J. Ghosh: Cluster ensembles – A knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNetGoogle Scholar
  8. 14.8.
    A. Topchy, K. Jain, W. Punch: Clustering ensembles: Models of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell. 27, 1866–1881 (2005)CrossRefGoogle Scholar
  9. 14.9.
    A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  10. 14.10.
    E. Kostadinova, V. Boeva, N. Lavesson: Clustering of multiple microarray experiments using information integration, LNCS 6865, 123–137 (2011)Google Scholar
  11. 14.11.
    B. Ganter, G. Stumme, R. Wille (Eds.): Formal Concept Analysis: Foundations and Applications, Lect. Notes Artif. Intell., Vol. 3626 (Springer, Berlin, Heidelberg 2005)Google Scholar
  12. 14.12.
    J. Besson, C. Robardet, J.-F. Boulicaut: Constraint-based mining of formal concepts in transactional data, LNCS 3056, 615–624 (2004)Google Scholar
  13. 14.13.
    J. Besson, C. Robardet, J.-F. Boulicaut, S. Rome: Constraint-based concept mining and its application to microarray data analysis, Intell. Data Anal. 9(1), 59–82 (2005)Google Scholar
  14. 14.14.
    D.P. Potter: A combinatorial approach to scientific exploration of gene expression data: An integrative method using formal concept analysis for the comparative analysis of microarray data. Ph.D. Thesis (Department of Mathematics, Virginia Tech 2005)Google Scholar
  15. 14.15.
    V. Choi, Y. Huang, V. Lam, D. Potter, R. Laubenbacher, K. Duca: Using formal concept analysis for microarray data comparison, J. Bioinf. Comput. Biol. 6(1), 65–75 (2008)CrossRefGoogle Scholar
  16. 14.16.
    M. Kaytoue-Uberall, S. Duplessis, A. Napoli: Using Formal Concept Analysis for the Extraction of Groups of Coexpressed Genes CCIS 14 (Springer, Berlin, Heidelberg 2008) pp. 445–455Google Scholar
  17. 14.17.
    G. Rustici, J. Mata, K. Kivinen, P. Lió, C.J. Penkett, G. Burns, J. Hayles, A. Brazma, P. Nurse, J. Bähler: Periodic gene expression program of the fission yeast cell cycle, Nat. Genet. 36, 809–817 (2004)CrossRefGoogle Scholar
  18. 14.18.
    E. Tsiporkova, V. Boeva: Two-pass imputation algorithm for missing value estimation in gene expression time series, J. Bioinf. Comput. Biol. 5(5), 1005–1022 (2007)CrossRefGoogle Scholar
  19. 14.19.
    V. Boeva, E. Tsiporkova: A multipurpose time series data standardization method. Intelligent systems: From theory to practice, Stud. Comput. Intell. 299, 445–460 (2010)CrossRefGoogle Scholar
  20. 14.20.
    A.K. Jain, M.N. Murty, P.J. Flynn: Data clustering: A review, ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  21. 14.21.
    M. Ester, H.P. Kriegel, J. Sander, X. Xu: A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd ACM SIGKDD, Portland (1996) pp. 226–231Google Scholar
  22. 14.22.
    M. Eisen, P.T Spollman, P.O Brown, D. Botstein: Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)CrossRefGoogle Scholar
  23. 14.23.
    S. Datta, S. Datta: Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics 19, 459–466 (2003)CrossRefGoogle Scholar
  24. 14.24.
    J.B. MacQueen: Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symp. Math. Stat. Prob. 1, 281–297 (1967)MathSciNetGoogle Scholar
  25. 14.25.
    L. Kaufman, P.J. Rousseeuw: Fitting Groups in Data: An Introduction to Cluster Analysis (Wiley, New York 1990)CrossRefGoogle Scholar
  26. 14.26.
    G. Babu, M. Murty: A near optimal initial seed value selection in k-means algorithm using a genetic algorithm, Pattern Recognit. Lett. 14, 763–769 (1993)CrossRefzbMATHGoogle Scholar
  27. 14.27.
    S.S. Khan, A. Ahmad: Cluster center initialization algorithm for k-means clustering, Pattern Recognit. Lett. 25, 1293–1302 (2004)CrossRefGoogle Scholar
  28. 14.28.
    M. Al-Daoud: A new algorithm for cluster initialization, World Acad. Sci. Eng. Technol. 4, 74–76 (2005)Google Scholar
  29. 14.29.
    V. Boeva, E. Tsiporkova, E. Kostadinova: Analysis of multiple DNA microarrays (2012), available online at http://cst.tu-plovdiv.bg/bi/SupplementaryMaterial_MapReduce-FCA.pdf
  30. 14.30.
    M. Halkidi, Y. Batistakis, M. Vazirgiannis: On clustering validation techniques, J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)CrossRefzbMATHGoogle Scholar
  31. 14.31.
    S. Theodoridis, K. Koutroubas: Pattern Recognition (Academic, New York 1999)Google Scholar
  32. 14.32.
    A.K. Jain, R.C. Dubes: Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs 2006)Google Scholar
  33. 14.33.
    J. Handl, J. Knowles, D.B. Bell: Computational cluster validation in post-genomic data analysis, Bioinformatics 21, 3201–3212 (2005)CrossRefGoogle Scholar
  34. 14.34.
    P. Rousseeuw: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  35. 14.35.
    M. de Hoon: Open clustering software (Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, University of Tokyo, 2012) available at http://bonsai.hgc.jp/∼mdehoon/software/cluster/software.htm
  36. 14.36.
    S. Maere, K. Heymans, M. Kuiper: BiNGO: A Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21, 3448–3449 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.Department of Computer Systems and TechnologiesTechnical University of Sofia, Branch PlovdivPlovdivBulgaria
  2. 2.Department of ICT & Software EngineeringSirris, The Collective Center for the Belgian Technological IndustryBrusselsBelgium
  3. 3.Department of Computer Systems and TechnologiesTechnical University of Sofia, Plovdiv BranchPlovdivBulgaria

Personalised recommendations