Skip to main content

Analysis of Multiple DNA Microarray Datasets

  • Chapter
Springer Handbook of Bio-/Neuroinformatics

Part of the book series: Springer Handbooks ((SHB))

Abstract

In contrast to conventional clustering algorithms, where a single dataset is used to produce a clustering solution, we introduce herein a MapReduce approach for clustering of datasets generated in multiple-experiment settings. It is inspired by the map-reduce functions commonly used in functional programming and consists of two distinctive phases. Initially, the selected clustering algorithm is applied (mapped) to each experiment separately. This produces a list of different clustering solutions, one per experiment. These are further transformed (reduced) by portioning the cluster centers into a single clustering solution. The obtained partition is not disjoint in terms of the different participating genes, and it is further analyzed and refined by applying formal concept analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 269.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

BiNGO:

biological networks gene ontology

DNA:

deoxyribonucleic acid

DTW:

dynamic time warping

EM:

expectation-maximization

FCA:

formal concept analysis

GO-id:

GO category identification

GO:

gene ontology

SI:

silhouette index

References

  1. T.C. Havens, J.M Keller, M Popescu, J.C Bezdek, E. MacNeal Rehrig, H.M Appel, J.C Schultz: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and gene ontology annotations, Proc. North Am. Fuzzy Inf. Process. Soc. (2008) pp. 1–6

    Google Scholar 

  2. D. Huang, W. Pan: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data, Bioinformatics 22(10), 1259–1268 (2006)

    Article  Google Scholar 

  3. J. Kasturi, R. Acharya: Clustering of diverse genomic data using information fusion, Bioinformatics 21(4), 423–429 (2005)

    Article  Google Scholar 

  4. G. Li, Z. Wang: Incorporating heterogeneous biological data sources in clustering gene expression data, Health 1, 17–23 (2009)

    Article  Google Scholar 

  5. R. Kustra, A. Zagdanski: Incorporating gene ontology in clustering gene expression data, Proc. 19th IEEE Symp. Comput.-Based Med. Syst. (2006) pp. 555–563

    Google Scholar 

  6. E. Johnson, H. Kargupta: Collective hierarchical clustering from distributed, heterogeneous data, LNCS 1759, 221–244 (1999)

    Google Scholar 

  7. A. Strehl, J. Ghosh: Cluster ensembles – A knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  8. A. Topchy, K. Jain, W. Punch: Clustering ensembles: Models of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell. 27, 1866–1881 (2005)

    Article  Google Scholar 

  9. A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  10. E. Kostadinova, V. Boeva, N. Lavesson: Clustering of multiple microarray experiments using information integration, LNCS 6865, 123–137 (2011)

    Google Scholar 

  11. B. Ganter, G. Stumme, R. Wille (Eds.): Formal Concept Analysis: Foundations and Applications, Lect. Notes Artif. Intell., Vol. 3626 (Springer, Berlin, Heidelberg 2005)

    Google Scholar 

  12. J. Besson, C. Robardet, J.-F. Boulicaut: Constraint-based mining of formal concepts in transactional data, LNCS 3056, 615–624 (2004)

    Google Scholar 

  13. J. Besson, C. Robardet, J.-F. Boulicaut, S. Rome: Constraint-based concept mining and its application to microarray data analysis, Intell. Data Anal. 9(1), 59–82 (2005)

    Google Scholar 

  14. D.P. Potter: A combinatorial approach to scientific exploration of gene expression data: An integrative method using formal concept analysis for the comparative analysis of microarray data. Ph.D. Thesis (Department of Mathematics, Virginia Tech 2005)

    Google Scholar 

  15. V. Choi, Y. Huang, V. Lam, D. Potter, R. Laubenbacher, K. Duca: Using formal concept analysis for microarray data comparison, J. Bioinf. Comput. Biol. 6(1), 65–75 (2008)

    Article  Google Scholar 

  16. M. Kaytoue-Uberall, S. Duplessis, A. Napoli: Using Formal Concept Analysis for the Extraction of Groups of Coexpressed Genes CCIS 14 (Springer, Berlin, Heidelberg 2008) pp. 445–455

    Google Scholar 

  17. G. Rustici, J. Mata, K. Kivinen, P. Lió, C.J. Penkett, G. Burns, J. Hayles, A. Brazma, P. Nurse, J. Bähler: Periodic gene expression program of the fission yeast cell cycle, Nat. Genet. 36, 809–817 (2004)

    Article  Google Scholar 

  18. E. Tsiporkova, V. Boeva: Two-pass imputation algorithm for missing value estimation in gene expression time series, J. Bioinf. Comput. Biol. 5(5), 1005–1022 (2007)

    Article  Google Scholar 

  19. V. Boeva, E. Tsiporkova: A multipurpose time series data standardization method. Intelligent systems: From theory to practice, Stud. Comput. Intell. 299, 445–460 (2010)

    Article  Google Scholar 

  20. A.K. Jain, M.N. Murty, P.J. Flynn: Data clustering: A review, ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  21. M. Ester, H.P. Kriegel, J. Sander, X. Xu: A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd ACM SIGKDD, Portland (1996) pp. 226–231

    Google Scholar 

  22. M. Eisen, P.T Spollman, P.O Brown, D. Botstein: Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)

    Article  Google Scholar 

  23. S. Datta, S. Datta: Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics 19, 459–466 (2003)

    Article  Google Scholar 

  24. J.B. MacQueen: Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symp. Math. Stat. Prob. 1, 281–297 (1967)

    MathSciNet  Google Scholar 

  25. L. Kaufman, P.J. Rousseeuw: Fitting Groups in Data: An Introduction to Cluster Analysis (Wiley, New York 1990)

    Book  Google Scholar 

  26. G. Babu, M. Murty: A near optimal initial seed value selection in k-means algorithm using a genetic algorithm, Pattern Recognit. Lett. 14, 763–769 (1993)

    Article  MATH  Google Scholar 

  27. S.S. Khan, A. Ahmad: Cluster center initialization algorithm for k-means clustering, Pattern Recognit. Lett. 25, 1293–1302 (2004)

    Article  Google Scholar 

  28. M. Al-Daoud: A new algorithm for cluster initialization, World Acad. Sci. Eng. Technol. 4, 74–76 (2005)

    Google Scholar 

  29. V. Boeva, E. Tsiporkova, E. Kostadinova: Analysis of multiple DNA microarrays (2012), available online at http://cst.tu-plovdiv.bg/bi/SupplementaryMaterial_MapReduce-FCA.pdf

  30. M. Halkidi, Y. Batistakis, M. Vazirgiannis: On clustering validation techniques, J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)

    Article  MATH  Google Scholar 

  31. S. Theodoridis, K. Koutroubas: Pattern Recognition (Academic, New York 1999)

    Google Scholar 

  32. A.K. Jain, R.C. Dubes: Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs 2006)

    Google Scholar 

  33. J. Handl, J. Knowles, D.B. Bell: Computational cluster validation in post-genomic data analysis, Bioinformatics 21, 3201–3212 (2005)

    Article  Google Scholar 

  34. P. Rousseeuw: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  35. M. de Hoon: Open clustering software (Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, University of Tokyo, 2012) available at http://bonsai.hgc.jp/∼mdehoon/software/cluster/software.htm

  36. S. Maere, K. Heymans, M. Kuiper: BiNGO: A Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21, 3448–3449 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Veselka Boeva , Elena Tsiporkova or Elena Kostadinova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag

About this chapter

Cite this chapter

Boeva, V., Tsiporkova, E., Kostadinova, E. (2014). Analysis of Multiple DNA Microarray Datasets. In: Kasabov, N. (eds) Springer Handbook of Bio-/Neuroinformatics. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30574-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30574-0_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30573-3

  • Online ISBN: 978-3-642-30574-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics