Abstract
Following the invention of microarrays in 1994, the development and applications of this technology have grown exponentially. The numerous applications of microarray technology include clinical diagnosis and treatment, drug design and discovery, tumour detection, and environmental health research. One of the key issues in the experimental approaches utilising microarrays is to extract quantitative information from the spots, which represent genes in a given experiment. For this process, the initial stages are important and they influence future steps in the analysis. Identifying the spots and separating the background from the foreground is a fundamental problem in DNA microarray data analysis. In this review, we present an overview of state-of-the-art methods for microarray image segmentation. We discuss the foundations of the circle-shaped approach, adaptive shape segmentation, histogram-based methods and the recently introduced clustering-based techniques. We analytically show that clustering-based techniques are equivalent to the one-dimensional, standard k-means clustering algorithm that utilises the Euclidean distance.
Similar content being viewed by others
References
Brown P, Botstein D. Exploring the new world of the genome with DNA microarrays. Nat Genet 1999 Jan; 21 (1 Suppl.): 33–7
Bittner M, Meltzer XP, Chen XY, et al. Gene expression data analysis. FEBS Lett 2000; 480: 17–24
Schena M. Microarray analysis. Hoboken (NJ): Wiley-Liss, 2003
Eisen M. ScanAlyze user manual. Stanford (CA): Stanford University, 1999
Axon Instruments, Inc. GenePix Professional 4200A: microarray scanner user’s guide [online]. Available from URL: http://www.files.axon.com/downloads/manuals/GenePix_4200A_User_Guide_Rev_B.pdf [Accessed 2005 May 24]
Packard BioScience. QuantArray microarray analysis software manual [online]. Available from URL: http://www.las.perkinelmer.com/content/Manuals/quantarraymanual.pdf [Accessed 2005 May 24]
Buckly M. The Spot user’s guide [online]. CSIRO Mathematical and Information Sciences, 2000. Available from URL: http://www.cmis.csiro.au/IAP/Spot/spotmanual.htm [Accessed 2005 May 24]
Callow MJ, Dudoit S, Gong EL, et al. Microarray expression profiling identifies genes with altered expression in HDL deficient mice. Genome Res 2000; 10(12): 2022–9
Katzer M, Kummert F, Sagerer G. Robust automatic microarray image analysis. International Conference on Bioinformatics: North-South Networking; 2002 Feb 6–8; Bangkok
Jain A, Tokuyasu T, Snijders A, et al. Fully automatic quantification of microarray image data. Genome Res 2002; 12(2): 325–32
Steinfath M, Wruck W, Scidel H. Automated image analysis for array hybridization experiments. Bioinformatics 2001; 17(7): 634–41
Soille P. Morphological image analysis: principles and applications. 2nd ed. New York: Springer-Verlag, 2003
Chen Y, Dougherty E, Bittner M. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J Biomed Opt 1997; 2: 364–74
Kooperberg C, Fazzio T, Tsukiyama T. Improved background correction for spotted DNA microarrays. J Comput Biol 2002; 9(1): 55–66
Goryachev A, Macgregor P, Edwards A. Unfolding of microarray data. J Comput Biol 2001; 8(4): 443–61
Yang M, Ruan Q, Yang J, et al. A statistical procedure for flagging weak spots greatly improves normalization and ratio estimates in microarray experiments. Physiol Genomics 2001; 7(1): 45–53
Schuchhardt J, Beule D, Malik A, et al. Normalization strategies for cDNA microarrays. Nucleic Acids Res 2000; 28: 47
Duda R, Hart P, Stork D. Pattern classification. 2nd ed. Canada: Wiley-Interscience, 2000
Jaakkola T, Diekhans M, Haussler D. Using the Fisher kernel method to detect remote protein homologies. Proc Int Conf Intell Syst Mol Biol 1999, 149–58
Mukherjee S, Tamayo P, Slonim D, et al. Support vector machine classification of microarray data. Artificial Intelligence (AI) Memo 1677. Cambridge (MA): Massachusetts Institute of Technology, 1999
Spellman P, Sherlock G, Zhang M, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998; 9: 3273–97
Zien A, Ratsch G, Mika S, et al. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 2000; 16(9): 799–807
Cai Y, Liu X, Xu X, et al. Support vector machines for predicting protein structural class. BMC Bioinformatics 2001; 2(1): 1–5
Schölkopf B, Guyon IM, Weston J. Statistical learning and kernel methods in bioinformatics. In: Frasconi P, Shamir R, editors. Artificial intelligence and heuristic methods in bioinformatics. Amsterdam: IOS Press, 2003: 1–21
Ding C, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 2001; 17: 349–58
Campanini R, Dongiovanni D, Lanconelli N, et al. A support vector machines classifier based on recursive feature elimination for microarray data in breast cancer characterization. First National Workshop on Bioinformatics, VIII National Congress of the Italian Association for Artificial Intelligence; 2002 Sep 10; Siena, Italy
Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Mach Learn 2002; 46(1/3): 389–422
Rueda L, Oommen BJ. On optimal pairwise linear classifiers for normal distributions: the two-dimensional case. IEEE Trans Pattern Anal Mach Intell 2002; 24(2): 274–80
Rueda L. An efficient approach to compute the threshold in multi-dimensional linear classifiers. Pattern Recognit 2004; 37(4): 811–26
Wen X, Fuhrman S, Michaels G, et al. Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci U S A 1998; 95: 334–9
Alon U, Barkai N, Notterman D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 1999 Jun 8; 96(12): 6745–50
Tibshirani R, Hastie T, Eisen M, et al. Clustering methods for the analysis of DNA microarray data [technical report]. Stanford (CA): Department of Statistics, Stanford University, 1999
Perou C, Jeffrey S, van de Rijn M, et al. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci U S A 1999; 96: 9212–7
Furey T, Cristianini N, Duffy N, et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000; 16(10): 906–14
Bicciato S, Pandin M, Didone G, et al. Analysis of an associative memory neural network for pattern identification in gene expression data. Workshop on Data Mining and Bioinformatics (BIOKDD’01); 2001 Aug 26; San Francisco
Tamayo P, Slonim D, Mesirov J, et al. Interpreting patterns of gene expression with selforganizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A 1999; 96(6): 2907–12
Asano T, Chen D, Katoh N, et al. Polynomial-time solutions to image segmentation. Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society of Applied and Industrial Mathematics, 1996
Puzicha J, Buhmann J, Hofmann T. Histogram clustering for unsupervised image segmentation. Comput Vis Pattern Recognit 1999; 2: 2602–8
Draghici S. Data analysis for DNA microarrays. Boca Raton (FL): CRC Press, 2003
Buhler J, Ideker T, Haynor D. Dapple: improved techniques for finding spots on DNA microarrays [technical report UWTR 2000-08-05.]. Seattle: University of Washington, 2000
Heyer L, Moskowitz D, Abele J, et al. MAGIC tool: integrated microarray data analysis. Bioinformatics 2005; 21(9): 2114–5
Adams R, Bishop L. Seeded region growing. IEEE Trans Pattern Anal Mach Intell 1994; 16(6): 641–7
Yang Y, Buckley M, Dudoit S, et al. Comparison of methods for image analysis on cDNA microarray data. J Comput Graph Stat 2002; 11: 108–36
Wu H, Yan H. Microarray image processing based on clustering and morphological analysis. Proceedings of the First Asia-Pacific Conference on Bioinformatics. Darlinghurst, Australia: Australian Computer Society, Inc., 2003: 111–8
Rueda L, Qin L. An unsupervised learning scheme for DNA microarray image spot detection. First International Conference on Complex Medical Engineering; 2005 May 15–18; Takamatsu, Japan
Acknowledgements
The authors would like to thank the referees who devoted their efforts to substantially improving the quality of the paper. This research work has been partially supported by NSERC (Natural Sciences and Engineering Council of Canada), CFI (Canadian Foundation for Innovation) and OIT (Ontario Innovation Trust).
The authors have provided no information on conflicts of interest directly relevant to the content of this article..
Author information
Authors and Affiliations
Corresponding author
Appendix 1
Appendix 1
Refer to figure A1.
Theorem 1:Let D = {x 1,…,x n}, which has to be clustered into two classes. If n→∞, KSCMIS produces the same results as the k-means algorithm, where the Euclidean distance is used.
Proof: Let n = N 1 + N 2. Since empty clusters are not allowed, then it is true that as n→∞, it implies that N 1, N 2 →∞. We can then write the asymptotic behaviour of equation 4 as follows (equation 6):
Additionally, it is straightforward that, in the 1-D Euclidean space, equation 6 is equivalent to (x i —μ(1)2 > (x i — μ1)2. Clearly, equation 6 is the criterion used by the standard k-means algorithm, and thus the result follows.
Rights and permissions
About this article
Cite this article
Qin, L., Rueda, L., Ali, A. et al. Spot Detection and Image Segmentation in DNA Microarray Data. Appl-Bioinformatics 4, 1–11 (2005). https://doi.org/10.2165/00822942-200504010-00001
Published:
Issue Date:
DOI: https://doi.org/10.2165/00822942-200504010-00001