Pattern Analysis and Applications

, Volume 16, Issue 3, pp 307–319 | Cite as

Spot defects detection in cDNA microarray images

  • Mónica G. Larese
  • Pablo M. Granitto
  • Juan C. Gómez
Theoretical Advances


Bad quality spots should be filtered out at early steps in microarray analysis to avoid noisy data. In this paper we implement quality control of individual spots from real microarray images. First of all, we consider the binary classification problem of detecting bad quality spots. We propose the use of ensemble algorithms to perform detection and obtain improved accuracies over previous studies in the literature. Next, we analyze the untackled problem of identifying specific spot defects. One spot may have several faults simultaneously (or none of them) yielding a multi-label classification problem. We propose several extra features in addition to those used for binary classification, and we use three different methods to perform the classification task: five independent binary classifiers, the recent Convex Multi-task Feature Learning (CMFL) algorithm and Convex Multi-task Independent Learning (CMIL). We analyze the Hamming loss and areas under the receiver operating characteristic (ROC) curves to quantify the accuracies of the methods. We find that the three strategies achieve similar results leading to a successful identification of particular defects. Also, using a Random forest-based analysis we show that the newly introduced features are highly relevant for this task.


Microarray images Quality control Defects classification Ensemble classifiers Convex multi-task Learning Pattern recognition 



This work was partially supported by ANPCyT through grants PICT 0237 and 2226.


  1. 1.
    Axon GenePix Pro 7.1.
  2. 2.
    Alizadeh AA, Eisen MB, Davis EE, Ma C, Lossos IS, Rosenwald A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nat Biotechnol 403(6769):503–511CrossRefGoogle Scholar
  3. 3.
    Angulo J, Serra J (2003) Automatic analysis of DNA microarray images using mathematical morphology. Bioinformatics 19(5):553–562CrossRefGoogle Scholar
  4. 4.
    Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn Special Issue Induct Transf Learn 73(3):243–272CrossRefGoogle Scholar
  5. 5.
    Bajcsy P (2006) An overview of DNA microarray grid alignment and foreground separation approaches. EURASIP J Appl Sig P Article ID 80163:1–13Google Scholar
  6. 6.
    Baluja S, Rowley HA (2007) Boosting sex identification performance. Int J Comput Vision 71(1):111–119CrossRefGoogle Scholar
  7. 7.
    Bariamis D, Maroulis D, Iakovidis D (2009) Unsupervised SVM-based gridding for DNA microarray images. Comput Med Imaging Graph 34(6):418–425Google Scholar
  8. 8.
    Bengtsson A, Bengtsson H (2006) Microarray image analysis: background estimation using quantile and morphological filters. BMC Bioinf 7(96):1–15Google Scholar
  9. 9.
    Bicego M, Martínez MDR, Murino V (2005) A supervised data-driven approach for microarray spot quality classification. Pattern Anal Appl 8:181–187CrossRefGoogle Scholar
  10. 10.
    Blekas K, Galatsanos NP, Likas A, Lagaris IE (2005) Mixture model analysis of DNA microarray images. IEEE T Med Imaging 24(7):901–909Google Scholar
  11. 11.
    Bonev B, Escolano F, Cazorla M (2008) Feature selection, mutual information, and the classification of high-dimensional patterns. Pattern Anal Appl 11:309–319MathSciNetCrossRefGoogle Scholar
  12. 12.
    Bozinov D, Rahnenfürher J (2002) Unsupervised technique for robust target separation and analysis of DNA microarray spots through adaptive pixel clustering. Bioinformatics 18(5):747–756CrossRefGoogle Scholar
  13. 13.
    Brändle N, Bischof H, Lapp H (2003) Robust DNA microarray image analysis. Mach Vision Appl 15(1):11–28CrossRefGoogle Scholar
  14. 14.
    Breiman L (2001) Random forests. Mach Learn 45:5–32MATHCrossRefGoogle Scholar
  15. 15.
    Brown CS, Goodwin PC, Sorger PK (2001) Image metrics in the statistical analysis of DNA microarray data. Proc Natl Acad Sci USA 98(16):8944–8949CrossRefGoogle Scholar
  16. 16.
    Cai R, Hao Z, Yang X, Huang H (2011) A new hybrid method for gene selection. Pattern Anal Appl 14:1–8MathSciNetCrossRefGoogle Scholar
  17. 17.
    Chen TB, Lu HHS, Lee YS, Lan HJ (2008) Segmentation of cDNA microarray images by kernel density estimation. J Biomed Inf 41:1021–1027CrossRefGoogle Scholar
  18. 18.
    Chopra P, Kang J, Yang J, Cho HJ, Kim HS, Lee MG (2008) Microarray data mining using landmark gene-guided clustering. BMC Bioinf 9(92):1–13Google Scholar
  19. 19.
    Culp M, Johnson K, Michailides G (2006) ada: An R package for stochastic boosting. J Stat Softw 17(2):1–27Google Scholar
  20. 20.
    Eisen M (1999) Scanalyze
  21. 21.
    Eisen MB, Brown PO (1999) DNA arrays for analysis of gene expression. Methods Enzymol 303:179–205CrossRefGoogle Scholar
  22. 22.
    Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetMATHCrossRefGoogle Scholar
  23. 23.
    Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Giannakeas N, Fotiadis DI (2009) An automated method for gridding and clustering-based segmentation of cDNA microarray images. Comput Med Imaging Graph 33:40–49CrossRefGoogle Scholar
  25. 25.
    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Sci Agric 286(5439):531–537CrossRefGoogle Scholar
  26. 26.
    Gonzalez R, Woods R (2002) Digital image processing, 2nd edn. Prentice HallGoogle Scholar
  27. 27.
    Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, BerlinGoogle Scholar
  28. 28.
    Hautaniemi S, Edgren H, Vesanen P, Wolf M, Järvinen AK, Yli Harja O et al (2003) A novel strategy for microarray quality control using Bayesian networks. Bioinformatics 19(16):2031–2038CrossRefGoogle Scholar
  29. 29.
    Lashkari DA, De Risi JL, McCusker JH, Namath AF, Gentile C, Hwang SY et al. (1997) Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci USA 94(24):13057–13062Google Scholar
  30. 30.
    Liu X, Zhang L, Li M, Zhang H, Wang D (2005) Boosting image classification with LDA-based feature combination for digital photograph management. Pattern Recogn Lett 38(6):887–901CrossRefGoogle Scholar
  31. 31.
    Ruosaari S, Hollmen J (2002) Image analysis for detecting faulty spots from microarray images. In: Lange S, Satoh K, Smith C (eds) Proceedings of 5th international conference on discovery science (DS2002). Springer, Berlin, pp 259–266Google Scholar
  32. 32.
    Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227Google Scholar
  33. 33.
    Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary cDNA microarray. Sci Agric 270:467–470CrossRefGoogle Scholar
  34. 34.
    Smyth GK, Ritchie M, Thorne N, Wettenhall J (2005) Limma: linear models for microarray data. In: Bioinformatics and computational biology solutions using R and bioconductor, Springer, Berlin, pp 397–420Google Scholar
  35. 35.
    Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis of microarray data. Bioinformatics 18(1):207–208CrossRefGoogle Scholar
  36. 36.
    Valafar F (2002) Pattern recognition techniques in microarray data analysis: a survey. Ann N Y Acad Sci 980:41–64CrossRefGoogle Scholar
  37. 37.
    Valiant LG (1984) A theory of the learnable. Commun ACM 27:1134–1142MATHCrossRefGoogle Scholar
  38. 38.
    Vapnik V (1995) The nature of statistical learningn theory. Springer, BerlinGoogle Scholar
  39. 39.
    Yang YH, Buckley MJ, Dudoit S, Speed TP (2002) Comparison of methods for image analysis on cDNA microarray data. J Comput Graph Stat 11(1):108–136MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Mónica G. Larese
    • 1
  • Pablo M. Granitto
    • 1
  • Juan C. Gómez
    • 2
  1. 1.CIFASIS, French Argentine International Center for Information and Systems SciencesUPCAM (France)/UNR-CONICET (Argentina)RosarioArgentina
  2. 2.Laboratory for System Dynamics and Signal ProcessingFCEIA, Univ. Nacional de RosarioRosarioArgentina

Personalised recommendations