Computational Statistics

, Volume 26, Issue 2, pp 259–277

Empirical study for the agreement between statistical methods in quality assessment and control of microarray data

  • Markus Schmidberger
  • Esmeralda Vicedo
  • Ulrich Mansmann
Original Paper
  • 76 Downloads

Abstract

As microarray data quality can affect each step of the microarray analysis process, quality assessment and control is an integral part. It detects divergent measurements beyond the acceptable level of random fluctuations. This empirical study identifies association and correlation between the six quality assessment methods for microarray outlier detection used in the arrayQualityMetrics package version 2.2.2. For evaluation two different agreement tests—Cohen’s Kappa, after a homogeneity marginal criteria, and AC1 Statistic—, the Pearson Correlation Coefficient and realistic microarray data from the public ArrayExpress database have been used. It is possible to assess the quality of a data set using only four of the six currently proposed statistical methods to comprehensively quantify the quality information in large series of microarrays. This saves computation time and reduces decision complexity for the analyst. The new proposed rule is validated with data sets from biomedical studies.

Keywords

Empirical study Microarray Quality assessment and control Bioconductor 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

180_2010_216_MOESM1_ESM.pdf (617 kb)
ESM 1 (PDF 616 kb)
180_2010_216_MOESM2_ESM.pdf (612 kb)
ESM 2 (PDF 611 kb)
180_2010_216_MOESM3_ESM.pdf (533 kb)
ESM 3 (PDF 533 kb)
180_2010_216_MOESM4_ESM.pdf (599 kb)
ESM 4 (PDF 598 kb)
180_2010_216_MOESM5_ESM.pdf (624 kb)
ESM 5 (PDF 623 kb)
180_2010_216_MOESM6_ESM.pdf (575 kb)
ESM 6 (PDF 574 kb)
180_2010_216_MOESM7_ESM.html (191 kb)
ESM 7 (HTML 191 kb)
180_2010_216_MOESM8_ESM.html (96 kb)
ESM 8 (HTML 96.2 kb)
180_2010_216_MOESM9_ESM.html (166 kb)
ESM 9 (HTML 165 kb)
180_2010_216_MOESM10_ESM.html (140 kb)
ESM 10 (HTML 139 kb)
180_2010_216_MOESM11_ESM.html (249 kb)
ESM 11 (HTML 248 kb)
180_2010_216_MOESM12_ESM.html (175 kb)
ESM 12 (HTML 174 kb)
180_2010_216_MOESM13_ESM.r (1 kb)
ESM 13 (R 1.28 kb)
180_2010_216_MOESM14_ESM.r (6 kb)
ESM 14 (R 5.63 kb)
180_2010_216_MOESM15_ESM.r (23 kb)
ESM 15 (R 23.1 kb)
180_2010_216_MOESM16_ESM.txt (1 kb)
ESM 16 (TXT 753 kb)

References

  1. Altman DG (1991) Practical statistics for medical research. Chapman & Hall, Boca RationGoogle Scholar
  2. Berrar, DP, Dubitzky, W, Granzow, M (eds) (2003) A practical approach to microarray data analysis. Kluwer Academic Publishers Group, LondonGoogle Scholar
  3. Brazma A (2009) Minimum information about a microarray experiment (miame)–successes, failures, challenges. Scientific World J 9: 420–423Google Scholar
  4. Brettschneider J, Collin F, Bolstad BM, Speed TP (2007) Quality assessment for short oligonucleotide arraysGoogle Scholar
  5. Burgoon LD, Eckel-Passow JE, Gennings C, Boverhof DR, Burt JW, Fong CJ, Zacharewski TR (2005) Protocols for the assurance of microarray data quality and process control. Nucleic Acids Res 33: 1–11CrossRefGoogle Scholar
  6. Fleiss JL, Levin BA, Levin B, Paik MC (2003) Statistical methods for rates and proportions. Wiley-Interscience, New YorkMATHCrossRefGoogle Scholar
  7. Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) affy—analysis of affymetrix geneChip data at the probe level. Bioinformatics 20(3): 307–315CrossRefGoogle Scholar
  8. Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (2005) Bioinformatics and computational biology solutions using R and bioconductor 1st edn. Springer, BerlinMATHCrossRefGoogle Scholar
  9. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10): R80CrossRefGoogle Scholar
  10. Gewet K (2002) Handbook of inter-rater reliability. Technical report, STATAXIS Publishing CompanyGoogle Scholar
  11. Gewet K (2002) Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Stat Methods Inter Rater Reliab Assess 2: 1–9Google Scholar
  12. Huber W (September 2008) Sixth framework programme for quality of life and management of living resources. Technical report, microarray and gene expression data society, EMERALD WorkshopGoogle Scholar
  13. Hummel M, Bentink S, Berger H, Klapper W, Wessendorf S, Barth TFE, Bernd H-W, Cogliatti SB, Dierlamm J, Feller AC, Hansmann M-L, Haralambieva E, Harder L, Hasenclever D, Kḧn M, Lenze D, Lichter P, Martin-Subero JI, Möller P, Müller-Hermelink H-K, Ott G, Parwaresch RM, Pott C, Rosenwald A, Rosolowski M, Schwaenen C, Stürzenhofecker B, Szczepanowski M, Trautmann H, Wacker H-H, Spang R, Loeffler M, Trümper L, Stein H, Siebert R (2006) Molecular mechanisms in malignant Lymphomas network project of the Deutsche Krebshilfe. A biologic definition of burkitt’s lymphoma from transcriptional and genomic profiling. N Engl J Med 354(23): 2419–2430CrossRefGoogle Scholar
  14. Kauffmann A, Gentleman R, Huber W (2009) arrayQualityMetrics—a bioconductor package for quality assessment of microarray data. Bioinformatics 25(3): 415–416CrossRefGoogle Scholar
  15. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1): 159–174MathSciNetMATHCrossRefGoogle Scholar
  16. McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12: 153–157CrossRefGoogle Scholar
  17. Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, Holloway E, Lukk M, Malone J, Mani R, Pilicheva E, Rayner TF, Rezwan F, Sharma A, Williams E, Bradley XZ, Adamusiak T, Brandizi M, Burdett T, Coulson R, Krestyaninova M, Kurnosov P, Maguire E, Neogi SG, Rocca-Serra P, Sansone S-A, Sklyar N, Zhao M, Sarkans U, Brazma A (2009) Arrayexpress update—from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37(Database issue): D868–D872CrossRefGoogle Scholar
  18. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A (2007) Arrayexpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35(Database issue): D747–D750CrossRefGoogle Scholar
  19. Schmidt M, Böhm D, von Törne C, Steiner E, Puhl A, Pilch H, Lehr H-A, Hengstler JG, Kölbl H, Gehrmann M (2008) The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res 68(13): 5405–5413CrossRefGoogle Scholar
  20. Schmidberger M, Mansmann U (2008) Parallelized preprocessing algorithms for high-density oligonucleotide arrays. In: Proceedings IEEE international symposium on parallel and distributed processing IPDPS, 14–18 April 2008, pp 1–7Google Scholar
  21. Schmidberger M, Vicedo E, Mansmann U (2009) affypara—a bioconductor package for parallelized preprocessing algorithms of affymetrix microarray data. Bioinform Biol Insights 3: 83–87Google Scholar
  22. Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, Desmedt C, Larsimont D, Cardoso F, Peterse H, Nuyten D, Marc B, Van de Vijver MJ, Bergh J, Piccart M, Delorenzi M (2006) Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4): 262–272CrossRefGoogle Scholar
  23. Stevens W Richard (1992) Advanced programming in the UNIX environment. Addison-Wesley, Upper Saddle River, NJ [u.a.]Google Scholar
  24. Stirewalt DL, Meshinchi S, Kopecky KJ, Fan W, Pogosova-Agadjanyan EL, Engel JH, Cronk MR, Dorcy KS, McQuary AR, Hockenbery D, Wood B, Heimfeld S, Radich JP (2008) Identification of genes with abnormal expression changes in Acute Myeloid Leukemia. Genes Chromosomes Cancer 47(1): 8–20CrossRefGoogle Scholar
  25. Urbanek S (2009) multicore: parallel processing of R code on machines with multiple cores or CPUs, R package version 0.1–3Google Scholar
  26. Vicedo E (2009) Quality assessment of huge numbers of affymetrix microarray dataGoogle Scholar
  27. Wang Q, Diskin S, Rappaport E, Attiyeh E, Mosse Y, Shue D, Seiser E, Jagannathan J, Shusterman S, Bansal M, Khazi D, Winter C, Okawa E, Grant G, Cnaan A, Zhao H, Cheung N-K, Gerald W, London W, Matthay KK, Brodeur GM, Maris JM (2006) Integrative genomics identifies distinct molecular classes of neuroblastoma and shows that multiple genes are targeted by regional alterations in dna copy number. Cancer Res 66(12): 6050–6062CrossRefGoogle Scholar
  28. Wilson CL, Miller CJ (2005) Simpleaffy: a bioconductor package for affymetrix quality control and data analysis. Bioinformatics 21(18): 3683–3685CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Markus Schmidberger
    • 1
  • Esmeralda Vicedo
    • 1
  • Ulrich Mansmann
    • 1
  1. 1.Division of Biometrics and BioinformaticsIBE, University of MunichMunichGermany

Personalised recommendations