At What Scale Should Microarray Data Be Analyzed?
- 19 Downloads
Introduction: The hybridization intensities derived from microarray experiments, for example Affymetrix’s MAS5 signals, are very often transformed in one way or another before statistical models are fitted. The motivation for performing transformation is usually to satisfy the model assumptions such as normality and homogeneity in variance. Generally speaking, two types of strategies are often applied to microarray data depending on the analysis need: correlation analysis where all the gene intensities on the array are considered simultaneously, and gene-by-gene ANOVA where each gene is analyzed individually.
Aim: We investigate the distributional properties of the Affymetrix GeneChip® signal data under the two scenarios, focusing on the impact of analyzing the data at an inappropriate scale.
Methods: The Box-Cox type of transformation is first investigated for the strategy of pooling genes. The commonly used log-transformation is particularly applied for comparison purposes. For the scenario where analysis is on a gene-by-gene basis, the model assumptions such as normality are explored. The impact of using a wrong scale is illustrated by log-transformation and quartic-root transformation.
Results: When all the genes on the array are considered together, the dependent relationship between the expression and its variation level can be satisfactorily removed by Box-Cox transformation. When genes are analyzed individually, the distributional properties of the intensities are shown to be gene dependent. Derivation and simulation show that some loss of power is incurred when a wrong scale is used, but due to the robustness of the t-test, the loss is acceptable when the fold-change is not very large.
KeywordsMicroarray Data Normality Assumption Hybridization Intensity Wrong Scale Nest Random Effect
We would like to express our thanks to Brian Eastwood and Phillip Iversen for various helpful consultations and discussions. We would like to thank Ray Carroll for giving very creative suggestions on several parts. We would also like to thank Faming Zhang and Jude Onyia for valuable comments and suggestions. The authors have provided no information on sources of funding or on conflicts of interest directly relevant to the content of this study.
- 3.Affymetrix Inc. [data sheet]. GeneChip murine genome U74v2 set [online]. Available from URL: http://www.affymetrix.com/support/technical/datasheets/mgu74_datasheet.pdf [Accessed 2003 Oct 21]
- 4.Rat UniGene database [online]. Available from URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene&cmd;=search&term;=rat [Accessed 2004 Mar 9]
- 5.Affymetrix Inc. Microarray suite user guide, version 5 [online]. Affymetrix, Santa Clara (CA). Available from URL: http://www.affymetrix.com/support/technical/manuals.affx [Accessed 2003 Oct 21]
- 6.Box GEP, Cox DR. An analysis of transformations. J R Stat Soc Ser B Methodological 1964; 26: 211–52Google Scholar
- 8.Holder D, Raubertas RF, Pikounis VB, et al. Statistical analysis of high density oligonucleotide arrays: a SAFER approach. GeneLogic Workshop on Low Level Analysis of Affymetrix Genechip Data. Santa Clara (CA): Affymetrix, 2001Google Scholar
- 9.Box GEP. Non-normality and tests on variance. Biometrika 1953; 40: 318–35Google Scholar
- 10.Zimmerman DW, Williams RH. Power comparisons of the student t-test and two approximations when variances and sample sizes are unequal. J Ind Soc Ag Statistics 1989; 41(2): 206–17Google Scholar
- 11.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 1995; 57(1): 289–300Google Scholar