Evaluation of Current Methods of Testing Differential Gene expression and Beyond
One frequent question in the study of microarrys concerns the number of replicates required to obtain vaild data. We used the T-matrix data from the NCI-60 cancer cell lines dataset to investigate this question. Five testing methods were evaluated. We selected two cancer groups for comparisons, ovarian (OV) vs. breast (BR) and leukemias (LE) vs. renal carcinoma (RE), to perform hypothesis testing for detecting the genes expressed differentially between cancer groups. Our goal is to examine the pattern and performance of each testing method and the required sample size. The first four testing methods are t-test based methods with different strategies of computing sampling variance, including the uses of sampling variance, pooled variance, and common variance. The 5th test is a permutation test based on the t-test with pooled variance. Our results show that there are more genes with statistically significant differences in expression in the LE vs. RE comparison than between the OV vs. BR. The permutation works similarly to the t-test itself. Overall, the pooled variance approach proved a better strategy. For sample size, as expected, the number of significant genes increased as the number of cell lines increased for the same testing method. However, we found that the results derived from 3 cell lines are very different from the other results. It may imply that more than three cell lines or replicates are needed in the microarray study in order to attain enough power to detect the differential gene expression
Key wordsmicroarry replicates t-test permutation test sample size power
Unable to display preview. Download preview PDF.
- Botstein, D, Brown, P. Exploring the new world of the genome with DNA microarrays. Nature Genetics (Suppl) 21 (1999): 33–37.Google Scholar
- Lander, ES. Array of hope. Nature Genetics (Suppl) 21 (1999): 3–4.Google Scholar
- Nadon, R, Shi, P, Skandalis, A, Woody, E, Hubschle, H, Susko, E, Rghei, N, Ramm, P. Statistical inference methods for gene expression arrays. http://www.imagingresearch.com (2001).
- Scherf, U, Ross, DT, Waltham, M, Smith, LH, Lee, JK, Tanabe, L, Kohn, KW, Reinhoid, WC, Myers, TG, Andrews, DT, Scudiero, DA, Eisen, MB, Sausville, EA, Pommier, Y, Botstein, D, Brown, PO, and Weinstein, JN. A gene expression database for the molecular pharmacology of cancer. Nature Genetics 24 (2000): 236–244.PubMedGoogle Scholar