Predicting features of breast cancer with gene expression patterns
- 909 Downloads
Data from gene expression arrays hold an enormous amount of biological information. We sought to determine if global gene expression in primary breast cancers contained information about biologic, histologic, and anatomic features of the disease in individual patients. Microarray data from the tumors of 129 patients were analyzed for the ability to predict biomarkers [estrogen receptor (ER) and HER2], histologic features [grade and lymphatic-vascular invasion (LVI)], and stage parameters (tumor size and lymph node metastasis). Multiple statistical predictors were used and the prediction accuracy was determined by cross-validation error rate; multidimensional scaling (MDS) allowed visualization of the predicted states under study. Models built from gene expression data accurately predict ER and HER2 status, and divide tumor grade into high-grade and low-grade clusters; intermediate-grade tumors are not a unique group. In contrast, gene expression data is inaccurate at predicting tumor size, lymph node status or LVI. The best model for prediction of nodal status included tumor size, LVI status and pathologically defined tumor subtype (based on combinations of ER, HER2, and grade); the addition of microarray-based prediction to this model failed to improve the prediction accuracy. Global gene expression supports a binary division of ER, HER2, and grade, clearly separating tumors into two categories; intermediate values for these bio-indicators do not define intermediate tumor subsets. Results are consistent with a model of regional metastasis that depends on inherent biologic differences in metastatic propensity between breast cancer subtypes, upon which time and chance then operate.
KeywordsBreast cancer Computational molecular biology Gene expression profiling Metastasis
Supported by the Breast Cancer Research Foundation (BCRF) and by the Dana-Faber/Harvard SPORE in Breast Cancer from the National Cancer Institute (J.D.I., A.R.), grants ACS-IRG 70-002 and CA23100-22 (X.L), NSFC grant 30625012 and the National Basic Research Program (2004CB518605) of China (X.Z.).
- 5.Dickson RB, Lippman ME (2000) Oncogenes, suppressor genes, and signal transduction. In: Harris JR, Lippman ME, Morrow M, Osborne CK (eds) Diseases of the breast, 2nd edn. Lippincott Williams & Wilkins, Philadelphia, pp 281–302Google Scholar
- 32.Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, New YorkGoogle Scholar
- 35.Cox TF, Cox MAA (1994) Multidimensional scaling. Chapman and Hall, LondonGoogle Scholar
- 36.Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New YorkGoogle Scholar
- 37.Tian L, Cai T, Goetghebeur E, Wei LJ (2005) Model evaluation based on the distribution of estimated absolute prediction error. Harvard University Biostatistics Working Paper Series. Working Paper 35Google Scholar
- 38.Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman and Hall, LondonGoogle Scholar