Predicting features of breast cancer with gene expression patterns

  • Xuesong Lu
  • Xin Lu
  • Zhigang C. Wang
  • J. Dirk Iglehart
  • Xuegong ZhangEmail author
  • Andrea L. RichardsonEmail author
Preclinical Study


Data from gene expression arrays hold an enormous amount of biological information. We sought to determine if global gene expression in primary breast cancers contained information about biologic, histologic, and anatomic features of the disease in individual patients. Microarray data from the tumors of 129 patients were analyzed for the ability to predict biomarkers [estrogen receptor (ER) and HER2], histologic features [grade and lymphatic-vascular invasion (LVI)], and stage parameters (tumor size and lymph node metastasis). Multiple statistical predictors were used and the prediction accuracy was determined by cross-validation error rate; multidimensional scaling (MDS) allowed visualization of the predicted states under study. Models built from gene expression data accurately predict ER and HER2 status, and divide tumor grade into high-grade and low-grade clusters; intermediate-grade tumors are not a unique group. In contrast, gene expression data is inaccurate at predicting tumor size, lymph node status or LVI. The best model for prediction of nodal status included tumor size, LVI status and pathologically defined tumor subtype (based on combinations of ER, HER2, and grade); the addition of microarray-based prediction to this model failed to improve the prediction accuracy. Global gene expression supports a binary division of ER, HER2, and grade, clearly separating tumors into two categories; intermediate values for these bio-indicators do not define intermediate tumor subsets. Results are consistent with a model of regional metastasis that depends on inherent biologic differences in metastatic propensity between breast cancer subtypes, upon which time and chance then operate.


Breast cancer Computational molecular biology Gene expression profiling Metastasis 



Supported by the Breast Cancer Research Foundation (BCRF) and by the Dana-Faber/Harvard SPORE in Breast Cancer from the National Cancer Institute (J.D.I., A.R.), grants ACS-IRG 70-002 and CA23100-22 (X.L), NSFC grant 30625012 and the National Basic Research Program (2004CB518605) of China (X.Z.).


  1. 1.
    Harari D, Yarden Y (2000) Molecular mechanisms underlying ErbB2/HER2 action in breast cancer. Oncogene 19:6102–6114PubMedCrossRefGoogle Scholar
  2. 2.
    Davidoff AM, Humphrey PA, Iglehart JD, Marks JR (1991) Genetic basis for p53 overexpression in human breast cancer. Proc Natl Acad Sci USA 88:5006–5010PubMedCrossRefGoogle Scholar
  3. 3.
    Miki Y, Swensen J, Shattuck-Eidens D et al (1994) A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 266:66–71PubMedCrossRefGoogle Scholar
  4. 4.
    Wooster R, Bignell G, Lancaster J et al (1995) Identification of the breast cancer susceptibility gene BRCA2. Nature 378:789–792PubMedCrossRefGoogle Scholar
  5. 5.
    Dickson RB, Lippman ME (2000) Oncogenes, suppressor genes, and signal transduction. In: Harris JR, Lippman ME, Morrow M, Osborne CK (eds) Diseases of the breast, 2nd edn. Lippincott Williams & Wilkins, Philadelphia, pp 281–302Google Scholar
  6. 6.
    Dickson RB, Stancel GM (2000) Estrogen receptor-mediated processes in normal and cancer cells. J Natl Cancer Inst Monogr 27:135–145PubMedGoogle Scholar
  7. 7.
    Carter CL, Allen C, Henson DE (1989) Relation of tumor size, lymph node status, and survival in 24, 740 breast cancer cases. Cancer 63:181–187PubMedCrossRefGoogle Scholar
  8. 8.
    Fisher B, Bauer M, Wickerham DL et al (1983) Relation of number of positive axillary nodes to the prognosis of patients with primary breast cancer: an NSABP update. Cancer 52:1551–1557PubMedCrossRefGoogle Scholar
  9. 9.
    Early Breast Cancer Trialists’ Collaborative Group (1998) Polychemotherapy for early breast cancer: and overview of the randomised trials. Lancet 352:930–942CrossRefGoogle Scholar
  10. 10.
    Barth A, Craig PH, Silverstein MJ (1997) Predictors of axillary lymph node metastases in patients with T1 breast carcinoma. Cancer 79:1918–1922PubMedCrossRefGoogle Scholar
  11. 11.
    Yiangou C, Shousha S, Sinnett HD (1999) Primary tumour characteristics and axillary lymph node status in breast cancer. Br J Cancer 80:1974–1978PubMedCrossRefGoogle Scholar
  12. 12.
    Silverstein MJ, Skinner KA, Lomis TJ (2001) Predicting axillary nodal positivity in 2282 patients with breast carcinoma. World J Surg 25:767–772PubMedCrossRefGoogle Scholar
  13. 13.
    Mittra I, MacRae KD (1991) A meta-analysis of reported correlations between prognostic factors in breast cancer: does axillary lymph node metastasis represent biology or chronology? Eur J Cancer 27:1574–1583PubMedGoogle Scholar
  14. 14.
    Tubiana-Hulin M, Hacene K, Martin PM, Spyratos F (1995) Prognostic factor clustering in breast cancer: biology or chronology? Eur J Cancer 31A:282–283PubMedCrossRefGoogle Scholar
  15. 15.
    Lancet (1992) Prognostic factors in breast cancer: biology or chronology? Lancet 340:517–518CrossRefGoogle Scholar
  16. 16.
    Mittra I (1993) Axillary lymph node metastasis in breast cancer: prognostic indicator or lead-time bias? Eur J Cancer 29A:300–302PubMedCrossRefGoogle Scholar
  17. 17.
    Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537PubMedCrossRefGoogle Scholar
  18. 18.
    Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750PubMedCrossRefGoogle Scholar
  19. 19.
    Dhanasekaran SM, Barrette TR, Ghosh D et al (2001) Delineation of prognostic biomarkers in prostate cancer. Nature 412:822–826PubMedCrossRefGoogle Scholar
  20. 20.
    Welsh JB, Zarrinkar PP, Sapinoso LM et al (2001) Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Natl Acad Sci USA 98:1176–1181PubMedCrossRefGoogle Scholar
  21. 21.
    van’t Veer LJ, Dai H, van de Vijver MJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536CrossRefGoogle Scholar
  22. 22.
    Perou CM, Sorlie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752PubMedCrossRefGoogle Scholar
  23. 23.
    Sorlie T, Perou CM, Tibshirani R et al (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 98:10869–10874PubMedCrossRefGoogle Scholar
  24. 24.
    Sorlie T, Tibshirani R, Parker J et al (2003) Repeated observation of breast tumour subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 100:8418–8423PubMedCrossRefGoogle Scholar
  25. 25.
    Wang Y, Klijn JGM, Zhang Y et al (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365:671–679PubMedGoogle Scholar
  26. 26.
    Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA 98:31–36PubMedCrossRefGoogle Scholar
  27. 27.
    Signoretti S, Di Marcotullio L, Richardson A et al (2002) Oncogenic role of the ubiquitin ligase subunit Skp2 in human breast cancer. J Clin Invest 110:633–641PubMedCrossRefGoogle Scholar
  28. 28.
    Wang ZC, Lin M, Wei LJ et al (2004) Loss of heterozygosity and its correlation with expression profiles in subclasses of invasive breast cancers. Cancer Res 64:64–71PubMedCrossRefGoogle Scholar
  29. 29.
    Matros E, Wang ZC, Richardson AL, Iglehart JD (2004) Genomic approaches in cancer biology. Surgery 136:511–518PubMedCrossRefGoogle Scholar
  30. 30.
    Zhang X, Lu X, Shi Q et al (2006) Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7:197PubMedCrossRefGoogle Scholar
  31. 31.
    Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914PubMedCrossRefGoogle Scholar
  32. 32.
    Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, New YorkGoogle Scholar
  33. 33.
    Pittman J, Huang E, Nevins JR, Wang Q, West M (2004) Bayesian analysis of binary prediction tree models. Biostatistics 5:587–601PubMedCrossRefGoogle Scholar
  34. 34.
    Breiman L (2001) Random forest. Mach learn 45:5–32CrossRefGoogle Scholar
  35. 35.
    Cox TF, Cox MAA (1994) Multidimensional scaling. Chapman and Hall, LondonGoogle Scholar
  36. 36.
    Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New YorkGoogle Scholar
  37. 37.
    Tian L, Cai T, Goetghebeur E, Wei LJ (2005) Model evaluation based on the distribution of estimated absolute prediction error. Harvard University Biostatistics Working Paper Series. Working Paper 35Google Scholar
  38. 38.
    Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman and Hall, LondonGoogle Scholar
  39. 39.
    West M, Blanchette C, Dressman H et al (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98:11462–11467PubMedCrossRefGoogle Scholar
  40. 40.
    Mittra I, MacRae KD (1991) A meta-analysis of reported correlations between prognostic factors in breast cancer: does axillary lymph node metastasis represent biology or chronology? Eur J Cancer 27(12):1574–1583PubMedCrossRefGoogle Scholar
  41. 41.
    Barth A, Craig PH, Silverstein MJ (1997) Predictors of axillary lymph node metastases in patients with T1 breast carcinoma. Cancer 79:1918–1922PubMedCrossRefGoogle Scholar
  42. 42.
    Rivadeneira DE, Simmons RM, Christos PJ, hanna K, Daly JM, Osborne MP (2000) Predictive factors associated with axillary lymph node metastases in T1a and T1b breast carcinomas: analysis in more the 900 patients. J Am Coll Surg 191:1–8PubMedCrossRefGoogle Scholar
  43. 43.
    Huang E, Cheng SH, Dressman H et al (2003) Gene expression predictors of breast cancer outcomes. Lancet 361:1590–1596PubMedCrossRefGoogle Scholar
  44. 44.
    Weigelt B, Wessels LFA, Bosma AJ et al (2005) No common denominator for breast cancer lymph node metastasis. Br J Cancer 93:924–932PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Xuesong Lu
    • 1
  • Xin Lu
    • 2
    • 3
    • 4
  • Zhigang C. Wang
    • 5
    • 6
  • J. Dirk Iglehart
    • 5
    • 6
  • Xuegong Zhang
    • 1
    Email author
  • Andrea L. Richardson
    • 7
    Email author
  1. 1.Bioinformatics Division, TNLIST and Department of AutomationTsinghua UniversityBeijingChina
  2. 2.Department of BiostatisticsHarvard School of Public HealthBostonUSA
  3. 3.Department of BiostatisticsDana-Farber Cancer InstituteBostonUSA
  4. 4.Department of Family and Preventive MedicineUniversity of California San DiegoSan DiegoUSA
  5. 5.Department of SurgeryBrigham and Women’s HospitalBostonUSA
  6. 6.Department of Cancer BiologyDana-Farber Cancer InstituteBostonUSA
  7. 7.Department of PathologyBrigham and Women’s HospitalBostonUSA

Personalised recommendations