Introduction

Breast cancer is the most frequently occurring type of cancer among all of the gynecological cancers. According to the 2003 vital statistics records on the Japanese population, the mortality rate from breast cancer (15.2%) is increasing. There are many prognostic factors for breast cancer, such as lymph node metastasis, hormone receptors, tumor size, pathological grade, age, vascular invasion, and HER2/neu (Goldhirch et al. 2005); however, these clinicopathological factors do not enable an accurate prediction of the outcome.

In the search for a prognostic factor(s) using clinical samples, patients under consistent clinical conditions are first selected and the factor(s) related to these patients’ prognosis is searched for. However, the clinical condition of any one patient, such as occurrence and places of recurrence and the choice of therapeutic methods and their effect on recurrence, differs greatly from that of another patient. Therefore, in most of the earlier studies reported, factors responsible for regulating prognosis were searched for by comparing the prognosis of patients who had actually received different treatments.

In the present study, we focused on breast cancer patients with supraclavicular lymph node metastasis [Sc(+)] in order to avoid – or reduce as much as possible – any influence of a difference in clinical condition among the study cohort. Most breast cancer patients diagnosed as Sc(+)-positive also have distant metastasis and, consequently, a poor outcome. On the other hand, the outcome of Sc(+)-positive breast cancer patients without distant metastasis varies from relatively good, such as postoperative relapse in local lymph nodes and bone, to poor, with a relapse in the lung, liver, and pleura. However, all Sc(+)-positive patients without distant metastasis are considered to be similar in terms of distribution of the cancer, and the difference between treatments of this patient population is small. Of 35 patients diagnosed as Sc(+)-positive without distant metastasis at the Cancer Institute Hospital (Tokyo, Japan) in the period 1996–2000, distant metastasis-free survival (DMFS) was shorter than 2 years and overall survival (OS) was shorter than 3 years in 13 patients (37.14%; poor outcome group). DMFS was longer than 2 years and OS was longer than 5 years in 12 patients (34.29%; good outcome group). Because the mechanism of cancer development, progression, and metastasis involves the interaction of many genes in a complex manner, the prognosis of each Sc(+)-positive patient is different; consequently, the therapy of recurrent breast cancer should be dealt with on an individual basis. Therefore, searching factors which characterize breast cancer with Sc(+) is very important for choosing an individualized therapy.

DNA microarray analysis is a revolutionary experimental tool for analyzing gene expression comprehensively, and gene expression profiling has created new possibilities for the molecular characterization of cancer. There have been several reports related to the prediction of outcomes, such as the likelihood of metastasis, using the gene expression pattern of the breast cancer primary lesion. Perou et al. (2000) carried out microarray analysis on samples collected from the breast cancer primary lesion and lymph node metastasis lesion before and after preoperative chemotherapy and found that the gene expression pattern was similar in any one patient compared to the pattern found in other patients. This suggests that common ‘molecular portraits’ are present in individual cases of primary breast cancer and the metastatic lesion and that the molecular portraits of primary lesions represent the characteristics of the cancer, such as the likelihood of metastasis.

The aim of the present study was to carry out a highly accurate search for a factor(s) predicting the outcome of breast cancer with Sc(+). We therefore only used Sc(+)-positive tissue samples without distant metastasis, although this criterion severely limited the number of samples that could be obtained. We then identified genes and functional pathways related to prognosis by profiling the gene expression and pathway construction method in breast cancer accompanied by Sc(+). Clarification of the gene expression patterns related to outcomes may reveal the direction to be taken for choosing individual therapeutic strategies.

Materials and methods

Patients and tissue samples

Primary breast cancers from 31 patients who underwent surgery at the Department of Breast Surgery, Cancer Institute Hospital, Tokyo, between 1996 and 2002 were obtained and frozen immediately. Core needle biopsy samples were obtained from the patients with neoadjuvant therapy before treatment, and tissue samples from the patients without neoadjuvant therapy were excised at surgery. The samples were embedded in O.C.T. compound (Sakura Finetek, Torrance, Calif.) and stored at −80°C. All patients were cytologically or pathologically diagnosed with supraclavicular lymph node metastasis. Patients diagnosed with a DMFS of longer than 2 years and an OS of longer than 5 years were grouped together as the ‘good outcome’ group (11/31); patients with a predicted DMFS of shorter than 2 years and an OS of shorter than 3 years were grouped together as the ‘poor outcome’ group (12/31); patients who did not meet these conditions were grouped together as the ‘intermediate outcome’ group (8/31) (Fig. 1). The clinical backgrounds of the patients are shown in Table 1 and Supplementary Table S1. Informed consent was obtained from all patients who provided samples. All of the procedures followed were examined and approved by the Institutional Review Board of Cancer Institutional Hospital.

Fig. 1
figure 1

The selection of patients included in the study. DMFS Distant metastasis-free survival, OS overall survival. Patients with a DMFS of longer than 24 months and an OS of longer than 60 months were included in the ‘good outcome’ group; patients with a DMFS of shorter than 24 months and an OS shorter than 36 months were included in the ‘poor outcome’ group; patients who did not meet these conditions were included in the ‘intermediate outcome’ group

Table 1 Clinical and biological characteristics of 31 breast cancer patients with supraclavicular lymph node metastasis without distant metastasis

RNA preparation and cDNA microarray analysis

Frozen sections (9 μm) were prepared from samples using a cryostat and stained with hematoxylin-eosin. Tumor cells alone were collected using a PixCell IIe laser captured microdissection system (Arcturus Bioscience, Mountain View, Calif.), and total RNA was extracted from the tumor cells using an RNeasy Micro kit (Qiagen, Valencia, Calif.). As a reference, total RNA was extracted and pooled from normal mammary glands collected from surgical specimens of ten breast cancer patients. The amount of total RNA was measured using a Nano Drop Spectrophotometer (NanoDrop Technologies, Delaware Calif.). RNA quality was assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies). Total RNA was amplified twice using a Low RNA Input Linear Amplification/Labeling Kit (Agilent Technologies, Palo Alto, Calif.), and labeled with cyanine 3(cy3) and cyanine 5(cy5). After the labeling rate was confirmed, 20 pmol of tumor-derived Cy5-labeled cRNA and 20 pmol of Cy3-labeled normal tissue-derived cRNA were mixed and hybridized at 40°C for 17 h. Cy3 and Cy5 dye-swapping hybridization was also performed at the same time. The cDNA microarray contained 60mer oligo DNA probes of 22,575 genes (Agilent Technologies). After washing of the microarray slide, signals were detected using an Agilent microarray scanner (Agilent Scan Control; Agilent Technologies). For data analysis, the fluorescence intensities of the scan images were converted to numerical values using Agilent Feature Extraction software (ver. A.7.5.1). The background level was corrected and normalized using Linear and LOWESS.

Real-time (RT)-PCR analysis

The expression of six genes selected in this study was quantified by RT-PCR. The templates and primer sets were mixed with 2× QuantiTect SYBR Green PCR Master Mix (Qiagen, Germany). β-Actin was used as a control. Reactions were performed in triplicate in 96-well microtiter plates in a ABI Prism 7900HT (Applied Biosystems, Foster City, Calif.).

Immunohistochemistry

Paraffin-embedded sections (4 μm) were stained with anti-human stathmin1 polyclonal antibody developed in a rabbit IgG fraction of antiserum (1:500 dilution; Sigma-Aldrich, St. Louis, Mo.) after deparaffinization and hydration. Following this treatment with the primary antibody, the slides were treated with DakoCytomation EnVision reagent (DakoCytomation, Ely, UK). In the final steps, the slides were lightly counterstained with hematoxylin, dehydrated with ethanol, cleaned with xylene, and mounted. We counted Stathmin1-positive tumor cells at ×400 magnification. The staining intensity was defined as 1+ (weak), 2+ (moderate), and 3+ (intense). Our samples were evaluated by the Stathmin1 score (percentage of Stathmin1-positive tumor cells × the staining intensity) (Shimada et al. 2005; Kouzu et al. 2006). Statistical significance was evaluated by the Jonckheere-Terpstra test.

Statistical and database analysis

Genes with 20.5-fold or more differences in the expression level between the good and poor outcome groups were selected using the Mann-Whitney U test (significance level: 0.05). For genes with less than a 20.5-fold difference in expression, Pearson’s correlation coefficients were calculated to assess their usefulness. To search for functional pathways, we used pathway database software: National Center for Biotechnology Information, NCBI (http://www.ncbi.nlm.nih.gov); Biomolecular Interaction Network Database, BIND (http://www.bind.ca/); MetaCore software provided by GeneGo (San Diego, Calif.).

The increasing treatment effect of Stathmin1 expression according to outcome was examined by the Jonckheere-Terpstra test (Hollander and Wolfe 1999). The following hypotheses were tested, and a P value < 0.05 was considered to be significant.

$$ H_{0} :\mu _{{{\text{poor}}}} = \mu _{{{\text{intermediate}}}} = \mu _{{{\text{good}}}} {\text{ vs}}{\text{. }}H_{1} :\mu _{{{\text{poor}}}} \le \mu _{{{\text{intermediate}}}} \le \mu _{{{\text{good}}}} $$

with at least one strict inequality. μ’s are median values of Stathmin1 scores in each group.

Results

Gene selection and hierarchical clustering

The median duration of DMFS was 70 months (range: 61–96 months) in the good outcome group, 8.5 months (range: 1–10 months) in the poor outcome group, and 34 months (range: 12–58 months) in the intermediate group. The median OS was 78 months (range: 61–96 months) in the good outcome group, 17.5 months (range: 10–17 months) in the poor outcome group, and 42 months (range: 35–58 months) in the intermediate group. A total of 321 genes showing differential expression – 20.5-fold or more – between the good and poor outcome groups were selected using the Mann-Whitney U test (significance level: 0.05) (Supplementary Table S2, gene list). Of these, 313 genes were up-regulated and eight genes were down-regulated in the poor outcome group compared with the good outcome group. Using the expression profiles of these 321 genes, we performed a clustering analysis in all 31 patients, including those in the intermediate group, using Gene Spring 7.2. The results are shown in Fig. 2. Two clusters were formed, indicating that the patients of the intermediate outcome group did not form a third cluster, but belonged to either one of the clusters (poor or good outcome clusters). One case from the poor outcome group was misclassified.

Fig. 2
figure 2

Panel The expression patterns of 321 genes expressed differentially between the good and poor outcome groups in all 31 patients, including those in the intermediate outcome group. Each row represents the relative level of expression of one gene, and each column represents the prognostic profile of the 231 genes for one tumor. Red and blue represent over- and down-expression compared to the mean expression level of the corresponding genes, respectively. Clustering analysis of all 31 cases using the expression profiles of 321 genes revealed that one case from poor outcome group was misclassified. Yellow square Poor outcome cases, red square good outcome cases, green square intermediate cases

Genes associated with prognosis

Molecules that bind to the proteins encoded by these 321 genes differentially expressed between the two groups were searched for in the BIND database. Transcriptional factors, the MYC-MAX complex and E2F4 were found to bind to the highest number of proteins. The MYC-MAX complex promotes the transcription of seven genes (LAMP1, MCM3, RPL15, PSMB5, THEM4, VDAC2, and PYCR1), and all their expression levels were different between the two groups. An oncogene, MYC(c-Myc), which is overexpressed during cell proliferation, forms a complex with MAX and binds to the E-box in the promoter of the target gene (Guo et al. 2000; Patel et al. 2004). The expression of these seven genes, for which MYC promotes transcription, was higher in the poor outcome group than in the good outcome group. In addition, the expression levels of another seven genes (BIRC5, CENPF, MCM3, HIST3H, CHCHD3, UNG, and HSPC150) whose transcription is regulated by E2F4 were different between the two groups. E2F4 inhibits the transcription of these genes upon transition to the S phase (Sardet et al. 1995), and its expression was up-regulated in the poor outcome group. However, it was difficult to identify a known pathway or function determined by the pathway in the genes selected by BIND. Moreover, clustering analysis using only these genes resulted in an increased number of misjudged cases (data not shown), suggesting that although the differential expression of some of these genes between the two groups may play an important role in regulating the outcome, the differential expression in other of these genes may have been incidentally detected. This result indicates the importance of identifying the functional pathways associated with prognosis.

Functional pathways in determining the outcome

MetaCore software contains a library of known functional pathways in its database as well as the functional pathway in which the gene group of interest can be searched. Based on the MetaCore analysis of the 321 genes with a 20.5-fold or more difference in expression level, we constructed a pathway that can be used as a tool to determine the prognosis (Fig. 3). For the six genes (DVL1, VDAC2, BIRC5, STMN1, PARP1, and RAD21) that are overexpressed in the poor outcome group compared to the good outcome group, we considered possible involvements in the Wnt signal pathway and the mitochondrial apoptosis pathway (Song et al. 2000; Kroemer et al. 2005). DVL1 inhibits the phosphorylation of β-catenin and activates Wnt signaling. β-Catenin promotes the expression of c-Myc and BIRC5, which leads to mitochondrial apoptosis in the downstream region (He et al. 1998; Zhang et al. 2001). VDAC2 inhibits Bak1, which is activated by Bax in mitochondrial apoptosis (Cheng et al. 2003; Chandra et al. 2005). BIRC5 is an apoptotic inhibitor and inhibits Caspase-3 and -7 (Deveraux et al. 1997). The expression of STMN1 is inhibited by the BH3-only protein, Puma, and overexpression of STMN1 inhibits mitochondrial apoptosis (Nakano et al. 2001; Liu et al. 2005). Furthermore, BIRC5 and STMN1 are inhibited by p53 through p21 (Löhr et al. 2003). PARP1 and RAD21 are substrates of caspase-3, and the cleavage of these induces apoptosis (Wieler et al. 2003; Chen et al. 2002).

Fig. 3
figure 3

Schematic diagram depicting specific genes and functional pathways determining the prognosis of breast cancer with Sc(+). The Wnt signal pathway and the mitochondrial apoptosis control pathway are constructed. Red circular nodes represent genes with a 20.5-fold or more difference in expression level between the good and poor outcome groups. Yellow square nodes represent genes with a less than 20.5-fold difference in expression level between the good and poor outcome groups and are candidates for prognosis-determining genes based on the result shown in S3. Asterisks represent genes with collectable array data. DVL1 Disheveled, dsh homolog 1 (Drosophila), VDAC2 voltage-dependent anion channel 2, Bax BCL2-associated X protein, Bak1 BCL2-antagonist/killer 1, Apaf1 apoptotic peptidase activating factor 1, Puma official symbol BBC3, BCL2 binding component 3, DIABLO Diablo homolog (Drosophila), Cyt C cytochrome C, BIRC5 baculoviral IAP repeat-containing 5 (survivin), PARP1 poly(ADP-ribose) polymerase family, member1, RAD21 RAD21 homolog (Schizosaccharomyces pombe)

To confirm the significance of the constructed functional pathway, we selected additional genes related to this pathway from the group of genes with a lower difference in expression as well as the six genes with a 20.5-fold or more differences in expression. The correlation of expression between the six genes with a 20.5-fold or more difference in expression and genes with a lower difference in expression but with detectable array data was investigated. Those genes with a lower difference in expression but with a confirmed correlation were considered for candidate genes in determining the prognosis. Supplemental Fig. S3 shows the correlation between two genes presented on the diagonal line. Between BIRC5 and PARP1, for example, the panel in the second column from the right and the sixth row from the top represents the expression levels of the two genes (BIRC5 and PARP1) in individual patients as a scatter plot, and the correlation coefficient is presented in the panel diagonal to this panel (in the sixth column from the right and the second row from the bottom). Array data were collectable from 13 (asterisks in Fig. 3) of the genes constituting the constructed functional pathway, and the expression was 20.5-fold or more different and less than 20.5-fold different in six and seven genes, respectively. The absolute value of the correlation coefficient was 0.6 or higher in 11 genes [six genes (DVL1, VDAC2, BIRC5, STMN1, PARP1, and RAD21; red circular nodes in Fig. 3)] with a 20.5-fold or more difference in expression and in five (MYC, Bax, APAF, Caspase-3, and Caspase-7; yellow squares in Fig. 3) of the seven genes with a lower difference in expression. We therefore conclude that these five genes with a lower difference in expression may also play a role in determining the prognosis in the pathways.

Confirmation of the expression levels by real-time RT-PCR

The expression levels of six genes (DVL1, VDAC2, BIRC5, STMN1, PARP1, RAD21) with a 20.5-fold or more difference in expression were evaluated by RT-PCR. All of these expression levels measured by microarray correlated with those by RT-PCR (data not shown).

Stathmin1 immunohistochemistry

Only Stathmin1 (encoded by STMN1) anti-body was stainable in the paraffin-embedded sections of our samples. Stathmin1 immunohistochemistry also revealed that Stathmin1 protein levels were significantly higher in the poor outcome group than in the good outcome group (Fig. 4a–c), and the Stathmin1 score calculated in this study significantly correlated with the outcome of breast cancer with Sc(+) (Jonckheere-Terpstra test, P < 0.0180; Fig. 4d), indicating that Stathmin1 immunohistostaining is a potential method for predicting the outcome of breast cancer patients with Sc(+).

Fig. 4
figure 4

Immunohistochemistry of Stathmin1 in breast cancers with Sc(+). Panels on the right show the immunohistochemistry of Stathmin1 (STMN1) in Sc(+) breast cancers. Panels on the left show hematoxylin-eosin staining (HE) in the same breast cancers. a Good outcome cases showed a weak expression (+1). Magnification ×200. b Intermediate outcome cases showed a moderate expression (+2). Magnification ×200. c Poor outcome cases showed an intense expression (+3). Magnification ×200. d Stathmin1 score in good outcome cases (n = 11), poor outcome cases (n = 12), and intermediate cases (n = 8) was calculated as described in the Material and methods. Stathmin1 showed a tendency to be overexpressed in the poor outcome cases (Jonckheere-Terpstra test, P < 0.0180)

Discussion

Breast cancer is the first-ranked cancer in women, and the outcome is generally poor in Sc(+)-positive cases. Investigation of individual patients in a specific clinical condition, Sc(+)-positive, revealed the presence of a group of patients who died of recurrence within 3 years and a group of patients who survived for more than 5 years. The selection of a therapeutic strategy based on the predicted outcome may improve a patient’s quality of life (QOL), and for this reason alone a reliable prediction of the outcome is important. For studies on prognosis-determining mechanisms using clinical samples, it is desirable to have patients that are subject to identical clinical conditions, including therapy, but this is very difficult to achieve in practice. In the present study, we focused on supraclavicular lymph node metastasis-positive patients in order to avoid the influence of differing clinical conditions as much as possible. Gene expression analysis with cDNA microarray selected 321 differentially expressed genes between the good and poor outcome groups; However, clustering analysis using these genes resulted in the misclassification of one case, suggesting that some functionally unrelated genes may have been incidentally selected in addition to functionally causative genes. Because it was necessary to be able to select genes that actually did determine the prognosis, we selected the prognosis-determining function and its related functional pathway on the basis of database searches and published data. Our investigation of the 321 genes with large differences in expression between the good and poor outcome groups using a public database of interaction between binding molecules, BIND, revealed several gene groups whose expressions were controlled by MYC and E2F4. These may be involved in determining prognosis; however, it was not possible to determine a specific cellular function for these genes or to construct a pathway that determines the function. Thus, known pathways involving the 321 genes were searched using MetaCore. Differences in the expression were noted in 11 genes constituting the Wnt signal pathway and the pathway controlling mitochondrial apoptosis between the good and poor outcome groups. Many researchers have analyzed the relationship between a small number of genes and cancer malignancy and determined the importance of the Wnt signal pathway and apoptosis in cancer development (Doucas et al. 2004; Parton et al. 2006). However, there has been no system capable of accurately predicting outcome as a clinical test. Stathmin1 is a microtubule-destabilizing protein. In human breast cancer, its overexpression decreases sensitivity to paclitaxel (Brattsand et al. 2000; Alli et al. 2002), a finding that is consistent with our results.

The outcome varied among the patients of our study cohort with supraclavicular lymph node metastasis-positive breast cancer, and the differences in outcome were reflected as differences in the gene expression pattern. Although lymph node dissection, radiotherapy, and chemotherapy are common treatments for patients with supraclavicular lymph node metastasis, no consistent viewpoint in terms of therapeutic strategy has yet been established. If chemotherapy is performed in patients with a poor prognosis – despite there being no effective drug available – the end result is only pain, leading to a poor QOL. In contrast, therapy should be actively given to patients with a good prognosis, even though the disease has progressed. Thus, the prediction of the outcome with the aim of selecting the therapeutic strategy is important, and the construction of an accurate prognosis prediction system is needed. Furthermore, the elucidation of the functional pathways that determine the prognosis of not only Sc(+)-positive breast cancer but also of breast cancer at various clinical stages may enable the selection of a treatment that is individually designed for a patient.