Background

Metastatic breast cancer (MBC) is a highly heterogeneous disease leading to an urgent need for a more personalized treatment approach. For those patients with estrogen receptor (ER)-expressing tumors, endocrine therapy is the mainstay of treatment. Although many patients greatly benefit from such endocrine therapies, approximately 30 % of the MBC patients never respond while virtually all initial responders eventually relapse and develop progressive disease. Numerous factors accounting for resistance to endocrine treatment have been revealed, including loss of ER expression [13], overexpression of the HER2 receptor [4], hyperactivation of the phosphatidylinositol 3-kinase (PI3K) pathway [5], and overexpression of Enhancer of Zeste Homolog 2 (EZH2) [6]. Determination of these factors in tumor tissue may therefore contribute to a more personalized treatment approach of individual patients.

Predictive factors contributing to treatment decision making are nowadays most commonly identified in the primary tumors. However, heterogeneity in molecular characteristics between primary tumor and metastases, including clinically relevant factors, is increasingly recognized. For example, differences in ER expression between primary tumor and metastases occur in approximately 20 % of the patients leading to treatment changes in a substantial number of patients [1, 7, 8]. Since this heterogeneity increases over time and under treatment pressure [7], repetitive analyses of the characteristics of metastatic tumor cells are likely to offer better guidance of treatment choices than characterization of the primary tumor. Unfortunately, metastatic tissue is often hard to obtain and only possible through invasive procedures.

Circulating tumor cells (CTCs) are tumor cells found in the peripheral blood and are thought to better represent the actual or clinically relevant metastatic tissue burden than the primary tumor does, in particular in those patients whose primary tumors have been removed several years prior to diagnosis of MBC. The CTC count has shown to be a powerful prognostic factor in MBC and a rise or decline in CTC count after the first cycle of systemic therapy is an early predictor of outcome [912]. Additionally, CTC characterization holds great promise and for that purpose, several techniques to molecularly characterize CTCs for drug target expression [1315], mutations [16] and gene expression [1719] have been developed. CTCs however occur in relatively low numbers in patients with MBC and, even after the epithelial cell adhesion molecule (EpCAM)-based enrichment of the CellSearch® system, they need to be identified and characterized amongst approximately a thousand of remaining leukocytes [20]. This greatly hinders the interpretation of results from techniques non-selective for tumor cells such as quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) on whole lysates. Nevertheless, by focusing on genes that are not, or only at a much lower level, expressed by leukocytes, we have previously shown that the expression levels of 96 genes in CTCs can be quantified in MBC patients through qRT-PCR [18].

In this study, we aim to quantify this panel of 96 genes in CTCs of MBC patients with ER-expressing primary tumors prior to start of first-line therapy with an aromatase inhibitor (AI) in order to identify a CTC predictor discriminating between good and poor responders.

Methods

Ethics statement

This study has been approved by the medical ethics committee of the Erasmus MC Rotterdam, The Netherlands and local Institutional Review Boards (ethics boards of Oncology Center GZA Hospitals Sint-Augustinus, Antwerp, Belgium; Ikazia Hospital, Rotterdam, The Netherlands; Sint Franciscus Gasthuis, Rotterdam) (METC 2006–248 and METC 2009–405). All patients gave their written informed consent.

We adhered to the Reporting Recommendations for Tumor Marker Prognostic Studies wherever possible [21].

Collection of blood samples and characteristics of recruited patient cohort

MBC patients had been included between October 2008 and August 2012 in 5 hospitals. From 78 MBC patients who were not previously treated for MBC and prior to start of first-line AI therapy (irrespective of type), 2 × 7.5 mL blood samples were prospectively drawn for CTC enumeration and isolation. Due to insufficient RNA quality and/or quantity and/or lack of expression of previously described CTC-specific genes [18] (for details see next), 33 (42 %) samples were excluded, providing 45 patients for further analysis (Additional file 1: Figure S1). Detailed clinicopathological information for these 45 patients is provided in Table 1.

Table 1 Patients and their clinico-pathological characteristics

In order to be able to decipher whether obtained results from this AI-treated patient cohort are of prognostic or predictive nature, we used an independent patient cohort composed of 71 MBC patients that received other types of first-line therapy. Of these, 21 patients were treated with chemotherapy, 40 patients with chemotherapy combined with targeted therapy, and 10 patients with tamoxifen therapy. This patient cohort had been extracted from MBC patients described in our recently published study in which the same techniques for CTC enrichments and gene expression determination were applied [22].

Enumeration of CTCs

In order to isolate CTCs for CTC enumeration, 7.5 mL blood was drawn in CellSave tubes (Veridex™ LCC, Raritan, NJ, USA) and processed on the CellTracks AutoPrep System by using the CellSearch Epithelial Cell Kit (both Veridex LCC). CTC enumeration was performed on the CellTracks Analyzer (Veridex LCC) according to the manufacturer’s instructions and as described previously [2325].

mRNA isolation from CTCs, qRT-PCR and quantification of gene transcripts

Together with the blood samples for CTC enumeration, another 7.5 mL blood of the same patients was drawn in EDTA tubes. These samples were enriched for CTCs on the CellTracks AutoPrep System using the CellSearch Profile Kit (Veridex LCC). Isolated cells were lysed by adding 250 μL of Qiagen AllPrep DNA/RNA Micro Kit Lysis Buffer (RLT+ lysis buffer) (Qiagen BV, Venlo, The Netherlands) and immediately stored at −80 °C until RNA isolation was performed with the AllPrep DNA/RNA Micro Kit (Qiagen) according to the manufacturer’s instructions and as previously described [18].

The generation of cDNA from isolated RNA from CTCs and subsequent pre-amplification and TaqMan-based PCR analysis were performed as described before [20]. The 96 measured mRNA transcripts have previously been selected and validated based on their clinical relevance and potential CTC-specificity [18, 20].

Reference genes, data normalization, and quality control

The procedure of data normalization and quality control was performed as previously described [18, 20]. In short, relative expression levels were quantified by using the delta Ct method, which is the difference between the average Ct of the reference genes HMBS, HPRT1, and GUSB and the Ct of the target genes. Samples that were able to generate a signal within the chosen cut-off set at 26 Ct of the average of the reference genes were considered of sufficient quality and quantity to be included in the study and quantified for the levels of the remaining 93 target genes. By the use of this threshold, 5 of our initial 78 CTC samples (6 %) were excluded from further analysis.

Finally, samples were checked for sufficient expression levels of a 12-gene mRNA cluster that has previously been determined as epithelial-specific and associated with the presence of CTCs [18]. Due to lack of sufficient expression of these genes and our aim to generate a CTC-specific predictor, another 28 CTC samples (36 %) were excluded from further analysis.

Statistical analysis

Statistical analyses were done with the STATA statistical package, release 12.0 (STATA Corp., College Station, TX). Primary endpoint was progression-free survival (PFS), defined as the time elapsed between start of first-line treatment with AI and clinical and/or radiological progression or death, whichever came first. Patients who were alive and had not progressed were censored at the last follow-up date, which was at least 9 months after start of 1st line therapy. Those patients with progression or death <9 months were considered as poor responders. This 9-month cut-off was chosen based on the median PFS for first-line therapy in MBC patients as reported in the literature [26, 27]. In all 45 eligible patients, a leave-one-out-cross validation (LOOCV) was conducted using the Support Vector Machines (SVM) method within Biometric Research Branc ArrayTools (http://linus.nci.nih.gov/BRB-ArrayTools.html) after selecting the top 75 % most variable genes from the 93 genes described above. With this LOOCV method, a gene signature was generated that consisted out of the most differentially expressed genes that were identified in the individual predictions and best predicted the left-out sample. A panel of 8 genes was identified that performed best in predicting the poor responding patients. The SVM method proved superior compared to the other prediction algorithms; based on 100 permutations, SVM was the only method with a significant P-value of 0.01. Cluster 3.0 and TreeView (http://bonsai.hgc.jp/~mdehoon/software/cluster/clustersetup.exe and http://jtreeview.sourceforge.net/ [28]) were used to cluster the samples according to the gene expression values of these 8 genes and to visualize the results. Survival curves were generated using the Kaplan-Meier method and a logrank test was used to test for differences. All statistical tests were 2-sided with P < 0.05 considered statistically significant.

Results

Patient characteristics

Characteristics of the 45 patients who were eligible for our CTC-specific analyses to explore differentially expressed genes between good and poor responders are listed in Table 1. One patient was described to have an ER-negative primary tumor but received hormonal treatment in both adjuvant and first-line setting due to PR-positivity. Median baseline CTC count in the 45 patient cohort was 8 (range 0 – 32,492 CTCs/7.5 mL blood). The extremely high CTC count of 32,492 was assessed in a patient who did not respond to treatment and died within one month after treatment initiation due to progression of disease. The 9-month cutoff as based on literature data on the median PFS in first-line MBC patients [26, 27] was well-chosen considering the median PFS of 11.8 months (range 0 – 41.3 months) in our 45 patient cohort.

8-gene CTC profile predicts for outcome to treatment

Of the 45 patients, 19 patients were classified as poor responders due to progression of disease or death <9 months whereas the remaining 26 patients were considered good responders. A LOOCV was performed in this cohort yielding an 8-gene predictor in which each gene had a univariate P-value of <0.1 (Table 2). Application of this 8-gene CTC profile resulted in 16 patients with an unfavorable profile and were thus predicted to be poor responders. Twelve of them truly showed resistance to therapy <9 months (disease progression or death) and four did not, resulting in a sensitivity of 63 % and a positive predictive value (PPV) of 75 % (Table 3). Applying the profile, 29 patients had a favorable profile and were thus predicted not to show progressive disease <9 months. Of these, 22 indeed did not fail treatment <9 months rendering a specificity of 85 % and a negative predictive value (NPV) of 76 %.

Table 2 Significantly differentially expressed genes between 45 good and poor responders
Table 3 Test performance

The Kaplan-Meier curves for PFS of the predicted good and poor responding patients according to the 8-gene CTC predictor are shown in Fig. 1 and were statistically different (Logrank P < 0.001).

Fig. 1
figure 1

Kaplan-Meier curve for patients as defined by the 8-gene CTC predictor. Blue (0): favorable profile; red (1): unfavorable profile; green (2): total cohort (N = 45)

In univariate analysis, the 8-gene CTC predictor was significantly associated with PFS (HR 4.40 [95 % CI: 2.17–8.92], P < 0.001). When including the traditional predictive factors, disease-free interval (DFI), which was defined as the time between primary surgery and CTC sampling, the dominant site of relapse, and the CTC count at baseline in a multivariate analysis, only the 8-gene CTC-profile was an independent predictor of PFS (HR 4.59 [95 % CI: 2.16–9.75], P < 0.001) (Table 4). The CTC count at baseline was not associated with PFS in this 45 patient cohort, but showed to be significant in the total cohort of 78 patients (HR 2.47 [95 % CI: 1.43-4.27], P = 0.001) (Additional file 2: Figure S2).

Table 4 Predictive value of the 8-gene CTC profile in uni- and multivariate analysis

Hierarchical clustering to identify clusters of patients according to the 8-gene CTC predictor

Two-dimensional average linkage hierarchical cluster analysis [28] was performed to compare the difference in gene expression of the 8 identified genes in our 45 patients. This analysis resulted in a clustering of 2 major and 5 minor groups of patients in which cluster 1 mainly contained the good responders (10 out of 12), whereas cluster 2 consisted of both good and poor responders (Fig. 2). In this cluster, however, a subcluster existed that, with 10 out of 12, predominantly contained poor responders with higher expression of most of the identified 8 genes.

Fig. 2
figure 2

Unsupervised hierarchical cluster analysis comparing the 8-gene CTC predictor in 45 MBC patients treated with first-line AI therapy. Each horizontal row represents a gene, and each vertical column corresponds to a sample. Red color indicates a mRNA expression level above the median level, black color indicates a median expression level, and green color indicates an expression level below the median level of the assay as measured in all 45 samples. The number of CTCs as established by the CellSearch Epithelial Kit is depicted below the figure. Blue: good responder; red: poor responder. CTC count: blue: <5 CTCs; red: ≥5 CTCs

Testing the 8-gene CTC profile in an independent differently treated patient cohort

Having identified the 8-gene CTC profile in AI-treated patients, it was assessed whether this signature was prognostic or predictive by investigating the association between this profile and outcome in an independent patient cohort composed of 71 MBC patients that received other first-line therapies than AI. Of these, 21 patients were treated with chemotherapy, 40 with chemotherapy combined with a type of targeted therapy such as trastuzumab, and 10 with tamoxifen therapy. Of this group, 35 patients had a PFS of less than 9 months and were therefore classified as having a poor outcome. Application of the 8-gene CTC profile resulted in 33 patients with a favorable CTC profile. The CTC profile however, could not properly discriminate the patients with a good versus those with a poor outcome (P = 0.899; Table 5).

Table 5 Test performance of the 8-gene CTC predictor in 71 patients not treated with AI therapy

Discussion

Characterization of CTCs holds great promise to predict response to treatment and to gain more insight into mechanisms underlying resistance to systemic anti-tumor agents. Although whole transcriptome analysis would be most preferable, isolation of CTCs by the CellSearch technique does not result in pure fractions of CTCs but only in fractions enriched for CTCs in which an overload of leukocytes is still present. This makes interpretation of whole transcriptome analysis impossible since only techniques yielding pure CTC fractions would allow such analyses. We have previously shown to be able to measure mRNA expression levels of multiple epithelial genes in CTCs enriched by CellSearch [18]. By using these selected genes and applying the same technique, the current study demonstrates the ability of using CTC characterization as a predictor for response to endocrine therapy. To our best knowledge, this is the first study that has generated an unique CTC-based gene expression panel that is able to distinguish good and poor responders to first-line AI therapy. From a clinical point of view, it is probably more relevant to identify the poor rather than the good responding patients, since these patients might benefit more from another treatment. Our identified 8-gene CTC profile however performed better in predicting the good responders, since the specificity of the predictor outperforms its sensitivity (85 % vs. 63 %; Table 3). Nevertheless, this could still impact clinical decision making since good responding patients could undergo less intensive follow-up strategies and fewer laboratory procedures which is not only less demanding for patients but can also reduce health care costs.

In order to explore whether this signature associated with outcome in AI-treated patients is prognostic or predictive, we tested the profile in CTCs of a group of 71 patients who were treated with types of systemic treatments other than AI including chemotherapy (N = 21), chemotherapy combined with a type of targeted therapy (N = 40), or tamoxifen therapy (N = 10). In contrast to the AI-treated patients, the 8-gene CTC profile could not discriminate patients with a good versus those with a poor outcome in this group of patients (P = 0.899; Table 5). Although this is not a true validation of the test, it strongly supports that the identified profile is predictive for outcome to AI therapy and not for outcome to other agents. It needs to be underscored that the identified CTC profile has been obtained in a small number of patients for which an LOOCV procedure to reveal such a profile is commonly applied. It important to realize that such an approach bears the risk of overfitting the data as a consequence of which validation in an independent patient cohort is needed before implementation in clinical practice.

The development of a CTC-specific predictor required exclusion of patients who lacked sufficient expression of epithelial-specific genes. These are mainly patients with no or few counted CTCs and are therefore more likely to have a longer PFS which might have biased our patient set [9]. Although most characteristics do not show differences between in- and excluded patients (Additional file 3: Table S1), the median PFS in the 33 excluded patients was 548 (40–1694) days which significantly differs from the median PFS of 358 (14–1255) days in the 45 included patients (Logrank P < 0.001). This exclusion criterion highly affected the number of patients available for further analysis. The low number of remaining patients might be the reason for the insignificant association between the CTC count at baseline (divided in <5 vs. ≥5 CTCs) and PFS. In the total cohort of 78 patients, CTC count was significantly related to PFS (Additional file 2: Figure S2). Since cohorts with few patients cannot be divided into independent discovery and validation sets, resampling the original data through cross-validation is statistically the best method [29].

Amongst the 8 genes that we found to be associated with outcome to AI therapy through LOOCV, was the epithelial marker KRT81. Many cytokeratins are highly expressed in both normal and tumor epithelium in which the pattern of expression can be used to identify the tissue of origin [30]. Not much is known about this specific cytokeratin and why high expression would lead to a worse outcome. Mutations in KRT81 have been described in monilethrix, a condition in which patients develop diffuse hypothrichosis [31].

CXCL14 and ERBB3 were the only genes that were more abundantly expressed in the good responding patients. This is discordant to what is currently known in primary tumor tissue with respect to both genes. The published literature, however, only considers gene expression in primary tumors which cannot easily be extrapolated to CTCs. CXCL14 is a chemokine that has been shown to be upregulated in tumor myoepithelial cells and enhances the proliferation, migration, and invasion of epithelial cells after binding to their receptors [32]. Expression of ERBB3 has, similar to EGFR in our CTC predictor, previously been associated with endocrine therapy resistance when highly expressed in primary tumor tissue [33, 34]. The predictor also contained high expression of PTRF and EEF1A2 to be associated with poor outcome. This is in contrast with previously published literature in which PTRF has been shown to interact with pS2/TTF1 [35] which on its turn needs ER as key transcriptional factor in order to be expressed [36] and is associated with a better clinical outcome in breast cancer [3739]. EEF1A2 is an eukaryotic elongation factor of which its expression downregulates through interaction with protein p16 (INK4a) leading to inhibition of cancer cell growth [40]. It is mainly known as a potential oncogene in ovarian cancer in which its expression enhances cell growth in vitro [41]. Overexpression of EEF1A2 has also been seen in breast tumors [42] and it is one of the genes in the 76-gene signature as identified in the ER-positive subset of 115 primary breast tumors that represent a strong prognostic factor for patients at high risk for developing metastases [43, 44]. With respect to the other genes of the predictor, PTPRK belongs to the group of protein-tyrosine phosphatases (PTPs) that control tyrosine phosphorylation. PTPs regulate the signaling of growth-factor receptors and can, when deregulated, be associated with tumorigenesis [45]. Deregulation of PTPs can result in both their up- and downregulation, which can explain the discordance between our established association between high expression of PTPRK and poor outcome to AI therapy, while decreased expression of PTPRK has previously been related to poor prognosis in MBC suggesting a more tumor suppressive role [46]. TWIST1, at last, is a transcription factor that is one of the most widely known factors to be involved in the process of epithelial-to-mesenchymal-transition (EMT). Its overexpression has been associated with endocrine therapy resistance due to downregulation of ER promoter activity [47]. Moreover, through direct repression of E-cadherin cells and activation of mesenchymal markers, TWIST1 plays an essential role in tumor metastasis [48]. The appearance of TWIST1 in our 8-gene CTC predictor is remarkable since our applied CTC isolation method relies on an EpCAM-based enrichment step and tumor cells undergoing EMT might become EpCAM-negative [49]. The dependency on EpCAM-expression by CTCs renders the CellSearch method therefore not the best method to capture all CTCs, but it is still the only FDA-cleared method which will enable its implementation and obtained results in clinical studies. In addition, whether EpCAM loss always accompanies EMT is still under debate [50].

Although ER is amongst the 93 target genes that were measured, its mRNA expression in this study was not associated with outcome to AI therapy. Several techniques have been explored to determine ER expression in CTC, but so far, none of these studies could show an association with outcome (reviewed in [19]). Recently, Babayan et al. have demonstrated the possibility of measuring ER protein expression in single CTCs through immunofluorescence. This study revealed that CTCs of individual MBC patients with ER-positive primary tumors are frequently a heterogeneous population consisting of both ER-positive and ER-negative CTCs [51]. Similar to primary tumor tissue, the percentage of ER-positive CTCs may be the best parameter associated with outcome rather than ER mRNA expression of the total CTC fraction as was measured in our study.

Conclusion

In conclusion, we have here defined an 8-gene expression predictor established in CTCs that is associated with outcome to first-line AI therapy in MBC patients. Importantly, before the results of the current study can be implemented, an independent patient cohort is warranted to validate the results found here. Nevertheless, this study underscores the enormous potential that molecular characterization of CTCs has.