Background

Breast cancer (BC) is the most frequent female malignancy with approximately 1.65 million diagnosed women worldwide [1, 2]. Growing incidence and decreasing mortality rates are reported for developed countries. In Germany, general trends are confirmed and today, survival after BC is higher than in the 1990s [3].

Guidelines before 2000

There are many reasons for these trends. The increasing effectiveness of therapy itself is certainly one crucial factor [4]. However, it is critical to distribute and implement published research from clinical trials into daily routine in a comprehensive manner. In the past, a small number of experts (St. Gallen consensus panel) interpreted actual results of trials and published the current state-of-the-art BC treatment [57]. Additionally, national [8] or European guidelines [9] provided treatment recommendations for physicians willing to improve their skills. Low acceptance and arbitrary application of these S1-guidelines were the norm, not the exception [10]. BC treatment depended mainly on experiences and knowledge of the physician. Counseling colleagues or quality circles met irregularly and the availability of expertise from other (medical) disciplines involved in the BC treatment was not institutionalized. Overall, health care professionals of different settings cooperated in a “free interplay” (e.g., liberally organized market) within a fragmented, but competitive German health care system [11].

Guidelines after 2000

A common effort of all stakeholders intended to overcome these deficits and developed evidence-, consensus-, and outcome-based national guidelines for the early detection [10] and therapy [12] of BC. The application of these S3-guidelines was mandatory for centralized BC networks inspired by so called “hub & spoke” models [1315]. Hubs were defined by academic institutions, and spokes refer to all of the related health care professionals. A network-wide monitoring of guideline application was assured by quality management systems, which became officially certified [13]. Multidisciplinary counseling was assured by expert panels (tumor conferences) hosted at comprehensive cancer centers. Integrated care models [16] were developed to overcome aforementioned infrastructural deficits.

Effectiveness of guidelines

Only a few studies have focused on all (inpatient) therapy sequences, guideline adherence and its impact on outcome measures. In Germany, the studies of Woeckel et al. [17, 18] examined this topic and confirmed the general effectiveness of guideline adherence using time intervals 1992–2005. Based on these data, authors called for the maximization of guideline adherence.

This approach is straightforward and yields two critical assumptions. First, study design might be appropriate as long as the general effectiveness of S3-guidelines is of concern. However, if the appropriateness of medical decisions according to released and concurrent guidelines is of interest, the above mentioned approach is not adequate. S3-Guidelines were not released before 2003, and therefore time effects induced by different guidelines cannot be captured.

Second, the concluded guideline maximization hypothesis was based on the assumption that medical decisions adherent to the guidelines is appropriate. This assumption is true if all of the physical and mental conditions of the patient agree with clinical algorithms, ancillary conditions, and patients’ preferences. However, if one of these premises is not fulfilled, the physicians are encouraged to decide against guideline recommendations [10, 12].

Aim of the study

The objective of the study is to exam the impact of process quality on 5-year overall survival. But in contrast to the above mentioned studies, process quality is assessed according to operating guidelines of time intervals (1996–97, 2003–04). Guideline adherence and divergence should be measured by a set of quality indicators defined along inpatient therapy sequences and related medical decisions. An overall adherence index is developed and two questions are examined: Is there a difference between guideline adherent and guideline divergent treated patients in terms of survival, first, within each time interval, and second, across time intervals? It is hypothesized that process quality increased over time. But in contrast to the cohort 1996–97, we expect an impact of process quality on survival for the cohort 2003–04. With respect to cross-period analysis, we expect higher survival of patients treated adherent to guidelines in 2003–04 and no survival differences of patients treated divergent from guidelines.

Methods

Incidence-based full population survey

All women with primary BC treatment in two general hospitals and one specialized academic hospital located in the district of Marburg-Biedenkopf (Hesse, Germany) were included (entry cohort). Patients were identified by surgical schedule lists and attendant histological affirmation of BC (ICD-10: C.50). Physicians recruited patients by explaining the aims of the study and obtained written informed consent. The relevant data were extracted from patient record files and stored in a clinical register [1921]. The study was approved and conducted according to the Declaration of Helsinki and the local ethics committee of the Philipps University of Marburg (Germany).

Sample selection for analysis

The entry cohort encompassed all treated patients (total “workload”), but not all patients of the entry cohort could be analysed by standardized quality indicators. Therefore, heterogeneous patient collectives with non-invasive tumours (pTis) and with distant metastasis or unknown metastasis status were dropped from further analysis to consider individual medical needs and the complexity of each therapy. This step defined the institutional-invasive samples. These were corrected by identifying non-resident patients to define regional-invasive samples [1921].

Exposure of cohorts

Cohort 1996–97 was exposed to the “free-interplay” of institutions. Primary BC treatment followed the S1-guidelines [69]. Cohort 2003–04 was exposed to an “integrated care” model defined by a certified BC center [13]. Primary BC treatment followed recommendations of the national S3-guidelines [10].

Primary endpoint and follow-up

Five-year overall survival regardless of causes of death was defined as the primary endpoint. The start of the observation time was the date of surgical intervention. The verification of the vital status was assessed by the official registry office corresponding to each inpatient. Follow-up began in 10/2008 and ended in 2/2009.

Covariates for risk adjustments

Available risk factors, prognostic and predictive factors for BC [22] were integrated into the Cox model. Regressors of the final model were: age at surgical intervention, binary nodal status, binary tumour size, binary hormone receptor status, and binary adherence index. The information on treatment location and application of chemotherapy served as strata variables.

Quality indicators of medical decision-making

Quality indicators were defined alongside relevant inpatient treatment sequences: surgical intervention (tumor, lymph nodes) together with radio-oncological irradiation, and chemo- and hormone-therapy according to different risk categories [7, 12]. Pre-operative diagnostic sequences and other systemic interventions (e.g., HER2neu among others) were not available in 1996–97 and were excluded.

Quality indicators (QI) operationalized guideline recommendations in two categories. First, recommendations that should be respected by physicians if all other ancillary conditions are fulfilled were one category. This QI category translated to Guideline Adherent Decisions (GAD). Second, medical decisions against recommendations of the guidelines were defined by Guideline Divergent Decisions (GDD). It is important to note that GADs and GDDs are not always the opposite of each other (e.g., not disjunctive). For the definition of QIs according to the S1-guidelines (1996–97) and S3-guidelines (2003–04), short and long descriptions are provided (see Additional files 1 and 2).

Adherence index

Developed QIs were aggregated into four indices concerning the adherence status of every therapy sequence. However, all QIs contributed to one overall binary adherence index. The aggregation of QIs was performed by the following methodology. First, each QI was assessed according to its category (GAD, GDD). Second, if all GADs were assessed as positive (e.g., adherent), BC treatment of one patient was preliminarily considered to be guideline adherent by the summarizing overall adherence index. But, if even one GAD did not catch up with guideline recommendations, the adherence index was devalued and considered to be guideline-divergent. Third, even when one GDD was administered as positive (e.g., divergent), inpatient primary BC therapy was classified as guideline-divergent by the overall adherence index. In this sense, only one disrespected quality indicator devalued all possible guideline-adherent indicators beforehand.

Statistics

Univariate statistics describe clinical characteristics of the selected samples. The distributions of covariates between cohorts were compared by Chi-square-, Kruskall- Wallis-, Mantel-Haentszel Chi-square, Mann–Whitney U- or T-Tests. Derived p- values were adjusted for multiple testing by the Bonferroni-Holm method. Frequency counts described quality indicators, and Chi-Square tests adjusted for multiple testing with the Bonferroni-Holm method were applied. Multivariate survival analysis was performed by a Cox-regression model [23]. Multivariate survival curves were derived by the corrected group analysis method [24]. The significance level was defined by α = 5 %. SAS 9.3 software was used.

Analysis strategy

Univariate results of sampling and distributions of important covariates are presented first. The number of developed quality indicators and their guideline adherence (divergence) of each therapy sequence and of the overall binary adherence index are presented. Finally, multivariate survival methods analyzed every period (e.g., cohort) separately, before cross-period/cohort comparisons without the adherence index and cross-period/-cohort comparisons conditioning on adherence status were performed.

Results

Sampling results

An entry cohort of 877 patients was reduced by 134 patients (15 %) due to loss to follow-up (1.9 %), non-assessable stage information (0.3 %), non-invasive tumours (6.3 %), non-assessable or distant metastases (5.2 %), or non-assessable margins of removed tumours (1.5 %). Excluded patients were randomly distributed over both cohorts (see percentages of Table 1), and no significant differences between included and excluded patients were detected (p-values not shown here). The exclusion of patient records left 743 (84.7 %) in the institutional-invasive samples and 504 (57.5 %) patients in the regional-invasive samples for analysis.

Table 1 Selection from entry cohort to samples of analysis

Descriptive statistics

Distributions of available risk, prognosis and predictive factors showed roughly balanced samples between cohorts (e.g., time intervals). Exceptions in the institutional-invasive sample refer to cohort 1996–97, which showed more invasive-ductal carcinomas (84 % vs 73 % in 2003–04), fewer G2- (48 % vs 71 %) and more G3-types (45 % vs 15 %), less R0-resection margins (85 % vs 95 %), and fewer patients from clinic C (63 % vs 80 %). A similar pattern was evident in the regional-invasive sample (see Table 2 for details).

Table 2 Distribution of available risk, prognostic and predictive factors in selected samples of analysis

Process quality indicators

In total, 104 quality indicators defined Guideline Adherent Decisions (51) and Guideline Divergent Decisions (53). Common QIs valid for both cohorts due to equal guideline recommendations related to the surgical strategy. A total number of 23 QIs referred to the sequences of breast conserving surgery and irradiation (BCS + RAD: 8 QIs) and the modified radical mastectomy (15 QIs).

The remaining QIs differed between time intervals, and cohort-specific conceptualization of QIs was required. Axilla treatment (1996–97: 2, 2003–04: 9) differed due to implementation of sentinel techniques in period 2003–04. Chemo- and hormone- therapy QIs (1996–97: 38, 2003–04: 32) differed due to distinct risk categories.

Adherence indices

The application of defined QIs showed significant differences of guideline adherence between 1996–97 and 2003–04 (see Table 3). The relative share of guideline- adherent surgical treatments increased from 28.7 % (1996–97) to 52.8 % (2003–04) in the institutional-invasive sample (from 30.3 to 51.9 % in the regional-invasive sample). Chemotherapy adherence increased from 74.5 to 93.2 % (76.9 to 92.1 %) of treatments and hormone therapy from 70.1 to 84.4 % (68.1 to 83.8 %). Only the therapy sequence of lymph node dissection failed to exhibit a significant difference between cohorts due to the high quality level prior to infrastructural changes.

Table 3 Guideline-adherent treated breast cancer inpatients per therapy sequence and distribution of guideline divergences

The summarizing overall binary adherence index among all of the measured inpatient therapy sequences significantly increased from 13.3 % (1996–97) to 35.2 % (2003–04) in the institutional-invasive samples and from 15.1 to 33.5 %. In other words, a two-fold increase of process quality has been achieved and the relative share of treatments divergent from guidelines declined from 86.7 to 64.8 % (84.9 to 66.5 %).

Multivariate 5-year survival estimates

Period-specific results

Furthermore, the impact of the overall binary adherence index on survival should be measured. Several steps of model selection-, check- and model-fit-procedures identified a relevant covariate set encompassing the developed adherence index. Estimates of the final Cox-regression model are shown in Table 4.

Table 4 Multivariate Cox-regression models applied to the adherence index and crucial risk, prognosis and predictive factors

Table 4 shows cohorts and samples across the statistical information. For cohort 1996–97, both samples (institutional- and regional-invasive) show the negative association between adherence index and survival. If a patient was treated according to the guidelines, the temporary affinity to die (hazard ratio) declined and the 5- year overall survival increased. However, this result is not significant. A systematic effect of adherence on survival is not evident. This result is consistent across cohort 2003–04 and defined samples. The related survival curves of multivariate survival estimates should be derived by the corrected group analysis (CGA) method. The results are shown in Table 5.

Table 5 Multivariate 5-year survival and event rates estimated by the corrected group analysis method

If all of the additional variables of the Cox model are taken together, the CGA method allows for estimating survival rates and related curves [24]. The cohort- specific perspective and the institutional-invasive samples are presented first. Cohort 1996–97 exhibits remarkable survival differences between comparison groups (institutional-invasive: 84.5 − 76.8 = 7, 7). However, confidence intervals and related p-values indicated that the results were not significant. The same result was obtained for cohort 2003–04. A small 5-year survival difference (87.7 − 86.3 = 1,4) was estimated. However, the survival curves behave differently as Fig. 1a-b indicates.

Fig. 1
figure 1

Institutional-invasive samples comparing guideline-adherent and -divergent treated patients. Cohort 1996–97 (left) and cohort 2003–04 (right)

Figure 1a on the left shows the development of cohort 1996–97. The survival curves start separating after 12 months and depart after 30 months. The survival curves of guideline-divergent treated patients decline more than patients treated according to guidelines. In comparison, for cohort 2003–04 the survival differences between groups are very small, and the decline occurred after 20 months and a less steep development for the guideline-divergent treated patients was observed (Fig. 1b). If the analysis is restricted to regional-invasive samples (e.g., residential patients), cohort 1996–97 displayed small survival differences (83.4 − 79.9 = 3.5, see Table 5) and cohort 2003–04 displayed considerable survival differences (91.0 − 84.0 = 7.0, see Table 5) between the comparison groups. Figure 2a-b demonstrates insights.

Fig. 2
figure 2

Regional-invasives samples comparing guideline-adherent and -divergent treated patients. Cohort 1996–97 (left) and cohort 2003–04 (right)

Figure 2a shows the survival curve of cohort 1996–97. It seems that the curves start to separate after 12 months, and after 30 months the curve declines more. The survival curve of cohort 2003–04 (Fig. 2b) exhibits a different pattern. The survival curve starts departing from the beginning of the observation time and the curve of guideline-divergent treated patients is steeper after 10 months. Thus, the survival curves of cohorts and samples were altered substantially in terms of survival level and curve developments.

Cross-period results

To obtain more insights into cross-period survival rates and patterns, the cohorts were compared regardless of adherence status (not shown in tables). The institutional-invasive sample estimated a survival rate of 79 % for cohort 1996–97 and 86 % for cohort 2003–04. The survival difference between cohorts was significant (p = 0.007).

However, if the information of guideline adherence is added to the model and cross-period survival curves of guideline-adherent only, or guideline-divergent treated patients only were estimated, the subject becomes more intriguing.

First, if only guideline-adherent patients of the institutional-invasive samples were compared, the survival estimates (see Table 5) were almost identical for cohorts 1996–97 and 2003–04 (89.6 % vs 89.9 %). The survival differences were not significant. If this comparison is restricted to residential patients (e.g., the regional-invasive sample), the survival rate of cohort 1996–97 was essentially lower than in cohort 2003–04 (87.1 % vs 92.2 %) but still not significant. Figure 3a-b shows the survival curves.

Fig. 3
figure 3

Guideline-adherent treated patients of cohort 1996–97 and cohort 2003–04. Comparison of institutional-invasive (left) and regional-invasive samples (right)

Second, only guideline-divergent treated patients were observed across the samples. The institutional-invasive samples showed a survival rate of 76.4 % in cohort 1996–97 and 84.6 % for cohort 2003–04 (see Table 5). This difference was significant (p = 0.013). However, this result was not replicated for the regional-invasive samples (79.6 vs 82.5; not significant). The survival curves are shown in Fig. 4a-b.

Fig. 4
figure 4

Guideline-divergent treated patients of cohort 1996–97 and cohort 2003–04. Comparison of institutional-invasive (left) and regional-invasive samples (right)

Discussion

Based on the defined set of quality indicators according to time dependent guidelines and available medical knowledge, a two-fold increase of process quality and its medical decision making from the expert’s point of view has been observed. This result is a benefit for women with BC because the complexity of modern therapies continues to grow.

Period-specific comparisons

The process quality of cohort 1996–97 was expected to be low, and no survival differences between comparison groups in cohort 1996–97 were expected. In fact, no impact of process quality on survival was observed. For cohort 2003–04, a higher clinical process quality was hypothesized and an impact on survival was expected. Higher survival rates of the guideline adherence group were expected but were not observed. Multivariate survival analysis revealed no significant associations of the adherence index on 5-year overall survival across all of the defined samples.

Cross-period comparisons

The cross-period/cohort comparisons should yield deeper insights into mechanisms of temporal changes. Cross-period comparisons without considering the overall binary adherence index showed a significant difference of survival rates of approximately 7 % (see subsection ’Cross-period results’). However, cross-cohort comparisons of the adherence group only showed that estimates revealed no significant survival differences. When the guideline divergence group of cohort 1996–97 and 2003–04 were compared, systematic survival gains of 10 % were observed for the institutional-invasive sample. The latter survival increase exceeds the survival increase of periods regardless of the adherence status by approximately 3 %. This excess survival can be characterized as a period effect and was not expected for this subgroup.

In the context of guideline developments and its assessment, this unraveled period effect was deemed inconsistent with the ubiquitous demand of the maximization of guideline adherence [17, 18]. Isn’t it a paradox that particular women with BC benefited most in the last decade from treatment which violated guideline recommendations?

Essence of guidelines

It is not inconsistent with the essence of guidelines because the identified paradox reflects the very nature of guidelines as they should apply for the vast majority of patients. Schulz et al. [10] emphasized that “if the individual situation requires deviations of guidelines, it is not solely possible, it is mandatory to do so. Guidelines do not discard physicians from their obligation to concern the clinical characteristics, somatic, psychological and social conditions of each patient”.

At this point, cohort 1996–97 and 2003–04 differ substantially from the infrastructural perspective. Systematic, rationale and conscious decisions against guidelines were made and monitored by expert panels in 2003–04.

Why adherence paradox?

These multidisciplinary expert panels were introduced in the decade of cohort 2003–04 to cope with the essence of guidelines. Expert panels operated by leading physicians from all related disciplines (e.g., gynaecologist, oncologists, surgeons, pathologists, radio-oncologists, psycho-oncologists, etc.) gave consensual advice for further, multi-modal treatment [11]. Expert panels became an important forum to consider guideline recommendations, individual medical experience of various experts, patient preferences and their social circumstances. Expert panels use guidelines as a starting point for common recommendations and, if necessary, violate them systematically, rationally and consciously to tailor an individualized therapy. Thus, the identified adherence paradox reflects this essence of guidelines and signalizes its appropriate application in certified BC networks [15].

Alternative approaches to define an adherence index

In comparison to related studies, most of these studies use a rate-based/criterion-based approach to define 5 to 20 quality indicators, mostly extracted from routine data [2530]. These studies estimate that guideline adherence is between 80 and 100 %. If 33 indicators are used, the adherence of medical decisions decreases to 52 % [17, 18]. If medical decisions documented in patient record files are revised, 19 % (1993) and 54 % (1995) of 375 medical decisions appear to be adherent with current guidelines [31]. Scientifically legitimated deviations increased from 42 % (1993) to 68 % (1995). As an experimental design with the same methodology was conducted, a non-significant increase of 36 % (1996) to 40 % (1999) of 825 revised medical decisions was found [32]. Overall, the degree of adherence strongly depends on the length of observation time [33], age of the patient [34], number of quality indicators and included therapy sequences.

Adherence index and survival of related studies

Most studies only refer to selected therapy sequences (e.g., surgery, chemotherapy, etc.) [35, 36] and dismiss effects of relevant or related interventions. Other studies assessed inpatient therapy by a small number of indicators and estimated 50 % lower hazard ratios induced by guideline adherence treatment [37]. Woeckel et al. reproduced this result with a greater number of indicators but advised that a non-linear relationship between adherence and survival seems to be persistent [17, 18]. Indeed, the influences of the socio-economic status (SES) seem to modify treatment effects because social disparities of survival have been reported [38, 39]. Hence, systematic positive and linear relationship of adherence and survival is not replicable with incomplete multivariate models. In this sense, the present study is consistent with other reports [40, 41].

Strengths of study

Data quality assessment prior to this study [1921] assured high data quality, epidemiological relevance, and reliable and valid survival estimates. Sample distinction between all selected patients and residential patients emphasizes that confounding effects and related biases were adjusted for survival analyses. The definition of quality indicators is based on “pathways of coherent decisions” and is superior to the rate-based/criterion-based methodology. For example, breast conserving surgery/mastectomy (BCS/MRM) together with irradiation (RAD) defines a compound therapy according to the guidelines [12]. As this approach was applied to time-interval specific guidelines, this study was able to identify the (unexpected) period effects.

Limitations of the study

A number of the 104 quality indicators did not include important variables necessary for guideline assessment. Particularly, patients’ preferences for treatment strategies are missing. Studies have shown that up to 50 % of patients disagree with physicians’ treatment recommendations [42]. This comparatively high share of disagreement between patients (mastectomy preference) and their physicians (favoring breast conserving therapy) referring to a sample recruited between 2001 and 2003 emphasizes that guideline deviations do not descend from medical experts alone. Additionally, some indicators refer to decisions and planned actions but not to actual “clinical performance”. This limitation refers to chemo- and hormone-therapies whose time schedules strongly depend on the patients’ physical conditions. To consider this general flaw of conceptualization, new categories such as “scientifically legitimate decisions” [31, 32] or “justifiable guideline divergence” decisions [43] seem to be more appropriate to relax the rigid distinction between guideline adherence and divergence.

Conclusions

The proof of a positive relationship of guideline adherence and survival seems to be more complex than understood so far. Unexpectedly, guideline-divergent treated patients of cohort 2003–04 benefited most. We hypothesized that infrastructural efforts made by multidisciplinary expert panels contributed to this adherence paradox. The adherence paradox reflects the essence of guidelines and signalizes appropriate application of guidelines in certified BC networks. The maximization of guideline-based decisions should substitute the postulation of adherence maximization. Finally, if women recognize treatment deviations from published patient guidelines for BC, the prognosis of therapy is no longer associated with shorter survival.