Introduction

For quality assurance and to implement evidence-based guideline recommendations effectively in everyday oncological care, a ‘Quality Cycle Oncology’ has been established in Germany. Its central elements are defined quality indicators (QIs) derived from strong recommendations of S3 oncological medical guidelines developed by the German Guideline Program in Oncology (GGPO) (Langer and Follmann 2015). The German S3 guidelines are based on a systematic literature review, the presence of a representative interdisciplinary and interprofessional expert panel, including patient advocacy groups, and the use of a formal consensus-building process (Langer and Follmann 2015; Nothacker et al. 2014). An obligatory part of every S3 guideline development process is the definition of QIs from strong recommendations. These are considered suitable as a quality standard since it can be assumed that most patients will gain a clear benefit from the addressed actions of these recommendations. In a multi-step process, interdisciplinary experts of the guideline group identify those strong recommendations of the S3 guideline whose comprehensive implementation improves the provision of care in a defined population and whose ‘translation’ to an indicator is possible (Langer et al. 2017).

The implementation rate of these QIs, and thus the adherence to guideline recommendations, is monitored and evaluated through the certification system implemented by the German Cancer Society (DKG), which serves as one of the core elements of the quality assurance and improvement process for certified cancer centres (Langer et al. 2017).

The results of the QIs are regularly fed back to the GGPO guideline groups to ensure the best possible exchange between the development of evidence- and consensus-based recommendations and clinical routine practice (Beckmann et al. 2016). In the context of guideline updates, the existing quality indicators are also subject to the updating process. Here, the results of the quality indicators are reviewed, and a decision is made as to whether the quality indicator must be retained or changed or, in the case of complete implementation, can be discontinued (Langer et al. 2017).

As of January 2022, 31 tumour-specific and cross-sectional S3 guidelines had been published and 192 quality indicators derived. Thereof, 108 quality indicators are implemented in 18 tumour-specific certification procedures in a total of 1,715 certified centres, including 142 outside of Germany.

In the present study, which was conducted within the scope of a qualifying thesis for a doctorate in medical science at the Charité University Medicine, we present an example from the gynaecological cancer centre (GCC) certification system of the German Cancer Society (DKG).

The certification system for GCCs was developed in 2008 by the DKG and the Working Group for Gynaecological Oncology (Arbeitsgemeinschaft Gynäkologische Onkologie [AGO]) and the German Society for Gynaecology and Obstetrics (DGGG) (Leitlinienprogramm Onkologie. Deutsche Krebsgesellschaft, Deutsche Krebshilfe, AWMF): S3-Leitlinie Diagnostik, Therapie und Nachsorge maligner Ovarialtumoren 2021). As of 2019, a total of 164 GCCs had been certified (Krebsgesellschaft e.V. Jahresbericht der zertifizierten Gynäkolgoischen Krebszentren 2020), and about 55% of all patients in Germany with a first diagnosis (primary case) of a gynaecological tumourFootnote 1 in 2019 were treated in these certified GCCFootnote 2 (Krebsgesellschaft e.V. Jahresbericht der zertifizierten Gynäkolgoischen Krebszentren 2020). Many certified GCC have also joined together in the AGO's working group AG Ovar and are part of the AGO's quality assurance program (QS-OVAR).

Gynaecological tumours consist of several entities that differ in incidence, therapy and prognosis. In 2017, approximately 38,000 women in Germany were diagnosed with a gynaecological neoplasm (Robert Koch Institut 2016).

The GCCs, like all other cancer centres of the DKG, are multidisciplinary and interprofessional networks of qualified partners that represent the entire chain of health care. They commit themselves to adhering to the defined quality standards (i.e., minimum case numbers, tumour boards, high expertise of all network partners, etc.) and transparently disclose the results of their key performance indicators and guideline-derived quality indicators to demonstrate their quality of care and guideline adherence and discuss, if necessary, improvement measures (Mensah et al. 2017).

Especially for gynaecological tumours, various studies have shown that the interdisciplinary cooperation and highly specialised surgical expertise of the clinic and surgeons as well as the surgical case volume have been of great benefit to patients and have had a relevant influence on the clinical outcome (Wright et al. 2011; Bristow et al. 2009; Bois et al. 2009; Munstedt et al. 2003).

The focus of this study will be on two selected gynaecological tumours, namely ovarian and cervical cancers. For both tumour entities, S3 guidelines are available and regularly updated (Leitlinien Programm Onkologie (Deutsche Krebsgesellschaft, Deutsche Krebshilfe, AWMF). S3-Leitlinie Diagnostik, Therapie und Nachsorge maligner Ovarialtumoren; Leitlinien Programm Onkologie (Deutsche Krebsgesellschaft Deutsche Krebhilfe, AWMF). S3-Leitlinie Diagnostik, Therapie und Nachsorge der Patientin mit Zervixkarziom 2021), and in GCCs it has been obligatory to document QIs for these two entities since 2014 for OC and 2015 for CC. For endometrial and vulvar tumours, QIs have been implemented only recently, in 2018 and 2016, respectively, and no S3 guideline is yet available for vulvar carcinoma.

Comprising 3.1% of all malignant neoplasms and 5.2% of all cancer deaths in women, ovarian cancer is the gynaecological cancer with the highest mortality rates (Wesselmann et al. 2014; Robert Koch Institut 2016), representing 19.2% of incident cases of gynaecological neoplasms (Robert Koch Institut 2016). Despite advances in screening and prevention measures, invasive cervical carcinoma, at 11.4% of cases, remains the third most common gynaecological neoplasm in women in Germany and worldwide (Robert Koch Institut 2016; Leitlinien Programm Onkologie (Deutsche Krebsgesellschaft Deutsche Krebshilfe, AWMF). Prävention des Zervixkarzinoms 2020).

Using the example of QIs for ovarian and cervical cancer, this study set out to investigate the development of the implementation rate over time, report results for the time period between 2015 and 2019, evaluate the status of guideline-compliant care and identify areas and corresponding measures to foster improvement. A further goal of this paper is to raise awareness of the potential of guideline-based QIs and their results to contribute to quality assurance and improvement in the clinical routine. The aim is to initiate a discussion and thus jointly define actions and measures to improve health service delivery to ovarian and cervical cancer patients.

Patients and methods

Data collection

Each GCC that intends to be (re-)certified must document fulfilment of the requirements. Annually, the results of key performance and quality indicators must be reported to OnkoZert, the independent certification institute that organizes the auditing procedure on behalf of the DKG. After collection from the centres, the datasets are analysed and tested for plausibility. Indicators mostly have target values or defined plausibility limits in which the certified centres have to give a mandatory statement of reasons as to why the limits were overstepped, i.e., in the case of deviation from the guideline recommendation. When target values or plausibility thresholds are reached, centres do not have to give explanations for patients not treated accordingly. For successful certification, cancer centres have to meet the target value or give a plausible explanation if they are not meeting the value (Adam et al. 2018).

Centres are audited regularly by trained gynaecological oncologic medical experts who check the reported data from the previous calendar year before the audit and have insight into patient files during the audit to verify the data. Only verified data are published in the benchmarking reports. For example, 2019 data are audited during 2020 and published in 2021. The data presented here are based on the 2015–2019 patient cohort. Only data from centres that were certified throughout the complete year and had no change in the tumour documentation system are included.

The QIs included in this study are derived according to a defined methodology (German Guideline Program in Oncology (German Cancer Society, German Cancer Aid, Association of the Scientific Medical Societies). Development of guideline-based quality indicators: methodology for the German Guideline Program in Oncology 2021) from the two evidence-based guidelines on the diagnosis, therapy and follow-up of malignant ovarian tumours and patients with cervical cancer published by the GGPO (Leitlinien Programm Onkologie (Deutsche Krebsgesellschaft, Deutsche Krebshilfe, AWMF). S3-Leitlinie Diagnostik, Therapie und Nachsorge maligner Ovarialtumoren 2021; Leitlininien Programm Onkologie (Deutsche Krebsgesellschaft Deutsche Krebhilfe, AWMF). S3-Leitlinie Diagnostik, Therapie und Nachsorge der Patientin mit Zervixkarziom 2021). The treatment guidelines, the corresponding QI and the QI set collected via the certification programme are regularly updated. In this analysis, only QIs that were included in the DKG dataset from 2014 onward and still included as of 2021 were taken into consideration. QIs that had been discontinued over time were not included in this analysis. An overview of discontinued QIs can be seen in (Table 1).

Table 1 Discontinued QIs for Ovarian and Cervical Cancer

Data analyses

Descriptive analysis of the case distribution, patient numbers and indicator definitions were performed. QI results for patients with cervical cancer (CC) and ovarian cancer (OC) treated in GCCs between 2015 and 2019 were analysed. Only patients from GCCs that had certified status over the entire time period were considered. The median proportion of the centres and overall proportion was calculated for every QI. Two-sided Cochran-Armitage tests were applied to detect trends over time. The standard deviations on the centre level over time were calculated to analyse fluctuations.

Statistical analyses were performed using R version 3.5.1 and the Data-WhiteBox, a data analysis tool developed by OnkoZert. Cochran–Armitage tests were calculated using XLSTAT Version 2019.2.1, excluding centres that had missing values at any reporting point. A p-value ≤ 0.05 was considered statistically significant.

The data analysis and study concept were reviewed and approved by the ethics committee of Charité University Medicine in November 2021.

Results

The number of certified GCCs increased steadily from 2015 to 2019 from 112 to 149, and the number of patients with a primary diagnosis of a gynaecological malignancy treated in GCCs increased from 11,587 to 14,986. Therefore, even though the incidence of OC and CC in Germany has been decreasing over time from 7318 to 7292 and 4606 to 4341, respectively (Robert Koch Institut 2016), the number of patients treated for these two tumour entities has increased in GCCs (OC: 3301–3798 and CC: 2059–2479) (Krebsgesellschaft and e.V. Jahresbericht der zertifizierten Gynäkolgoischen Krebszentren 2020).

The indicators are defined and categorized in (Table 2) including the numerator, denominator and plausibility corridor for the reported QI results. QIs were divided into two categories, (1) process organization (PO-QIs) and (2) treatment procedures (TP-QIs), to allow a differentiated analysis in order to identify areas and corresponding measures to foster improvement in the implementation rate.

Table 2 Definition of indicators ovarian carcinoma (numerator, denominator, evaluation of results and category)

Process organization QIs are defined as indicators that document the implementation of processes and structures explicitly recommended by the medical guideline within the certified network.

Treatment procedure QIs are defined as indicators that report on treatments performed by the members of the certified network, e.g., surgical interventions or recommendations for systemic therapies.

Five QIs were included in the category treatment procedures (four for OC, one for CC) and four QIs in process organization (one for OC, three for CC).

Table 3 presents the results of 9 QIs (5 OC, 4 CC) from 75 GCCs treating 17,495 OC primary cases (incident cases) and 10,969 CC primary cases between 2015 and 2019.

Table 3 Quality indicators for ovarian and cervical cancers; treatment years 2014–2019

The implementation rate for PO-QIs that reflect the application of processes and structures either remained stable on a very high implementation level or increased steadily over time to a very high implementation level (e.g., CC: details in pathology report for lymphonodectomy—median 2015: 88.0% to 2019: 97.8%; OC: operation of advanced ovarian carcinoma by a gynaecological oncologist—median 2014: 100.0% to 2019 100.0%).

The implementation rate for TP-QIs that report on treatment methods show an overall high implementation rate, yet the median fluctuates slightly over time (e.g., OC: macroscopic complete resection advanced OC—median 2014: 58.8%; 2015: 62.5%; 2016: 70.0%; 2017: 69.6%; 2018: 68.3.0%; 2019: 75.0%).

Breaking down the TP-QI category further, TP-QIs that address recommendations for systemic therapy show a good to very good implementation rate; however, the analysis indicates that the median is not only fluctuating but decreasing over time (OC: post-operative chemotherapy advanced ovarian carcinoma—median 2014: 94.6% to 2019: 88.9%; OC: first-line chemotherapy of advanced ovarian carcinoma—median 2014: 69.2% to 2019: 60.1%).

By contrast, the overall median for TP-QI results referring to surgical interventions show a good to very good implementation rate, which increased over the past 4 years. The median fluctuates over time (QI 1 surgical staging in early OC—median 2014: 75.0% to 2019: 81.8%; QI 2 macroscopic complete resection advanced OC—mean 2014: 58.8% to 2019: 75.0%).

Calculating the SD using the annual QI quota of each centre, the overall mean SD of all QI was calculated and is displayed in a boxplot diagram in (Fig. 1a, b). Analysis of the implementation rate on the individual centre level shows that the results within one centre can vary over time. The mean SD for PO-QIs is the lowest, between 4.4 and 18.2 (e.g., QI 14 presentation at the tumour board CC, mean SD 4.4), the mean SD for TP-QIs that address systemic therapies lies between 11.8 and 16.2 (e.g., QI 12 post-operative chemotherapy for advanced OC, mean SD 11.8), and the mean SD for TP-QIs reporting surgical intervention is the highest, between 15.0 and 19.1 (e.g., QI 1 surgical staging early OC cumulative mean SD 19.1).

Fig. 1
figure 1figure 1

Means of overall standard deviations of centres annual quotas for QIs evaluated between 2014 and 2019

The Cochran-Armitage test shows positive trends for five out of nine QI. Positive trends in both categories show four QIs in treatment procedures and one QI in process organization. Trend analyses were conducted over the course of 4 years for the QI 2 ‘macroscopic complete resection advanced OC’, QI 4 ‘postoperative chemotherapy advanced OC’ and QI 5 ‘first-line chemotherapy of advanced OC’. For QI 9 ‘cytological/histological lymph node staging’, the analysis was conducted over the course of 3 years.

Discussion

This article presents, for the first time, a differentiated overview of the implementation level and development of guideline-derived QI results for OC and CC in certified GCCs.

The results of the evaluated QIs show that the recommendations of the guidelines are implemented to a high or very high extent in the certified GCCs. The quality of care is made visible, and results can be compared between centres. Grouping the analysed QIs into two categories—process organization and treatment procedures—offers the opportunity to assess the improvement potential of QIs in a differentiated way and allows identification of suitable measures for improvement, which can be implemented in the certified centres.

QIs that reflect the implementation of processes and structures within the certified networks are very well applied. The results illustrate that QIs related to procedural aspects have a very high implementation rate (2019: QI 3: 100%; QI 6: 100%, QI 7: 92.3%; QI 8: 97.8%). The excellent implementation rate of this category of QIs has often been realized right from its introduction (e.g., QI 1 and QI 6 each 2015: 100% and 2019: 100%) and is maintained over time. For instance, mandating that surgical therapy for advanced ovarian cancer can only be performed by specialized gynaecologists not only improves outcomes and lengthens survival (Bois et al. 2009; Munstedt et al. 2003; Begg et al. 1998; Junor et al. 1999) but is also easily achievable via a top-down process arrangement. The same process can be applied within the network and to cooperation partners regarding implementation of QI 6 (tumour board presentation rate) and the definition of mandatory information to be included in pathology reports, such as initial diagnosis, tumour resection and, if applicable, indication that lymphadenectomy is complete (QI 7 and QI 8).

These procedural QIs have a tremendous influence on the quality of patient care, while being relatively easy implementable in GCCs, e.g., through standard operating procedures and handling instructions. This is also shown by a consistently high implementation rate and low mean SD of the PO-QI on the individual centre level. Hence, in principle, these indicators and corresponding target values are easily reachable for every certified centre while taking into account justifiable individual cases such as emergency surgery, preventing presentation at the pre-therapeutic tumour board. In the case of repeated not-justifiable non-fulfilment of this indicator group, a ‘deviation’ in the audit will be given. An ultimate failure to fulfil the indicators can lead to withdrawal of the certificate.

Results from QIs that report on treatment procedures such as surgical interventions and recommendations for systemic therapy present a slightly different picture. For evaluation of adherence to recommendations for treatment procedures, it must be considered that situations in routine care are very complex, and conclusions from raw QI data on quality of care are not readily possible (Junor et al. 1999). For example, QI results that do not reach a pre-defined threshold (target value) do not necessarily indicate insufficient performance on the part of the providers. Under such circumstances, additional information is needed to decide whether quality of care is adequate or not (Junor et al. 1999). Therefore, the given explanations by the certified centres are discussed with the auditor during the on-site audit and checked through random samples of patient files. If explanations of the centres seem not to be adequate, the auditors pronounce ‘deviations’ that need to be remedied by the centres (Kowalski et al. 2017). If the explanations are plausible and justifiable, no further action is required.

QIs that call for the implementation of systemic therapies in line with the guideline recommendations show a good yet decreasing implementation rate over time in this analysis (QI 4: 2014 94.6% to 2019 88.9% and QI 5 2014 69.2% to 2019 60.3%). Explanations from the centres that fell below the target value included, for both QIs, mainly patient-related reasons (i.e., patient death after surgery, patient wish, existing comorbidities and/or poor general health, therapy termination due to side effects). For QI 5 (First-line chemotherapy of advanced OC) comorbidities and poor general health often also caused changes in therapy regimes. Patients being treated ex domo / outside the network as well as the time of data reporting (i.e., patients can only be counted in the numerator when the therapy is completed) were named as reasons why patients were missing even though the recommendations for chemotherapy was provided during the tumour boards. It must be kept in mind that written explanations only have to be provided in case the number of patients is below the threshold (QI 4 < 30%; QI 5 < 20%), i.e., if the overall number of eligible patients in the numerator or the median decreases but remains above the threshold, the certified GCCs do not have to provide a reason.

Thus, based on this preliminary evaluation, it can be argued that in contrast to the results of the PO-QIs, the implementation rate for QIs documenting the application of systemic therapies reaches a plateau where the guideline recommendation is known to the practitioners, but patient-related factors prevent a further meaningful increase in the rate. Hence, fluctuations of the implementation rate and higher mean SD of these TP-QIs on the individual centre level are to be expected. The decreasing implementation rate could be in relation to an older age and/or the existence of multiple comorbidities and/or other therapy regimes. Unfortunately, this cannot be further explored with the present data set, as socio-demographic information and detailed information about comorbidities are not yet available or too superficial.

By contrast, TP-QIs that report on surgical interventions offer more room for improvement measures. This set of QIs reflects not only patient-related factors (i.e., comorbidities, poor overall health status, patient rejection of surgery) but also the professional expertise of the surgical team. Surgical therapy is one of the fundamental pillars of the treatment strategy for OC and CC. Not only is it the most important diagnostic instrument; it also has a direct and strong influence on prognosis and is part of a mostly multimodal and interdisciplinary therapy concept (Sehouli et al. 2019). Like QIs reporting on systemic therapy, the data show an increase over time and also reach a plateau in the implementation rate (i.e., QI 1 2014: 75% to 2019 81.8%; QI 2 2014: 58.8% to 2019: 75.0%% and QI 9 2015 63.2% to 2019 72.9%). While keeping in mind that the denominator of the surgical QIs was often small, explanations for not meeting the Q9 (cytological/histological lymph node staging) target value mostly included the application of radio chemotherapy prior to cytological/histological lymph node staging. For QI 2 (macroscopic complete resection of advanced OC), the existence of multiple (distant) metastasis was given as the most frequent reason for an incomplete macroscopic resection. As reported above, some patients also decided to undergo the procedures outside of the certified network. However, besides patient-related topics, the most frequent reasons for not reaching the QI target value included inoperable situs due to advanced spreading of carcinoma or inter-operative assessment, which deemed the surgery as not possible. In the case of QI 2, it was stated several times that the tumour could only be reduced in size but not removed. The data unfortunately do not allow us to assess if other surgical teams would have come to different conclusions and assessments. During the audit, auditors and physicians of the GCC discuss if the results are justifiable, but explanations regarding the deviations are typically brief and often superficial (Inwald et al. 2019).

The following further limitations need to be pointed out in the light of the data interpretation. Firstly, only aggregate data are submitted by the individual centres, hence assessment of individual patients’ information regarding case severity or socio-demographics is not possible. Secondly, the centres included in this analysis could be prone to a selection bias as often only centres that are already performing well join quality assurance programmes. Also, the data investigated here cannot be linked to survival data from registries.

As for these QIs, the most relevant factors are the personal skills of the practitioners, and when these are combined with technical prerequisites, opportunities to identify measures for improvement are given. Thus, measures for improvement of the implementation rate of this QI set, besides the discussion of results amongst peers during the audit, could additionally include offers of surgical courses or coaching.

Interestingly, the data also show that on the individual centre level, the results for macroscopic complete resection, sugical staging of early OC and cytological/hostological LN staging can vary widely from one year to another, with an overall standard deviation of up to 19. Reasons for these fluctuations cannot be provided with the currently available data. When interpreting the results, we must bear in mind the primary purpose of data collection, i.e., creating a basis for the decision of whether or not the certificate should be issued (Inwald et al. 2019). Further investigation is thus necessary. Notwithstanding, one hypothesis could be that, for instance, staff changes in the surgical team could explain why several centres with high indicator results in 1 year can have lower results in the forthcoming year. It could be argued that, meanwhile, the certified GCCs who maintain a constantly high implementation rate provide a good environment for surgeons in training and could be the ones selected to offer coaching courses for other GCCs.

Conclusion

To achieve the best possible treatment outcomes for women with gynaecological malignancies, synergistic collaboration across all disciplines and professional groups involved in oncological care as well as the pursuit of specialization by physicians are important elements (Wesselmann et al. 2014).

QIs support the establishment of guideline-based treatment in everyday clinical practice and motivate practitioners to critically reflect on their treatment results. In the audit procedures, these results are discussed, and measures are identified that enable better application of the guideline contents. The effectiveness of these measures is reviewed in the next audit 1 year later. The results of the QIs will be reported to the medical guideline development groups and provide information on how and to what extent a recommendation is implemented in everyday clinical practice and thus offer additional suggestions for further development of the guidelines. Furthermore, the results of this analysis, with a focus on ovarian and cervical cancer, suggest that dividing the analysed QI into two categories—process organization and treatment procedures—provides an opportunity to evaluate the QI improvement potential in different ways and allows the determination of appropriate improvement measures and therefore shows that a combination of different measures is necessary to anchor quality sustainably in health care and thus improve it.