Introduction

Breast cancer (BC) screening has been widely implemented since the 1990s. Since its implementation, a reduction in BC mortality has been observed [1,2,3]. Furthermore, besides reducing BC mortality, screening may lead to other health benefits. Screening allows detection of BC at an earlier stage as compared to clinical detection through symptoms. Thus, it is reasonable to hypothesize that women with screen-detected cancer experience better quality of life (QoL) than women with clinically detected cancer. This is likely because treatment for women with screen-detected BC is more often less intense, as in this group BC is more often detected in an early stage compared to the group of women with clinically detected BC [4].

Studies on differences in QoL between women with screen-detected and clinically detected BC showed mixed results. For example, a retrospective study from Germany including 735 women with BC revealed that mode of detection was not associated with QoL as measured using the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire (EORTC QLQ) C-30 and BR23 questionnaires [5]. While in Norway, a QoL analysis of 4,487 women using the EuroQol 5-Dimension 5-Level (EQ-5D-5 L) showed that women with screen-detected cancer reported a better QoL than women with symptomatically detected cancer [6]. Furthermore, to the best of our knowledge, there is no study that investigates the QoL of BC patients based on the mode of detection at different time points. Moreover, the magnitude and clinical relevance of QoL differences between women with screen-detected and clinically detected BC have not been adequately researched. Investigating QoL differences between women with screen-detected and clinically detected BC along with its magnitude and clinical relevance may better inform women about the potential impact of screening on QoL.

In the Netherlands, BC screening has been implemented since 1990. Currently, Dutch female residents are offered mammography screening from the age of 50 to 75 every two years. In 2022, the participation rate was 70.7%, which, according to the European guidelines for quality assurance in breast cancer screening and diagnosis, is considered an acceptable level of participation rate [7, 8]. In this study, our objective is to compare QoL between women with screen-detected and clinically detected BC in the target population for screening in the Netherlands. To a large extent, these two groups represent different stages of cancer at diagnosis as a consequence of screening.

Materials and methods

Study design and data source

This study is conducted with data from the observational, prospective multicentre ‘Utrecht cohort for Multiple BREast cancer intervention studies and Long-term evaluation’ (UMBRELLA). UMBRELLA includes women with BC or ductal carcinoma in situ prior to surgery or radiotherapy in several hospitals in the Netherlands [9]. Participating hospitals in the UMBRELLA study includes the Utrecht University Medical Centre, Alexander Monro Hospital Bilthoven, St. Antonius Hospital Utrecht, Alrijne Hospital Leiderdorp, and ZGT hospital Twente in The Netherlands. Women who are younger than 18 and/or have limited Dutch language understanding are not eligible to be included in the UMBRELLA cohort [9]. Participants provided consent for the collection and use of clinical data and patient-reported outcomes (PROs) at regular intervals up to 10 years after cohort enrollment [9]. The ethical approval for the UMBRELLA study, registered on clinicaltrials.gov (ID: NCT02839863), was provided by the University Medical Centre Utrecht (NL52651.041.15, Medical Ethics Committee 15/165).

Within UMBRELLA, multiple PROs were measured periodically using validated questionnaires in the Dutch language. In this study, we analysed QoL Questionnaires developed by the EORTC: the EORTC QLQ-C30, supplemented with questionnaire module specifically for BC patients, the EORTC QLQ-BR23 [10, 11]. The EORTC QLQ-C30 is a measure for the assessment of health-related QoL of cancer patients. It consists of multi-item scales and single-item measures related to general health, functional QoL, and symptoms [10]. The EORTC QLQ-BR23 is a validated BC-specific QoL questionnaire module, that has 23 questions covering functional QoL and symptom-related items and scales [11]. For functional and summary scores, a higher score means a better QoL. While for symptomatic scales and items, a higher score indicates greater severity of reported symptoms and should be interpreted as less favourable. The summary score was calculated as the average value of 13 scale and items of the EORTC QLQ-C30 (excluding global QoL and financial difficulties) [12]. This summary score provides a single measure to assess overall QoL [13]. In addition, as depression and anxiety are prevalent among women with BC but not explicitly addressed in the EORTC QLQ-C30 or QLQ-BR23 [14], we used the Hospital Anxiety and Depression (HADS) questionnaire to assess anxiety and depression symptoms [15]. For this study, we analysed all questionnaire results which were completed shortly after diagnosis, mostly prior to treatment (T1), and one-year after treatment (T2).

To evaluate how the questionnaire results might differ between BC patients and the general population, normative scores for the general Dutch female population aged 50 to 75 on EORTC-QLQ-C30, HADS, and sexual-related questions of EORTC-QLQ-BR23 were obtained via the PROFILES (Patient Reported Outcome Following Initial Treatment and Long-term Evaluation of Survivorship) registry (2011 version) [16].

Participants

In this study, we used data of patients who were recruited within the UMBRELLA cohort between October 2013 and March 2022. The exclusion criteria were: participants with unknown detection mode, unknown clinical information, metastatic stage at diagnosis, men, aged under 50 or above 75, history of tumour in the same breast, and non-participation in the survey during T1 and/or T2.

Statistical analyses

We stratified eligible BC patients by detection mode, i.e., screen-detected or clinically detected. We calculated the average scores and standard deviations (SD) of all items in the EORTC QLQ-C30, EORTC QLQ-BR23, and HADS separately for T1 and T2. The calculation of each questionnaire scale was performed based on published questionnaire scoring manuals [17, 18]. We also calculated the average score changes between T2 and T1.

The data distribution of each variable was visually examined using Q-Q plots. Despite the existence of variables which were not normally distributed, independent t-tests were used to compare all questionnaire items between women with screen-detected and clinically detected BC. This is because when analysing public health data where the sample size is not small (guidelines suggest minimum of 30 to assume normality according to the central limit theorem), a parametric test, such as independent t-test, is already robust even in non-normally distributed and severely skewed data [19, 20].

For HADS, next to the average score, we also calculated the proportions of women who experienced anxiety symptoms (HADS anxiety scale cut-off > 8), depressive symptoms (HADS depression scale cut off > 8), and who had an indication of a psychiatric disorder (total score cut off > 12). To test the differences of these proportions between women with screen-detected BC and women with clinically detected BC, we performed chi-square tests.

To assess the difference between QoL of women with screen-detected and clinically detected BC, for each item of the questionnaires we calculated the Cohen’s d, an index to measure the strength of difference between groups [21]. The difference (irrespective of its positive-negative sign) is classified as trivial (0.0 d < 0.2), small (0.2 d < 0.5), medium (0.5 d < 0.8), or large (d 0.8) [21]. Furthermore, for each QLQ-C30 questionnaire item and scale, we categorized its clinical relevance based on a previously published guideline on interpreting differences of self-reported QoL between different treatment groups [22]. Clinical relevance indicates practical significance or importance of QoL differences within clinical settings. As there are conflicting results from previous studies on the significance of QoL differences among women with BC stratified by detection mode, we expect that there are scales and items which have limited clinical relevance [5, 6]. Thus, the value between trivial and small effect difference based on a previous study was chosen as the clinical relevance cut-off in this analysis [22]. We did not assess clinical relevance of the BR23 and HADS differences because, to our knowledge, there is not yet any published guideline on comparing minimal clinical important difference in outcomes of different treatment groups among cancer patients for these two questionnaires.

Results

Between October 2013 and March 2022, there were 4,162 women enrolled in UMBRELLA. Following the exclusion of individuals based on the exclusion criteria, a total of 1,171 women remained, comprising 691 (59%) women with screen-detected BC and 480 (41%) women with clinically detected BC (Fig. 1). Characteristics such as age, stage, surgery, and treatment can be seen in Table 1. Furthermore, we performed 87 independent t-tests and chi-square tests to analyse all questionnaire scales and items. This potentially increases the probability of committing Type 1 errors. Therefore, the p-value threshold was adjusted using the Bonferroni correction method, resulting in a p-value threshold of 0.00057 (0.05 divided by 87).

Fig. 1
figure 1

Flowchart illustrating the initial population of women with BC included in the UMBRELLA project, followed by exclusion and classification into screen-detected and clinically detected groups

Characteristics of participants

There was a substantial difference (p < 0.0001) in stage distribution between women with screen-detected and clinically detected BC, showing that women detected at an earlier stage are more represented in the screen-detected group than in the clinically detected group (Table 1). The combined proportion of Ductal Carcinoma in Situ (DCIS) and Stage 1 cases among women with screen-detected cancer was 75.6%, while it was 45.8% for women with clinically detected cancer.

Furthermore, we found a substantially lower proportion of women who underwent mastectomy among those who were screen-detected (9.3%) as compared to those who were clinically detected (18.8%) (p < 0.0001) (Table 1). A smaller proportion of women with screen-detected cancer received neo-adjuvant treatment (7.1%) as compared to women with clinically detected BCs (24.6%) (p < 0.0001). Similar results were found for chemotherapy, targeted therapy, and endocrine therapy. The median time interval between study inclusion and the baseline survey was 13 days for women with screen-detected cancers and 14 days for women with clinically detected cancers.

Table 1 Characteristics of women with BC (n = 1,171) stratified by mode of detection

Quality of life differences between mode of detection

In general, women with screen-detected BC had more favourable scores on questionnaires measuring QoL than women with clinically detected cancers (Tables 2 and 3). Score differences in QoL between women with screen-detected and women with clinically detected BC were found in both the initial survey (T1) and the follow-up survey (T2).

At T1, women with screen-detected BC showed statistically significantly better scores on three scales and items of the QLQ-C30 questionnaire, namely fatigue, appetite loss, and the summary score, compared to women with clinically detected cancer (Table 2). Additionally, the score differences of four scales and items in the QLQ-BR23 (body image, future perspective, side effects of systemic therapy, and arm symptoms) between women with screen-detected and clinically detected BC were also statistically significant, favouring women with screen-detected BC (Table 3).

At T2, women with screen-detected BC showed statistically significantly better scores across seven scales and items of the QLQ-C30 questionnaire, namely general health, physical functioning, cognitive functioning, social functioning, fatigue, constipation, and the summary score, compared to women with clinically detected BC (Table 2). In the QLQ-BR23 questionnaire at T2, the score differences of body image, side effects of systemic therapy, breast symptoms, and arm symptoms, also showed statistical significance in favour of women with screen-detected BC (Table 3). There is no scale of the HADS questionnaire where the difference in scores is statistically significant at T1 and/or T2. Thus, at T2, statistically significant score differences between women with screen-detected and clinically detected BC were found on more questionnaire scales and items than at T1.

The difference (Cohen’s d) of all questionnaire scales and items were in the range of 0.00 to 0.39, irrespective of the positive or negative sign. For example, Cohen’s d value for sexual functioning at T1 was 0.04 in favour of women with screen-detected BC suggesting a trivial difference (Table 3). While also at T1, the Cohen’s d value for systemic therapy was 0.39 in favour of women with screen-detected BC, indicating a small difference. These findings indicate that in general, the QoL differences between women with screen-detected and clinically detected BC can be considered either trivial or small.

Regarding the clinical relevance of score differences in EORTC QLQ-C30 scales and items, there were almost no clinically relevant score differences between detection modes at T1 (Table 2). The only exception was the score difference in emotional functioning, which was considered to have limited clinical relevance in favour of women with screen-detected BC. At T2, average score differences for general health, emotional functioning, fatigue, and constipation between screen-detected and clinically detected group were considered to have small clinical relevance in favour of women with screen-detected BC [22].

Table 2 EORTC QLQ-C30 questionnaire scales and items compared between women with screen-detected and clinically detected BC measured in the initial survey (T1) and 12 months post-treatment follow-up survey (T2). Population normative values were obtained from the PROFILES study [15]
Table 3 EORTC QLQ-BR23 and HADS questionnaire scales and items compared between women with screen-detected and clinically detected BC measured in the initial survey (T1) and 12 months post-treatment follow-up survey (T2). Population normative values were obtained from the PROFILES study

Using the p-value threshold of 0.00057, we found no statistically significant difference in questionnaire score change between initial (T1) and follow-up (T2) survey among women with screen-detected and clinically detected BC. (Table 4).

Table 4 Questionnaire score changes between initial and follow-up survey compared between women with screen-detected and clinically detected BC

Discussion

Quality of life differences between mode of detection

The results of this study showed that women with screen-detected BC reported a better and statistically significant QoL compared to women with clinically detected BC up to one year post treatment. However, the difference is minimal (Cohen’s d between 0.00 and 0.39) and the clinical relevance of the difference is limited. We also found that self-reported QoL differences according to detection mode are more prominent one year after treatment than shortly after diagnosis.

The better QoL reported by women with screen-detected cancer may be explained by a higher proportion of screen-detected cancers detected at earlier stages. The difference in stage at detection affects treatment choices [23, 24], and less invasive treatment may result in a better QoL. Based on a previous study, women with less invasive breast-conserving therapy reported better QoL than women who underwent mastectomy [25]. Another study that observed women with BC reported that chemotherapy and endocrine therapy have a detrimental impact on QoL, with the impact of endocrine therapy that persisted for at least 2 years after diagnosis [26]. Post-menopausal patients who received endocrine therapy also reported lower QoL [27]. Another important consideration when evaluating QoL of women with BC according to mode of detection is that screening detects BCs at an earlier stage and age. Without screening, women diagnosed with BC through screening in this study might have been diagnosed clinically at a later time, very likely at an older age. Therefore, we did not adjust for age and stage in this study as they are intermediary variables rather than confounders.

There were more differences and more statistically significant differences between the two modes of detection at T2 than at T1. The difference in the summary score of QLQ-C30 between detection modes was also larger at T2 as compared to T1 (Table 2). Looking into the scores of scales and items which differed and were statistically significant, it appears that at T2, there was an improvement of functional QoL and reduced symptoms experienced by BC patients in both groups. These findings implied that the impact of detecting BC earlier on QoL is larger at one year after treatment than shortly after diagnosis. While previous studies have shown QoL differences between mode of detection at a single time point of measurement [5, 6], our study is the first to show QoL differences between women with screen-detected and clinically detected BC at two different time points, shortly after diagnosis and one year after treatment.

At T1, there was no statistically significant difference in QLQ-C30 functional QoL scale or item between the two groups (Table 2). However, the score difference between mode of detection for one functional item, namely body image, and one functional scale, future perspective, of QLQ-BR23 were statistically significant at T1 (Table 3). This shows that the QLQ-BR23 may be more sensitive at detecting differences in functional QoL between screen-detected and clinically detected BC patients as compared to the QLQ-C30. This can be expected as the QLQ-BR23 questionnaire is tailored specifically to BC patients, whereas the QLQ-C30 which aims to assess QoL of cancer patients in general [10, 11]. Therefore, the way questions are formulated in the QLQ-BR-23 might better capture the BC-specific circumstances experienced by the patients.

Although statistical differences in QoL between modes of detection were found using the QLQ-C30 as early as shortly after diagnosis, the clinical relevance of these QoL differences is limited. Score differences on the emotional functioning scale between the screen-detected and clinically detected groups were found to be clinically relevant at both time points, shortly after diagnosis and one-year post-treatment. In contrast, score differences on other clinically relevant scales and items, such as general health, constipation, and fatigue, were only observed one-year post-treatment. For clinicians, these findings may contextualize statistical results and help identify the specific aspects of QoL that differ between screen-detected and clinically detected BC patients in clinical settings.

The inclusion of the HADS questionnaire in this study provided valuable insights into the psychological well-being of the participants. While there was no statistically significant difference in HADS scores between the two groups, it is noteworthy that women with screen-detected cancer consistently reported fewer symptoms of anxiety and depression (Table 3). This finding is consistent with a previous study from Ireland [28]. Another noteworthy finding is that at T2, the proportion of women experiencing anxiety and depressive symptoms seemed to be generally lower than at T1.

We did not perform statistical tests comparing the scores of women with BC detected by both detection modes and the scores of the general populations as it is not the aim of this study. Thus, whether the scores are different and statistically significant remains unknown. However, looking at the average of scores alone, we found no specific pattern in how the self-reported QoL among BC patients (regardless of its mode of detection) compared to the general population. For example, women with screen-detected and clinically detected BC appeared to have lower EORTC-QLQ-C30 summary score than the general population at T1 (86.1, 83.0 and 88.0 respectively) (Table 2). While at T2, the average summary score for women with screen-detected BC seemed to be improved (88.3), making it appear to be slightly higher than the general population (88.0). Another example is the score for physical functioning (Table 2). At T1, both groups—women with screen-detected and clinically detected cancer—appeared to have higher scores than the general population (85.4), with the scores of 87.9 and 85.5 respectively. Similarly, at T2, the scores appeared to remain higher at 88.8 and 85.6 respectively.

In some questionnaire items and scales, women with BC even showed more favourable scores as compared to its corresponding normative values. This might be caused by positive feelings experienced by women with BC since they managed to survive so far. Women may also value their QoL differently after being diagnosed with and surviving cancer. Another possible cause is that BC is more prevalent in women with higher socioeconomic status and women with higher socioeconomic status tend to report better QoL [29, 30]. Additionally, it may also be caused because healthier patients or patients who had fewer symptoms were more likely to participate in this survey. Although this kind of selection might occur in both groups—women with screen-detected and clinically detected BC.

Strengths and limitations

The strength of this study is that the QoL was measured at two different time points. This makes it possible to capture the QoL dynamics experienced by women with screen-detected and clinically detected BC up to one year after treatment. Secondly, we complemented the EORTC QLQ-C30 with a questionnaire module specifically for BC patients, the QLQ-BR23. In addition, we also used of HADS questionnaire to assess depression and anxiety symptoms among BC patients. This approach allows a comprehensive assessment of QoL among women with BC considering both physical and psychological aspects.

Our inclusion period was between October 2013 and March 2022, which means it included the period of the COVID-19 pandemic. This might raise concern on how the pandemic affected the results of this study. In the Netherlands, the national BC screening programme was stopped between March 2020 until June 2020 due to COVID-19 pandemic. A relevant study examining the pandemic’s impact on BC diagnosis and treatment in the Netherlands found that while there was a reduction in the incidence of lower-stage diseases, treatment delays were limited to the first eight weeks of the pandemic. Patients diagnosed thereafter experienced no significant delays in their initial treatments, although there was a notable shift in treatment strategies, with an increased use of primary hormonal treatments instead of surgical options [31]. Moreover, any delay and change on diagnostics and treatment might affect all women recruited during the pandemic, regardless of screen-detected or clinically detected. Furthermore, the majority of women included in this study were recruited before the pandemic. Therefore, we assume that the impact of the pandemic on the results of this study is minimal.

Certain biological aspects may not be fully captured by this study. Evidence suggests that women with Human Epidermal Growth Factor Receptor 2 (HER2) and Triple Negative BC (TNBC) are less frequently screen detected. TNBC is challenging to detect on mammography as it often lacks detectable features [32]. Both types are aggressive and may require intensive treatment, potentially impacting QoL [33, 34]. Therefore, having screen-detected does not always correlate with better QoL, as BC subtypes like TNBC and HER2-positive are less likely to be found through screening. Additionally, The Bonferroni correction method in this study resulted in a more conservative p-value threshold (0.00057) than the traditionally chosen p-value threshold (0.05). While this approach has minimized the probability of committing a Type I error, the probability of having a Type II error is increased [35]. Thus, it is possible that there are actually more questionnaire items which are statistically significantly different between women with screen-detected and clinically-detected BC. The result of QLQ-C30 emotional functioning item at T2 (Table 2) which showed clinical relevance, but no statistical significance may indicate a Type II error. Nevertheless, the main result of this research will largely remain unaffected, indicating that the magnitude and clinical relevance of the QoL difference between the two detection modes are marginal, regardless of the number of scales or items displaying statistically significant score differences.

Conclusions

In the target population for screening in The Netherlands, we found that women with screen-detected BC reported a better and statistically significant QoL than women with clinically detected BC. The QoL differences were larger at one-year after treatment rather than shortly after diagnosis. However, the magnitude of the differences and clinical relevance of these QoL differences are limited. Our findings add to the current knowledge on the impact of the mode of detection on QoL after BC treatment. This may allow women to make better-informed decisions regarding BC screening participation.