Main

In recent years, screen-film mammography (SFM) has been replaced by full-field digital mammography (FFDM) in most Western breast cancer screening programs. The main reasons are a better image quality and dose optimisation, and easier and more efficient archiving and image transfer capabilities (Obenauer et al, 2002; Pisano and Yaffe, 2005).

Studies have shown that FFDM does not result in differences in cancer detection rate (Vinnicombe et al, 2009; Juel et al, 2010; Domingo et al, 2011) or higher overall breast cancer detection compared with SFM in the clinical as well as in a population-based setting (Lewin et al, 2001; Pisano et al, 2005; Del Turco et al, 2007; Glynn et al, 2011). However, others have shown a higher detection rate only for ductal carcinoma in situ (DCIS) (Vigeland et al, 2008; Karssemeijer et al, 2009). Furthermore, FFDM results in a higher recall rate, in combination with higher detection rates and decreased positive predictive values (PPV) of recall (Skaane et al, 2007; Bluekens et al, 2012). Nevertheless, the opposite effect on recall rate and PPV has also been described (Hofvind et al, 2014). The recall and detection rates for SFM vary between different screening programs. These differences in ‘baseline’ measurements might partly explain the different outcomes for the introduction of FFDM when comparing with SFM.

FFDM seems associated with an increase in smaller cancers and DCIS (Del Turco et al, 2007; Vigeland et al, 2008; Bluekens et al, 2012; van Luijt et al, 2013). It remains uncertain to which extent this might be associated with overdiagnosis and a wide range of overdiagnosis estimates exist (Jørgensen and Gøtzsche, 2009). Furthermore, some studies comparing FFDM with SFM have analysed tumour characteristics other than DCIS and report no differences in tumour characteristics such as T- or N-stage (Vigeland et al, 2008; Juel et al, 2010; Domingo et al, 2011; Bluekens et al, 2012). However, one study found that FFDM detected more lymph node-negative and hormone receptor-positive cancers (Nederend et al, 2012). Studies about the incidence of interval cancers so far report no differences in interval cancer rates between SFM and FFDM (Juel et al, 2010; Hofvind et al, 2014). As the recall rate in The Netherlands is lower than in other countries with population-based screening programs, the transition from SFM to FFDM might lead to different conclusions regarding interval cancer rates. However, data regarding interval cancers in the Dutch situation, with a national screening program, is still scarce (Nederend et al, 2013).

The objective of this study was to evaluate whether the transition from SFM to FFDM resulted in changes in the recall rate, PPV, the detection rate of DCIS and invasive cancers and tumour characteristics of screen-detected cancers in the Dutch screening program. Moreover, the rate and tumour characteristics of interval cancers were compared, as were program sensitivity and specificity.

Materials and methods

Setting

In The Netherlands, a national population-based screening program is operational since 1990, inviting women 50–69 years of age for a biennial screening examination (Fracheboud et al, 2001). Women 70–75 years of age are included from 1999 onwards. All mammographic examinations are performed by specialised radiographers. Recall for further diagnostic work-up was indicated in case of incomplete examination (BI-RADS 0) or suspicious or malignant findings (BI-RADS 4 or 5)(D’Orsi et al, 2003; Timmers et al, 2012). The screening program started in 1990 using screen-film mammography. In 2003 the first digital mammography unit was introduced and in 2010 all screening examinations were performed using digital mammography. By participating in the screening program, women consent to the use of their data to evaluate the program. Information about the use of their data is provided in a flyer accompanying the letter of invitation. If a woman does not want her data being used for this purpose, she can return a signed form to the screening organisation. Only a minor fraction of screened women (0.01%) uses this option (Fracheboud et al, 2014).

Data on cancer is registered in The Netherlands Cancer Registry (NCR). The NCR is a population-based database, which contains data on patient, tumour and treatment characteristics of all in situ and invasive malignancies diagnosed in The Netherlands. This study was approved by the Committee of Privacy of The Netherlands Comprehensive Cancer Organisation (IKNL), which hosts the NCR.

Women and data

All women (50–75 years) who underwent a screening examination between 2004 and 2010 in the Dutch Breast Cancer Screening Program, region North-Netherlands were included. The target population of this area comprised ∼300 000 women, of whom about 150 000 were invited each year. The region North-Netherlands followed the national program. It started in 1991 with SFM and implemented the first digital mammography in 2003. During 2009, the remaining seven screen-film mammography units were replaced by digital mammography units. The screening data included all dates and results of the screening examinations. These screening data were linked to data of the NCR. The main source of notification for the NCR is PALGA, the nationwide Dutch network and registry of histo and cytological pathology. After notification to the NCR, specially trained registration clerks visit the hospitals to collect information on patient and tumour characteristics, stage and treatment from patient files. Coding of the items is based on international coding rules. For the staging of cancers, TNM classification is used (Wittekind et al, 2004).

Image acquisition and interpretation

SFM images were acquired with two types of systems (GE 600/800T, GE Healthcare, Buc, France). Digital mammograms were acquired by using Embrace DR (Agfa-Gevaert, Mortsel, Belgium) or Lorad Selenia FFDM systems (Hologic, Danbury, CT, USA). Initial screening examinations performed with SFM or FFDM always included two standard views, craniocaudal and mediolateral oblique. At subsequent screening examinations, mediolateral oblique views were routinely acquired, craniocaudal views were acquired if indicated by criteria based on, among others, breast density and visible abnormality. Radiologists interpreted mammograms made by SFM as well as FFDM. Mammograms were interpreted in batch mode and were independently read by two radiologists. Differences in opinion were resolved by consensus. In subsequent screens with SFM as well as FFDM, prior mammograms were available for comparison. To facilitate soft-copy reading of subsequent screening examinations with FFDM, the most recent screen-film mediolateral oblique and craniocaudal views were digitised by using film scanners and archiver designed for mammography (IMPAX TS5 transmit scanner, Agfa-Gevaert; DigitalNow, Hologic).

Definitions

Women with a ‘screen-detected cancer’ were defined as women having a positive screening examination at the breast cancer screening program, which was subsequently followed by a diagnosis of breast cancer within 12 months. The second group comprised women with an ‘interval cancer’, that is, women who, despite a negative examination at the breast cancer screening, were diagnosed with breast cancer within 24 months. In the analysis, women with screen-detected breast cancer and women with interval cancer were considered separately. Recall rate was defined as the proportion of women with a positive screening examination out of all screened women. Detection rate was defined as the proportion of women diagnosed with breast cancer out of all screened women and PPV was calculated as the percentage of women with screen-detected breast cancer out of all recalled women. All screening examinations from 2004 to 2010 were considered for the calculation of recall rate, detection rate and PPV (n=902 868). Program sensitivity was defined as the proportion of screen-detected breast cancers out of all breast cancers (interval cancers plus screen-detected cancers). Program specificity was defined as the proportion of women without breast cancer with a negative examination (true negatives) out of all women without breast cancer (true negatives plus false positives). Program sensitivity, program specificity and the incidence of interval cancers were calculated for women screened from 2004 to 2008. As follow up was complete until 2010, the complete number of breast cancer cases, including interval cancers can be diagnosed up to 24 months after the last screening examination (n=638 663). For women with bilateral disease, the tumour with the highest stage was selected. Tumour stage was defined according to the greatest dimension of the largest tumour size (T1a: ⩽0.5 cm, T1b: >0.5 cm and ⩽1 cm, T1c: >1 cm and ⩽2 cm, T2+: >2 cm).

Statistical analysis

SFM and FFDM were compared regarding the performance indicators (recall rate, detection rate and PPV), tumour characteristics of screen-detected cancers, incidence rate and tumour characteristics of interval cancers, program sensitivity and program specificity. The comparison was stratified by initial and subsequent screening examinations. Pearson χ2-tests or Fisher exact tests were applied. Missing values in tumour characteristics were excluded in the statistical comparison. Variation in recall and detection rate per year was graphically visualised for SFM and FFDM, stratified by initial and subsequent screening examinations. The statistical significance level was set at a P-value <0.05. Analyses were performed using the STATA software package, version 13.1 for Windows (Stata Corporation LP, College Station, TX, USA).

Results

In the period 2004–2010 in total 902 868 screening examinations were performed by the Dutch Breast Cancer Screening Program, region North-Netherlands, of which 255 633 (28%) by FFDM. Twelve percent of the 647.235 examinations by SFM were initial examinations; for FFDM, this was 11% (Table 1).

Table 1 Performance indicators of women (50–75 years) screened, recalled and positive predictive value for SFM and FFDM (2004–2010)

Recall rate, detection rate and positive predictive value

The recall rate after initial examination of 2.1% for SFM remained stable during the study period (Table 1). For FFDM the recall rate of 3.0% proved to be significantly higher than for SFM (P<0.001). After the complete introduction of FFDM, the recall rate decreased to 2.8% in 2010 and was still significantly higher than the recall rate of 2.1% for SFM (P<0.001) (Figure 1A).

Figure 1
figure 1

Recall and detection rates. (A) Percentage of women recalled after initial screens. (B) Detection rate per 1000 screened women after initial screens. (C) The percentage of women recalled after subsequent screens. (D) Detection rate per 1000 screened women after subsequent screens. Of the women with SFM as initial examination, 92% were aged 50–54 years, compared with 95% of women with FFDM as initial examination. For subsequent examinations, 18 and 17% were aged 50–54 years for SFM and FFDM, respectively; 16% were aged 70–75 years for SFM as well as FFDM.

After initial examination, the detection rate for DCIS was 0.86 per 1000 women for SFM compared with 1.18 per 1000 women for FFDM (P=0.137). For invasive cancers, detection rates were 4.42 per 1000 women for SFM and 4.83 per 1000 women for FFDM (P=0.338). During the total period, the overall detection rate for SFM remained fairly constant. For FFDM the detection rate fluctuated markedly, due to small numbers (Figure 1B). The PPV after initial examination was 25.6% for SFM and 19.9% for FFDM (P=0.002).

After subsequent examinations, the overall recall rate for SFM was 1.2% and for FFDM, after an initially high rate, 1.1% (P=0.532; Table 1; Figure 1C.) The overall detection rates were 5.28 per 1000 women for SFM and 5.14 per 1000 women for FFDM (P=0.638; Table 1). No differences were found in detection rates for DCIS (0.74 per 1000 women for SFM vs 0.81 per 1000 women for FFDM; P=0.298), nor for invasive cancers (4.54 per 1000 women for SFM vs 4.33 per 1000 women for FFDM; P=0.210). The detection rate remained stable over the years for SFM as well as FFDM (Figure 1D). Positive predictive values were 45.7 and 45.2% for SFM and FFDM, respectively (P=0.638).

Tumour characteristics of screen-detected cancers

After initial examinations, no differences were found in tumour characteristics (Table 2), but a non-significant trend towards more T1a cancers was seen (4.7% for SFM and 10.6% after FFDM, P=0.138). In 16.3% of all cancers a DCIS was diagnosed after SFM, compared with 19.6% after FFDM (P=0.335). After subsequent examinations, DCIS was diagnosed in 14.0% of all cancers after SFM, 15.8% after FFDM (P=0.139). After FFDM a significantly larger part of invasive cancers were ductal cancers (SFM: 81.6%, FFDM: 85.1%; P=0.030) and the cancers were more often high-grade (SFM: 9.0%, FFDM: 23.4%; P=0.024).

Table 2 Tumour characteristics of screen-detected cancers in women 50–75 years of age for SFM and FFDM (2004–2010)

Incidence rate of interval cancers, program sensitivity and specificity

Overall incidence rates of interval cancers were similar with 2.35 per 1000 women for SFM and 2.42 per 1000 women for FFDM (P=0.748). After initial screens, no difference was found in the incidence of interval cancers with 2.69 per 1000 women for SFM and 2.51 per 1000 women for FFDM (P=0.787, Table 3). The sensitivity after initial examinations was 66.1% for SFM and 69.1% for FFDM (P=0.657), specificity was 98.5% and 96.9% for SFM and FFDM, respectively (P<0.001). After subsequent examinations, the overall incidence rate of all interval cancers was 2.30 per 1000 women for SFM and 2.41 per 1000 women for FFDM (P=0.652), the sensitivity was 69.7% for SFM and 66.7% for FFDM (P=0.232). Specificity was 99.4% SFM and 99.2% for FFDM (P<0.001).

Table 3 Women (50–75 years) screened, recalled and the number of women with screen-detected or interval cancers (2004–2008)

Tumour characteristics of interval cancers

No differences were found in the tumour characteristics of interval cancers diagnosed after FFDM compared with SFM after initial or after subsequent examinations (Table 4).

Table 4 Tumour characteristics of interval cancers in women 50–75 years of age for SFM and FFDM (2004–2008)

Discussion

After initial screens with FFDM, we found a higher recall rate and a lower PPV. No difference was found in the proportion and malignancy grade of DCIS. Tumour characteristics of invasive cancers and the incidence rate and tumour characteristics of interval cancers were similar. After subsequent screens the proportion of DCIS was similar for SFM and FFDM. After FFDM, a significantly larger part of invasive cancers were ductal cancers and high-grade. No differences were found in the incidence of interval cancers or their tumour characteristics between SFM and FFDM.

In our study a higher recall rate and lower PPV were found after initial screens with FFDM, which accounts for 12% of all screens, but no differences were found after subsequent screens. A systematic review by Vinnicombe et al (2009) showed that recall rates varied greatly with some studies showing lower and others showing higher recall rates for FFDM. We did not find a higher detection rate, which is in line with some other studies (Del Turco et al, 2007; Vinnicombe et al, 2009; Juel et al, 2010; Domingo et al, 2011). In contrast, other studies reported higher detection rates for FFDM (Glynn et al, 2011; van Luijt et al, 2013), including a Dutch study. The latter study also describes some variation in detection rates between regions within The Netherlands. As our study only included data from one region and the study could not interfere with the ongoing screening program, existing variation between regions and countries before the introduction of FFDM might explain the different results and might have an effect on the generalisability of our results. In the first years using FFDM, recall and detection rates fluctuated. This might be due to a learning curve, which has been described before (Bluekens et al, 2010), in combination with the relative low number of FFDM screens performed in comparison with SFM. As SFM was already in use since the start of the program, recall and detection rates remained stable during this study period as expected. The overall increase in recall rate after FFDM is probably due to a higher resolution and the possibilities to adjust contrast. After FFDM, women are more often recalled due to microcalcifications, which might be the result of this contrast adjustment (Bluekens et al, 2010; Nederend et al, 2012). Unfortunately, we did not have any information if the higher recall rate after initial screens with FFDM lead to an increased number of biopsies performed in patients with a false-positive result.

After initial screens, no differences in tumour characteristics between SFM and FFDM were found, which might indicate that FFDM performs similar for the entire spectrum of breast cancer. However, a non-significant trend towards more T1a cancers was seen after initial screens with FFDM. To further analyse this, the percentages of T1a cancers of all invasive cancers diagnosed after initial FFDM (10.6%) and SFM (4.7%) in this regional study were compared with the percentage of T1a cancers in initially screened women in the total Dutch population. For the period 1990–2007 (when most women were screened with SFM) this percentage was around 4% (Fracheboud et al, 2009). As we found a similar rate for SFM, this indicates that our data can be representative of the Dutch data and the trend towards more T1a cancers might be of interest. In literature a non-significant increase in T1a-c cancers was described, but the T1a cancers were not studied separately (Nederend et al, 2012).

After subsequent screens with FFDM, more ductal and high-grade invasive cancers were found, whereas other characteristics were similar. Another Dutch, multicentre study found no differences in grade in invasive cancers after subsequent screens (Bluekens et al, 2012). Our findings also differ from two Norwegian studies reporting no differences in tumour characteristics in the total screened population (Hofvind et al, 2014). These differences are interesting as they may reflect possible underlying differences in screening education or performance.

Overdiagnosis will always remain an issue in screening programs. It is accepted as an inevitable side effect of the screening program, but the debate to which extent overdiagnosis is acceptable still continues (Independent Uk Panel on Breast Cancer Screening, 2012a, 2012b). Overdiagnosis can be defined as a malignancy that, if left undetected and untreated, never would have surfaced in a person’s lifetime. An increase in low-grade DCIS might connote possible overdiagnosis. However, in our study we did not find a higher incidence of DCIS nor did we find more low-grade DCIS after FFDM compared with SFM. Another study did report a higher incidence of DCIS, but no differences in the distribution of grade (Bluekens et al, 2012). In contrast to our study, others did report that FFDM resulted in differences in grade. However, these studies reported opposite effects either towards more high-grade DCIS (Vigeland et al, 2008) or towards more low-grade DCIS (Nederend et al, 2012). Furthermore, any effect on breast cancer mortality is still unknown.

We found no decrease in the incidence rate of interval cancers after FFDM. The incidence of interval cancers for SFM and FFDM was similar, which is in concordance with a randomised trial comparing SFM and FFDM (Skaane et al, 2007). The rates of interval cancers in the Norwegian population-based screening program were also similar to our study (Hofvind et al, 2014). In addition, tumour characteristics did not differ between interval cancers diagnosed after SFM and FFDM, which is in concordance with other studies (Nederend et al, 2013; Hofvind et al, 2014).

The program sensitivity of SFM and FFDM found in the present study were both lower than in the study by Nederend et al (2013), performed in another Dutch region. A Norwegian study found a slightly higher sensitivity for FFDM and lower for SFM than in our study (Skaane et al, 2007). Both studies did not discriminate between initial and subsequent screens, which might explain the differences. Compared with the program sensitivity of the complete Dutch program of 70% after initial screens and 71% after subsequent screens (Fracheboud et al, 2014), our study found a slightly lower sensitivity. The program sensitivity has been stable at this level since at least 2004 and the Health Council of The Netherlands advised to continue the organised screening program (Health Council of The Netherlands, 2014).

Our study has several strengths and limitations. In the current study, initial and subsequent screening examinations were analysed separately, increasing the insight of the effects of FFDM. Furthermore, this is a large population-based study using standardised collected data from the NCR, including tumour characteristics on tumour size, morphology, grade and lymph node status. Data on oestrogen or progesterone receptor or HER2neu status were not available for this cohort. A limitation in comparing results with other studies might lie in the fact that the Dutch screening program invites women 50–75 years of age, which differs from other screening programs mostly offering screening to women 50–69 years old. This might limit the generalisability of our results. Second, in The Netherlands, screening examinations are independently read by two radiologists and – particularly in the Northern region – recall rates are relatively low. Comparing FFDM with SFM might therefore lead to different conclusions in comparison with other studies. Furthermore, a Dutch study found variation in recall rate, with less variation in detection rate, between regions in The Netherlands (van Luijt et al, 2013). Finally, during this study period a policy change towards making standard craniocaudal views at subsequent screening examinations started in The Netherlands. Also, during this period some radiologists synchronous read mammograms made using FFDM and SFM. Both effects could have influenced recall or detection rate for both SFM and FFDM. However, in our data we did not find an increased recall or detection rate for SFM after subsequent screening examination, indicating that these effects were negligible.

These data show that FFDM can be safely used for screening purposes. The easier image transfer capabilities of FFDM may prevent repeating the mammography during diagnostic work-up in the hospital. Further research on possible effects of the higher recall rate on the number of biopsies performed in patients with a false-positive result might add to the comparison of FFDM and SFM, as can future research of the proportion of T1a tumours, hormone receptor and HER2 status.

In conclusion, FFDM resulted in similar rates of screen-detected and interval cancers compared with SFM. This indicates that FFDM performs as well as SFM in a breast cancer screening program, with more ductal and high-grade invasive cancers found after subsequent screens. No signs of an increase in low-grade DCIS (which might connote possible overdiagnosis) were seen. Nonetheless, after initial screening, which accounts for 12% of all screens, FFDM resulted in higher recall rate and lower PPV that requires attention.