Introduction

The Basis of Effective Screening Tests

Colorectal cancer (CRC) screening reduces CRC-related mortality and potentially CRC incidence depending on the method used [16]. The former is achieved by earlier detection at more readily cured stages, while the latter is achieved by removal of adenomas (i.e., pre-invasive dysplastic lesions).

Screening methods using endoscopic visualization as the primary screening modality have proved effective with flexible sigmoidoscopy (FS) being supported by population randomized controlled trials (RCTs) [57] and colonoscopy by cohort and case–control studies [8, 9]. The biological basis for the prevention and early detection of CRC for these tests is the endoscopic visualization and removal of a neoplastic lesion. Detection of the presence of hemoglobin in feces using a fecal occult blood test (FOBT) has also proved effective; guaiac-based FOBTs (gFOBTs) are proven by multiple RCTs [14] and the newer technology, fecal immunochemical tests (FITs) for hemoglobin, by studies in CRC cases and controls [1012]. The biological basis for both depends on the neoplastic lesion having a bleeding phenotype.

Screening by visualization of a neoplastic lesion or by FOBT is advocated by many screening guidelines that are based on the strength of published evidence. The differences between endoscopic and the less-invasive FOBT screening have important implications for population participation.

The goals of this opinion piece are to highlight the issues to consider when choosing a FOBT for screening and to address several key challenges and controversies.

The Goals and Nature of a Screening Program

Screening aims to reduce CRC mortality and incidence on a population basis. The International Agency for Research on Cancer (IARC) states that screening programs, whether organized or opportunistic, should provide protection against the harms of screening, over-screening, the complications of screening, poor follow-up of those who test positive and poor quality of treatment [13]. We add to this list the need for screening tests to be of proven efficacy and of high analytical quality.

A screening test is just one event in a multi-step process that includes engagement, testing, diagnostic confirmation, communication, treatment and rescreening or surveillance as necessary [14]. While the test must possess the requisite sensitivity and specificity, individuals must also be willing to do the test (acceptance is a characteristic of the test itself) and healthcare professional involvement must be of high quality.

Full colonoscopy can be used for primary, one-step screening. Simpler tests such as FOBT provide the option of two-step screening, where the test selects participants at a higher risk of cancer who can then proceed to diagnostic investigation by colonoscopy. The change in the likelihood of cancer detection for these two methods can be simply calculated as sensitivity divided by 1 − specificity [15, 16]. In the gFOBT RCTs, the likelihood of finding a cancer given a positive test was eightfold to 25-fold times greater relative to colonoscopy without any intervening test [17]. Thus, it is crucial that test-positive individuals are subjected to diagnostic clarification.

Willingness to undertake the screening test is the first crucial step. In many screening settings, only a minority of the eligible population actually participates. For simple tests with low application sensitivity (one screen only), repeated participation is necessary. An example of how participation can determine the effectiveness of screening was shown by first-round screening in a Spanish trial comparing colonoscopy with FIT [18]. With participation of 34 % with FIT and 25 % with colonoscopy, and the willingness by some in the latter group to be first screened with FIT, the number of CRCs detected was higher in those who were first screened by FIT. The importance of repeated screening is demonstrated in the gFOBT RCTs where it was documented that some CRCs were only detected in subsequent rounds [3, 4], possibly due to intermittent bleeding from important lesions or the poor analytical sensitivity of gFOBT. Although first-round FIT screening detects more cancers, repeated screening is also necessary [19]. Of course, rapid development of new lesions might also occur.

The key steps in screening are shown in Fig. 1.

Fig. 1
figure 1

Key steps in screening, each of which needs to be completed with high quality for there to be an impact on mortality from and/or incidence of CRC

The Evidence for Guaiac-Based FOBT (gFOBT)

The earliest technology for FOBT, namely gFOBT, can be seen as simple and of proven benefit, but it has poor accuracy and subject (screenee) acceptance. gFOBTs also lack precise objective end points. We provide an overview of issues that have stimulated a quest for a better-performing FOBT.

Performance of gFOBT

The effect of gFOBT on mortality is modest. The traditional (i.e., unrehydrated) gFOBT (Hemoccult II was the FOBT used) returned an intention-to-screen reduction in CRC mortality of 15 % [20]. This effect was limited by screenee acceptance (generally just over one-half of the population) and sensitivity for neoplasia. Once-only test sensitivity for cancer may approximate 50 % [21] although other studies indicate it is lower [22]. For some countries, this limited sensitivity raised concern among practitioners for legal liability for missed lesions. In consequence, some jurisdictions have not been enthusiastic about adopting this as the primary population screening.

Hydration of gFOBT samples improves the detection of heme. This can lead to larger CRC mortality reduction [1], but also activates plant peroxidases and so compromises specificity [23]. This led to the development of more sensitive gFOBT, sometimes referred to as “high-sensitivity gFOBT.” An early example is Hemoccult Sensa—its cancer sensitivity has been shown to be twice that of Hemoccult II [22]. However, poor specificity is a problem with sensitive gFOBT [22, 24], and their use is associated with high colonoscopy demands. The optimal dietary restrictions to minimize false positives with gFOBT are well known [17]. However, these restrictions are barriers to participation [25]. In certain populations, e.g., Asian settings, the false-positive rate with more sensitive gFOBT is high, possibly due to dietary interference [22, 24] which renders them relatively useless in such settings.

Technical Issues with gFOBT

gFOBT detection of blood is dependent on heme in feces [26]. When hydrogen peroxide is added during analysis, heme reacts with the hydrogen peroxide developer to oxidize guaiac, resulting in a color change to blue. While gFOBTs are cheap and designed as point-of-care tests, they require a moderate quantity of heme to effect a visible change in color and thus are not analytically very sensitive to the presence of blood [27]. The method relies on simple oxidation, and therefore, any dietary peroxidases, such as heme from myoglobin in red meat, peroxidase in plants, etc., or antioxidant, such as vitamin C, have the potential to confound the result. The gFOBT is therefore an inherently nonspecific test.

gFOBTs are technically crude, and in the age of quality assurance of diagnostic tests, they fall far short of what would be ideal for a test that might be analyzed in the high volumes usual in programmatic screening [28]. A major issue is that gFOBTs have a subjective and evanescent end point not readable, let alone quantifiable, using automated instrumentation and therefore not suited to high-throughput screening programs. Professional quality assurance programs are minimal [29], and problems in variation in reading of gFOBT among laboratory staff have been well known for decades [30, 31].

Technological Advances in Detection of Hemoglobin in Feces

It is not surprising, therefore, that advances in methods to detect and measure hemoglobin in feces have been welcomed [32]. An understanding of the biochemical fate of hemoglobin in the intestinal lumen provides a basis for understanding the advantages of these new technologies [26].

Biochemical Fate of Hemoglobin in the Gut

Hemoglobin is digested/degraded in different ways in different regions in the gut. In the stomach and small intestine, the globin moiety is digested by proteolytic enzymes of endogenous origin. Proteolysis also proceeds in the colon but at a slower and highly variable rate and partly due to microbial enzymes [26]. Such changes have different implications for the technologies employed in detection depending on whether they target the heme or globin moieties.

In the colon, heme is subject to bacterial enzymatic degradation which releases iron and protoporphyrins [33]—the resultant products have no peroxidase activity and so gFOBT detectability is lost.

Consequently, feces will contain a mix of intact hemoglobin, intact heme and globin as well as globin and heme in varying stages of degradation the degree to which will depend on the location of bleeding in the gut.

Technology that Detects These Products

While gFOBT positivity is dependent on the presence of the heme, more recent diagnostic tests target other moieties or derivatives of hemoglobin.

Fecal Immunochemical Tests (FITs) for Hemoglobin

“FIT” was recommended as the preferred name for this screening technology by a World Endoscopy Organization (WEO) Expert Working Group in 2012 [34] to avoid confusion with gFOBTs and to emphasize the substantial analytical, clinical and organizational opportunities these tests provide for CRC screening.

At a technical level, FITs use antibodies, monoclonal or polyclonal, specific for the globin moiety of human hemoglobin. A variety of immunoassay methods, including immunochromatography, immunoturbidimetry and ELISA, measure the development of antibody–globin complexes [35]. Generally, the technique is analytically sensitive to low concentrations of globin and is not known to be subject to direct interference from other constituents of feces including medication and dietary products.

The immunoassay methods are not all the same and differ substantially between qualitative and quantitative FIT [36]. FIT can be placed into two general analytical techniques: lateral flow immunochromatographic analysis typically exploited in point-of-care (POC) devices and laboratory instrument-based immunoturbidimetry or alternative end point analyses [35]. Many qualitative FIT (positive/negative result) devices are available and are designed for use at the point-of-care, outside of a laboratory [35]. For qualitative tests, only the manufacturer can adjust the conditions of the analysis and hence the sensitivity for detecting globin; they are generally not adjustable by the end user. Even though they are point-of-care tests, in application they require skill and practice to obtain consistency in sample application and visual interpretation. Experience with a range of POC devices in a primary care-led screening program in the Czech Republic has recently shown widely different positivity rates and an inability to monitor analytical performance across the program. Few, if any, of the POC FIT have peer-reviewed published results of their performance characteristics or laboratory quality control in large average-risk populations.

Quantitative FITs generally use immunoturbidimetric analysis and so provide a semiquantitative measure of globin in feces captured in a buffer solution in the sampling device. While the measured concentration is, in part, dependent on the amount of feces sampled, studies have consistently shown that the concentration is related to the nature of the neoplastic pathology present [3739]. In other words, the degree of bleeding discriminates between normal physiological gastrointestinal bleeding and the presence and extent of neoplasia-related bleeding. The criterion value (the value used to set positivity, here a hemoglobin concentration in feces and hereafter referred to as the cutoff concentration) for discrimination can be readily adjusted with a quantitative test [3739]. By adjusting the cutoff, the performance can be adjusted to match the desired sensitivity and specificity of a screening activity [40]. This is illustrated in Fig. 2.

Fig. 2
figure 2

Theoretical representation of distribution of fecal hemoglobin concentrations in normal subjects and cancer cases. The arrows labeled a, b and c point to different fecal hemoglobin concentrations (criterion values) which might be chosen to discrimination between those without pathology (normal) and those with cancer. At c, most normals are declared negative (hence a high specificity) and a majority but not all cancers are declared positive, while at a, most cancers are included (high sensitivity) but more normals will test positive

With quantitative FIT, one can set the cutoff concentration to a desired sensitivity or specificity or to manage colonoscopy follow-up rates according to the operational imperatives of a screening program. This brings both practical and clinical advantages to population screening. It enables clinical interpretation of the result and its significance in a manner similar to the interpretation of a cholesterol or glucose measurement result against the risk of associated disease. It also raises the possibility of generating a multivariate risk score, incorporating other accessible risk factors such as age, sex, family history, screening history, and perhaps BMI and smoking [41]. Quantitative FIT opens opportunities for major enhancement to the current binary risk (positive or negative) outcome offered by gFOBT and is an approach that has been described by Stegeman et al. [42].

FITs, especially the quantitative versions, also provide major laboratory advantages over gFOBT. The end point is more objective, easier to read (often by an instrument) and more amenable to quality assurance (QA) procedures. FIT can be partly or fully automated and is therefore well suited to large-scale population-based screening programs.

FITs are analytically more specific than gFOBT and not subject to the factors known to interfere with gFOBT [17]. They are also selective for colorectal bleeding since globin from the upper gastrointestinal tract is degraded readily by digestive proteolytic enzymes, with a study showing that up to 100 mL of ingested blood was not detected by some immunochemical methods but was by gFOBT [26]. FIT is, however, not clinically specific because nonneoplastic and benign pathologies may also bleed and there is a baseline level of globin in feces that reflects physiological blood loss [39].

Heme-Derived Porphyrin Assay

A heme–porphyrin assay (HemoQuant) has been developed to measure fecal heme and heme-derived porphyrins [43], but it has not proved useful for screening. Like gFOBT, it is subject to the effects of red meat ingestion [33] and it would detect heme in shed cells (e.g., in cytochromes) and in ingested foods.

Choosing a FOBT for the Screening Context

Choosing the right FOBT for a given screening setting requires consideration of the attributes of the two main technological options—FIT and gFOBT. There is a range of apparently similar products for each technology [35], but each will have different characteristics due to assay design, choice of antibody (for FIT), source and concentration of guaiac (for gFOBT) and volume/mass of sample collected in, or applied to, the device. Two products might use the same technology—gFOBT or FIT—but might have very different performance characteristics. When choosing a test technology, consideration must also be given to how the test will be applied in the screening program (e.g., target population, climate, number of test samples, cutoff concentration and testing frequency).

Consideration of test operating characteristics and required accuracy are crucial since these relate to the likelihood that neoplasia is present and also to the health system demands and to the derived cost benefits.

Informative test operating characteristics [44] fall into two main categories of program consequence reflecting the test capacity to detect neoplasia (related to sensitivity) and the burden on the health system associated with detection (related to specificity). Table 1 shows both direct and practical measures of accuracy (test operating characteristics) which are used for ROC (receiver operating characteristic) analysis. From the discussion in Sect. 2.2.1 and Fig. 2, it is apparent that sensitivity cannot be adjusted independently of specificity.

Table 1 Relationship between direct, practical measures (operating characteristics) of a screening test result, how each informs assessment of test accuracy and what the consequence of the result is for a screening program

Consideration of the many 2-step centrally coordinated screening programs around the world [45] shows that different health systems vary in their focus on which strategic outcome is most important in designing the screening program. The choice of FOBT should suit the requirements of the program. In simple terms, one needs to decide on the desired balance between detection and the effort involved in detection as well as the desired degree of population engagement. Consideration of four main strategic scenarios facilitates the selection of an FOBT:

  1. 1.

    Highly constrained colonoscopy resources: The colonoscopy workload created by screening which is determined by the test positivity rate in the screening population. Some health care systems consider it necessary to constrain positivity to around 2–3 %. This achieves efficient detection with a small number needed to colonoscope to detect one cancer but means that a significant number of cancers and advanced adenomas are missed, leading to a high interval cancer rate. This may in part be addressed by accepting a short screening interval, but this obviously translates in time to higher colonoscopy demands.

  2. 2.

    Maximum detection: Maximization of detection of cancer and adenomas means applying the most sensitive FOBT with less concern for specificity. This approach is more common in screening settings where screening is promoted but not centrally organized. It has led to the introduction of the term “high-sensitivity” FOBT and specifically refers to those FOBT that return a once-only test sensitivity for cancer of above 50 % [46].

  3. 3.

    Balancing detection and colonoscopy burden: Compromising between maximizing detection and colonoscopy burden by screening with higher-sensitivity FOBT that does not create a large colonoscopy demand. This is equivalent to choosing an optimal PPV for a given screening setting.

  4. 4.

    Optimal screening participation, whether offering FOBT as the only screening modality, or in a multi-modality program: Population detection is the product of participation rate and test sensitivity so it is crucial to detection of neoplasia. Screening environments vary in the emphasis placed on this crucial parameter, but it applies to some degree in each of the above scenarios.

Of course, rather than be offered as part of a centrally coordinated screening program, screening might be offered to an individual in the setting of a face-to-face consultation. Here, tailoring the screening test to the individual’s situation with attention to test quality, effectiveness and cost is important [35]. Scenario 3 provides this flexibility in that it allows for choosing from a range of test performance characteristics.

The question therefore arises as to which type of FOBT is suited to each of these four scenarios?

Comparative Performance of FOBT

Before describing how different FOBT might be selected to suit these scenarios, it is useful to summarize what is known about the operating characteristics and accuracy of the different FOBTs. This would be most thoroughly informed by large, comparative screening studies, but such studies are impracticable. Consequently, we plot sensitivity and specificity for cancer reported for gFOBT (Fig. 3) [4758] and FIT (Fig. 4) [38, 39, 48, 54, 55, 5966] from a range of studies, to provide some idea of the range of sensitivity/specificity relationships for these technologies. It will be obvious that these measures of accuracy vary greatly between tests within a technology as well as between technologies and according to how the test is applied (e.g., sample number) and what is chosen as the cutoff.

Fig. 3
figure 3

Reported sensitivity and specificity for CRC of a range of gFOBT [4758]

Fig. 4
figure 4

Reported sensitivity and specificity for CRC of a range of FIT [38, 39, 48, 54, 55, 5966]

Generalizations about performance accuracy are complicated by test use in different populations and in different ways. This in particular pertains to the cutoff for positivity and the number of fecal samples collected. It is readily apparent from Figs. 3 and 4 that FITs are able to achieve a higher sensitivity for cancer and show a tighter curvilinear relationship between sensitivity and specificity than gFOBT. From Fig. 4, it is also apparent that FITs provide a broad range of options for matching each of the Scenarios 1–3 (see Sect. 3).

To more critically match tests to the screening scenarios described above, three main topics will be addressed: comparison of gFOBT kits, comparison of FIT devices and systems and comparison of gFOBT with FIT.

Comparison of gFOBT Kits

Results of a large screening study in about 8,000 subjects are summarized in Table 2 [22]. In this study, the gFOBT sensitivity for cancer was 37 %, consistent with estimates from the Nottingham RCT [4]. The same study (Table 2) showed that sensitivity for cancer doubled to 79 % with a high-sensitivity gFOBT, but the number of positive tests increased over fivefold. In other words, detection of twice as many cancers required more than five times as many colonoscopies. A subsequent study by the same group [67] returned a positivity rate of 10.1 % and a sensitivity for cancer of 64.3 % (CI 35.6–80 %) with the high-sensitivity gFOBT.

Table 2 Comparison of two gFOBT and one FIT in a screening population (n > 8,000) in California [22]

While rehydration of gFOBT increased program sensitivity for cancer in the Minnesota RCT (92.2 % compared with 80.8 % for nonhydrated), it also decreased specificity (90.4 % compared with 97.7 %) [1] and resulted in a much higher test positivity rate. In a separate comparison, the rehydrated gFOBT had a positivity rate more than two times higher than a sensitive gFOBT (15 vs 7 %), but a lower PPV (2.6 vs 4.9 %) [68].

These findings, taken together with the specificity problems described above in Sect. 1.3.1 for gFOBT, make it clear that “high-sensitivity” gFOBTs detect approximately twice as many cancers compared with nonhydrated gFOBT but at a marked deterioration in specificity and increase in test positivity rate. Furthermore, the increase seems greater with rehydration and is unpredictably high in some populations such as in certain Asian populations [24].

We would therefore conclude that traditional gFOBT can be suitable for Scenario 1 screening settings (limited colonoscopy resources) but not for Scenario 2 (maximum sensitivity required). To further meet the demands of Scenario 1, and increase PPV and reduce the number needed to colonoscope, some programs (e.g., Scotland, England) further restrain the test positivity rate and colonoscopy referral by requiring that at least 5 of 6 windows (using a three-sample gFOBT that has two windows per sample card) need to be positive on initial screening. Those with 1–4 positive panels are then subject to some form of retesting [69]. High-sensitivity gFOBTs are better suited for Scenario 2, although rehydration is not a desirable way to achieve this and should be abandoned.

Comparison of FIT Devices and Systems

A limited number of studies have compared FIT devices and systems. There are many qualitative products available [40] plus a smaller, but growing, number of quantitative devices. These studies have varied in size, devices tested, methods and outcomes reported, all of which make comparison between studies challenging. As a consequence, a call has been made for standardized reporting of FIT [29, 70, 71].

FIT sensitivity and specificity are shown from 13 studies in Fig. 3. They show the expected curvilinear relationship between sensitivity and specificity and demonstrate that one can readily choose a FIT with high sensitivity for CRC (at the cost of reduced specificity) or one with high specificity and hence constrained test positivity rate yet still achieving a cancer sensitivity above 50 %.

It is rare for a screening program using FIT to require more than 1 or 2 fecal samples [40]. Several studies have indicated that two samples give the best sensitivity and specificity for cancer [55, 63], with one study showing no difference in sensitivity between 2 and 3 samples and no difference in specificity between one and two samples [63]. Another study showed that offering 1 or 2 FITs did not affect uptake [72]. Positivity is higher with multiple samples collected, and this has been achieved without markedly increasing the number of colonoscopies needed to detect a neoplasm [64] although this would be dependent on the cutoff used.

As indicated above in Sect. 2.2.1, if one chooses a quantitative FIT, then the cutoff can be chosen to suit the screening scenario. The trade-offs are well described by Rozen et al. [64]. Sensitivity is lowest and positivity highest when a low hemoglobin concentration is chosen for the cutoff, while specificity and positive predictive value are highest at a high concentration. Rozen et al. [64] found that a 95 % specificity for CRC (considered appropriate for average-risk screening) was achieved with a one-sample quantitative FIT with a 100 ng/mL cutoff of hemoglobin in sample buffer (equivalent to 20 µg Hb/g feces with an OC Sensor FIT), whereas two or three samples at 50 ng/mL (10 µg Hb/g feces with OC Sensor) cutoff increased sensitivity but decreased specificity to 90.2–87.8 % and increased colonoscopy workload. Similar studies in other populations do not return the same values, and when choosing a test for screening, a test’s operating characteristics need determination in the intended target population. Nonetheless, FITs are highly flexible and can be used into Scenarios 1, 2 or 3!

Comparison of gFOBT and FIT

A number of studies have compared gFOBT with FIT, with the same limitations applying (see Sect. 3.1.2) when comparing different FITs. Because of the broad choice of FITs and the broad range of performance characteristics (Fig. 4), one must be guarded when making generalizations especially when using a quantitative FIT since the operating characteristics are not fixed and can be easily adjusted to suit the screening scenario.

Higher sensitivities are achievable for CRC with FIT than gFOBT. For example, one test with a positivity threshold of 20 µg Hb/g feces as cutoff has been reported to have a sensitivity for cancer of 87.1–92.3 % compared with 30.8–74.2 % for a traditional gFOBT [54, 55]. These recent results mirror an early large-scale comparison as shown in Table 2 when an early FIT was compared to a traditional gFOBT.

Specificity is generally reported to be slightly lower with FIT compared with a traditional gFOBT. For example, a commonly used FIT at a fecal hemoglobin cutoff of 20 µg Hb/g feces has a specificity of 90.1–94.2 % compared with 92.4–95.7 % for a traditional gFOBT [54, 55]. However, if a quantitative FIT is used and the cutoff is set at a level that returns the same test positivity rate as the gFOBT under comparison, the PPV for cancer is higher with the FIT than with the gFOBT [48, 73].

A specificity advantage for FIT is the higher sensitivity for cancer when used at low cutoff. As shown in Table 2, a FIT returned a sensitivity for cancer comparable to that of a high-sensitivity FOBT but less than half the test positivity rate—this means the number needed to colonoscope to detect each cancer was much lower.

Test positivity rate in a general screening population tends to be higher with FIT compared with gFOBT. One comparison using a FIT at a positivity threshold of 20 µg Hb/g feces (and collection of 1 sample), the positivity rate for the FIT was 3.4–5.5 % compared with 2.4–3.5 % for the gFOBT [37, 74, 75]. Despite this, the PPV for cancer was similar; 8.6–10.2 % for FIT compared with 9.7–10.7 % for a gFOBT [37, 75]. This means that, in practice, more cancers are detected by FIT but not at a significantly higher rate of colonoscopies done per cancer detected.

FITs are more sensitive for advanced adenomas than gFOBT and so improve capacity to prevent cancer. Several studies show that FIT has a sensitivity for advanced adenoma 2–3 times that of gFOBT although this is dependent on the chosen cutoff concentration [48, 54, 55].

Based on these findings, FITs are clearly the optimal technology for Scenarios 2 and 3. For Scenario 1, the advantage over gFOBT is not quite so marked and given that FITs generally cost a few dollars more than gFOBT, a case for retaining gFOBT in Scenario 1 can sometimes be sustained, although the issue of population participation needs consideration (see Sect. 3.3 below).

Finally, the flexibility of quantitative FIT enables “smarter” use of FIT, including first-round screening undertaken with a more sensitive configuration (low cutoff, use of two samples) followed by subsequent rounds with less sensitive configuration based on the knowledge that a proportion of prevalent lesions will have been removed.

Causes of Test Positivity

A number of factors other than the test configuration itself can contribute to the variability in FIT positivity reported by different screening programs. The basis for these differences is that positivity rate is directly related to the tested population. It is known that positive tests occur more frequently in men than in women, in older populations and in the more economically disadvantaged [72, 7580]. The distributions of fecal hemoglobin concentration are different from country to country [81]. Previous participation in FIT screening [82] also influences positivity rate. The role of other factors in affecting the FIT positivity rate is not so thoroughly explored. Time between sampling and test development had no significant effect in one study [72]. Other studies do suggest the possibility of degradation of hemoglobin with delayed sample return [83, 84].

Ambient temperature may affect FOBT positivity as in vitro studies show that hemoglobin levels in samples fall at temperatures above 20 °C [85, 86], most likely due to degradation. This is confirmed in population screening programs, with most studies finding that the summer months are significantly associated with a decrease in the positivity rate for both FIT [72, 87, 88] and gFOBT [8991]. Taking the former studies into consideration, the Australian National Bowel Cancer Screening Program now avoids sending FITs to participants during summer [80].

Additional nonneoplastic factors reported to affect FIT positivity include

  • Medication: use of anti-platelet drugs increases positivity rate [59]

  • First versus repeated participation; positivity rates are higher in first-time participants

  • Personal history: positive result was more likely in those who had a personal history of colorectal neoplasia [92]

  • Benign bleeding disorders increase positivity rate

These variables are likely to differ between populations and are uncommonly fully documented in reports on population screening studies. The call for standardized reporting of studies using FOBT should facilitate an understanding of the differences between studies.

Behavioral Considerations

It is over a decade now since it was shown that mass, impersonal population screening with FOBT achieves better participation rates when using a FIT relative to gFOBT [25, 93]. RCTs addressing participation as the outcome show that participation is improved by providing an easier device, restricting the need for fecal sampling to only one or two bowel movements, and removing the need to restrict diet and certain drugs [25, 93]. An early 2-sample brush FIT achieved 67 % better participation than a 3-sample stick-sampling gFOBT with dietary restrictions [93]. Subsequent studies confirm better participation with FIT [37, 75, 94, 95] even when populations differ in the effect of other determinants of participation such as socioeconomic status, gender, age, nature of the diet and hence impact of dietary restrictions. Moreover, using a FIT compared to a traditional gFOBT increases participation especially in the young, males and the deprived, the very groups that have low participation with traditional gFOBT [96].

At the population level, participation is crucial to detection of neoplasia since the rate of cancer detection in the population is the product of sensitivity for cancer and the participation rate [17]. In other words, behavioral parameters are just as important as technical performance when considering what test achieves the desired cancer detection rate.

In conclusion, population participation is consistently higher with FIT than gFOBT which addresses the requirements of Scenario 4.

Laboratory and Regulatory Considerations

The advantages of FIT over gFOBT in the laboratory are outlined in Sect. 2.2.1. Traditionally, FOBTs have been seen in many countries as point-of-care (POC) tests with a history that goes back 3–4 decades. In the POC test format, they have generally escaped attention in the increasingly stringent quality assurance processes that now apply to diagnostic laboratories around the world [71]. This is despite the well-documented issues with gFOBT readability [31]. Efforts are underway in some countries to address this, but many still underrate the importance of paying careful attention to test QA. In practice, QA of both FIT and gFOBT requires attention to both consistency of sample collection and analytical performance. Good QA procedures for sample collection are difficult for both gFOBT and FIT. Analytical QA procedures are also difficult for gFOBT but not for quantitative FIT where standard internal QC and external QA procedures can be easily adopted. There has been a recent call to standardize reporting of studies on FOBT, especially those including FIT, by using the FITTER criteria [29]. Inherent in these is the inclusion of total quality management strategies.

There is considerable variation between countries concerning FOBT approval for marketing and reimbursement. Where approval processes consider FOBT as just POC tests, the evidentiary standards to register a new test are often not high. Quantitative FITs, especially where there is a degree of automation, are not POC tests and should be regulated as appropriate for general laboratory-based tests.

Flexibility with Quantitative Tests

When implementing screening with FOBT, flexibility can be achieved in a number of ways. Programmatic performance characteristics of both gFOBT and FIT can be manipulated to some degree by altering the screening interval (e.g., annual, biennial or triennial), the number of fecal samples collected and the number of gFOBT “windows” required to be positive to trigger colonoscopy.

However, the greatest degree of flexibility is provided by quantitative FIT as outlined in Sect. 2.2.1 and shown in Fig. 2. Thus, a quantitative FIT can be used in any of the screening program scenarios described in Sect. 3. The desired balance between detection and the effort involved in detection (i.e., the workload) needs to be decided, and in a face-to-face consultative setting, this decision needs to be tailored to the requirements of the individual. Once the prime scenario for screening is decided, the test operating characteristics (see Table 1) need to be selected to match the chosen scenario.

Two examples explain how this can work. For instance, if one wishes to control colonoscopy workload to a specific proportion of participants (Scenarios 1 or 3), then one would choose a cutoff concentration that returns the corresponding test positivity rate in the target population [97]. While guidance can be obtained from studies undertaken by others, pilot studies in the intended context are needed to verify this choice. It becomes easier with a quantitative test to adjust the cutoff if the outcome is as required, a qualitative test would necessitate selection of a different test product involving a further pilot study.

If the choice is to maximize detection (Scenario 2), then a cutoff that gives the desired sensitivity can be chosen. It can be seen from Figs. 3 and 4 that in addressing Scenario 2 (which aims at high detection) this is generally achieved with a sensitivity that corresponds to a specificity of 90 % or worse. It would seem wise to use a FIT even in that setting since the colonoscopy workload will be less for the same benefit in detection.

Intention-to-Screen Outcomes with FIT and gFOBT

A few programs have compared gFOBT and FIT on an intention-to-screen basis, where behavioral and accuracy characteristics interact to determine detection of neoplasia and the burden of detection within the population.

The first such paper, from The Netherlands [75], showed that improved sensitivity and participation rates with a FIT compared with a gFOBT led to doubling of the detection of cases with advanced adenoma or cancer in a large study. While this additional detection required approximately double the number of colonoscopies, the number of colonoscopies per significant neoplasia case detection was approximately the same. In other words, the extra effort associated with the FIT seemed justified.

Hol et al. [37] reported a similar study comparing gFOBT, FIT and sigmoidoscopy. The results were similar. Participation rates were 49.5 and 61.5 %, respectively, for gFOBT and FIT with positivity rates of 2.8 and 4.8 %, respectively. Cancers were detected in 0.3 % and 0.5 % and advanced adenomas in 0.9 and 2.0 %, respectively, while PPV of each test did not differ.

Considering the combined advantages of improved participation and improved detection, and given that FIT exploits the same biological basis for early detection as gFOBT, FIT must be considered superior to gFOBT for CRC screening.

Challenges and Controversies

There are several aspects of FOBT usage that warrant particular consideration.

The first concerns the number of fecal samples. While three samples are the norm for gFOBT, FITs return equivalent or better performance with just one or two samples. Increasing the number, and referring to colonoscopy as soon as at least one sample tests positive, does improve sensitivity [55, 63, 73, 98] but also usually leads to a reduction in specificity [63, 73, 98]. In contrast, 2-FIT testing with referral only if both tests are positive decreases sensitivity, but increases specificity. Participation is likely similar to 1- and 2-sample FIT. The few studies that have addressed this show no or only a marginal difference [98]. It should be noted that costs will be reduced if one sample is used [99].

If using quantitative FIT, it is possible to independently vary sample number and cutoff for positivity [55, 73, 100]. As a consequence, if using the a quantitative FIT, programs tend to choose a cutoff of 10–30 µg Hb/g feces when testing one sample compared with 20–40 µg Hb/g feces when testing two samples. For cancer detection, there is little difference between these options, but it has been pointed out that adenoma detection is better when two samples are tested [100]. It should be stressed that these results concern single, first-round screening only. With repeated rounds, the yields of 1- and 2-FIT testing are likely to approximate.

The second issue is the cutoff to be used. It should be obvious from the discussions above that this depends on choice of screening scenario. FOBT result identifies the likelihood that cancer is present. In other words, the chosen screening scenario supported by pilot studies in the target population will identify the most suitable cutoff.

The third issue is whether any role remains at all for gFOBT. We would argue that when a high-sensitivity FOBT is desired (as for Scenarios 2 or 3), FITs provide similar sensitivity without the colonoscopy workload required with gFOBT. There might be a case for use of traditional gFOBT for Scenario 1 but only if one disregards the major disadvantages of gFOBT and the behavioral and laboratory advantages of FIT. There is a small cost differential, but this proves insignificant in cost-effectiveness studies [101].

The fourth issue is the screening interval. While one might speculate based on theory and modeling, that higher-sensitivity tests might be repeated at longer intervals, without precise knowledge of the ideal length of time in which a cancer is detectable by FIT and remains highly curable, it is impossible to predict whether the interval can be extended beyond second yearly. One study that considered this and compared 1-, 2- and 3-year intervals found similar yield in the second round of low cutoff FIT [102]. Modeling of further data from different studies will help to determine the optimal approach for individual populations based on their characteristics and resources.

A fifth issue is whether we should always use the same cutoff value for fecal hemoglobin concentration in a program, no matter who is being screened and whether this is the first round of screening or a subsequent round. At this point, there is no direct evidence and modeling the many possibilities is not fully developed. There is a case for an initial screen using a very sensitive FOBT, with subsequent screens using adjusted cutoffs so as to achieve a desired sensitivity, colonoscopy workload or efficiency of detection. More data to assist such modeling are required and such can only be achieved with quantitative FIT.

A complementary issue is whether different cutoffs should be used for different subpopulations. Since fecal hemoglobin concentrations are related to age, gender and deprivation, and such data are usually obtainable for invitees, cutoffs could be adjusted so as to achieve a desired sensitivity, colonoscopy workload or efficiency of detection within a subgroup. One might even take this further to the individual level and use more complex risk algorithms as briefly discussed earlier [103].

Another issue is quality control of FOBT and whether screening programs using FIT should use automated FIT (where the test is measured and interpreted by a computerized analytical instrument) or POC tests (where the test is generally undertaken by personnel inexperienced in analytical procedures including QA). It is concerning that 93 % of the FITs in the US are POC and there is no oversight of quality control [35].

Finally, in the selection of a FIT product, should only those FIT with demonstrable quality control and screening outcome performance be adopted over those with little or no supporting evidence? Guidelines usually refer to FOBT in a generic manner, whereas very few brands have adequate supporting data to substantiate their use. It would be valuable to specify what criteria are needed before a test is approved for use. This is often sought when providers set up a process of procurement of FOBT. Guidance on what should be requested has been published [Quantitative FIT Procurement. FIT for Screening Expert Working Group, Colorectal Cancer Screening Committee, World Endoscopy Organization. Available http://www.worldendo.org/assets/downloads/pdf/activities/weo_expert_working_group_fit_discussion_doc_no6_pr.pdf.]

Summary/Conclusions

The choice of FOBT should suit the requirements of the screening setting so that it can achieve the desired balance between detection and the effort involved in detection as well as the desired the degree of screening participation. The following summarizes how FOBTs best suit each of the four main strategic scenarios:

  1. 1.

    Highly constrained colonoscopy resources: use an FOBT with a low test positivity rate in the target population. While this can be achieved with either certain gFOBT or FIT, FITs are overwhelmingly preferable.

  2. 2.

    Maximum detection: For gFOBT, high-sensitivity gFOBT might be considered suitable, but FIT can achieve the same high sensitivity with better specificity and fewer colonoscopies. Thus, if the goal is maximum sensitivity, FIT should be the test of choice potentially using two or more samples. Simply referring to an FOBT as “high sensitivity” fails to adequately characterize the test, especially as high-sensitivity gFOBTs are subject to great and unpredictable variations in performance.

  3. 3.

    Balancing detection and burden of detection: quantitative FITs are ideal for this situation as they provide flexibility to tailor to the circumstances of either a population or an individual.

  4. 4.

    Optimizing participation: FITs are superior to gFOBT.

Overall, FIT technology is more selective for colorectal bleeding, less affected by nonpathological factors such as diet and drugs, more suitable for the modern laboratory and large-scale processing of tests, more acceptable to individuals and more flexible in terms of choice of screening test characteristics than is the gFOBT technology. It has been suggested that gFOBT is now obsolete [104].

Population screening for CRC should be undertaken predominately with a well-characterized automated FIT in an accredited laboratory with trained staff applying rigorous quality assurance procedures. Screening programs need to be open to regular audit and performance monitoring, and reported results subject to review and external scrutiny. POC FITs also need to meet high standards of quality both before being approved for use and when implemented. POC FIT should only be used where laboratory-based analysis is not feasible and then the analytical performance of the product should be well characterized, a rigorous training program must be implemented and quality monitoring procedures adopted.

Screening guidelines need to recognize these different scenarios and the end user flexibility that can be gained with quantitative FIT. Choice of a FOBT should also consider the available evidence for that test, including test operating characteristics, subject acceptance and quality issues that ensure a reliable and robust test. Guidelines should make these requirements clear and not imply that all FOBTs are the same.