Background

Health screening involves the use of tests to identify apparently healthy people with early stage disease who do not have, or have not recognized that they have, symptoms or signs of the condition being screened. Screening is premised on the idea that early identification of asymptomatic pre-clinical disease can increase the likelihood of effective intervention and, thus, improve future health [1, 2]. Since the 1960s, when screening for breast cancer with mammography was first tested, enthusiasm for the idea that some diseases can be prevented through early detection has resulted in an explosion in the number of screening tests that have been promoted, some with evidence of benefit and others without such evidence [3].

This enthusiasm has also resulted in an expansion of the scope of screening itself. In addition to the goal of reducing risk of future ill health by detecting pre-clinical indicators of disease, the idea of screening has increasingly been applied to the use of self-report questionnaires to “screen” for existing health problems (e.g., alcohol misuse) or symptom-based syndromes (e.g., depression) that are not hidden; rather, they are experienced by patients, but not reported as health problems or observed by healthcare providers. The first example of a major national preventive care recommendation for this type of screening was the 2002 United States Preventive Services Task Force (USPSTF) recommendation for depression screening among adults in primary care [4]. Questionnaire-based screening has since been evaluated for other presently experienced health problems and symptom-based syndromes, including alcohol misuse, illicit substance use, intimate partner violence, and developmental delays in young children [5,6,7].

However, screening with questionnaires for existing conditions is controversial [8, 9], and major guideline organizations have reached different conclusions about the potential benefits versus harms of some of these programs [5,6,7]. Indeed, there are a number of reasons why applying a conventional test-based screening paradigm to presently experienced problems and symptoms may not improve health outcomes compared to providing patients with accurate healthcare information and appropriate assessment and intervention when problems are recognized. One such reason is that some of the conditions being screened may not necessarily be progressive. For some patients, symptoms and problems identified via self-report questionnaires reflect transitory reactions to circumstances that will resolve without intervention [8, 9]. Another is that using tests to identify and label medical conditions that patients do not otherwise recognize or report as health problems risks identifying large numbers of patients with mild conditions whose symptoms or problems may not be amenable to healthcare interventions. Finally, interventions to reduce symptoms or solve health problems are most effective when there is agreement between patients and providers on the impact of the problem and the need to address it. Such an agreement may not be present when tests are used to inform patients that they are experiencing a healthcare problem which they did not recognize as such [10].

Recommendations for screening should ideally be based on direct evidence from high-quality randomized controlled trials (RCTs) that show a sufficiently large benefit to justify the costs and harms involved in screening [1, 2, 10,11,12]. RCTs designed to directly test the effectiveness of a screening program should, at a minimum, (1) randomize patients prior to the screening intervention and (2) provide similar treatment resources to patients detected with the condition or health problem in the screening and non-screening arms of the trial so as not to confound the effects of a screening program with the effects of providing different treatments. Ideally, RCTs of screening programs would also exclude patients who are already known to have the targeted condition at the time of screening, as these patients would not be screened in actual practice [11].

The objective of the present study was to examine recommendations from three major national guideline organizations, the Canadian Task Force on Preventive Health Care (CTFPHC), the United Kingdom National Screening Committee (UKNSC), and the USPSTF, to (1) document the consistency of recommendations on using questionnaires to screen for presently experienced health problems or symptom-based syndromes, (2) identify sources of divergent recommendations, and (3) determine if guideline organizations have identified any examples of direct evidence from RCTs that questionnaire-based screening programs improve health outcomes for screened patients compared to non-screened patients.

Methods

Identification of eligible screening recommendations and data extraction

To identify eligible screening recommendations, we reviewed the most recent version of all guideline and recommendation statements listed on the websites of the CTFPHC [5], the UKNSC [6], and the USPSTF [7]. We considered only completed guideline and recommendation statements, but not “upcoming guidelines” or “recommendations in progress.” Eligible guidelines and recommendations were those that primarily focused on the use of a self-report questionnaire to identify patients with previously unreported and undetected yet presently experienced health problems or symptom-based syndromes. Guidelines and recommendations that focused on the use of performance-based measures, such as measures designed to test for cognitive impairment, but not self-report symptom questionnaires, were excluded.

The names of all guideline and recommendation statements listed on the websites of the CTFPHC, UKNSC, and USPSTF were uploaded into the systematic review data management program DistillerSR (Evidence Partners, Ottawa, Canada). DistillerSR was used to store and track results of the inclusion and exclusion process and for data extraction. When guideline and recommendation statements included more than one recommendation (e.g., one for children and one for adolescents), each recommendation was listed separately. For each included recommendation, we extracted the recommendation that was made (e.g., recommendation for screening, recommend against screening, insufficient evidence). Two investigators independently reviewed all recommendations to assess eligibility and extract the recommendations made. Any disagreements were resolved by consensus with a third investigator, if necessary.

Sources of divergent recommendations

In cases where recommendations differed between guideline organizations, we extracted information on the main rationales provided for recommendations. One investigator initially extracted the rationales from the recommendation statements, and a second investigator validated the information extracted against the statements. Any disagreements were resolved by consensus, including a third investigator, if necessary. We compared rationales and identified where they diverged.

Identification and evaluation of direct evidence from RCTs described in recommendations

We reviewed each recommendation statement and its accompanying evidence review and extracted the citations of all RCTs described as screening interventions; non-randomized interventions were excluded. If there were separate sections in the recommendation statement or evidence review for trials of screening interventions and for trials of treatment interventions, we extracted citations for all trials listed in the screening intervention section. If there were no separate sections, we extracted only citations for trials described as screening intervention trials. If the recommendation statement or evidence review described a systematic review of screening intervention trials, we extracted the citations for all eligible RCTs included in the systematic review.

In order to identify direct tests of screening interventions for each RCT that was described in a recommendation or accompanying evidence review as a screening trial, we determined (1) if patient eligibility and randomization occurred prior to administering the screening test and (2) if similar management resources were available to patients identified as having the target condition in both the screening and non-screening trial arms. Additionally, we determined if patients with a recent diagnosis of the target condition and patients being treated for the condition at the time of trial enrollment were excluded from the trial.

For included RCTs that directly evaluated screening interventions based on having (1) randomized patients prior to administering the screening test and (2) providing similar management resources to patients with the condition in the screening and non-screening trial arms, we extracted the primary and secondary health outcomes assessed in the RCT and determined if the outcomes were statistically significant or not. Process-based outcomes, such as the number of patients diagnosed or the number of patients who received treatment, were not extracted since these outcomes do not reflect improvements in health. If intent-to-treat and completer-only outcomes were provided, we extracted only intent-to-treat results. We did not extract subgroup outcomes, but only outcomes for main analyses that included all patients randomized to the screening and non-screening trial arms.

We determined if each screening trial had been registered, and, if so, we compared published outcomes to registered outcomes to identify any relevant discrepancies. If there was a pre-enrollment trial registration, and if published and registered outcomes differed, we recorded whether the trial outcome related to demonstrating benefit would have been different if pre-trial registered outcomes had been used. To identify whether trials had been registered, we first attempted to retrieve trial registration data, including the registration number, from each published article. If no registration information was included in the article, we searched for a trial registration in multiple clinical trial registries, including the ClinicalTrials.gov registry (www.ClinicalTrials.gov), the International Standard Randomized Controlled Trial Number registry (www.isrctn.com), the World Health Organization registry search portal (http://www.who.int/ictrp/search/en/), and the registry from the country of the first author (e.g., Netherlands Trial Register; www.trialregister.nl). To identify registry records, we performed a search using key terms from the published article, then attempted to match the principal investigator, funding source, intervention, control group, and design from the article to the registrations obtained in the search. If this method did not uncover a registration number, we contacted the corresponding author by email to attempt to determine if there was a trial registration that we had not been able to identify. Data were extracted by two investigators independently with any disagreements resolved through consultation with a third investigator.

Results

Recommendations on screening with self-report questionnaires

As of 5 April 2016, there were 217 guideline or recommendation statements with 299 separate recommendations posted on the websites of the CTFPHC (12 statements with 39 recommendations), UKNSC (109 statements with 109 recommendations), and USPSTF (96 statements with 151 recommendations). Of these, there were 18 guideline or recommendation statements with 22 separate recommendations that focused on questionnaire-based screening, including two statements with three recommendations from the CTFPHC, eight statements with eight recommendations from the UKNSC, and eight statements with 11 recommendations from the USPSTF. No additional recommendations related to questionnaire-based screening were identified when the websites were reviewed again on 5 September 2016 (Fig. 1).

Fig. 1
figure 1

Flow of guideline and recommendation statements reviewed and included, randomized controlled trials described in the statements, and results of randomized controlled trials that were tests of questionnaire-based screening interventions

As shown in Table 1, the CTFPHC made two weak recommendations and one strong recommendation against screening. The UKNSC recommended against screening in all eight of its recommendations. The USPSTF, on the other hand, made four recommendations to offer screening and determined in seven cases that there was insufficient evidence to recommend for or against screening. In conditions where more than one organization made a recommendation for or against screening in the same patient population, the USPSTF recommended using questionnaires to screen for alcohol misuse, but the UKNSC recommended against it; the USPSTF recommended screening adults, including women in pregnancy and postpartum for depression, whereas the CTFPHC and UKNSC recommended against; both the CTFPHC and the UKNSC recommended against screening for developmental delays or behavioral problems; and the USPSTF recommended screening for intimate partner violence, whereas the UKNSC recommended against.

Table 1 Characteristics of CTFPHC, UKNSC, and USPSTF guidelines that provide recommendations for questionnaire-based screening

Sources of divergent recommendations

We compared divergent recommendations for versus against screening, but did not consider “I” recommendations by the USPSTF in our assessment of divergent recommendations. As shown in Table 2, USPSTF recommendation statements in favor of screening for alcohol misuse in adults, depression screening of adolescents, and intimate partner violence in adult women all recognized that there was no direct RCT evidence of benefit from screening. Instead, the USPSTF expressed confidence that screening would result in benefit based on indirect evidence from studies of screening test accuracy and intervention effectiveness. The CTFPHC and UKNSC, on the other hand, emphasized the lack of direct trial evidence of effectiveness in their recommendations against screening.

Table 2 Comparison of main rationales provided for recommendations for and against screening

In the case of adult depression screening, the USPSTF argued that there was direct trial evidence of benefit of combined screening and management support. The UKNSC indicated that there were no trials that had shown direct evidence of effectiveness of screening. The CTFPHC similarly indicated that there was no direct trial evidence of the benefit of screening programs. In the CTFPHC recommendation, it was specifically noted that the trials identified in the systematic review performed in conjunction with the USPSTF recommendation conflated screening and enhanced collaborative depression care and that it was not necessarily the case that screening was a necessary component.

Another key difference between organizations was related to the treatment of resource utilization and possible harms from screening. The USPSTF does not consider costs in their recommendations, and in each of their recommendations in favor of screening, they indicated that any harms would be small to negligible. The CTFPHC and UKNSC, on the other hand, did raise concerns about resource consumption in the absence of evidence of benefit and about harms to patients who would be screened, including overdiagnosis and overtreatment.

Evaluation of direct RCT evidence on screening interventions described in recommendations

As shown in Fig. 1, there were 22 unique RCTs that were described in the recommendation statements or accompanying evidence reviews (see Table 3 for trial characteristics). Of these, only six met the two criteria for being a direct test of a screening intervention; that is, they randomized patients prior to administering the screening questionnaire and provided similar resources for management of patients identified as needing care in the screening and non-screening trial arms [13,14,15,16,17,18,19]. Of the other 16 trials, 10 included questionnaire scores as part of trial eligibility criteria, but they were trials that evaluated a specific treatment compared to usual care for people identified with the condition of interest, not whether screening would benefit patients compared to not screening [20,21,22,23,24,25,26,27,28,29,30]. The other RCTs randomized patients post-screening [31] or screened post-randomization, but provided superior care options to patients identified in the screening arm compared to patients identified as needing care in the non-screening arm [32,33,34,35,36].

Table 3 Characteristics of randomized controlled trials described in CTFPHC, UKNSC, and USPSTF guidelines

As shown in Table 4, of the six RCTs that directly tested screening interventions, two tested depression screening interventions [13, 14], two tested interventions for screening for developmental or speech and language delays [15,16,17], one tested an intimate partner violence screening intervention [18], and one tested a suicide risk screening intervention [19]. In five of the RCTs [13, 15,16,17,18,19], no primary or secondary health outcomes were statistically significant in favor of the screening intervention. In the other RCT [14], a trial of depression screening in postpartum women from Hong Kong, of the two primary outcomes that were registered, one generated statistically significant results, whereas the other did not. The published trial report, however, only identified the statistically significant outcome as primary and relegated the non-significant outcome to secondary.

Table 4 Primary and secondary health outcomes reported in randomized controlled trials that (1) determined eligibility and randomized patients prior to screening and (2) provided similar management options for screened and unscreened trial arms

Discussion

Screening for presently experienced health problems and symptom-based syndromes with self-report questionnaires has been evaluated by the CTFPHC, UKNSC, or USPSTF in the areas of alcohol misuse, depression, developmental or speech and language delays, domestic violence, and suicide risk. The CTFPHC and UKNSC have made a total of 11 recommendations against screening with self-report questionnaires and no recommendations in favor of the practice. The USPSTF, on the other hand, has made four recommendations in favor of questionnaire-based screening programs (alcohol misuse, adult depression, adolescent depression, intimate partner violence) and no recommendations against screening. In seven other cases, the USPSTF determined that there was insufficient evidence to recommend for or against the service (“I” recommendation).

The CTFPHC, UKNSC, and USPSTF all attempt to evaluate the balance between possible benefits and possible harms that would be accrued from screening programs. The methods the groups use are generally similar, although there are some differences. Both the CTFPHC and USPSTF include methods for evaluating screening pathways based on indirect evidence, such as evidence on screening test accuracy and treatment effectiveness [37, 38]. They differ, however, in that the CTFPHC uses the GRADE system [39] and makes weak or strong recommendations for or against all preventive care services it evaluates; the USPSTF, on the other hand, uses its own rating system and may make an “I” recommendation, which reflects that its members do not believe that there is sufficient evidence to make any recommendation. The UKNSC differs from both the CTFPHC and USPSTF in that it uses a list of criteria, including the availability of evidence from high-quality RCTs, to evaluate screening programs [10]. In addition, the CTFPHC and UKNSC, but not the USPSTF, consider resource use in their recommendations [10, 37, 38].

Divergences in recommendations between the USPSTF and the CTFPHC and UKNSC appear to stem from several sources. First, when recommendations diverge, the USPSTF has indicated in each case that there is at least moderate certainty that there would be at least moderate net benefit based on indirect evidence from studies of test accuracy and treatment of screen-detected symptomatic patients and, if available, potential harms of screening and treatment. The CTFPHC and UKNSC, on the other hand, have determined that those links are insufficient to establish that benefit would occur. Additionally, in the case of depression screening, the CTFPHC noted that the USPSTF relied upon RCTs of depression care management programs, which used screening tools to establish trial eligibility prior to randomization, as evidence on screening. Consistent with this, of the 13 RCTs described by the USPSTF as screening trials, only two randomized patients prior to screening and provided similar care options in patients with depression in the screen and no-screen trial arms (Table 3). Second, in divergent recommendations, the CTFPHC and UKNSC raised concerns about possible harms from screening, including overdiagnosis and overtreatment, whereas the USPSTF rated described harms as small to negligible in all recommendations in favor of screening and did not mention the possibility of overdiagnosis or overtreatment in any. Finally, cost and resource considerations were included in CTFPHC and UKNSC recommendations, but not in USPSTF recommendations.

No examples of direct RCT evidence that questionnaire-based screening improves health outcomes were described in the recommendations of the CTFPHC, UKNSC, or USPSTF. There were only six RCTs that directly tested screening interventions by randomizing patients prior to administering the screening questionnaire and providing similar management resources for patients identified as needing care in the screening and non-screening arms of the trials. In five of the trials, which evaluated whether screening for depression, developmental or speech and language delays, intimate partner violence, and suicide risk improved health compared to usual care, there were no statistically significant primary or secondary health outcomes in favor of the screening intervention.

In the sixth RCT, which tested depression screening among postpartum women in Hong Kong [14], based on outcome definitions registered prior to conducting the trial, there was one primary outcome that was statistically significant in favor of screening and one that was not. However, in the published outcome report, only the statistically significant outcome was described as a primary outcome; the non-statistically significant outcome was described as secondary [14]. As described previously [40, 41], there is concern that results from this trial may not represent what would likely occur in practice. In addition to reclassifying trial outcomes post hoc in a way that portrayed trial results as positive, rather than equivocal, the reported effect size was implausibly large. The authors randomized 231 women to be screened, of whom 55 received the low-intensity counseling treatment that was provided; 11 of 231 women in the control arm also received the treatment. The authors reported a standardized mean difference (SMD) effect size per woman screened on the Edinburgh Postnatal Depression Scale of 0.34, roughly equivalent to SMD = 1.81 for the 44 additional patients treated in the screened group compared to the control group. This reported effect per woman treated, however, is six to seven times the size of effects that are typically achieved with similar interventions in primary care settings [40, 41]. A meta-analysis of collaborative depression care treatment, for instance, reported an effect size of 0.25 SMD (N = 30 trials) [42]. Another meta-analysis of psychological treatment for adult depression in primary care reported an overall SMD effect size of 0.31 (N = 15 trials) [43]. None of the individual RCTs included in either meta-analysis approached the effect size reported per patient treated in the Hong Kong screening trial. Consistent with concerns that results from the Hong Kong trial may not be reproduced in actual practice, the only other trial of depression screening included in the present review did not find that depression screening significantly reduced the number of depression diagnoses among patients screened compared to patients not screened [13].

The USPSTF was recently criticized for relying upon indirect evidence and for not adequately considering potential harms in recommending depression screening [44]. Experts pointed out that there are numerous examples where the use of insufficient and indirect evidence has led to ineffective and harmful screening programs and argued that guideline makers should refrain from recommending new screening services based on only indirect evidence [44]. In the context of questionnaire-based screening programs, this concern is heightened because, when RCTs have directly tested these programs, they have not found evidence of health benefits. When high-quality trials are feasibly conducted, as is the case with questionnaire-based screening programs, a more conservative approach than recommending a new service without direct evidence would be to call for well-conducted RCTs.

Appropriate care that addresses patient needs, but avoids intervention without demonstrated benefit, is increasingly emphasized in healthcare planning and service delivery [45, 46]. Recognition that screening is not benign is reflected in recent recommendations for more restricted use of screening for breast [47, 48] and prostate cancer [49, 50]. Using self-report questionnaires as screening tests to identify unreported and unrecognized, but presently experienced, health problems and symptoms extends the boundaries of the standard screening paradigm, in which tests are used to detect hidden signs or unrecognized symptoms in order to stave off future health problems. It is possible that questionnaire-based screening might improve upon good, conscientious medical care that provides patients with information and encourages them to inquire about problems they are experiencing. Direct evidence from existing studies included in CTFPHC, UKNSC, and USPSTF recommendations, however, does not lead to this conclusion.

Without evidence that using questionnaires to search for presently experienced, unreported problems would lead to better health outcomes, the negative implications of this practice need to be carefully considered in screening recommendations, including the possibility that it would lead to overdiagnosis and overtreatment [51,52,53,54]. Traditionally, overdiagnosis has been understood to occur when a person without symptoms is diagnosed with a condition or disease that will not lead to symptoms or early mortality and would not ever be identified without screening [51, 52]. More broadly, in the case of presently experienced problems or symptoms, overdiagnosis can occur when patients are identified with a disorder or problem that they do not experience as significantly impairing and that would not be expected to be substantively affected by medical intervention [53, 54]. This could occur in mental disorders, even when diagnostic criteria are met, such as in the presence of mild depressive symptoms that fall close to the normal range on a diagnostic spectrum [54].

Potential harms have not been well documented in questionnaire-based screening, but if screening is done, some patients who would not otherwise be exposed will experience harms. For example, individuals may be exposed to unnecessary and ineffective treatments, undesirable medication effects, the labeling of problems that may resolve on their own as medical problems, and nocebo effects from telling patients who are not otherwise specifically concerned that they have a medical problem, such as depression [10, 55].

In addition to direct harms to patients, the practice would consume scarce healthcare resources that might be better devoted to providing services to patients who clearly have health problems, including mental health problems, but who in many cases receive less than adequate care [10, 56]. Some have argued that screening with questionnaires can be done at very little cost [57], and having patients respond to questionnaires is not typically expensive. However, screening involves much more than this, including follow-up assessments to separate true from false positives, consultations to determine the best management options, and treatment and follow-up services. One study found that, when depression screening is conducted, more than 70% of visits last more than 15 minutes and 17% last more than 30 minutes compared to 42% and 6%, respectively, when screening is not done, and this only factors in the time involved in the initial screening visit, but not follow-ups and referral management, for instance [58]. The number of patients who would follow this pathway depends on the clinical setting and condition targeted. In depression, 30% or more of patients in many settings would have positive screens and would need to be evaluated, even though most of these patients would not have depression [59, 60].

By 1996, based on a conservative estimate, a typical primary care physician needed to spend 7.4 hours per day just to minimally comply with Grade A and B recommendations (moderate to high certainty of moderate or high benefit, should be offered) for preventive care from the USPSTF [61]. Since then, the number of A and B recommendations has grown, including the recommendations for questionnaire-based screening described in the present study. Physicians cannot realistically comply with all USPSTF A and B recommendations, but guidance on how to prioritize is not provided. As a result, they may determine which recommendations to offer based on their own estimation of likely benefit and harm, as well as resources required. In depression screening, a national survey found that only 4% of American primary care patients were screened for depression in 2012–2013, even though it was recommended by the USPSTF and covered by the Affordable Care Act as of 2010 [62].

There are limitations to consider in evaluating the results of the present study. First, we included only recommendations from three guideline organizations, the CTFPHC, UKNSC, and USPSTF. Although these organizations are recognized for their leadership in the area of preventive healthcare policy, these results do not necessarily apply to other organizations that make recommendations on screening. Second, we only reviewed trials included in recommendation statements and did not seek to identify other trials that may have been conducted. It is possible that there are trials of questionnaire-based screening that we did not review from other areas of screening where no recommendations have been made or from trials conducted since these recommendations were made. However, identification of any existing trials was not the objective of the present study. Rather, we sought to determine if the CTFPHC, UKNSC, or USPSTF had identified direct evidence from any questionnaire-based screening program that would support the use of indirect evidence in recommendations.

Conclusions

In summary, neither the CTFPHC nor the UKNSC has made any recommendations endorsing questionnaire-based screening. The USPSTF, on the other hand, has recommended questionnaire-based screening for alcohol misuse, depression in adolescents and adults, and intimate partner violence. Compared to the CTFPHC and UKNSC, the USPSTF appears to be more confident in relying upon indirect evidence, minimizes potential harms, and does not consider cost and resource utilization.