Keywords

The first five chapters of this primer present important concepts in cancer screening and evaluation of its data. Examples were provided to reinforce concepts and interpretation, but most were limited, fictional, and not intended to demonstrate how cancer screening efficacy and effectiveness are formally evaluated. Chapters 6 and 7 present the research study designs that are used to generate the data necessary for cancer screening assessment. Design features, analysis features, and strengths and weaknesses will be presented for each. A synopsis of at least one published report, along with its reference, will be provided for each design. Statistical theory will not be discussed.

There are two classes of study designs: experimental and observational. Randomized controlled trials (RCTs) are experimental study designs and are discussed in this chapter. All other study designs presented in this primer are observational. They are discussed in Chap. 7. In general, efficacy is assessed using RCTs, while effectiveness is assessed using observational designs, though exceptions exist. Recall from Chap. 1 that efficacy refers to the ability of cancer screening to reduce cause-specific mortality in a highly controlled and near ideal setting, and effectiveness refers to the ability of cancer screening to reduce cause-specific mortality in a traditional community health care setting, one that provides numerous and varied services and faces typical US health care challenges. Pragmatic RCTs, which will be discussed, are conducted in community settings. They usually are classified as effectiveness research but are presented in this chapter given their experimental nature.

Readers who would like to learn more about experimental research can consult Fundamentals of Clinical Trials, by Friedman, Furburg, and DeMets [1].

6.1 An Overview of Experimental Study Designs

RCTs are experimental because the intervention is assigned at random rather than chosen by the study participant or study researcher. Randomization can occur individually for each participant (individual-level randomization) or for entities (cluster-level randomization). Most RCTs are composed of two groups, referred to as trial arms. When the number of participants is large enough, randomization will create, with high probability, trial arms that are equivalent prior to administration of the intervention. Equivalent means that the distribution of all risk and protective factors, both measured and unmeasured, is the same in each trial arm. Large enough means that the trial has adequate statistical power, which can be determined by published formulas [2]. The arm that does not receive the intervention is treated as the counterfactual experience of the intervention arm, which is the hypothetical experience that the intervention arm would have had if the intervention had not been administered. It is the counterfactual principle that allows the outcome to be fully and solely attributable to the intervention, as randomization greatly minimizes the possibility of confounding. In the context of cancer screening, confounding occurs when a third factor is related to both screening activity and cause-specific mortality, and will be discussed in detail in Chap. 7.

All RCTs are prospective in nature. Individual-level and cluster-level RCTs share many features. Those features will be discussed in the context of individual-level trials. The manners in which cluster-level RCTs differ will be presented afterwards. Pragmatic RCTs, a type of experimental design used in patient-centered research, will be discussed at the end of the chapter. Pragmatic RCTs incorporate randomization but allow for crossover (that is, assignment to the other trial arm) if the randomization assignment is counter to patient preference.

6.2 Individual-Level Randomized Controlled Trials of Screening

6.2.1 Design Features

Individual-level cancer screening RCTs involve randomization of each participant to a trial arm. RCTs have at least one intervention arm and one control arm. For simplicity’s sake, a trial with one intervention arm and one control arm will be used to present this chapter’s material.

Intervention arm participants are offered the screening test or screening regimen that is hypothesized to be of benefit. Control arm participants are offered either no cancer screening test or cancer screening with the standard of care screening test or regimen. Control arm participants who are offered no cancer screening may be offered an unrelated exam, such as a glaucoma exam, to engender good will and to facilitate follow up for trial outcomes.

Ascertainment of all information, but most importantly intermediate and definitive outcomes, must be conducted with the same amount of rigor for each arm. Death review should be considered. Death reviewers should be blinded to trial arm.

An RCT is designed to have a pre-specified number of screening rounds and years of follow-up. Screening rounds in an RCT are typically called T0, T1, and so on. T0 refers to the first screen and also may be called the prevalence screen, with later screens called incidence screens. A stop-screen RCT is one in which follow-up continues after screening stops. All RCTs should have interim analysis and data monitoring plans so that a trial can be stopped early if evidence is overwhelming that the intervention is efficacious or it is not.

6.2.2 Analysis Features

The primary outcome in a cancer screening individual-level RCT is a cause-specific mortality rate ratio (and its 95% confidence interval), which is the ratio of the cause-specific mortality rate in the intervention arm to the cause-specific mortality rate in the control arm. Rate ratios that are statistically significant and lower than 1 indicate that the intervention reduced cause-specific mortality relative to whatever was received (if anything) by the control arm. A rate ratio that is not significantly different from 1 indicates that there is no evidence to suggest that the intervention reduces cause-specific mortality, relative to whatever was received (if anything) by the control arm. An all-cause mortality rate ratio usually will be reported as well, although as discussed in Chap. 5, cancer screening RCTs rarely have the statistical power to detect a significant reduction in all-cause mortality because death due to the cancer of interest usually represents a small percentage of all deaths. Intermediate outcomes often are reported as well.

If it is desired to generate an adjusted ratio due to suspected confounding, proportional hazards models can be used. Confounding is unlikely in well-designed and well-executed RCTs, but it is often worthwhile to explore the possibility. If confounding by measured factors is not present, the unadjusted and adjusted ratios will be similar. Proportional hazards models do not produce rate ratios; instead, they produce hazard ratios, which reflect the instantaneous risk of death. Hazard ratios are comparable to mortality rate ratios as the two types of ratios produce the same information: a relative measure of the chance of death in the intervention arm versus the chance of death in the control arm.

From the counterfactual principle comes the expectation that, prior to application of the intervention, the same number of cancers and cancer deaths would emerge in the two trial arms as time passes. Thanks to randomization, the intervention arm participants have counterparts in the control arm who would have the same experience, including cancer diagnosis and death, if screening did not occur. The intervention arm will quickly begin to accrue more cancer cases than the control arm once screening begins, primarily because of lead time. In the absence of overdiagnosis, the number of cancers is expected to equalize at some point after screening stops, a phenomenon called catch-up. In the presence of overdiagnosis, catch-up does not occur, because screening found cancers whose control arm counterparts do not present in the absence of screening. A stop-screen design allows the question of overdiagnosis to be addressed by comparing the numbers of cancers in the two arms at a point in time after screening ceases. The appropriate point in time is based on beliefs about the natural history of disease. A stabilization of the difference in the number of cancers as time progresses is a good indication that catch-up is complete. That stable difference is the magnitude of overdiagnosis. This method for calculating overdiagnosis is called the excess incidence method. Another method for estimating overdiagnosis is discussed in Chap. 9. Assessing overdiagnosis in an RCT that does not utilize a stop-screen design cannot be done unless the length of the trial is longer than the longest of lead times. With a long enough observation period, the difference in cancer incidence between the trial arms will stabilize; the difference at that point is the magnitude of overdiagnosis.

Cessation of screening can lead to dilution of the mortality rate ratio. Dilution occurs when a mortality rate ratio that suggested a benefit of screening moves closer to a null result (no benefit; a rate ratio of 1) as time passes without screening. The counterfactual principle explains why dilution occurs: after screening ends, the trial arms eventually will return to their pre-intervention states, a time when they were equivalent in terms of their mortality rates. Any beneficial effect of cancer screening will eventually cease. An RCT that does not utilize a stop-screen design will not experience dilution.

Most RCTs randomize in a 1-to-1 fashion, leading to equal sample sizes in the two arms. Discussion of overdiagnosis and catch-up assumed equal numbers were randomized to each arm. If other randomization schemes are used, expectations regarding catch-up must be adjusted. For example, a trial that employs a stop-screen design and randomizes in a 2 (intervention) to 1 (control) fashion is expected to have twice as many cases in the intervention arm, if overdiagnosis does not exist.

6.2.3 Strengths and Weaknesses

The greatest strength of a cancer screening RCT is that results can be attributed to the intervention and not to a confounding factor, but only if randomization achieved its goal of creating two equivalent groups. The chance of that happening is positively correlated with the size of the trial arms. Screening trials that have the necessary statistical power to properly assess a cause-specific mortality rate ratio are almost guaranteed to have equivalent groups as long as nothing in the randomization process is systematically awry.

Other potential differences in the experience of the arms must be considered when interpreting the findings of a cancer screening RCT. Outcome ascertainment methods need to be equivalent for the two arms, as does treatment for a given stage of cancer. Most RCTs collect extensive amounts of data; therefore, the aforementioned two conditions often can be assessed. However, it is important to remember that participants in the intervention arm will have more contact with trial staff during the screening period of the trial, which could lead to the two arms having different experiences at many points in the screening process.

Standardized application of the screening regimen is a strength. An RCT is thought to provide the most favorable setting in which to evaluate a screening regimen; all steps in the screening process, from invitation to treatment, tend to occur with an extra level of forethought and rigor.

Cancer screening RCTs are expensive and take a long time to complete. They require large numbers of participants for reasons of statistical power. If intervention arm participants do not receive the intervention of interest (referred to as non-compliance) or control arm participants do (referred to as contamination), statistical power may be compromised if the degree of observed non-compliance and contamination is greater than what was assumed when the trial was designed. In the instance of extreme non-compliance and contamination, the trial arms become indistinguishable and any comparison in mortality rates is meaningless. If the intervention is available outside the trial and either is inexpensive or covered by health insurance, high rates of contamination are likely and may make an RCT impractical.

6.2.4 Example of an Individual-Level Cancer Screening RCT

There have been a number of cancer screening RCTs conducted, and they vary with regard to rigor and availability of information on their conduct. A well-conducted and a well-documented cancer screening RCT is the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO), which has been mentioned previously. Informative publications include the primary outcome papers [3,4,5,6] and methods and operations papers [2, 7]. The methods and operations papers will be useful to those who are planning to launch a trial or wish to learn more about the nuts and bolts of how cancer screening RCTs are carried out.

6.3 Cluster-Level Randomized Controlled Trials of Cancer Screening

6.3.1 Design Features

A cancer screening cluster-level RCT is quite similar to an individual-level RCT. The only design difference is that cluster-level trials randomize groups rather than individuals. The number of groups must be at least two but can be more. If a group is randomized to receive the intervention, all eligible individuals in that group are invited to receive it. Groups often are geopolitical entities, such as counties or provinces. The groups to be randomized must be similar enough for the counterfactual principle to hold.

6.3.2 Analysis Features

The same principles that hold for analysis of individual-level cancer screening RCTs hold for cluster-level cancer screening RCTs, except in one instance. A cluster-level RCT usually is analyzed at the cluster level, meaning that the cluster, rather than individual, is the unit of analysis [8]. When analyzed at the cluster level, statistical analyses are straightforward, but results are applicable to only clusters. For example, a cause-specific mortality rate ratio of 0.80 indicates that clusters that were offered the intervention have a 20% reduction in cause-specific mortality rates relative to those clusters that were not, not that individuals who were screened had a 20% reduction in cause-specific mortality. The conclusions are not guaranteed to be directly applicable to the individuals who reside in those clusters, although many times they are interpreted as if they are.

It is inappropriate to analyze a cluster-level RCT as one would analyze an individual-level RCT; that is, it is inappropriate to use individuals as the unit of analysis rather than the cluster. Individuals within a cluster are rarely independent of one another. Lack of independence invalidates statistical assumptions on which methods rest and can lead to incorrect conclusions. There are, however, advanced statistical methods that can account for the lack of independence that accompanies individuals within clusters and allow for inferences to individuals [9].

6.3.3 Strengths and Weaknesses

A cluster-level RCT of cancer screening can have very low rates of contamination if the new screening regimen is available in only certain clusters and it is difficult for individuals to cross into or receive medical services in other clusters. In addition, cancer mortality rates are often available for clusters that are geopolitical entities, eliminating the need for collection of mortality information as part of the RCT. However, compliance within a cluster can be low because individuals are usually not consulted before randomization. The number of clusters is often small, which can impact the ability of randomization to produce a true counterfactual group.

Cluster-level RCTs of cancer screening are difficult to carry out in places with opportunistic screening. In the US, randomization by state could be attempted, but ease of mobility and out-of-network health insurance policy benefits, not to mention entrepreneurial ventures, could foster contamination. Cluster-level RCTs of cancer screening may be more easily done in countries with government-administered health care, although a Swedish cluster-level RCT of mammography screening still experienced non-negligible rates of contamination in the control arm [10].

6.3.4 Example of a Cluster-Level Cancer Screening RCT

In the United Kingdom (UK), the AgeX cluster-level RCT is looking at the impact of offering an additional breast cancer screen to women ages 47–49 and offering breast cancer screening every 3 years to women over 70 [11]. Most of the 80 breast cancer screening centers in the UK’s National Health Service are participating. Each center is a cluster and is randomized to the intervention arm or the control arm. All women in intervention arm clusters are invited to receive the age-appropriate additional screens. All women in control arm clusters are invited to receive the standard breast cancer screening regimen.

6.4 Pragmatic Randomized Controlled Trials of Cancer Screening

RCTs of cancer screening usually have been carried out in highly controlled and near ideal settings. They have measured efficacy rather than effectiveness. Effectiveness can be addressed by pragmatic RCTs.

A pragmatic RCT is done in the reality of every day health care, which introduces many challenges that can hinder the ability of a cancer screening test to reduce mortality. Pragmatic trials usually have fewer eligibility criteria than in traditional RCTs. Pragmatic RCTs typically do not hire staff dedicated to trial operations; in other words, there usually are no extra resources for recruitment or compliance. Data collection above and beyond what is collected in usual care is not common.

Though randomization still occurs in pragmatic trials, patients may have the opportunity to receive what they want rather than what randomization assigns to them. While that may seem heretical to a strict clinical trialist, the goal of a pragmatic trial is to evaluate the impact of introducing a cancer screening test in a community health care setting. The impact reflects the fact that some patients will accept the test and some will not.

To learn more about pragmatic trials and patient-centered research in general, consult the National Institutes of Health (NIH) Collaboratory’s Living Textbook of Pragmatic Clinical Trials [12], a website that presents expert consensus regarding special considerations, standard approaches, and best practices in the design, conduct, and reporting of pragmatic clinical trials.

6.4.1 Examples of Pragmatic Cancer Screening RCTs

There are no completed pragmatic RCTs of cancer screening effectiveness, although there are at least two underway. The HOME trial, conducted in the Kaiser Washington managed care system, is examining the ability of self-sampling to increase cervical cancer screening uptake and effectiveness [13]. Self-sampling could overcome certain real-world barriers to being screened, including lack of transportation to a clinic, lack of child care, and needing time off from work. It also could increase cervical cancer screening uptake among women who prefer not to receive a pelvic exam. The WISDOM trial, conducted in clinics in California and South Dakota, is comparing breast cancer screening regimens based on age to screening regimens based on risk [14]. WISDOM is using what is known as a preference tolerant design, which encourages randomization but allows women to self-assign if they wish. The reason for choosing such a design was to maximize participation, a factor that may lead to better generalizability of results.