Design and Endpoints of Clinical Trials, Current and Future
- 147 Downloads
With the advent of several new systemic agents for the treatment of hepatocellular carcinoma and the prospect of more to come it is expected that many more clinical trials will be undertaken to establish the best treatment paradigm(s). In order to help develop the most efficient and most relevant clinical trials this review concentrates on endpoints that have been used in the past. Survival is the gold standard. None of the surrogate endpoints correspond completely with survival. In addition, alternative clinical trial designs are presented that may be more efficient than the usual phase I, II, and III clinical trial strategy that has been used in the past.
KeywordsHepatocellular carcinoma Clinical trial design Clinical trial endpoints
Survival remains the optimal endpoint for trials of HCC therapy.
No surrogate endpoint accurately captures treatment effects. Therefore, any study that reports outcome as time to progression or time to recurrent or as response rate has to be regarded with caution. However, surrogate endpoints may be useful in phase II trials.
Pragmatic clinical trials can answer questions that would be prohibitively expensive or complex for a randomized controlled trial.
Master protocols are study designs that allow multiple agents to be tested or allow a single agent to be tested in different diseases.
Adaptive clinical trials can improve the efficiency of testing multiple agents.
The gold standard for trials of new medicines has always been the randomized controlled trial with survival as the endpoint. Randomized controlled trials have always been favored because they provide the most unbiased assessment of efficacy and safety of the drug under test. Randomized controlled trials in hepatocellular carcinoma (HCC) have shown the efficacy of sorafenib , regorafenib , lenvatinib , ramucirumab for HCC with alpha-fetoprotein (AFP) greater than 400 ng/mL , and cabozantinib . This study design has also shown that brivanib , erlotinib , and other drugs and combinations of drugs were not effective in prolonging survival. However, randomized controlled trials are not always possible for a number of reasons that will be discussed in this article. In particular, the expense of a phase III randomized controlled trial is enormous, as are the earlier studies leading up to the phase III study. There may be other trial designs that are able to test multiple drugs in a single study and which would be much less expensive on a per-drug basis, particularly in phase I and II. For cancer studies the regulatory agencies demand survival as an endpoint, but other endpoints are used, and are sometimes accepted by regulatory agencies on an interim basis until the phase III study can be completed. These include response rate, progression-free survival, or time to progression and similar related endpoints. The advantages and disadvantage of these will be discussed. This part of the discussion will be divided into: What do we need to measure; how do we measure it; who do measure it in.
“What Do We Need to Measure” or What Are Ideal Endpoints?
To be useful an endpoint has to fulfill certain criteria. These include that it should be relevant to the disease process, it should be accurate, in that it measures what it claims to measure, and the measurement can be taken with minimal error or variability. The endpoint should be reliable, i.e., repeated trials using the same methodology and treatment should come up with the same result. As mentioned above the most widely accepted endpoint, and the endpoint demanded by regulatory agencies is survival, defined as the time from initial recruitment until death or the end of the trial. The endpoint should be assessable within a reasonable time frame. In cancer treatment research survival as an endpoint usually meets these criteria in most cases. However, in trials of early HCC comparing two forms of treatment, e.g., local ablation versus resection, survival may be prolonged making it not useful to use survival as an endpoint.
In hepatocellular carcinoma a separate issue is that the severity of cirrhosis may influence overall survival, either by causing death unrelated to cancer progression or by decreased tolerance for the drug under test resulting in increased toxicity that may mask efficacy. In addition, measures taken to ameliorate the toxicity, i.e., dose reduction, may result in a sub-therapeutic dose of drug being administered . These are instances when overall survival may not accurately capture treatment efficacy. A competing risk analysis can theoretically capture some of these factors and isolate the survival related to the treatment under test. However, in Child A cirrhosis this is probably not necessary since death would most likely be from tumor progression, but in Child B and C cirrhotics it is difficult to attribute the cause of death, whether related to liver disease progression or tumor progression. A competing risk analysis also increases the sample size required, because those dying of the competing cause do not contribute to the final data analysis.
In the modern era when there are so many new drugs to test for cancer it is a rare patient who is in a trial and is still alive and reasonably well at the end of trial who does not enter another trial or undergo another course of treatment. This also complicates the assessment of survival as an endpoint.
These considerations have led to the development of a number of surrogate endpoints mostly related to tumor response to treatment, i.e., whether the tumor grew or shrank, working on the hypothesis that tumor growth or lack thereof correlates with survival. The criteria for surrogate endpoints include that they are biologically plausible, i.e., the response should correlate with cure, stability, or progression. The endpoints must reflect clinical severity. When survival cannot be used as an endpoint two types of endpoints are used as surrogates. These are response rate and time-to-event endpoints. Commonly used surrogate endpoints include progression-free survival and time to progression. These rely on ability to accurately measure progression, usually meaning radiological progression. Neither of these is ideal for studies in liver cancer for the reasons given below. More recently time to symptomatic progression has been used as an endpoint. This avoids the need to measure radiological progression, but suffers from the lack of specificity of symptoms, particularly in HCC treatment when decreasing liver function confounds the picture. Change in concentration of a biological marker of tumor volume, such as AFP, has also been used. As discussed below these are also not ideal.
A decrease in the size of a tumor is thought to correlate with survival. It certainly seems to make sense that a reduction in tumor burden should be associated with improved survival and this has been confirmed in tumors other than HCC. However, this has not been demonstrated for HCC . There are several possible reasons for this. One has to do with the mechanism of action of the new drugs for HCC. The problem arises when the treatment being tested has a cytostatic rather than cytotoxic effect. Many of the newer targeted agents currently in use fall into this category. Sorafenib was the first of these. In the initial registration trials sorafenib-treated participants had an improved median survival of about 3 months compared to control; however, the tumor response rate was only 3% . Accepting an endpoint based on radiological response would have missed the benefit of sorafenib.
Another important issue is how to measure response. Ideally the measurement would assess all viable tumors. However, classical RECIST or WHO criteria assess only lesion diameter and do not consider areas of the tumor that may be necrotic. The modified RECIST, Choi, and EASL criteria [10, 11, 12] attempt to measure only viable tumor, but even these do not correlate well with outcome. Furthermore, as anyone who has watched radiologists measure these lesions knows, this is not an exact science. Minor differences in where the measurement cursor is placed can produce significantly different measurements. Taking the measurement at different slices of the same examination gives different results. Positioning of the patient in the CT scan may not be identical each time, so that slice registration is different for each examination. Finally, if only the diameter of the visible lesion is measured a successful radiofrequency ablation will actually demonstrate an enlarging lesion because the burn caused by this treatment is larger than the HCC.
The classical definition of a tumor response is a reduction in tumor diameter of 30%. Progression is defined as an increase in size of more than 20%. Anything in between is considered stable disease. New lesions are considered to be progression. However, in the cirrhotic liver changes in perfusion characteristics are often seen that present as new arterially enhancing areas, which can be mistaken for HCC. Therefore, it has been suggested that before calling a new lesion progression it should be more than 1 cm in size and exhibit the classical HCC features of arterial enhancement and venous washout, or if the lesion grows in size by 1 cm or more, this can also be considered progression. Ideally, measuring total tumor volume should be a better measurement of progression or response. Measuring total tumor volume eliminates apparent changes in size that are due to slice mis-registration because this measurement would be independent of position. There is software that accurately assesses tumor size in other cancers, but for HCC the problem of definition of what is and is not cancer would remain. A simpler measure of outcomes therefore would be the proportion of cases in which the cancer shrinks by 30% or more at a fixed time point. The correlation with survival would still be questionable. This endpoint is used, but most often as a secondary endpoint.
Time to Progression
This is the time from randomization to the first evidence of disease progression (usually measured radiologically). The assumption is that if the new treatment is effective, the disease burden will be stable, if not actually decrease. If there is an initial response, but subsequent progression, even though the lesion may not be at the size it was prior to treatment this is counted as progression. Time to progression may work as a secondary endpoint for cytostatic agents, but may still not correlate with survival because these agents are thought to slow the rate of progression of tumor, and therefore, the cancer would still be expected to progress.
Time to Symptomatic Progression
For this endpoint the symptoms that define progression have to be defined a priori. There are no well-validated instruments that are able to assess this for HCC. Since the development of symptoms in patients with HCC is a harbinger of terminal disease, in HCC clinical trials time to symptomatic progression has been taken as an indication that there is no more possibility of treatment benefit, and suggests that palliative care is the next step.
This is a composite endpoint assessing both whether the disease has progressed and whether the patient has survived. Because it involves two variables it may not be a good surrogate for survival, particularly in patients with liver disease. Death may be due to liver disease despite good control of the tumor. Under these circumstances using time to progression may miss identifying a useful drug.
This is similar to progression-free survival, but assumes an initial complete response. Although subject to the same problems, this may be used for HCC, e.g., after resection or local ablation, when at least immediately after treatment there may be no detectable tumor present.
The only biomarker that has been studied to any extent is alpha-fetoprotein (AFP). However, the correlation between a fall (or rise) in AFP and outcome has only been roughly demonstrated. Measurement of AFP in the past has been confounded by active hepatitis, although today with effective treatment for hepatitis B and C this is less of an issue. More importantly about 40% of HCC’s do not secrete AFP, thus limiting its usefulness.
Who Do We Measure Endpoint In?
This is a discussion about who is a suitable candidate for a clinical trial of a new drug. There are two major considerations here. First, there has to be a standard way of classifying patients so that like can be compared with like. Second the patient’s liver has to be in good enough health to tolerate any side effects of the drug, nor should the liver disease itself progress to cause death before the end of the study. Unfortunately, we only have blunt tools for these assessments.
Classification of Stage of Disease
There are multiple liver cancer staging systems available. None are universally accepted. Until now the BCLC staging system has been the most widely used . Studies of systemic therapy have mainly included only BCLC stage C disease, and as a result, this is the stage of disease for which systemic therapy is recommended. Some studies have included BCLC B disease to assess adjuvant therapy following chemoembolization. Unfortunately, the BCLC stages are not as homogeneous as one might like. Within the BCLC stage B and C there is considerable heterogeneity. For example, BCLC B includes tumors that are within the Milan criteria, but a single 3-cm nodule has a different prognosis than, e.g., 3 × 2 cm nodules. A small nodule with minor portal vein invasion or a small lung metastasis is BCLC C. Similarly a 7-cm invasive lesion with major portal vein invasion is also BCLC C. Yet it is quite likely that these two scenarios are associated with quite different outcomes, given similar treatment. Nonetheless, the use of the BCLC system has standardized the inclusion criteria for studies. Its particular value has been to separate out “non-resectable” disease into three stages: BCLC B, C and D.
There are a number of criteria that can be considered as exclusionary in the initial selection of patients for inclusion in a study. This is important, because the more homogeneous the cohorts, the easier it is to interpret the findings, although the less generalizable the results. Some criteria that can be considered include AFP (exclude, e.g., AFP > 500 ng/mL), tumor size, and the nature of metastases. It has become apparent that patients with metastases outside of the liver do worse than those whose disease progresses exclusively within the liver . Therefore, depending on the study this can be used as a criterion for inclusion or a stratification factor.
Assessing Liver Function
Liver function is usually assessed using the Child–Pugh (C–P) score. The recommendation from a panel of experts  was that initially drugs should be tested to the stage of licensing in patients with C–P A disease, and most phase III studies that have concluded recently used this restriction. Assessment of efficacy in more advanced liver disease should only be undertaken once the initial phase III studies have been successfully concluded. However, many authors include patients with a C–P score of 7 in their studies. Others have suggested using the MELD score , but this has not gained wide acceptance. Since the MELD score was developed to predict death within 3–6 months rather than predicting survival for the duration of a study, which might be 12–18 months, it is not an ideal method to determine who should be eligible for treatment studies. One other score, the ALBI score , has been shown to be able to sub-classify patients in C–P A and B categories into strata with clearly separate outcomes. This too has not achieved widespread uptake into research studies, but it is probably the best method of assessing liver function for clinical trials that we currently have. Phase IV studies can be used to assess the efficacy of treatment in patients with more advanced HCC.
Given the expense and challenges of developing new cancer treatments and testing them alternative methods apart from the randomized controlled trial as usually practiced have been developed. These are pragmatic trials, adaptive trials, and master protocols.
Pragmatic Trials 
These are essentially studies in which patients undergo usual care. They are useful when there are different standards of usual care that can be compared. For HCC the comparison that comes to mind is to compare resection with RFA in patients with lesions of, e.g., 3 cm. For a pragmatic trial to be meaningful several criteria have to be fulfilled. The participants should be similar to patients who would receive the intervention if it became usual care. This is easily met for a study of RFA versus resection, but this may not always be the case. In order to properly represent usual care there should be an appropriate heterogeneity of participants, investigators, and trial sites. Note that the requirement for heterogeneity is opposite of the requirement in randomized trials in which the more homogeneous the cohorts the better. Randomization is still possible in pragmatic trials. Most often a form of cluster randomization is used, whereby, e.g., one site or a series of sites use one intervention and the other site(s) exclusively uses the other intervention. Another possibility is pre-randomization. In this scenario sites are randomized, but consent is only obtained from the “experimental” group. For the other group (usually a non-intervention control group) since there is no consent the only information that can be used for the study is information that is publically available, such a death records. Blinding is also possible, although not easily accomplished. It may not be possible to blind the participants, but at least the assessors can be blinded.
The intervention should be delivered as in normal practice, by staff with typical experience and with the use of routinely available equipment. Other considerations include that the endpoints should be important to patients, such as major life events such as death or hospital admission.
These kinds of studies, if properly designed, may be able to identify minor treatment effects that would not be evaluated in a randomized trial because of the expense. Pragmatic trials can also be used to assess the safety of under-investigated interventions in select populations. An example might be to assess the safety and efficacy of TACE versus TARE. These kinds of trials often involve very large populations, but data collection is simple, and trial procedures are minimized.
The only example of a pragmatic trial in liver disease that I am aware of (and it was not exclusively a pragmatic trial) was the randomized trial of HCC screening with AFP and ultrasound carried out in China . In this study pre-randomization and cluster randomization were used.
Basket trials are in a sense the opposite of umbrella trials. In this type of study a single intervention is tested in multiple diseases. This is a common trial design in testing early-stage anticancer drugs. The initial studies usually include several different tumor types. However, these are not an entirely typical basket trials, because in a basket trial each disease (or cancer type) is assessed separately, rather than looking at overall response in all cancers in the study.
In platform trials multiple targeted therapies are tested in the context of a single disease in a perpetual manner, with therapies allowed to enter or leave platform on the basis of a decision algorithm.
The master protocol can use a common screening platform to identify all trials for which a patient is eligible. This coordinated screening represents one of its chief advantages—more efficient use of patients and resources. An example might be a trial of two therapies that target the same biomarker signature regardless of the cancer type. These can share control patients, even if the drugs enter and exit the master protocol at different times. The other advantage is that the schedule of visits, clinical examination components, measurement procedures, outcome definitions, and ascertainment procedures are shared across trials, allowing for reuse of study materials.
Adaptive Trial Design 
These are prospectively planned future changes to the course of an ongoing trial based on analysis of accumulating data from the trial itself. This design is often used in dose finding or dose response types of studies or first-in-human studies. The major advantage is that if properly designed the study will allocate a larger proportion of patients to groups doing well rather than groups doing poorly. These kinds of studies can be part of an umbrella protocol.
The strategy can also be used for phase III studies. The design can allow the sample size to be adjusted. The study can be switched from a non-inferiority to superiority endpoint. The number and spacing of the interim analyses can be altered based on previous scheduled unblinded interim analyses.
This requires that multiple sponsors pool their resources, and it also requires a network of investigators that can commit to completing the studies.
Biomarker-Stratified Patient Selection
Some of the study designs discussed above lend themselves to cancers for which there are biomarkers that predict response. Unfortunately, so far, for HCC few such markers exist. The only example of a successful biomarker-driven study is the recently reported study of ramucirumab in HCC in patients with AFP > 400 ng/mL . Ideally, once there are biomarkers identified which predict response or lack thereof trials should be enriched by inclusion of patients exhibiting these biomarkers. However, biomarkers need to be carefully selected. The only other biomarker-driven study was a phase III trial of the use tivantinib in second-line HCC . The trial did not meet its endpoint. Tivantinib was thought to be an inhibitor of C-met, and the study targeted patients whose biopsies showed high levels of C-met. Unfortunately, it subsequently turned out that at best tivantinib was not really a C-met inhibitor, but had its anti-proliferative effect through other mechanisms.
Numerous other proteins, enzymes, and even nucleic acids have been found to be elevated in patients with HCC, but the link between secretion of these and their role in tumor growth is mostly unexplored.
Initial genetic and expression studies in HCC suggested the possibility of identifying a tissue marker associated with tumor growth, which might be “druggable.” This would be an ideal biomarker. Unfortunately, so far no “druggable” target molecule has been found in HCC.
In summary, clinical trial design and endpoint selection continue to be problematic in HCC. However, the success of several orthodox randomized controlled trials suggests that at least some drugs can be tested in that manner. As more and more drugs become available they will be tested in combinations and in the adjuvant or neo-adjuvant setting. The expense and complexity of running trials with survival as an endpoint will dictate that many of these will use surrogate endpoints. However, it then becomes incumbent on the researcher to demonstrate that the surrogate endpoint correlates with survival.
The multiplicity of scenarios in which drugs will be tested also means that a more efficient method of completing these trials needs to be developed. The adaptive trial design and some of the other trial designs may be useful here.
Despite the difficulties the future of HCC looks brighter than ever, and our future patients can expect improved survival.
Compliance with ethical standards
Conflict of interest
The author declare that they have no conflict of interest.
- 4.Zhu, AX, Kang YK, Yen CJ, Finn RS, Galle PR, Llovet JM. REACH-2: a randomized, double-blind, placebo-controlled phase 3 study of ramucirumab versus placebo as second-line treatment in patients with advanced hepatocellular carcinoma (HCC) and elevated baseline alpha-fetoprotein (AFP) following first-line sorafenib. J Clin Oncol. 2018;36:4003.Google Scholar
- 5.Abou-Alfa GK, Meyer T, Cheng AL, El-Khoueiry A, Rimassa L, Ryoo BY. Cabozantinib versus placebo in patients) with advanced hepatocellular carcinoma who have received prior sorafenib: results from the randomized phase 3 CELESTIAL trial. J Clin Oncol. 2018;36:4019.Google Scholar
- 19.Zhang BH, Yang BH, Tang ZY. Randomized controlled trial of screening for hepatocellular carcinoma. J Cancer Res Clin Oncol. 2004;130:417–422.Google Scholar
- 21.Mandrekar SJ, Dahlberg SE, Simon R. Improving clinical trial efficiency: thinking outside the box. Am Soc Clin Oncol Educ Book. 2015. https://doi.org/10.14694/EdBook_AM.2015.35.e141.