Melanoma is a rare but serious form of skin cancer that can rapidly infiltrate the deep, vascular skin layers and often metastasizes very early. Data from real-world clinical practice consistently show that survival among patients with metastatic melanoma differs greatly by stage of disease [1]. In a study of 1682 patients with metastatic melanoma from the Surveillance, Epidemiology, and End Results (SEER) database [2], patients with unresectable, non-visceral disease (stage IIIB or IIIC or IV M1a) had a median overall survival (OS) of 22–24 months, whereas those with visceral disease (stage IV M1b or IV M1c) had a median OS of 5–11 months.

Until 2011, the only systemic therapies for metastatic melanoma were conventional agents, such as dacarbazine, fotemustine, and interleukin-2 [3, 4], that did not show clinically meaningful improvements in OS. Recently licensed agents include ipilimumab, vemurafenib, dabrafenib, trametinib, pembrolizumab, and nivolumab, which have all been approved by the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA). Recent treatment guidelines issued by the European Society for Medical Oncology (ESMO) discuss these new therapeutic strategies, stating that recommendations for first-line treatment of metastatic disease are under debate [5]. For BRAF-mutated melanomas, combination treatment with BRAF/MEK inhibitors is a recommended approach. For patients with BRAF-wild-type disease, the guidelines highlight ipilimumab as a standard first-line choice based on long-term survival benefit, but state anti-PD1 therapy is currently preferred, based on very recent trial results comparing pembrolizumab with ipilimumab. Anti-PD1 therapies are also recommended as a second-line treatment, after ipilimumab failure as well as for patients with other BRAF mutations.

Ipilimumab, a fully human, IgG1 monoclonal antibody, blocks cytotoxic T lymphocyte-associated antigen 4 (CTLA-4), a negative regulator of T cells, and thereby augments T cell activation and proliferation [24]; whereas vemurafenib is a potent inhibitor of mutated BRAF and has marked antitumor effects against melanoma cell lines with the BRAF V600E mutation but not against cells with wild-type BRAF [26].

The most recently approved therapy for melanoma is talimogene laherparepvec, a novel first-in-class oncolytic immunotherapy designed to selectively replicate within tumors and produce granulocyte macrophage colony-stimulating factor (GM-CSF) to enhance systemic antitumor immune responses. First, talimogene laherparepvec directly attacks cancer cells in the injected tumors, and second, it helps the immune system find and kill cancer cells throughout the body while leaving healthy cells undamaged [6]. Talimogene laherparepvec has been assessed in a Phase 3 randomized trial (OPTiM; identifier: NCT00769704) versus GM-CSF in patients with unresectable stage IIIB/C or IV melanoma.

In the treatment of metastatic melanoma, there is a lack of randomized, controlled, active comparator trials to date that would help to compare new treatments; as shown by the recent ESMO guidelines, the treatment pathway for patients at different disease stages remains unclear even as it evolves. Currently, ipilimumab and vemurafenib, being the first newer therapies to market, are the most widely used newer agents. Given that indirect treatment comparisons for newer therapies are increasingly a requirement for health technology assessment (HTA) agencies, the aim of this study was to examine the relative treatment effect of talimogene laherparepvec compared with ipilimumab and vemurafenib [7].


Systematic Review

Relevant trials were identified through a systematic review conducted in September 2015 of English-language studies, published since January 1990, on the efficacy and safety of treatments for metastatic melanoma. All trials were subject to a quality assessment, to identify the appropriate highest quality trials for inclusion. The review followed Cochrane and Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines and was conducted in accordance with the key HTA agencies’ requirements for identifying evidence. Inclusion and exclusion criteria for the studies are presented in Table 1. The following databases were searched: MEDLINE, including MEDLINE In-Process Citations and Daily Update (PubMed) (OvidSP); Embase (OvidSP); Cochrane Library, including Cochrane Database of Systematic Reviews (CDSR), Database of Abstracts of Reviews of Effects (DARE); Cochrane Central Register of Controlled Trials (CENTRAL), HTA Database and NHS Economic Evaluation Database (NHSEED). Abstracts from the following conferences were also searched to identify relevant studies: American Society of Clinical Oncology (ASCO); ESMO; International Society for Pharmacoeconomics and Outcomes Research (ISPOR; European and international conferences); European Association of Dermato Oncology (EADO); European Cancer Congress (ECC).

Table 1 Inclusion and exclusion criteria for the systematic review

Data from each study’s comparator arm were extracted, including study design, patient characteristics, treatment (including dose, duration), and results on primary and secondary endpoints, and safety endpoints or outcomes reported. Studies with a low risk of bias were identified using the Grades of Recommendation, Assessment, Development and Evaluation (GRADE) criteria [8].

Establishing the Feasibility of a Valid Network Meta-Analysis

Based on the findings from the systematic literature review, the feasibility of establishing a valid network meta-analysis of talimogene laherparepvec compared with ipilimumab and vemurafenib was explored using a process established and published by Cope et al. [9]. A valid network of evidence could not be established according to the Cope algorithm because of a lack of sufficient comparative, head-to-head data and of studies with sufficient common comparators such as dacarbazine, the glycoprotein 100 peptide vaccine (gp100), and GM-CSF. In addition, there were issues around the exchangeability of patient populations across the trials. Therefore, an alternative method was used to inform the indirect comparisons [10]. Specifically, a meta-analysis of absolute efficacy was undertaken, controlling for known prognostic differences between studies, which allowed for OS over time for each treatment to be compared. The methodology for this analysis is described below.

Alternative Approaches to Indirect Treatment Comparison

Alternative approaches to indirect treatment comparison include simulated treatment comparison and matching-adjusted indirect comparison [11, 12]. Simulated treatment comparison is an approach in which detailed predictive equations are constructed to characterize a single index trial for which individual patient-level data are available. Equations include enrollment, randomization, and follow-up. External baseline data from other studies can then be used to simulate those patients’ experience and outcomes according to the index trial. In matching-adjusted indirect comparison, the index trial for which individual patient-level data are available is reweighted using propensity score-type approaches, so that it matches the characteristics of another study.

Neither of these approaches were considered feasible for this analysis due to the complexity of the prognostic information, combined with the heterogeneity in patient and trial characteristics, including need to consider disease stage, age, gender, visceral disease, brain metastases, lactate dehydrogenase (LDH) levels, in addition to any other patient or study characteristics. In the case of simulated treatment comparison there were not sufficient data for the required equations; for matching-adjusted indirect comparison, there was also a limitation in the matching across many prognostic factors, and the need to match to several studies.

For this analysis, a treatment-specific meta-analysis of absolute treatment effect was undertaken, which involved analysis of independent data on OS for talimogene laherparepvec, ipilimumab, and vemurafenib in each published study, but separate analyses of each drug at a time. No attempt was made at network meta-analysis, following the assessment using the Cope framework. However, the outcomes of each relevant treatment arm in the studies used were adjusted for heterogeneity in prognostic factors (i.e., external data were adjusted accordingly to their baseline characteristics), to be comparable to the OPTiM trial. Adjustments were made using a published algorithm [13, 14].

Compared with the OPTiM trial, trials including ipilimumab and vemurafenib had higher percentages of patients with stage IV M1b/c melanoma, who have a greater mortality risk than patients with stage III melanoma (Table 2). Patients also varied in terms of other baseline characteristics, including gender, Eastern Cooperative Oncology Group (ECOG) performance status, presence of visceral metastases, presence of brain metastases, and LDH levels. Therefore, adjustment was needed to permit comparability of these factors with those of the OPTiM trial.

Table 2 Summary of randomized controlled Phase 3 trials included in the indirect treatment comparison, and patient characteristics used for adjustment of survival

The adjustment of survival for differences in baseline characteristics was based upon a predictive model for survival that was developed by Korn et al. using pooled data from 2100 patients with metastatic melanoma treated with variety of regimens from 42 trials conducted between 1975 and 2005 [13]. This is valuable in this instance as the Korn model is founded on a larger data set than the OPTiM trial would represent, and broader in terms of the baseline characteristics, so that it should be less prone to bias. The Korn model demonstrated that four factors are associated with OS: gender, ECOG performance status, presence of visceral metastases, and presence of brain metastases. In 2014, a five-factor model was used in the National Institute for Health and Care Excellence (NICE) technology appraisal of ipilimumab for previously untreated advanced melanoma (NICE TA 319; [14]), in which the original Korn model was modified to include LDH level as the fifth factor. The modified Korn model was accepted by NICE and was used in this study.

Survival was adjusted using a hazard ratio (HR) as the modifier; that is, an HR was used that reflected the impact of the difference in patient characteristics between a given trial and the OPTiM trial. For example, a trial including more patients with better ECOG performance status, and more patients without visceral disease, would exhibit higher rates of survival even without treatment; therefore, survival in this trial would have to be adjusted downwards so that each trial’s baseline survival better matched baseline survival in the OPTiM trial, and it is this effect that the Korn algorithm achieves.

In the adjustment used, the trial-specific HR was estimated by applying the modified Korn model from NICE TA 319 [14], where \( \bar{X} \) is the proportion of each sample satisfying the condition (e.g., \( \bar{X}_{\text{Gender = Female}} \) is the proportion of females).

$$ { \log }\left( {\text{HR}} \right) = - 0.154\bar{X}_{\text{Gender = Female}} - 0.400\bar{X}_{\text{ECOG = 0}} - 0.285\bar{X}_{\text{Visceral = NO}} - 0.306\bar{X}_{\text{Brain = NO}} - 0.782\bar{X}_{\text{LDH = Normal}} $$

In the equation, all variables represent a better prognosis: if more patients in a trial are female, more patients have ECOG status 0, more patients have non-visceral melanoma, more patients do not have brain metastases, and/or more patients have normal LDH levels, prognosis (i.e., survival) improves, and the HR is lower. The ratio of the HR for a given trial and the HR for the OPTiM trial becomes the adjustment factor: \( {\text{HR}}\left( {\frac{{T_{\text{TVEC}} }}{{T_{\text{TRIAL}} }}} \right) = \frac{{{\text{HR}}_{{T_{\text{TVEC}} }} }}{{{\text{HR}}_{{T_{\text{TRIAL}} }} }} \).

Implicit in this is that each of \( T_{\text{TRIAL}} \) and \( T_{\text{TVEC}} \) are relative to the worst prognosis, when all of the factors in the equation equal zero and the adjusted HR equals 1.

Kaplan–Meier (KM) data were simulated at each time point for \( T_{\text{TRIAL}} \), assuming it had the patient population of \( T_{\text{TVEC}} \), which was calculated as \( S\left( t \right)_{{T_{\text{TRIAL}} |T_{\text{TVEC}} }} = S\left( t \right)_{{T_{\text{TRIAL}} }}^{{{\text{HR}}\left( {\frac{{T_{\text{TVEC}} }}{{T_{\text{TRIAL}} }}} \right)}} \).

If a drug was studied in more than one trial included in the analysis, the data from each trial were combined so that all survival data on that drug were included in the comparison. To do this, OS data were adjusted using the modified Korn model and were then pooled across studies using the Mantel–Haenszel method [15, 16], a fixed-effect model primarily for dichotomous outcomes that can be implemented in modeling survival counts by transformation of the survival data into hazards, or risks, period by period.

The procedure for this involves two stages: first, producing data containing events and non-events such that odds can be calculated; these data were then combined across studies to produce a pooled survival estimate. The data were not combined automatically on the basis of the single curve for survival; rather, the Mantel–Haenszel method combines the rates of death and censoring, across all studies, at each time point, and the Mantel–Haenszel survival curve is calculated from the resultant data.

Detailed procedures/steps involved are as follows:

  1. 1.

    Each study’s KM data (unadjusted and adjusted) were broken out using the Parmar algorithm [17, 18], to produce estimates, for each time period (in our analysis this was 1 month), of the number of patients at risk, the number of events (i.e., death or progression, depending upon whether OS data were being analyzed) and the number of censored data points.

  2. 2.

    In each time interval, the data were pooled using the Mantel–Haenszel method, which is as follows:

    1. (a)

      Pooled proportion of deaths in time interval (sum of proportions across included studies for each time point).

    2. (b)

      Pooled proportion of patients alive through time interval (sum of proportions across included studies for each time point).

    3. (c)

      Mantel–Haenszel odds of dying in time interval (a/b).

    4. (d)

      Estimated probability of death in the time interval (c/1 + c).

    5. (e)

      Estimated cumulative probability of surviving to the end of that time interval [probability of surviving to end of previous time interval × (1 − d)].

  3. 3.

    Finally, the pooled survival curve S(t) was created from E. In this method, confidence intervals also can be constructed around S(t).

Indirect Treatment Comparison of Subgroups

A subgroup indirect treatment comparison was also analyzed, comprising patients with no bone, brain, lung, or other visceral metastases (stage IIIB–IV M1a disease). For this subgroup analysis, the same methods outlined in the previous section were used.

Extracting Survival Data for Analysis

KM curves were extracted and digitized with DigitizeIt version 2.0.3 for studies selected in the systematic review [19]. The digitized dataset of each arm of each trial included the survival probability at consecutive half-month intervals. To establish the quality of the digitization outputs, median survival was determined for each of the digitized curves and compared with the median survival published in each study.

Compliance with Ethics Guidelines

This article is based on previously conducted studies and does not involve any new studies of human or animal subjects performed by any of the authors.


Systematic Review and Trials Included in the Indirect Treatment Comparison

The systematic review PRISMA chart is provided in the online supplementary material. Randomized controlled trials (RCTs) were included in the meta-analysis if they were phase III trials, published since 2010, reported an OS curve and key baseline patient characteristics, and studied a licensed monotherapy agent and dose to treat patients with metastatic melanoma. These selection criteria were chosen to reflect the introduction of recent melanoma treatments (ipilimumab and vemurafenib), for which clinical trial publications are available only from 2010. Among the RCTs identified, four met the inclusion criteria and were included in the final indirect treatment comparisons: two for ipilimumab, one for vemurafenib, and one for talimogene laherparepvec (Table 2).

Table 2 highlights the differences between trials in terms of line of therapy, ECOG performance status, LDH status, presence of visceral disease, and presence of brain metastases. Based on these factors, in general, patients enrolled in the OPTiM trial appeared to have a better prognosis than other study populations. The ipilimumab RCT in previously untreated patients studied the combination of ipilimumab and dacarbazine at 10 mg/kg and ipilimumab is licensed for only monotherapy at 3 mg/kg. However, an OS curve was derived for ipilimumab monotherapy at 3 mg/kg for this study population in the NICE appraisal for ipilimumab in previously untreated disease. The derived OS curve was used in this study.

Overall Survival: All Patients

The prognostic patient characteristics used in the adjustments for each trial are presented in Table 2: gender, ECOG performance status, presence of visceral metastases, presence of brain metastases, and LDH levels. Since there are two RCTs for ipilimumab, OS adjustment was done for each individual trial, then the adjusted OS data were pooled across the two trials using the Mantel–Haenszel method [15, 16]. Data for talimogene laherparepvec and vemurafenib were not required to be pooled, being comprised of only a single clinical trial each. Estimated HR for death based on the modified Korn model and adjustment factors are presented in Table 3.

Table 3 Overall survival curve adjustment: HR and adjustment factor for all patients and early-stage subgroup analysis

Table 3 shows that adjustment factors ranged between 0.53 and 0.72 and were more closely clustered within each of the two patient populations (overall and subgroup); however, the results do suggest that adjustment using the modified Korn model had a material impact.

Unadjusted and adjusted median OS for each comparator are presented in Table 4. Adjusted median OS significantly increased compared to unadjusted median OS. This reflects the starting point of this analysis: that variation in disease and patient characteristics were biasing survival estimates, and by adjusting for this the survival data are now more comparable.

Table 4 Median overall survival in months: all patients and early-stage subgroup analysis

Unadjusted OS curves for ipilimumab and vemurafenib and the observed OS curve for talimogene laherparepvec are presented in Fig. 1, and unadjusted and adjusted OS curves for ipilimumab and vemurafenib are presented in Figs. 2 and 3, respectively.

Fig. 1
figure 1

Unadjusted Kaplan–Meier OS curves for ipilimumab and vemurafenib vs. observed OS curve for talimogene laherparepvec, all patients. OS overall survival, T-VEC talimogene laherparepvec

Fig. 2
figure 2

Unadjusted and adjusted Kaplan–Meier OS curves for ipilimumab vs. observed OS curve for talimogene laherparepvec, all patients. OS overall survival, T-VEC talimogene laherparepvec

Fig. 3
figure 3

Unadjusted and adjusted Kaplan–Meier OS curves for vemurafenib vs. observed OS curve for talimogene laherparepvec, all patients. OS overall survival, T-VEC talimogene laherparepvec

For both ipilimumab and vemurafenib, the adjustments improved survival along the entire survival curve (i.e., the entire OS curves shifted upward). The observed talimogene laherparepvec OS curve remained above the adjusted OS curves for ipilimumab and vemurafenib, based on a combination of gender, ECOG status, visceral and brain metastases, and LDH level. The figures also show difference in long-term survival, even after adjustment. As with the results in Table 4, the adjustment has increased survival for ipilimumab and vemurafenib in all cases.

Overall Survival: Patients with no Visceral Metastases (Stage IIIB–IV M1a Disease)

For the subgroup analysis of patients with no bone, brain, lung or other visceral metastases (stage IIIB–IV M1a disease), unadjusted and adjusted median OS values for each comparator are presented in Table 4. A consistently higher adjustment for patients with no bone, brain, lung, or other visceral metastases (stage IIIB–IV M1a disease), relative to all patients, can be observed—not just at the median of OS but along the entire survival curve (Figs. 2, 3, 4). This is predictable in that OS is expected to be longer for patients with no visceral disease than for those with visceral disease.

Fig. 4
figure 4

Adjusted Kaplan–Meier OS curves for ipilimumab and vemurafenib vs observed OS curve for talimogene laherparepvec, patients with no bone, brain, lung, or other visceral metastases (stage IIIB–IV M1a disease). OS overall survival, T-VEC talimogene laherparepvec

Considering the 95% confidence intervals around the adjusted data, the talimogene laherparepvec OS curve lies above the upper bound of the survival data for adjusted vemurafenib in both patient populations, while in the case of ipilimumab the upper bound of the survival curve can be seen to cross with the talimogene laherparepvec survival curve. This allows for some possibility that ipilimumab and talimogene laherparepvec are equivalently effective, adjusting for the factors in the Korn algorithm.


Treatment options for metastatic melanoma have evolved rapidly in the past 5 years and two key pathways, based around BRAF mutation status, have emerged: anti-PD1 antibodies (pembrolizumab and nivolumab) and ipilimumab for all patients, and BRAF/MEK inhibitor combinations for patients with BRAF-mutant melanoma. Talimogene laherparepvec has been approved for the treatment of metastatic melanoma regardless of BRAF status.

This study aimed to compare OS for talimogene laherparepvec with ipilimumab and vemurafenib, two of the most commonly used treatments of patients with metastatic melanoma. However, a conventional network meta-analysis was not technically feasible. Successive health technology appraisals of treatments for patients with metastatic melanoma have previously determined that adjusted indirect treatment comparison with the use of network meta-analysis is not feasible for metastatic melanoma [14, 20, 21].

We undertook an indirect treatment comparison using the modified Korn model, in which patient and disease characteristics are adjusted so that all trials reflect one reference trial in terms of key patient characteristics—in this case the pivotal talimogene laherparepvec clinical trial. This approach helps to overcome issues around generalizability and transferability of results between and across trials.

To our knowledge, this is the first treatment-specific meta-analysis of independent survival curves for metastatic melanoma that includes recently available therapies and that attempts to account for significant confounders such as stage of disease. The results from this analysis showed that the OS with talimogene laherparepvec appears to be at least as good as OS with ipilimumab and vemurafenib. OS was higher for patients treated with talimogene laherparepvec than with ipilimumab or vemurafenib after adjusting for differences in patient demographic and clinical characteristics across clinical trials; this improvement was more pronounced in patients with no bone, brain, lung or other visceral metastases (stage IIIB–IV M1a disease). The adjusted OS curve for vemurafenib was initially above, but later went below the adjusted OS curve for ipilimumab. This is consistent with the observation that ipilimumab is associated with a relatively low but durable response rate and that vemurafenib has a high response rate but the responses appear to be of limited duration because of the development of treatment resistance [22].

The findings from this analysis must be interpreted with caution because of some limitations. First, there is no network of RCTs for metastatic melanoma for which both direct and indirect comparisons exist. This would enable preservation of treatment randomization and consistency of indirect and direct comparisons and would potentially allow for meta-regression. The dearth of such an RCT network is attributed primarily to a lack of trials; indeed, the Cope framework [9] for assessing the feasibility of a network meta-analysis recommends identification of RCTs required to resolve the issue of a lack of feasibility, and this is in line with HTA agency assessments for ipilimumab and vemurafenib. Second, the algorithm used to adjust for differences in survival, specifically the original and modified Korn algorithms, has been used previously to adjust for heterogeneity, but has not been widely used in melanoma and might reflect specific clinical trials rather than patients with advanced melanoma generally. However, it is the only adjustment algorithm published and available. It was developed using a large meta-analysis of 42 Phase 2 trials, making up 70 trial arms, and thus should be robust in its use in melanoma. Third, the impact of subsequent therapies on the results of OS in talimogene laherparepvec, ipilimumab, and vemurafenib was not specifically adjusted for. However, subsequent therapies from those pivotal clinical trials seemed balanced. For example, in the OPTiM trial of talimogene laherparepvec versus GM-CSF, the proportion of patients receiving subsequent antimelanoma therapy was similar between arms (43% in the GM-CSF arm and 39% in the talimogene laherparepvec arm); and in the CA184-024 study ( identifier: NCT00324155) of ipilimumab plus dacarbazine versus dacarbazine, therapy after disease progression was balanced between the two groups; 54.7% of the patients in the ipilimumab group and 59.0% in the dacarbazine group received subsequent therapy. Finally, this report focuses on comparing talimogene laherparepvec to ipilimumab and vemurafenib because ipilimumab and vemurafenib are the most widely used newer agents on market. Comparisons to other antimelanoma systemic therapies, especially anti-PD-1 antibodies, can be interesting and warrant further research when the mature OS data become available for anti-PD-1s.

This study also has several strengths. Its main strength is that, even with the limited data at hand, the method for adjustment and meta-analysis allowed talimogene laherparepvec, ipilimumab, and vemurafenib to be compared with adjustments for this heterogeneity. This provided for a reliable understanding of the relative effect of treatment on survival in a more comparable patient population. The method permitted for more limited inference than comparative RCTs or network meta-analyses yet still supported the interpretation that relative clinical benefit (i.e., survival) is greater for patients with earlier stages of metastatic melanoma. The subgroup analysis extended this finding further to patients with stage IIIB–IV M1a disease, who are heterogeneously in a better health state and have a better underlying prognosis than patients in later stages of disease with visceral metastases.

Although this study makes an important contribution to understanding the relative efficacy of different treatments for metastatic melanoma, there is a clear need for active head-to-head randomized clinical trials that compare new treatments to each other or to common comparators in comparable patient populations that are generalizable to clinical practice. This will enable stronger evidence networks and more robust indirect treatment comparisons, to further enhance our knowledge of effective therapies for metastatic melanoma.


Even with limited data, talimogene laherparepvec, ipilimumab, and vemurafenib could be compared following adjustments, thereby providing a more reliable understanding of the relative effect of treatment on survival in a more comparable patient population. The results of this analysis suggest that overall survival with talimogene laherparepvec is at least as good as with ipilimumab and vemurafenib and improvement was more pronounced in patients with no bone, brain, lung or other visceral metastases.