FormalPara Key Summary Points

Cutaneous melanoma is a growing public health concern, with annual incidence increasing by 3% per year and 7180 individuals expected to die from cutaneous melanoma in 2021 alone.

The current AJCC8 and NCCN guidelines utilize historical, pathological, and phenotypic risk factors to determine melanoma prognosis, but these do not account for genomic expression and may not optimize melanoma prognostic assessment.

Gene expression profile (GEP) tests are validated, reproducible, and consistent across studies.

Studies have demonstrated that integrating GEPs into AJCC8 and NCCN models can improve prognosis and clinical decision-making regarding sentinel lymph node biopsies.

Incorporating GEP testing into real-world clinical management has positively impacted patient outcomes.

Introduction

Cutaneous melanoma (CM) is the sixth most common malignancy in the USA [1,2,3,4]. It is estimated that one in 27 men and one of 40 women will receive a melanoma diagnosis in their lifetime [1,2,3,4]. Despite earlier diagnoses and management leading to fewer annual CM-related deaths, 7180 Americans are expected to die from CM in 2021, accounting for over $1.5 billion in annual healthcare spending [2, 5, 21].

Patients with CM are primarily staged according to the American Joint Committee on Cancer (8th edition [AJCC8]) criteria, including Breslow thickness, ulceration, sentinel lymph node (SLN) status, and presence of distant metastasis [6]. Patients without lymphatic spread or distant metastasis (stages I–II) typically have a better long-term prognosis or melanoma-specific survival (MSS) compared to those with SLN involvement (stage III) or distant metastasis (stage IV) [6, 7]. Based on these factors, National Comprehensive Cancer Network (NCCN) guidelines recommend increasing degrees of clinical management [8]. However, “low-risk” stage I–IIA CM still incurs morbidity and mortality, with a 5-year MSS of 99% for stage IA CM, 97% for stage IB CM, and 94% for stage IIA CM per AJCC8 model [6, 7].

Studies have identified additional factors outside of AJCC staging with varying degrees of prognostic utility, including the number of mitoses/mm2, tumor regression, tumor-infiltrating lymphocytes, lymphovascular invasion, tumor location, uncertain microstaging, and patient age [6,7,8,9]. Gene expression profile (GEP) tests were developed to gain insight into the tumor molecular biology to assist in prognostic assessment [9, 10]. Despite advancements in GEP technology and the increasing common use of GEP testing across other notable malignancies, including breast, prostate, lung, and colorectal cancer, controversy regarding their clinical implementation and validity in CM prognosis persists [11,12,13,14,15,16].

The purpose of this study was to review the available literature regarding the validity, accuracy, efficacy, and utility of commercially available prognostic GEP tests for CM and provide insight into the nuances of current controversies and real-world applications of GEP testing.

Methods

Literature Search

A MEDLINE search was performed using the keywords “cutaneous melanoma,” “primary melanoma,” “gene expression profile,” “prognosis,” “risk,” and “sentinel lymph node biopsy” and the Boolean terms “AND” and “OR” for full-length, original research, English-language articles and meta-analyses published between 2010 and 2021. Articles were screened, appraised, and selected based on Oxford Center for Evidence-based Medicine criteria for relevance investigating GEP use in augmenting CM prognosis, sentinel lymph node biopsy (SLNBx), and real-world clinical decision-making [16]. References from selected articles were also reviewed for relevant articles not found in the initial search. Final articles were distributed to members of the consensus panel for individual review.

Consensus Development Process

An eight-person consensus panel of dermatologists representing the Skin Cancer Prevention Working Group (SCPWG), comprised of physicians with additional specialized training in managing and diagnosing melanoma and non-melanoma skin cancers, convened on 28 October 2021 and 28 December 2021 to discuss issues surrounding the clinical implementation and appropriate use of GEP testing. Consensus statements were constructed based on the review and discussion of the selected articles.

A modified Delphi technique was used to achieve consensus among panel members [18]. The modified Delphi technique has been previously employed in the development of dermatologic expert panel consensus recommendations [18,19,20]. This technique utilizes serial rounds of real-time voting with a required supermajority (> 80%) to adopt a proposed statement. Statements undergo additional discussion, modification, and additional rounds of voting if supermajority approval is not achieved.

Ethics

The article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Results

The initial search identified 1064 studies/meta-analyses that met the search criteria. In reviewing the available literature of prognostic GEP tests, the panel found 21 original articles and meta-analyses that studied the 31-GEP test (DecisionDx-Melanoma; Castle Biosciences, Inc., Friendswood, TX, USA) [14, 31,32,33, 39,40,41,42,43,44,45,46,47,48,49, 59,60,61,62,63,64,65,66,67], five original articles that studied the 11-GEP test (Melagenix; NeraCare GmbH, Bönen, Germany) [14, 34,35,36, 50], and four original articles that studied the 8-GEP + CP test (Merlin; SkylineDx B.V., Rotterdam, the Netherlands) [37, 38, 51, 57].

The expert consensus panel developed six statements that all received supermajority approval using the modified Delphi technique (Table 1).

Table 1 Consensus statements arrived at by the expert consensus panel using the modified Delphi technique

Cutaneous Melanoma is a Growing Public Health Concern

Panel members note that while not a new public health concern, the overall impact CM has on population health continues to grow at a rate of approximately 3% annually [22,23,24,25,26]. While there is contention regarding potential overdiagnosis, especially of thinner melanomas, recent epidemiological studies suggest that improvements in screening and diagnostic techniques have decreased the incidence of thicker melanomas [22,23,24,25,26]. Given the increasing CM incidence, the panel further emphasizes the importance of appropriate and optimized resource allocation to best identify high-risk patients and populations that would benefit from increased clinical scrutiny and management.

The Current AJCC8 and NCCN Guidelines Utilize Historical, Pathological, and Phenotypic Risk Factors to Determine Melanoma Prognosis

The AJCC8 and NCCN guidelines provide a prognostic assessment of MSS based on clinicopathological CM stages [6,7,8,9]. AJCC8 guidelines focus on permutations of three factors: thickness (T), regional nodal and non-nodal metastasis (N), and distant metastases (M) (Table 2) and also include histological factors (e.g., ulceration) and, in advanced disease, serum lactate dehydrogenase levels [6,7,8,9]. Higher AJCC8 stages carry a worse 5-year MSS and an implied higher risk of local recurrence and distant metastasis (after complete surgical excision) [6, 7]. NCCN guidelines also include number of mitoses/mm2, presence of BRAF mutations, and additional histopathological factors (e.g., desmoplastic subtypes) to guide clinical decision-making based on inherent and potential risk [8].

Table 2 The American Joint Committee on Cancer (8th edition) melanoma clinicopathological staging and associated 5-year melanoma specific survival

The AJCC8 and NCCN Prognostic Model, Which Does not Account for Genomic Expression, may not Optimize Melanoma Prognostic Assessment

The AJCC8 and NCCN guidelines account for a large range of outcomes even among “lower-risk” CM stages, with an MSS of 99% for stage IA CM and 97% for stage IB CM, to 94% and 82% for stage IIA and stage IIC CM, respectively [6,7,8, 27]. There are also instances wherein lower AJCC8 CM stages carry a higher mortality than higher AJCC8 CM stages; for example, patients with stage IIIA disease have a 5-year MSS of 93% compared to those with a “lower” stage IIC disease, who have a 5-year MSS of 82% [6, 7].

An important clinical consideration is that while thin CM (≤ 1.0 mm Breslow depth) often have a good MSS, by sheer volume, stage I CM account for approximately 80% of all melanomas diagnosed. As a result, these thinner, “lower-risk” CM account for approximately 26% of all melanoma-specific mortality (MSM) [9, 27, 28]. Therefore, although thin CM may carry a better MSS, there may be a currently unidentified subset of patients with thin CM that have inherently increased morbidity (e.g., worse 5-year MSS, increased risk of recurrence, and potential metastasis). Additional factors (e.g., genetic mutations and tumor biology) beyond those included in AJCC8 staging likely contribute to the observed 5-year MSS overlap during the initial staging process. [29, 30]. Identifying these higher-risk thin CM may provide an opportunity to identify patients who may benefit from additional clinical scrutiny, individualized follow-up intervals, multispecialty referrals, radiographic monitoring, and adjuvant therapy.

GEP tests are Validated, Reproducible, and Consistent Across Studies

GEP tests use tumor tissue obtained from the initial biopsy to assess how different genes are regulated, thereby providing additional objective data for clinical decision-making [10, 28]. Several GEP tests have been developed and validated to supplement prognosis in stage I–III CM: the 31-GEP test [31,32,33], the 11-GEP test [34,35,36], and the 8-GEP + CP test [37, 38]. The authors note that different genes are assessed in each of these three GEP tests (Table 3). This may be due to the creation of these tests using retrospective methods among different study populations (e.g., 2 German trial sites for the 11-GEP test [35], 3 US-based sites for the 8-GEP + CP test [37], and 6 US-based sites for the 31-GEP test [31]) and/or focusing on different gene functions (e.g., the 11-GEP test focuses on genes associated with lower-risk CM [35], and the 8-GEP + CP test focues on genes associated with angiogenesis/hypoxia, coagulation, and epithelial-to-mesenchymal transition [37]). Although different genes are assessed between GEP tests, multiple independent studies (prospective) and meta-analyses have demonstrated these tests’ internal validity and prognostic consistency. Of note, the authors of this review found that among the original studies for each of these three commercially available GEP tests investigating morbidity (Table 4) and SLNBx status, available literature on the 31-GEP test was fourfold that available on the other two GEP tests, including several meta-analyses and prospective clinical trials with reproducible results, as well as outcomes from real-world clinical management studies [14, 31,32,33, 39,40,41,42,43,44,45,46,47,48,49, 59,60,61,62,63,64,65,66,67].

Table 3 Genes assessed in commercially available gene expression profile tests for cutaneous melanoma
Table 4 Prognostic end-points for original studies of commercially available gene-expression profile tests

Integrating GEPs into AJCC8 and NCCN Models can Improve Prognostic Accuracy

AJCC8 and NCCN models have material MSS overlap between distinct clinical stages; furthermore, they do not account for all clinically useful prognostic assessments, such as relapse-free or recurrence-free survival (RFS), which are an often used outcome for adjuvant therapy trials, or distant metastasis-free survival (DMFS) [6,7,8,9]. Multiple independent studies and meta-analyses have shown that GEP tests can risk-stratify patients to provide more granular interval assessments of RFS, DMFS, and MSM or survival (MSS) [39,40,41,42,43,44,45,46,47,48,49,50,51].

31-GEP Test

The prognostic 31-GEP test (DecisionDx-Melanoma; Castle Biosciences, Inc.) is a Medicare-reimbursable test that uses 28-gene targets and three control genes to assess tumor biology. Retrospective and prospective cohort studies determined that stage I-III CM designated as high risk (e.g. 31-GEP class 2) carried a significantly worse 5-year RFS [33, 39,40,41,42], 3-year RFS [44,45,46], 5-year MSS [33, 39,40,41,42], 5-year DMFS [39], 3-year DMFS [44,45,46], 5-year MSS [39,40,41,42] and metastasis-free survival (MFS) [39,40,41,42] than similarly staged GEP class 1 patients (Table 4). Prospective studies have also demonstrated that the 31-GEP test is a significant, independent predictor of RFS and DMFS for the “low-risk” AJCC8 stage I-IIA CM [47].

Data from prospective cohorts and the EXPAND and INTEGRATE clinical trials found the 31-GEP test could stratify AJCC8 prognostic models further [43]. Among patients considered at lower risk by AJCC staging (e.g., stage I-IIA CM), those with a 31-GEP class 2 result had significantly lower 3-year survival than those with a class 1 result for RFS (83 vs. 97%; p < 0.001), DMFS (87 vs. 99%; p < 0.001), and overall survival (OS)(90 vs. 98%; P = 0.01) [37]. Similar trends were noted for patients considered to be higher risk AJCC staging (e.g., stage IIB-III CM) for 3-year RFS (class 2: 52% vs. class 1: 79%; p = 0.02), DMFS (class 2: 74% vs. class 1: 79%; P = 0.40), and OS (class 2: 74% vs. class 1: 91%; p = 0.02) [43].

In addition, studies found increased precision using 31-GEP subclasses, with the class 2B designation carrying the worst 3-year RFS (60%), DMFS (78%), and OS (74%) compared to class 1A denoting lowest risk [43]. Importantly, clinical trial data have found that a stage I-IIA class 2B CM and stage IIB-III CM had similar rates of distant metastasis (21 vs. 24%) and deaths (29 vs. 22%) [43].

In multivariate meta-analyses [48, 49], retrospective studies [32, 33, 39, 41, 42], and prospective studies [43,44,45,46,47], the 31-GEP test has consistently been found to be a significant predictor with a strong negative predictive value (NPV) for RFS (NPV 94%) [43], DMFS (NPV 97%) [43], and OS (NPV 97%) [43] for class 1 patients, independent of AJCC8 prognostic criteria, including Breslow depth, ulceration, and SLN status.

11-GEP Test

The 11-GEP (Melagenix; NeraCare GmbH) uses eight prognostic targets and three reference genes to assess CM via a continuous scoring system in which “0” delineates “low-risk” (i.e., ≤ 0) versus “high-risk” (> 0) [44, 45]. A retrospective study of 291 stage I-III CMs and prospective study of 245 stage II CMs found that high-risk patients identified with the 11-GEP test have a significantly worse 5-year disease-free survival (DFS) (p < 0.001) and 5-year MSS(p = 0.001) than their low-risk counterparts (Table 4) [36]. In a prospective study of 245 stage II CM, high-risk patients had significantly worse 5-year RFS (p = 0.009), DMFS (p = 0.005), MSS (p = 0.018), and 10-year MSS (p = 0.018) (Table 4) [50]. In both studies, multivariate analysis found the continuous (p < 0.0068) [30] and binary stratification (p = 0.018) [50] 11-GEP to be a significant prognostic factor for MSS, independent of age or Breslow thickness [36, 50]. Of note, initial data also suggest a potential synergy of the 11-GEP with AJCC8 staging [35].

8-GEP + CP Test

The 8-GEP + CP test (Merlin, SkylineDx B.V.) is a logistic regression model comprised of eight genes and two clinicopathological factors initially designed and validated to predict SLN status [37, 38]. The model was then retrained for prognostic assessment in stage I-IIA CM [51]. In a retrospective study, the 8-GEP + CP test was found to be an independent predictor of 5-year RFS (stage I–III CM, p < 0.001; stage I-IIA, p = 0.006) and 5-year DMFS (stage I–III CM, p = 0.001; stage I–IIA CM, p = 0.025), but not of 5-year MSS, after accounting for age and Breslow thickness [51]. The 8-GEP + CP separated patients into high- and low-risk prognostic categories by 5-year RFS and DMFS with additional stratification for patients with SLN status and among patients with AJCC8 stages I-IIA (Table 4) [51]. The 8-GEP + CP did not significantly stratify/differentiate MSS between risk classes [51].

Prognostic GEP Tests can Augment Clinical Decision-Making Regarding SLN Biopsies

According to NCCN guidelines, SLNBx is not recommended for patients with < 5% likelihood of a positive node (e.g., T1a CM without additional risk factors), should be discussed with patients with a 5–10% likelihood of positivity (e.g., T1b or T1a with risk factors, including transected base, age < 40, or > 1 mitosis/mm2), and should be offered to patients with > 10% likelihood of SLN positivity (e.g., clinicopathological stage IB or higher) [8]. The panel notes the importance of holistically discussing the risks and benefits of SLNBx, including low detection rates (approx. 12–20%) [52], false-positive and false-negative rates, approximately 10.1% postoperative morbidity (including wound dehiscence, hematoma/seroma formation, and surgical site infection), as well as financial impact, with one study estimating a cost of $47,906 to detect one positive SLN [52,53,54].

31-GEP Test

In a study of prospectively tested patients with T1/T2 CM, a 31-GEP class 1A designation was found to predict a < 5% likelihood of SLN positivity among patients aged ≥ 65 years [46]. Furthermore, the 31-GEP test could further stratisfy prognsis according to risk when subclass designations are combined (e.g., class 1A or class 2B) with SLN status [46]. Among patients with T1/T2 SLN-negative CM aged ≥ 55 years, those with a 31-GEP class 2B result had significantly worse 5-year RFS (66.7 vs. 91.4%; p < 0.05), DMFS (76.2 vs. 93.4%; p < 0.05), and MSS (85.4 vs. 99.3%; p < 0.05) than patients with a class 1A result [46]. Prognostic differences were starker when patients with T1/T2 tumors, aged ≥ 55 years, who received a class 2B 31-GEP result and were SLN positive were compared to their class 1A counterparts in terms of 5-year RFS (19.1 vs. 91.4%; p < 0.001), DMFS (32.1 vs. 93.4%; p < 0.001), OS (18.5 vs. 96.3%; p < 0.001) and MSS (55.0 vs. 99.3%; p < 0.001) [46]. An additional study published after the SCPWG consensus meeting found an integrated 31-GEP (i31-GEP) test using a continuous score and traditional clinicopathological factors had a 98% NPV, was able to accurately identify up to 27% of 1674 patients with T1-4 CM, and was able to reclassify an additional 63% of patients with an intermediate probability of SLN positivity (5–10%) [55]. The additive prognostic ability and increased sensitivity and specificity of the 31-GEP and SLN status were also demonstrated in additional meta-analyses and clinical trials [43, 49].

8-GEP + CP Test

In development, the 8-GEP + CP test, which incorporates the expression of several tumor cell-adhesion genes found to be associated with SLN metastasis (e.g., integrin-β3 and tissue-type plasminogen activator), was noted to have a high NPV capable of identifying tumors (T1-T3) at a low risk of SLN metastasis [37, 56]. In two separate retrospective cohorts, the 8-GEP + CP had an NPV point estimate of 93.3–100% for T1 CM (T1a n = 8; T1b n = 77), 89.3%-93.3% for T2 CM, and 75.0%-100% for T3 CM [38, 57]. From a cohort of 208 patients, NPV point estimates were 92.9% for T1 CM and 100% for T2 and T3 CM for those aged ≥ 65 years [38]. Retrospective studies on the 8-GEP + CP also report an “SLNBx reduction rate” (or proportion of patients the 8-GEP + CP assessment alone would designate as “low-risk”) of 60.8% among patients with T1 CM, 24.1% for those with T2 CM, and 2.5% for those with T3 CM. Of note, this metric does not account for the error rate, noted to be 2–3% depending on the CM stage [38].

11-GEP Test

There were no relevant articles found within the literature search regarding the 11-GEP and SLNBx prognosis/prediction.

Incorporating GEPs into Real-World Clinical Management has Positively Impacted Patient Outcomes

In 2020–2021, a reported 27,051 patients were clinically tested using the 31-GEP [58], with a 2020 study estimating that 45.2% of dermatologists [59] ordered the test within the 2020 calendar year. More importantly, studies suggest that integrating 31-GEP results into traditional clinical management has positively impacted patient outcomes [27, 59,60,61,62,63,64,65].

31-GEP Test

One multicenter study found that in patients with stage I-II CM who were cared for by dermatologists and surgical/medical oncologists, 49% had their management directly altered by 31-GEP class results [60]; specifically, 91% of decreases in management intensity occurred in patients with a class 1 result [60]. In comparison, 72% of increased management intensity occurred in patients with a class 2 result, with significant management differences for patients with higher-risk 31-GEP classes regarding frequency of follow-up visits (p < 0.001) [52], imaging (p < 0.001) [52], and laboratory testing (p = 0.04) [60].

A single-center multidisciplinary prospective study also found a significantly increased likelihood of patients with a class 2 designation following up with surgical oncology and receiving a recommendation for adjuvant trial (stage I, 100 vs. 18%, p < 0.001; stage II, 64 vs. 36%, p < 0.05) versus patients with class 1 CM [61].

Additional studies have also found GEP class results affected the frequency of physical exams (p < 0.0001)[54] and of referrals (p < 0.0001) [62], with significantly more physicians altering management and opting to refer CM patients with a 0.7-mm Breslow depth if they also had a GEP class 2 result (p < 0.05) [63]. Furthermore, approximately 65% of class 1 results lead to surveillance intensity similar to AJCC8 stage I-IIA, while 98% of class 2 results lead to increased scrutiny, similar to patients with AJCC8 stage IIB-IV CM [64]. Overall, studies have found that 31-GEP class results had the potential to positively influence management decisions among physicians [63,64,65] and non-physician providers [66], consistent with the augmented risk stratification provided by GEP testing.

11-GEP/8-GEP + CP Test

No relevant articles were found in the literature search on the real-world use of the 11-GEP or 8-GEP + CP.

Discussion

The current AJCC8 [6, 7] and NCCN guidelines [8] provide a framework for a population-based approach for the management of CM. However, while these guidelines do provide a framework for many CM presentations, they may not account for all potential clinically relevant risk factors for an individual patient. As such, the NCCN guidelines recommend that a patient’s individual risk of disease recurrence drive clinical decision-making [8]. To that end, it is important that dermatologists have all the tools available to robustly risk-stratify CM patients to provide adequate follow-up, management, and therapy.

There is a large and growing amount of literature demonstrating the validity, efficacy, and utility of GEP tests in the diagnosis and management of CM [10, 12, 13, 27, 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51, 57, 59,60,61,62,63,64,65,66,67]. In prognostic assessments of morbidity and mortality, GEP tests may provide additional information to current AJCC8 prognostic models for MSS.[36, 39,40,41,42, 50]. Studies also suggest that GEP tests may provide more nuanced information to assist dermatologists in managing CM with regards to SLNBx in conjunction with current NCCN guidelines [37, 38, 43, 46, 49, 54] and potentially reduce unnecessary testing, procedures, and annual healthcare expenditures [28, 60,61,62,63,64,65,66]. While meta-analyses are not yet available for all prognostic CM GEP tests, two recent separate studies [49, 67] found similar and robust evidence for the 31-GEP test, which was also supported by clinical recommendations (including A-strength recommendations for the use of the 31-GEP to guide management of patients with negative SLNBx) from an independent consensus panel [10].

Despite these findings, several critical articles have been published regarding GEP testing [14,15,16]. While additional appraisal is important, context must be provided. Although randomized control trials (RCTs) are the gold standard for interventions [15, 68], prognostic tests are validated by repeated measurements in large cohorts and by meta-analyses that track consistency across studies (though they can be retro/proactively applied to cohorts within RCTs as has been performed for GEP tests for breast cancer) [69, 70]. These studies also query the lack of U.S. Food and Drug Administration approval. While this may be necessary for interventional therapies, prognostic tests are validated by repeated large prognostic studies and are performed only at accredited laboratories (i.e., with Clinical Laboratory Improvement Amendments [CLIA] and College of American Pathologists [CAP] certification).

Additional studies should include larger, prospective, novel cohorts to provide a more robust assessment of consistent endpoints: RFS, DMFS, and MSM/MSS. These studies are most essential among thin, “low-risk” CM, with the goal of providing better individualized care. Additional meta-analyses may also be performed to determine the repeated accuracy of these tests within various clinical contexts. GEP tests should also be further refined and integrated into current clinicopathological prognostic models to provide a more nuanced, graded risk assessment for dermatologists and patients, as opposed to the majority binary classifications that are currently widely used. Finally, these additional studies may also include real-world data to determine both physician experience and patient outcome regarding the real-world use of GEP tests.

Conclusion

CM poses a substantial public health risk, with approximately 100,000 new cases of invasive CM diagnosed in the USA annually. While current diagnostic and prognostic models for CM management can provide identification and risk stratification of suspicious pigmented lesions, studies have found that the incorporation of GEP tests into current algorithms may provide an objective, non-invasive method to improve the accuracy of risk prediction to inform clinical management decisions and optimize patient care. By studying the molecular underpinnings of CM, dermatologists will ideally be able to reduce unnecessary costs and morbidity associated with CM while providing more individualized care to patients.