FormalPara Key Points
Table 1

1 Introduction

The American Cancer Society predicted that 91,270 new melanomas would be diagnosed in the USA in 2018 and that 9320 people would die from the disease during that time [1]. This represents a 46% increase in incidence compared with 2008 [2]. There has also been little change in the prognosis for those diagnosed with melanoma (13–10% mortality) [1, 2].

Historically, practitioners have relied on a variety of clinical and histopathological features for prognostication. Recent advances have led to the introduction of molecular tests that may improve our ability to predict disease course. Synergy between the modern field of genetic medicine and the traditional practice of dermatology may be the key to the most accurate individualized prognostication. Knowledge of disease progression risk is essential for directing patients to the best form of treatment, which should optimize outcomes.

Evidence-based evaluation of the accuracy and clinical impact of prognostic tools is critical for the appropriate guidance of patient management, and well-characterized level of evidence (LOE) systems have been established accordingly. This review seeks to evaluate the LOE for DecisionDx-Melanoma (Castle Biosciences, Inc., TX, USA), a 31-gene expression profile (31-GEP) test for cutaneous melanoma, and rank its validity according to guidelines utilized by major dermatologic organizations: American Joint Committee on Cancer (AJCC), National Comprehensive Cancer Network (NCCN), and American Academy of Dermatology (AAD).

2 Algorithms

The AJCC, NCCN, and AAD are considered major authoritative organizations that many dermatologists rely upon to provide skin cancer guidelines. However, each employs a unique ranking system to assign a LOE for each element as it relates to the diagnosis, prognostication, and management of melanoma.

The AJCC employs a simple system in which, based on the available evidence, a particular disease-related item is ranked on a scale between I (1) and IV (4) (Table 1) [3]. Elements supported by multiple, large, well-designed, prospective or retrospective registry-based studies earn a LOE ranking of I. Those with only one large well-designed study supporting its use are given a ranking of II. Items earning a LOE of III have inconsistent results or inadequate size or quality. Finally, level IV items lack appropriate studies to support their validity or utility.

Table 1 American Joint Committee on Cancer levels of evidence [3]

The NCCN LOE ranking methodology is more complex. Febbo et al. proposed a system that combines conventional and modern ranking schematics to account for the difficulty in conducting prospective trials for tumor markers [4]. This resulted in the LOE of tumor markers being designated by both number and letter (Table 2). The classical numbers system ranges from I (1) to V (5) and is largely based upon study size and prospective vs. retrospective design. The letters spanning A to D allow for flexibility in the definition of “prospective studies,” such that studies using archived samples may still be “prospective” if certain criteria are met.

Table 2 National Comprehensive Cancer Network levels and categories of evidence [4]

Finally, the AAD relies on the Strength of Recommendation Taxonomy criteria [5, 6]. Items are designated a level I, II, or III depending on the focus and quality of the study and, correspondingly, assigned a level A, B, or C recommendation based on the caliber of evidence (Table 3). For prognosis, tumor markers are assigned the highest level, level I, if the evidence is derived from a prospective cohort study with sufficient follow-up or a systematic review/meta-analysis of good-quality cohort studies. Level II is given to those supported by studies that are not prospective or systematic reviews/meta-analyses of lower quality cohort studies. Finally, level III is assigned to items that are supported by other forms of evidence aside from formal studies. As dictated by the LOE, markers are given an A recommendation if the evidence is consistent, high quality, and patient oriented, a B recommendation if the evidence is inconsistent or lower quality, and a C recommendation if it is not based on strict evidence.

Table 3 Strength of Recommendation Taxonomy criteria levels of evidence and recommendation [6]

3 31-Gene Expression Profile Test

3.1 Overview

The 31-GEP is a molecular test used to predict cutaneous melanomas as high or low risk based on the tumor’s genetic profile. Using 28 prognostic genes and three control genes, as well as a training cohort of 164 melanoma cases in predictive modeling, melanomas are characterized as Class 1 or Class 2, with Class 1 given a favorable prognosis and Class 2 posing a poor outlook in terms of 5-year recurrence-free survival, distant metastasis-free survival, and melanoma-specific survival.

3.2 Literature Review of Clinical Validity

The first 31-GEP study is a detailed account of its inception and a validation of its prognostic accuracy. Gerami et al. utilized previously published genomic data comparing primary vs. metastatic melanoma to select genes that varied considerably and consistently between the two tumor types [7]. This literature-derived genetic profile was applied to the 164-sample training cohort with known outcomes. Using machine learning on this training cohort to classify new samples, the 31-GEP test was then validated in an independent retrospective cohort of 104 archived samples from multiple institutions. A Kaplan–Meier analysis of the two classes identified by the 31-GEP test demonstrated, with significance, a 66% difference in disease-free survival rates. Following validation, the study compared the 31-GEP test to other prognostic indicators, including Breslow thickness, ulceration, mitotic rate, and age. A multivariate analysis found that the 31-GEP classifications were independent predictors of metastatic risk (hazard ratio [HR] 9.55, confidence interval [CI] 2.3–39.5, p = 0.002).

The next study compared the 31-GEP test to the sentinel lymph node biopsy (SLNB), by retrospectively evaluating 217 samples from numerous cancer centers [8]. The investigators found that the positive predictive value of the SLNB was similar to the molecular test (55% and 50% with 95% CI of 42–68 and 42–59, respectively). More importantly, the negative predictive value of the 31-GEP test surpassed the SLNB (82% and 67% with 95% CI of 59–74 and 71–89, respectively). The strength of the 31-GEP test’s negative predictive value is particularly noteworthy, as many patients who die from metastatic melanoma are initially SLNB negative [9]. This fact was reflected in the study findings, as the combination of SLNB and 31-GEP testing led to significantly improved prognostication when compared with either modality alone [8].

Ferris et al. further refined the role of the 31-GEP test in prognostication by evaluating its validity and utility for stage I and II cutaneous melanoma, a subset known to account for the majority of melanoma deaths [9, 10]. Using 205 specimens from multiple centers, a head-to-head comparison of the 31-GEP test vs. the AJCC Individualized Melanoma Patient Outcome Prediction Tool was performed [10]. In regard to all three outcomes of recurrence, distant metastasis, and death, the genetic test was ultimately more sensitive, while the AJCC calculator was more specific.

This study complemented previous articles that found combining molecular testing with traditional clinical markers yielded improved prediction of risk than either tool alone [8]. Ferris et al. determined the sensitivity of the AJCC prediction tool combined with the 31-GEP test to be at least 88% for recurrence, 85% for distant metastasis, and 82% for death [10]. This represented a minimum increase of 4% and a maximum increase of 54% in sensitivity from either test alone. As expected, specificity declined when combining the two tools; however, this was at most a 22% decline in specificity in contrast to the 54% increase in sensitivity.

The investigators also considered circumstances when the AJCC calculator and the genetic test did not agree, such that tumors would be classified as Class 1/AJCC high risk and Class 2/AJCC low risk. Of the 43 cases with contrary classifications, the number of study outcomes (recurrence, distant metastasis, or death) in the Class 1/AJCC high risk was 38%, whereas in the Class 2/AJCC low risk was 46%. While these findings hint at the increased sensitivity of the 31-GEP test as described above, the small sample size and the use of 124 previously analyzed specimens preclude definitive conclusions from being drawn.

The next assessment of the 31-GEP test was the first prospective evaluation of its prognostic capability. Hsueh et al. evaluated 322 patients from 11 different medical centers in a 1.5-year interim analysis of a 5-year study [11]. In comparison to a positive SLNB and the presence of ulceration, the Class 2 designation was more sensitive for recurrence (40%, 60%, and 80% respectively, and p < 0.0001 for all), distant metastasis (50%, 75%, and 83%, and p < 0.001, < 0.0001, and < 0.0001, respectively), and death (9%, 45%, and 73%, and p = 1, 0.04, and 0.0001, respectively) predictions. Furthermore, a multivariate analysis indicated that Class 2 designations were associated with a significant 7.15 HR for recurrence risk. Analysis of the HR for Class 2 in regard to distant metastasis or overall survival was not significant; however, these two outcomes were limited in number, which may have caused difficulties when assessing for significance. Importantly, ulceration was not associated with a significant HR.

Zager et al. reaffirmed previous findings by increasing the size of the retrospective cohort analyzed [12]. Five hundred and twenty-three previously unreported cutaneous melanomas from 16 facilities were evaluated using the 31-GEP test to determine the risk of recurrence and distant metastasis. As determined previously, Class 1 vs. Class 2 had significantly different levels of risk, with Class 2 associated with a worse prognosis regardless of tumor stage. Notably, this study subdivided Class 1 and Class 2 designations into A and B subclasses, such that Class 1A has the best prognosis, Class 2B has the worst prognosis, and Classes 1B/2A have intermediate prognoses. Similar to Gerami et al. [7], this study demonstrated high accuracy metrics for both classifications and found combining 31-GEP testing with SLNB increased prognostic accuracy more than either alone.

A recent study directly evaluated the utility of the 31-GEP test to identify high-risk lesions amongst tumors traditionally categorized as low risk. Melanomas usually considered to be low risk included: thin tumors ≤ 1 mm (T1), stage I–IIA disease, and those with negative SLNB. Gastman et al. assessed 690 cutaneous melanomas from a pooled cohort that excluded samples previously used for test development [13]. Comparison of tumors with a negative SLNB paired with a Class 1A vs. Class 2B designation found that melanomas with the higher risk molecular classification were associated with a significantly worse prognosis, despite the lesions’ traditionally low-risk profile. These results were echoed in the evaluations of molecular categorizations for other cutaneous melanomas that met the standard criteria of low risk. Furthermore, for lesions classified as thin or stage I–IIA, a multivariate analysis accounting for thickness, ulceration, and mitotic rate found that the 31-GEP Class 2B was a significant predictor of recurrence-free survival.

Finally, Greenhaw et al. retrospectively assessed a registry of 256 patients with cutaneous melanoma who, either at the time of diagnosis or first follow-up visit, received the 31-GEP test as part of their clinical care [14]. This study demonstrated a 99% negative predictive value for a Class 1 designation. The sensitivity of the molecular test was also substantial, with 77% of melanomas that recurred accurately called Class 2.

3.3 Literature Review of Analytic Validity and Clinical Utility

The primary focus of this review is to objectively assess the clinical validity of the 31-GEP test. However, adoption of a molecular test into clinical practice requires additional considerations, such as the analytic validity and clinical utility of the test. The available literature addressing the analytic validity and clinical utility of the 31-GEP test is briefly summarized here.

Cook et al. evaluated the analytical validity, or test reliability, of the 31-GEP test through multiple inter-assay, inter-instrument, and inter-operator studies [15]. One hundred and sixty-eight melanoma samples tested on 2 consecutive days yielded an inter-assay concordance score of 99%. Inter-instrument validity was assessed by comparing probability scores generated by two models of the same machine and two entirely different machines. The total sample size of 43 was associated with a 95% concordance rate between instruments. Finally, inter-operator concordance was evaluated using 298 melanoma samples and generated a concordance value of 100%.

Clinical utility studies, necessary to demonstrate if and how testing impacts patient management decisions by physicians, have also been performed for the 31-GEP assay [16,17,18,19,20,21]. Changes to patient management for physician visits, imaging, laboratory workup, referrals, and SLNB guidance with 31-GEP testing have been assessed by the following study designs: prospective testing of 31-GEP impact on physician recommendations (247 patients, stage I–II at consent) [16] and retrospective chart reviews with prospective testing of cases (156 and 91 patients, stage I–III) [17, 18]. These clinical impact studies reported patient management changes in approximately half of the patients tested with the 31-GEP test. Of these patients, follow-up, surveillance, and interdisciplinary care were generally reduced in intensity or frequency after a Class 1 result and increased with a Class 2 result. The majority of patients who had their management influenced by the 31-GEP test result were stage I–II. While patient outcomes were not assessed as part of these utility studies, the importance and contribution of appropriate clinical follow-up and surveillance for detection of distant disease and its impact on survival has been detailed elsewhere [22]. Intended-use decision studies using hypothetical clinical vignettes and survey responses have demonstrated physicians and nurse practitioners are willing to use the 31-GEP test and re-evaluate management accordingly, particularly in patients with melanomas at least 0.5 mm thick [19,20,21].

The financial impact of the molecular test is also a significant factor to consider prior to implementation into clinical practice. Unfortunately, the current data evaluating the economic ramifications of the GEP-31 test are limited; however, incorporation of the molecular test within projected cost-of-care models suggests that the assay may result in a 31% net reduction in expenditure for those with T1/T2 disease by impacting surveillance and SLNB management [23]. Additional studies are needed to fully assess the economic impact of 31-GEP testing.

4 Level of Evidence Evaluation for the 31-Gene Expression Profile Test

The above summaries of the various studies assessing the 31-GEP test are most useful when understood in the context of the LOE structures championed by the major dermatologic and oncology organizations. All physicians rely on evidence-based medicine to inform clinical decisions and provide the highest level of care to patients [24]. Table 4 provides a summary of the following discussion for the LOE for the 31-GEP test interpreted according to the AJCC, NCCN, and AAD guidelines.

Table 4 Evaluation of the 31-gene expression profile (31-GEP) test according to the American Joint Committee on Cancer (AJCC), National Comprehensive Cancer Network (NCCN), and American Academy of Dermatology (AAD) level of evidence guidelines

4.1 American Joint Committee on Cancer Evaluation

The AJCC’s four-level hierarchy is the simplest to understand and apply. The seven articles meet level II criteria, as even the smallest study had a sample size of 205 subjects from several medical centers [10]. Each study relied on quantitative outcomes assessed with appropriate statistical calculations: multivariate analysis and Kaplan–Meier curves. Despite the strict methodology, the results were overall significantly in favor of the molecular test [7, 9,10,11,12,13,14].

The consistency maintained within multiple studies nearly qualifies the 31-gene test as a level I prognosticator; however, because its singular prospective study has limited follow-up, it would be premature to designate it with the highest degree of evidence [11]. As such, the authors deem the current LOE for the 31-GEP test as a I/II according to AJCC criteria. Of note, the most recently published AJCC guidelines did not rank the molecular marker [3]. This may be attributable to a shortage of studies assessing the 31-GEP test prior to publication of the cancer manual.

4.2 National Comprehensive Cancer Network Evaluation

Although the NCCN did not officially rank the 31-GEP test, understanding the molecular marker in the context of NCCN standards is useful in assessing its value. These guidelines are challenging to navigate, despite their stated goal of facilitating easy evaluation of prognostic biomarkers [4]. Given the high number of retrospective studies employing multivariate analyses, the 31-GEP test meets criteria for an evidence level of III [7, 8, 10, 12,13,14]. It does not, however, meet criteria for level II, as it lacks studies prospectively evaluating the marker as a secondary outcome. This presents a conundrum when considering Hsueh et al.’s prospective trial that primarily evaluated the 31-GEP test [11]. As it is only a 1.5-year interim report of a 5-year marker, it does not completely meet criteria for a level I marker, but the strength of the design and associated findings support a designation above level III. Therefore, the authors assign a LOE to the 31-GEP test that lies between I and III.

The second half of the NCCN LOE ranking (the alphabetical designation) is similarly equivocal in its application to this molecular marker. Hsueh et al. falls in the A category of a “prospective trial designed to address the tumor marker [4];” the remaining articles are not as easily placed. The majority of the studies were centered on analyses of archived samples, which indicates the retrospective nature of the studies. This alone would exclude all but one study from the A, B, and C categorization that heavily relies on prospective classification [7, 8, 10, 12,13,14]. Simon et al. however, note that evaluating prognostic markers in the traditional prospective framework is costly, time consuming, and contingent upon very large sample sizes. Accounting for these factors, a prospective evaluation of a tumor marker may be so hindered that results may not emerge until after its clinical usefulness has expired [25]. As such, it is difficult to penalize investigators for bypassing the traditional clinical trial model, which is best suited to evaluate therapeutic predictive biomarkers, given the poor fit of prospective studies for tumor marker evaluation.

The remaining two subcategories in each class are more applicable to the 31-GEP studies. Each of the six retrospective studies formulated an analytic plan prior to testing the collected specimens [7, 8, 10, 12,13,14]. This single factor eliminates category D as a possible designation (Table 2). The strongly significant findings in all seven studies also discredit the possibility that these results are “very likely to be a play of chance,” a key finding in the C designation [4]. The conflicting evidence categorizations led the authors to assign a B ranking to the 31-GEP test, as this category is the only one without a direct contradiction that also allows for flexibility in trial design. This difficulty highlights the need for more inclusive LOE guidelines that can be applied more broadly to the emerging field of molecular medicine.

4.3 American Academy of Dermatology Evaluation

The Strength of Recommendation Taxonomy criteria used by the AAD also utilizes a dual number-alphabet ranking system (Table 3). Hsueh et al. is indeed a prospective study; however, despite promising results, the trial is not yet complete [11]. As such, the authors feel it cannot meet the level I criteria that demands a “prospective cohort study with good follow-up [6].” This trial, combined with the remaining six retrospective analyses, meet the level II requirements, allowing the authors to designate the 31-GEP test as such. Additionally, given the consistency of the study findings, the authors believe an A recommendation is also warranted.

This IIA classification stands in contrast to the II/IIIC ranking granted by the AAD in regard to the use of prognostic molecular testing [5]. Importantly, the AAD designation was centered upon only one validation study and one trial comparing the 31-GEP test to SLNB [7, 8]. Five more recent studies were not included. Evaluation of the molecular test on these two trials alone would also lead the authors to assign it a II/IIIC designation.

5 Conclusions

Based on the above studies assessing clinical validity, analytic validity, and clinical utility, the authors find the 31-GEP test to be particularly useful for patients with invasive melanoma or older patients with T1/T2 melanomas. For patients with invasive melanoma, the results of the molecular test may help guide the frequency of skin examinations and utilization of SLNB or imaging following diagnosis. Patients aged older than 65 years diagnosed with T1/T2 melanomas may also benefit from molecular testing, particular in the assessment of the risks and benefits of a SLNB.

To provide the highest level of care to their patients, dermatologists, like all medical practitioners, must keep abreast of the latest research and best practices. Relying on verified guidelines is critical, but it is equally as important to supplement with recommendations based on the latest quality research, especially when the lag time between updates is prolonged. The authors hope that this simplified literature review and its relationship to the major organizations’ standards of evidence will help clinicians make informed decisions regarding their own practice to better serve their patients.