Level of Evidence Review for a Gene Expression Profile Test for Cutaneous Melanoma
The advent of molecular medicine may allow for individualized cancer prognostication, which should enable better clinical management and, hopefully, improve patient outcomes. A 31-gene expression profile (31-GEP) test is currently available for patients diagnosed with cutaneous melanoma; this test helps inform patients’ individual treatment plans, especially when combined with traditional biomarkers.
The objective of this study was to review the current literature and establish the level of evidence for a cutaneous melanoma 31-GEP test.
A review of seven development and validation studies for the 31-GEP test was conducted. The respective strengths and weaknesses of each study were applied to the level of evidence criteria from major organizations that publish guidelines for melanoma management: American Joint Committee on Cancer, National Comprehensive Cancer Network, and American Academy of Dermatology.
Evaluating each study led to classifying the 31-GEP test as level I/II, I–IIIB, and IIA according to American Joint Committee on Cancer, National Comprehensive Cancer Network, and American Academy of Dermatology criteria, respectively. This stands in contrast to the official unrated status conferred by the American Joint Committee on Cancer and National Comprehensive Cancer Network and the II/IIIC rating designated by the American Academy of Dermatology.
Differences between the authors’ findings and official published ratings may be attributed to chronological issues, as many of the studies were not yet published when the aforementioned organizations conducted their reviews. There was also difficulty in applying the National Comprehensive Cancer Network criteria to this prognostic test, as their guidelines were intended for evaluation of predictive markers. Nevertheless, based upon the most current data available, integration of the 31-GEP test into clinical practice may be warranted in certain clinical situations.
Recent advances in molecular medicine have led to the development of a 31-gene prognostic test for patients with cutaneous melanoma.
The available literature for the 31-gene expression profile test was evaluated in the context of the level of evidence criteria used by the American Joint Committee on Cancer, National Comprehensive Cancer Network, and American Academy of Dermatology.
The 31-gene expression profile test may be warranted in appropriate clinical scenarios.
The American Cancer Society predicted that 91,270 new melanomas would be diagnosed in the USA in 2018 and that 9320 people would die from the disease during that time . This represents a 46% increase in incidence compared with 2008 . There has also been little change in the prognosis for those diagnosed with melanoma (13–10% mortality) [1, 2].
Historically, practitioners have relied on a variety of clinical and histopathological features for prognostication. Recent advances have led to the introduction of molecular tests that may improve our ability to predict disease course. Synergy between the modern field of genetic medicine and the traditional practice of dermatology may be the key to the most accurate individualized prognostication. Knowledge of disease progression risk is essential for directing patients to the best form of treatment, which should optimize outcomes.
Evidence-based evaluation of the accuracy and clinical impact of prognostic tools is critical for the appropriate guidance of patient management, and well-characterized level of evidence (LOE) systems have been established accordingly. This review seeks to evaluate the LOE for DecisionDx-Melanoma (Castle Biosciences, Inc., TX, USA), a 31-gene expression profile (31-GEP) test for cutaneous melanoma, and rank its validity according to guidelines utilized by major dermatologic organizations: American Joint Committee on Cancer (AJCC), National Comprehensive Cancer Network (NCCN), and American Academy of Dermatology (AAD).
The AJCC, NCCN, and AAD are considered major authoritative organizations that many dermatologists rely upon to provide skin cancer guidelines. However, each employs a unique ranking system to assign a LOE for each element as it relates to the diagnosis, prognostication, and management of melanoma.
American Joint Committee on Cancer levels of evidence 
Level of evidence
The available evidence includes consistent results from multiple, large, well-designed, and well-conducted national and international studies in appropriate patient populations, with appropriate endpoints and appropriate treatments. Both prospective studies and retrospective population-based registry studies are acceptable; studies should be evaluated based on methodology rather than chronology
The available evidence is obtained from at least one large, well-designed, and well-conducted study in appropriate patient populations with appropriate endpoints and with external validation
The available evidence is somewhat problematic because of one or more factors, such as the number, size, or quality of individual studies; inconsistency of results across individual studies; appropriateness of the patient population used in one or more studies; or the appropriateness of outcomes used in one or more studies
The available evidence is insufficient because appropriate studies have not yet been performed
National Comprehensive Cancer Network levels and categories of evidence 
Level of evidence
Prospective, marker primary objective. Well powered or meta-analysis
Prospective, marker the secondary objective
Retrospective, outcomes, multivariate analysis
Retrospective, outcomes, univariate analysis
Retrospective, correlation with other markers, no outcomes
Prospective using archived samples
PCT designed to address tumor marker
Prospective trial not designed to address tumor marker, but design accommodates tumor marker utility
Prospective observational registry, treatment and follow-up not dictated
No prospective aspect to study
Patients and patient data
Prospectively enrolled, treated, and followed in PRCT
Prospectively enrolled, treated, and followed up in clinical trial and, especially if predictive utility is considered, a PRCT addressing the treatment of interest
Prospectively enrolled in registry, but treatment and follow-up standard of care
No prospective stipulation of treatment or follow-up; patient data collected through retrospective chart review
Specimen collection, processing, and archival
Specimens collected, processed, and assayed for specific marker in real time
Specimens collected, processed, and archived prospectively using generic SOPs; assayed after trial completion
Specimens collected, processed, and archived prospectively using generic SOPs; assayed after trial completion
Specimens collected, processed, and archived with no prospective SOPs
Statistical design and analysis
Study powered to address tumor marker question
Study powered to address therapeutic question and underpowered to address tumor marker question. Focused analysis plan for marker question developed before performing assays
Study not prospectively powered at all; retrospective study design confounded by selection of specimens for study. Focused analysis plan for marker question developed before performing assays
Study not prospectively powered at all; retrospective study design confounded by selection of specimens for study. No focused analysis plan for marker question developed before performing assays
Results unlikely to be a play of chance. Although preferred, validation not required
Results more likely to be a play of chance than A, but less likely than C. Requires one or more validation studies
Results very likely to be play of chance. Requires subsequent validation studies
Requires subsequent validation studies
Strength of Recommendation Taxonomy criteria levels of evidence and recommendation 
Level of evidence
Prognostic types of study
Good-quality patient-oriented evidence
Systematic review/meta-analysis of good-quality cohort studies
Prospective cohort study with good follow-up
Limited-quality patient-oriented evidence
Systematic review/meta-analysis of lower quality cohort studies or with inconsistent results
Retrospective cohort study or prospective cohort study with poor follow-up
Extrapolations from bench research
Disease-oriented evidence (intermediate or physiologic outcomes only)
Case series for studies of diagnosis, treatment, prevention, or screening
Strength of recommendation
Recommendation based on consistent and good-quality patient-oriented evidence
Recommendation based on inconsistent or limited-quality patient-oriented evidence
Recommendation based on consensus, usual practice, opinion, disease-oriented evidence, and case series for studies of diagnosis, treatment, prevention, or screening
3 31-Gene Expression Profile Test
The 31-GEP is a molecular test used to predict cutaneous melanomas as high or low risk based on the tumor’s genetic profile. Using 28 prognostic genes and three control genes, as well as a training cohort of 164 melanoma cases in predictive modeling, melanomas are characterized as Class 1 or Class 2, with Class 1 given a favorable prognosis and Class 2 posing a poor outlook in terms of 5-year recurrence-free survival, distant metastasis-free survival, and melanoma-specific survival.
3.2 Literature Review of Clinical Validity
The first 31-GEP study is a detailed account of its inception and a validation of its prognostic accuracy. Gerami et al. utilized previously published genomic data comparing primary vs. metastatic melanoma to select genes that varied considerably and consistently between the two tumor types . This literature-derived genetic profile was applied to the 164-sample training cohort with known outcomes. Using machine learning on this training cohort to classify new samples, the 31-GEP test was then validated in an independent retrospective cohort of 104 archived samples from multiple institutions. A Kaplan–Meier analysis of the two classes identified by the 31-GEP test demonstrated, with significance, a 66% difference in disease-free survival rates. Following validation, the study compared the 31-GEP test to other prognostic indicators, including Breslow thickness, ulceration, mitotic rate, and age. A multivariate analysis found that the 31-GEP classifications were independent predictors of metastatic risk (hazard ratio [HR] 9.55, confidence interval [CI] 2.3–39.5, p = 0.002).
The next study compared the 31-GEP test to the sentinel lymph node biopsy (SLNB), by retrospectively evaluating 217 samples from numerous cancer centers . The investigators found that the positive predictive value of the SLNB was similar to the molecular test (55% and 50% with 95% CI of 42–68 and 42–59, respectively). More importantly, the negative predictive value of the 31-GEP test surpassed the SLNB (82% and 67% with 95% CI of 59–74 and 71–89, respectively). The strength of the 31-GEP test’s negative predictive value is particularly noteworthy, as many patients who die from metastatic melanoma are initially SLNB negative . This fact was reflected in the study findings, as the combination of SLNB and 31-GEP testing led to significantly improved prognostication when compared with either modality alone .
Ferris et al. further refined the role of the 31-GEP test in prognostication by evaluating its validity and utility for stage I and II cutaneous melanoma, a subset known to account for the majority of melanoma deaths [9, 10]. Using 205 specimens from multiple centers, a head-to-head comparison of the 31-GEP test vs. the AJCC Individualized Melanoma Patient Outcome Prediction Tool was performed . In regard to all three outcomes of recurrence, distant metastasis, and death, the genetic test was ultimately more sensitive, while the AJCC calculator was more specific.
This study complemented previous articles that found combining molecular testing with traditional clinical markers yielded improved prediction of risk than either tool alone . Ferris et al. determined the sensitivity of the AJCC prediction tool combined with the 31-GEP test to be at least 88% for recurrence, 85% for distant metastasis, and 82% for death . This represented a minimum increase of 4% and a maximum increase of 54% in sensitivity from either test alone. As expected, specificity declined when combining the two tools; however, this was at most a 22% decline in specificity in contrast to the 54% increase in sensitivity.
The investigators also considered circumstances when the AJCC calculator and the genetic test did not agree, such that tumors would be classified as Class 1/AJCC high risk and Class 2/AJCC low risk. Of the 43 cases with contrary classifications, the number of study outcomes (recurrence, distant metastasis, or death) in the Class 1/AJCC high risk was 38%, whereas in the Class 2/AJCC low risk was 46%. While these findings hint at the increased sensitivity of the 31-GEP test as described above, the small sample size and the use of 124 previously analyzed specimens preclude definitive conclusions from being drawn.
The next assessment of the 31-GEP test was the first prospective evaluation of its prognostic capability. Hsueh et al. evaluated 322 patients from 11 different medical centers in a 1.5-year interim analysis of a 5-year study . In comparison to a positive SLNB and the presence of ulceration, the Class 2 designation was more sensitive for recurrence (40%, 60%, and 80% respectively, and p < 0.0001 for all), distant metastasis (50%, 75%, and 83%, and p < 0.001, < 0.0001, and < 0.0001, respectively), and death (9%, 45%, and 73%, and p = 1, 0.04, and 0.0001, respectively) predictions. Furthermore, a multivariate analysis indicated that Class 2 designations were associated with a significant 7.15 HR for recurrence risk. Analysis of the HR for Class 2 in regard to distant metastasis or overall survival was not significant; however, these two outcomes were limited in number, which may have caused difficulties when assessing for significance. Importantly, ulceration was not associated with a significant HR.
Zager et al. reaffirmed previous findings by increasing the size of the retrospective cohort analyzed . Five hundred and twenty-three previously unreported cutaneous melanomas from 16 facilities were evaluated using the 31-GEP test to determine the risk of recurrence and distant metastasis. As determined previously, Class 1 vs. Class 2 had significantly different levels of risk, with Class 2 associated with a worse prognosis regardless of tumor stage. Notably, this study subdivided Class 1 and Class 2 designations into A and B subclasses, such that Class 1A has the best prognosis, Class 2B has the worst prognosis, and Classes 1B/2A have intermediate prognoses. Similar to Gerami et al. , this study demonstrated high accuracy metrics for both classifications and found combining 31-GEP testing with SLNB increased prognostic accuracy more than either alone.
A recent study directly evaluated the utility of the 31-GEP test to identify high-risk lesions amongst tumors traditionally categorized as low risk. Melanomas usually considered to be low risk included: thin tumors ≤ 1 mm (T1), stage I–IIA disease, and those with negative SLNB. Gastman et al. assessed 690 cutaneous melanomas from a pooled cohort that excluded samples previously used for test development . Comparison of tumors with a negative SLNB paired with a Class 1A vs. Class 2B designation found that melanomas with the higher risk molecular classification were associated with a significantly worse prognosis, despite the lesions’ traditionally low-risk profile. These results were echoed in the evaluations of molecular categorizations for other cutaneous melanomas that met the standard criteria of low risk. Furthermore, for lesions classified as thin or stage I–IIA, a multivariate analysis accounting for thickness, ulceration, and mitotic rate found that the 31-GEP Class 2B was a significant predictor of recurrence-free survival.
Finally, Greenhaw et al. retrospectively assessed a registry of 256 patients with cutaneous melanoma who, either at the time of diagnosis or first follow-up visit, received the 31-GEP test as part of their clinical care . This study demonstrated a 99% negative predictive value for a Class 1 designation. The sensitivity of the molecular test was also substantial, with 77% of melanomas that recurred accurately called Class 2.
3.3 Literature Review of Analytic Validity and Clinical Utility
The primary focus of this review is to objectively assess the clinical validity of the 31-GEP test. However, adoption of a molecular test into clinical practice requires additional considerations, such as the analytic validity and clinical utility of the test. The available literature addressing the analytic validity and clinical utility of the 31-GEP test is briefly summarized here.
Cook et al. evaluated the analytical validity, or test reliability, of the 31-GEP test through multiple inter-assay, inter-instrument, and inter-operator studies . One hundred and sixty-eight melanoma samples tested on 2 consecutive days yielded an inter-assay concordance score of 99%. Inter-instrument validity was assessed by comparing probability scores generated by two models of the same machine and two entirely different machines. The total sample size of 43 was associated with a 95% concordance rate between instruments. Finally, inter-operator concordance was evaluated using 298 melanoma samples and generated a concordance value of 100%.
Clinical utility studies, necessary to demonstrate if and how testing impacts patient management decisions by physicians, have also been performed for the 31-GEP assay [16, 17, 18, 19, 20, 21]. Changes to patient management for physician visits, imaging, laboratory workup, referrals, and SLNB guidance with 31-GEP testing have been assessed by the following study designs: prospective testing of 31-GEP impact on physician recommendations (247 patients, stage I–II at consent)  and retrospective chart reviews with prospective testing of cases (156 and 91 patients, stage I–III) [17, 18]. These clinical impact studies reported patient management changes in approximately half of the patients tested with the 31-GEP test. Of these patients, follow-up, surveillance, and interdisciplinary care were generally reduced in intensity or frequency after a Class 1 result and increased with a Class 2 result. The majority of patients who had their management influenced by the 31-GEP test result were stage I–II. While patient outcomes were not assessed as part of these utility studies, the importance and contribution of appropriate clinical follow-up and surveillance for detection of distant disease and its impact on survival has been detailed elsewhere . Intended-use decision studies using hypothetical clinical vignettes and survey responses have demonstrated physicians and nurse practitioners are willing to use the 31-GEP test and re-evaluate management accordingly, particularly in patients with melanomas at least 0.5 mm thick [19, 20, 21].
The financial impact of the molecular test is also a significant factor to consider prior to implementation into clinical practice. Unfortunately, the current data evaluating the economic ramifications of the GEP-31 test are limited; however, incorporation of the molecular test within projected cost-of-care models suggests that the assay may result in a 31% net reduction in expenditure for those with T1/T2 disease by impacting surveillance and SLNB management . Additional studies are needed to fully assess the economic impact of 31-GEP testing.
4 Level of Evidence Evaluation for the 31-Gene Expression Profile Test
Evaluation of the 31-gene expression profile (31-GEP) test according to the American Joint Committee on Cancer (AJCC), National Comprehensive Cancer Network (NCCN), and American Academy of Dermatology (AAD) level of evidence guidelines
31-GEP test is supported by multiple retrospective studies and one interim prospective study. Both types are large and well designed
Numerical designation: 31-GEP test is supported by one well-powered prospective study and multiple retrospective studies that employ multivariate analyses; however, the prospective study is still in progress, which makes it difficult to draw definitive conclusions
Alphabetical designation: evaluation of the tumor marker is the primary objective for all the 31-GEP studies and, given the consistency between studies, the results are unlikely to be attributable to chance
The 31-GEP test is supported by multiple retrospective cohort studies and one prospective study with incomplete follow-up. The results are consistent between studies
4.1 American Joint Committee on Cancer Evaluation
The AJCC’s four-level hierarchy is the simplest to understand and apply. The seven articles meet level II criteria, as even the smallest study had a sample size of 205 subjects from several medical centers . Each study relied on quantitative outcomes assessed with appropriate statistical calculations: multivariate analysis and Kaplan–Meier curves. Despite the strict methodology, the results were overall significantly in favor of the molecular test [7, 9, 10, 11, 12, 13, 14].
The consistency maintained within multiple studies nearly qualifies the 31-gene test as a level I prognosticator; however, because its singular prospective study has limited follow-up, it would be premature to designate it with the highest degree of evidence . As such, the authors deem the current LOE for the 31-GEP test as a I/II according to AJCC criteria. Of note, the most recently published AJCC guidelines did not rank the molecular marker . This may be attributable to a shortage of studies assessing the 31-GEP test prior to publication of the cancer manual.
4.2 National Comprehensive Cancer Network Evaluation
Although the NCCN did not officially rank the 31-GEP test, understanding the molecular marker in the context of NCCN standards is useful in assessing its value. These guidelines are challenging to navigate, despite their stated goal of facilitating easy evaluation of prognostic biomarkers . Given the high number of retrospective studies employing multivariate analyses, the 31-GEP test meets criteria for an evidence level of III [7, 8, 10, 12, 13, 14]. It does not, however, meet criteria for level II, as it lacks studies prospectively evaluating the marker as a secondary outcome. This presents a conundrum when considering Hsueh et al.’s prospective trial that primarily evaluated the 31-GEP test . As it is only a 1.5-year interim report of a 5-year marker, it does not completely meet criteria for a level I marker, but the strength of the design and associated findings support a designation above level III. Therefore, the authors assign a LOE to the 31-GEP test that lies between I and III.
The second half of the NCCN LOE ranking (the alphabetical designation) is similarly equivocal in its application to this molecular marker. Hsueh et al. falls in the A category of a “prospective trial designed to address the tumor marker ;” the remaining articles are not as easily placed. The majority of the studies were centered on analyses of archived samples, which indicates the retrospective nature of the studies. This alone would exclude all but one study from the A, B, and C categorization that heavily relies on prospective classification [7, 8, 10, 12, 13, 14]. Simon et al. however, note that evaluating prognostic markers in the traditional prospective framework is costly, time consuming, and contingent upon very large sample sizes. Accounting for these factors, a prospective evaluation of a tumor marker may be so hindered that results may not emerge until after its clinical usefulness has expired . As such, it is difficult to penalize investigators for bypassing the traditional clinical trial model, which is best suited to evaluate therapeutic predictive biomarkers, given the poor fit of prospective studies for tumor marker evaluation.
The remaining two subcategories in each class are more applicable to the 31-GEP studies. Each of the six retrospective studies formulated an analytic plan prior to testing the collected specimens [7, 8, 10, 12, 13, 14]. This single factor eliminates category D as a possible designation (Table 2). The strongly significant findings in all seven studies also discredit the possibility that these results are “very likely to be a play of chance,” a key finding in the C designation . The conflicting evidence categorizations led the authors to assign a B ranking to the 31-GEP test, as this category is the only one without a direct contradiction that also allows for flexibility in trial design. This difficulty highlights the need for more inclusive LOE guidelines that can be applied more broadly to the emerging field of molecular medicine.
4.3 American Academy of Dermatology Evaluation
The Strength of Recommendation Taxonomy criteria used by the AAD also utilizes a dual number-alphabet ranking system (Table 3). Hsueh et al. is indeed a prospective study; however, despite promising results, the trial is not yet complete . As such, the authors feel it cannot meet the level I criteria that demands a “prospective cohort study with good follow-up .” This trial, combined with the remaining six retrospective analyses, meet the level II requirements, allowing the authors to designate the 31-GEP test as such. Additionally, given the consistency of the study findings, the authors believe an A recommendation is also warranted.
This IIA classification stands in contrast to the II/IIIC ranking granted by the AAD in regard to the use of prognostic molecular testing . Importantly, the AAD designation was centered upon only one validation study and one trial comparing the 31-GEP test to SLNB [7, 8]. Five more recent studies were not included. Evaluation of the molecular test on these two trials alone would also lead the authors to assign it a II/IIIC designation.
Based on the above studies assessing clinical validity, analytic validity, and clinical utility, the authors find the 31-GEP test to be particularly useful for patients with invasive melanoma or older patients with T1/T2 melanomas. For patients with invasive melanoma, the results of the molecular test may help guide the frequency of skin examinations and utilization of SLNB or imaging following diagnosis. Patients aged older than 65 years diagnosed with T1/T2 melanomas may also benefit from molecular testing, particular in the assessment of the risks and benefits of a SLNB.
To provide the highest level of care to their patients, dermatologists, like all medical practitioners, must keep abreast of the latest research and best practices. Relying on verified guidelines is critical, but it is equally as important to supplement with recommendations based on the latest quality research, especially when the lag time between updates is prolonged. The authors hope that this simplified literature review and its relationship to the major organizations’ standards of evidence will help clinicians make informed decisions regarding their own practice to better serve their patients.
Compliance with Ethical Standards
No external funding was used in the preparation of this article.
Conflict of interest
Danielle P. Dubin has no conflicts of interest that are directly relevant to the content of this article. Scott M. Dinehart and Aaron S. Farberg currently serve on the advisory board of Castle Biosciences.
- 1.American Cancer Society. Cancer facts & figures 2018. Atlanta: American Cancer Society; 2018. https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/annual-cancer-facts-and-figures/2018/cancerfacts-and-figures-2018.pdf.Google Scholar
- 2.American Cancer Society. Cancer facts & figures 2008. Atlanta: American Cancer Society; 2008. https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/annual-cancer-facts-and-figures/2008/cancer-factsand-figures-2008.pdf.Google Scholar
- 3.Amin MB, American Joint Committee on Cancer, American Cancer Society. AJCC cancer staging manual, 8th edn. Amin MB, editor-in-chief. Chicago: Springer; 2017.Google Scholar
- 4.Febbo PG, Ladanyi M, Aldape KD, et al. NCCN Task Force report: evaluating the clinical utility of tumor markers in oncology. J Natl Compr Cancer Netw. 2011;9(Suppl. 5):S1–32 (quiz S33).Google Scholar
- 10.Ferris LK, Farberg AS, Middlebrook B, et al. Identification of high-risk cutaneous melanoma tumors is improved when combining the online American Joint Committee on Cancer Individualized Melanoma Patient Outcome Prediction Tool with a 31-gene expression profile-based classification. J Am Acad Dermatol. 2017;76(5):818–25.CrossRefGoogle Scholar
- 13.Gastman BR, Gerami P, Kurley SJ, Cook RW, Leachman S, Vetto JT. Identification of patients at risk of metastasis using a prognostic 31-gene expression profile in subpopulations of melanoma patients with favorable outcomes by standard criteria. J Am Acad Dermatol. 2019;80(1):149–57.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.