GPT 3.5 READS NOVITAS L39365 on DECISION DX TESTS
GPT 3.5 :
I have received all three parts of the text regarding DecisionDx products (melanoma and squamous).
After analyzing the author's position, reasoning, and conclusions, it is evident that the author raises several concerns and criticisms about the peer-reviewed literature supporting the DecisionDx-Melanoma and DecisionDx-SCC tests. The author highlights various weaknesses and limitations in the studies, including issues with data collection, methodological inconsistencies, inadequate patient follow-up, definitional inconsistencies, and the lack of clinical utility evidence.
In terms of DecisionDx-Melanoma, the author questions the reliability and accuracy of the test due to incomplete data, limited patient follow-up, inconsistencies in definitions and outcomes, and variations in positive and negative predictive values reported across studies. The author also points out that the test's performance was often compared to AJCC staging without considering other clinical factors, potentially leading to misleading conclusions. Furthermore, the author emphasizes the evolving landscape of melanoma management, where other factors such as genetic mutations and targeted therapies have become significant in determining patient outcomes.
Regarding DecisionDx-SCC, the author highlights the complexities of gene expression profiling (GEP) tests and the potential pitfalls associated with them. The author questions the lack of addressing key variables in test development and the absence of transparency in published literature, which prevents a clear assessment of the test's integrity. Additionally, the author criticizes the limited patient follow-up in validation studies, the lack of evidence demonstrating improved patient outcomes, and the reliance on observational studies and surveys that have their own limitations.
Overall, the author concludes that the peer-reviewed literature supporting the DecisionDx-Melanoma and DecisionDx-SCC tests is insufficient to establish their analytic validity, clinical validity, and clinical utility in the Medicare population. As a result, the author deems these tests as not meeting the medically reasonable and necessary criteria for Medicare patients.
The author's position indicates a cautious and critical stance towards the DecisionDx products, emphasizing the importance of robust evidence, comprehensive data collection, and clinical utility in assessing the value of these tests. The concerns raised suggest a need for further research and rigorous evaluation to establish the reliability, accuracy, and impact of these tests on patient outcomes.
ORIGINAL NOVITAS TEXT L39365
DecisionDx – Castle Biosciences
DecisionDx-Melanoma
PubMed and Google Scholar were searched for peer-reviewed, evidence-based literature which provided information regarding analytic and clinical validity and clinical utility for the DecisionDx-Melanoma test. Key words used to search in combination included: DecisionDx, DecisionDx-Melanoma, Castle Biosciences, molecular testing, melanoma, skin cancer, sentinel lymph node biopsy, SLNB, GEP test, gene expression profile, stage I melanoma, stage II melanoma, stage III melanoma, 31-gene profile test, 28-gene profile test, cutaneous melanoma, and formalin-fixed paraffin embedded (FFPE) tissue.
The DecisionDx-Melanoma prognostic test (from Castle Biosciences) is described in numerous peer-reviewed publications. Forty total publications addressing analytic validity, clinical validity, or clinical utility were identified. Five papers were immediately excluded from review due to their publication in a non-peer reviewed journal (not found on PubMed).95-99 For the remaining 35 papers, three were editorial in nature or in response to editorial comments and four were evidence review papers without meta-analysis.100-106 There were two primarily meta-analysis papers identified, and additionally, one of these papers was frequently used as a source of raw data for other publications.107,108 The remaining 26 papers were identified as either validation papers or cohort studies (either prospective or retrospective, or both). Of the four validation papers, one paper described the development of the original GEP test and the other three papers described clinicopathologic syntheses with the GEP data that resulted in new information beyond the results of the original GEP test.109-112 It should be noted that these latter three papers were relatively new and do not currently have any follow-up publications that further evaluate the clinical validity and/or utility of the clinicopathologic and GEP combined results.110-112 One paper primarily assesses the analytic validity of DecisionDx-Melanoma, while seven papers address aspects of clinical utility.113 The remaining 14 papers mostly assess clinical validity through retrospective and/or prospective cohorts of patients, typically evaluating five-year outcomes in recurrence free survival (RFS), disease free survival (DFS), distant metastasis free survival (DMFS), melanoma specific survival (MSS), and/or overall survival (OS).114-127
DecisionDx-SCC
PubMed and Google Scholar were searched for peer-reviewed, evidence-based literature which provided information regarding analytic and clinical validity and clinical utility for the DecisionDx-SCC test. Key words used to search in combination included: DecisionDx, DecisionDx-SCC, Castle Biosciences, molecular testing, melanoma, skin cancer, prognostic test, SCC, cutaneous squamous cell carcinoma, cSCC, GEP test, gene expression profile, metastasis, 40-gene profile test, and formalin-fixed paraffin embedded (FFPE) tissue.
Twelve total publications addressing the analytical validity, clinical validity, and/or clinical utility of the DecisionDx-SCC prognostic test (from Castle Biosciences) were identified. All 12 identified studies were funded by, or written by employees of, Castle Biosciences. The papers identified included one panel review and three surveys of medical professionals.128-131 Additionally, two papers were evidence reviews without meta-analyses.132,133 The remaining six papers included two that addressed analytical validity, three cohort studies (both prospective and retrospective) that addressed clinical validity, and one case series that addressed clinical utility.134-139
DX DECISION ANALYSIS AND CONCLUSIONS
DecisionDx – Castle Biosciences
DecisionDx-Melanoma
In order to systematically evaluate such a large body of publications, we will discuss both test design and study design of DecisionDx-Melanoma in the context of what information is offered to providers and patients. Looking at an example report from the Castle Biosciences website, we can see that their test provides prognostic information based on GEP data alone (class assignment) and GEP data in combination with clinicopathologic data (i31-ROR and i31-SLNB).186 We will spend most of the discussion below focused on the class assignment portion of the results since most of the peer-reviewed literature focuses on this result alone and the i31-ROR and i31-SLNB were developed much more recently.
Fundamentally, DecisionDx-Melanoma is a GEP that analyzes 28 genes of interest (and considered by the company to be significantly informative of the prognosis of melanoma) and anchors these 28 genes to three reference genes. As controls, the three reference genes should provide a consistent baseline across all types of melanoma and non-melanoma tissue. Unfortunately, the literature described in the summary of evidence did not provide insight into the consistency of expression of the three reference genes across tissue types and other pre-analytic variables (for instance fixation time and age of formalin-fixed paraffin-embedded [FFPE] tissue). Of note, one of the reference genes (FXR1) serves as a gene of interest (not a control gene) in Castle Bioscience’s DecisionDx-UM.187
Gene expression profiles are founded on the principle of differential gene expression in a cell of interest, like a cancer cell, when compared to background cells, namely all other cells in the surrounding tissue. Tissues are comprised of many different types of cells. Skin, for instance, is comprised of a variety of cell types including melanocytes, keratinocytes, immune cells such as lymphocytes, structural cells such as fibroblasts and fibrocytes, hair generating cells, and specialized glandular cells such as apocrine cells. Many different factors influence gene expression in a cell including changes in the cell’s surroundings (as is seen in a response to sun damage) and the cell’s stage of development, such as whether a cell is part of a germinal layer (and mitotically active) or terminally differentiated. This means that even among cells of the same type (same lineage) the GEP can be different. Considering all of these complexities, a GEP from cells of interest can be difficult to untangle from the GEP of background cells.
In terms of DecisionDx-Melanoma, the test development publication from Gerami and colleagues did not adequately address these complexities inherent to GEPs.109 Many key questions were left unaddressed, such as: how did the relative quantity of tumor versus non-tumor cells affect test sensitivity, did lighter or darker skin tones affect the test outcome, did the test perform differently between histologically distinct melanomas such as acral lentiginous melanoma versus desmoplastic melanoma, and did the presence of sun damage affect test results? As a more specific example, we would expect the presence of tumor-infiltrating lymphocytes (TILs) to affect the outcome of a GEP test. Tumor-infiltrating lymphocytes change the composition of the background tissue, increasing the density of background cells. Additionally, TILs are thought to interact with tumor cells, which would suggest tumor cells would respond with differential GEP expression. These factors alone could influence the expression profile of the 28 genes in DecisionDx-Melanoma test, but until this scenario is tested, its impact is unknown. In summary, understanding the potential pitfalls of GEP testing is critical for understanding the reliability, performance, and accuracy of a GEP test.
During the development of any test, consideration and documentation of pre-analytic variables is critical for establishing test accuracy and precision. For instance, RNA extracted from formalin-fixed, paraffin-embedded tissue is a challenging analyte. Of all the macromolecules (like DNA, carbohydrates, lipids, and proteins), RNA is one of the most fragile and unstable macromolecules, meaning that adverse conditions such as delayed tissue fixation can result in the rapid degradation of RNA.188 When creating a valid clinical test (such as a GEP that uses RNA extracted from FFPE specimens), the material used to develop a test must be comparable to the material tested in the clinics. One cannot legitimately develop a test for blood using only urine, a test for females using only male specimens, or a test for breast cancer using only prostate cancer specimens. Similarly, a test using archived, older specimens cannot represent newer, <1 year-old specimens without, at minimum, comparisons demonstrating that the specimens behave similarly in the test. In the case of DecisionDx-Melanoma, test development and validation utilized archival specimens with ages up to 14 years.109 The effect of aged material on RNA integrity was not thoroughly addressed (only a brief statement about quantity and quality assessed using NanoDrop 1000 and Agilent Bioanalyzer 2100 was provided), and older material was not differentiated from newer material.
In a later study assessing the analytic validity of DecisionDx-Melanoma, Cook and colleagues presented data supporting the performance and reproducibility of their test.113 Per the paper introduction, Cook and colleagues stated that they performed their study in accordance with published guidelines, specifically the 2010 publication “Quality, Regulation and Clinical Utility of Laboratory-developed Molecular Tests” from the AHRQ and the 2011 publication “NCCN molecular testing white paper: effectiveness, efficiency, and reimbursement.”189,190 In the study, several points of concern were identified, a few of which are described below.
First, RNA stability studies only extracted RNA once from each specimen, relying on downstream analysis of the same pool of RNA (kept at -80°F) to confirm analytic validity.113 This means that the study did not assess the processes of macrodissection and RNA extraction for reproducibility and reliability. Note that per the AHRQ, “if the assay incorporates an extraction step, reproducibility of the extraction step should be incorporated into the validation studies, and likewise for any other steps of the procedure.”189
Second, while Cook and colleagues did try to evaluate the affect of FFPE block age on GEP testing, the study did not compare GEP results from the same tumor specimen at different time points.113 Instead, the study evaluated whether or not a GEP result could be obtained from an older FFPE block. Interestingly, Figure 3 in the paper diagrams test failure rates in yearly increments (for specimens aged up to four years) and then lumps together all data from specimens older than four years. Although 6,772 FFPE specimens were represented by Figure 3, the break down of how many specimens were found in each age category was not given. Also, it is not clear as to the origin or handling of these 6,772 specimens. Nowhere else in the paper are such large numbers (thousands versus hundreds) of specimens evaluated despite the apparent availability of 6,772 FFPE specimens. Not only does Figure 3 demonstrate an expected decline in the testability of older specimens, but it also highlights the quandary of using older, less reliable specimens to develop a test intended for clinical specimens that will invariably be under one year of age. Moreover, data regarding the measurement of RNA integrity (as was done in the Gerami publication from 2015) was not provided, even though this would be valuable for comparing specimens of different ages.109 Altogether, the evaluation represented by Figure 3 does not answer the question of whether an older specimen would have the same test result as a younger version of itself.
Cook and colleagues (along with other studies from Castle Biosciences) also failed to sufficiently address many other pre-analytic variables including, but not limited to:113
• Protocols for central pathologic review of cohort specimens:
• How many pathologists participated?
o What specific features were evaluated in each slide?
o How were discrepancies between outside report and internal review handled?
• Protocols for diagnosis of sentinel lymph nodes?
o How many sections/levels were evaluated per lymph node?
o Was immunohistochemistry used for every specimen to identify occult or subtle tumor deposits?
• How much time passed between biopsy or wide excision and placing the specimen in fixative?
• Was the fixation time (time in fixation before processing to FFPE) consistent for each specimen?
• Was the same fixative (e.g., formalin) used for each specimen?
• What was the time between tissue sectioning for slide creation and RNA extraction? Was this time consistent between different specimens?
• When cDNA was “preamplified” prior to testing, was the process confirmed to consistently amplify all relevant genes to the same, consistent degree or were some amplifications more efficient than others?
These questions address known pitfalls in both the comparability of specimens and the integrity of extracted RNA. Moreover, even if some of the above questions were addressed during test development, the lack of transparency in the published literature prevents clear assessment of the integrity of the test development.
In terms of study design, a prognostic test should ideally evaluate itself in the context of the current standard of care. We would anticipate that a prognostic test for malignancy would both compare its accuracy with the best prognostic standards available and would also compare itself against real world outcomes. Once accuracy is sufficiently established, proving clinical utility becomes crucial. One of the key factors in determining clinical utility is a test’s impact on patient outcome. A test without an improvement in patient outcome is not clinically utile for the purposes of Medicare coverage.
The initial assessment of newly diagnosed melanoma is complicated. For the primary melanoma alone, clinical and pathologic evaluations are critical for developing a proper plan of management for the patient. This plan must consider many factors both in the primary melanoma and the surrounding clinical context, including exposure and family histories. The American Joint Committee on Cancer (AJCC) acts as an authority on the grading and staging of primary melanomas based on many clinical, radiologic, and pathologic factors.191,192 Additionally, the AJCC provides extensive prognostic data tied to the factors used in the grading and staging of melanomas. Many of these factors are assessed during pathologic evaluation and include histologic features such as tumor mitotic rate, surface ulceration, and Breslow depth. At the same time, it must be recognized that AJCC staging is only one consideration in a multitude of data points that are considered by the clinical team when developing plans for patient management. For instance, a melanoma subtype, which is not explicitly factored into the AJCC scoring, can play a significant role in determining patient management. In general, the term “melanoma” represents a category of malignancies that are actually comprised of a spectrum of subtypes, each with their own etiologies, behaviors, and properties. For instance, acral lentiginous melanoma is known to be more aggressive than other subtypes of melanoma and have a poorer prognosis.193 Without consideration of this subtype, a patient could be misclassified as having a less dangerous form of melanoma based on AJCC clinical and pathologic staging alone. For this reason, while AJCC staging is invaluable to patient management in melanoma, it does not represent the only clinical consideration in patient care.
The development and assessment of DecisionDx-Melanoma relied heavily on comparisons to AJCC clinical and pathologic staging and the factors used in these AJCC scores. Often, the authors of DecisionDx-Melanoma studies would focus primarily on a single factor, such as sentinel lymph node positivity, and compare the prognostic value of that factor to the prognostic value of DecisionDx-Melanoma. This strategy often set up false dichotomies since in clinical practice a single prognostic factor such as sentinel lymph node biopsy is not considered in isolation without considering other clinical data. Even in more complicated, multifactorial comparisons, studies involving DecisionDx-Melanoma failed to account for the whole clinical and pathologic picture, sometimes only evaluating a limited number of factors used in AJCC scoring when attempting to establish the prognostic validity of the test. This can be seen in the variability of demographics and clinical information provided from study to study. In general, most studies at least provided information on patient age, Breslow thickness, presence/absence of tumor ulceration, and AJCC clinical and pathologic stages. Conversely, most studies did not provide information regarding the subtype of melanoma, location of primary tumor, presence/absence of transected tumor base, and presence/absence of lymphovascular invasion. Moreover, none of the studies identified in the Summary of Evidence provided sufficient information to determine the interrelationships between demographic and clinicopathologic data points. For instance, despite knowing the count of patients with a specific subtype of melanoma, one could not further explore other characteristics within a melanoma subtype group such as the average Breslow thickness per subtype group or the AJCC clinical stages represented in a subtype group.116
Of all the clinicopathologic factors used in describing melanoma, the Breslow thickness is a central factor, critical in both AJCC clinical and pathologic staging. Measuring Breslow thickness requires histologic identification of both the surface of the melanoma and the deepest point of tumor growth. Obviously, transection of the tumor base during biopsy or wide excision would compromise the accurate measurement of Breslow thickness. Moreover, since AJCC pathologic staging is primarily based on Breslow thickness (with subgrouping currently based on presence or absence of ulceration), undermeasurement of Breslow thickness can dramatically affect both clinical and pathologic stage assignment. For instance, according to AJCC’s 8th edition, the cutoff between pathologic stage T2 and T3 tumors is a Breslow thickness of 2 mm. Just looking at the pathologic stage without consideration of nodal or metastasis status, a T2 could be AJCC clinical stage I or II depending on the presence or absence of ulceration.192 However, a T3 melanoma will always be at least a clinical stage II tumor. Undermeasurement of a T3 melanoma without ulceration would drop the melanoma at least one clinical stage, from II to I. While this seems to be a minor technicality, several DecisionDx-Melanoma studies draw conclusions through comparison of clinical stage I and stage II melanomas (such as Zager, 2018).117 Interestingly, most of the DecisionDx-Melanoma studies do not present data on how many specimens were transected at the base of tumor, although this metric does appear in three more recent studies.110,126,194 In fact, the rate of transection in the more recent studies is striking, seen in 39.5%, 34.9%, and 53.29% of all specimens respectively.110,126,194 It is further notable that even with the presence of transection, the specimens were still used in the papers’ analyses and conclusions.
Limited patient follow-up proved to be another critical weakness in many of the DecisonDx-Melanoma studies. DecisionDx-Melanoma advertises its results as five year prognosticators for risk of recurrence, metastases, and/or death.186 At a baseline, data supporting this assertion must account for a minimum of five years of patient follow-up, even if the patient experiences a recurrence event. If the patient experiences a local recurrence, they may still develop distant metastases and/or pass away from the melanoma within the five year time frame, both events of which would be relevant to the DecisionDx-Melanoma prognostics. Of all the studies reviewed in the summary of evidence, only one study monitored all of its patients for a minimum of five years. 115 Even the publication from Gerami and colleagues in 2015 that described the development and validation of DecisionDx-Melanoma reported use of specimens with well under five years of follow-up.109 Their training cohort included patients with 0.06 years of follow up (claiming a median of 6.8 years for all training specimens), and their validation cohort included patients with 0.5 years of follow-up (claiming a median of 7.3 years all validation specimens).109 Overall, studies demonstrated median follow-ups of patients without disease recurrence that ranged from 1.5 to 7.5 years.115,117
Publications involving DecisionDx-Melanoma also lacked consistent definitions from study to study. Definitional inconsistency was well captured by Marchetti and colleagues in the Melanoma Prevention Working Group, which convened in 2020 to discuss prognostic GEP tests for melanoma.108 For instance, the definition of “melanoma recurrence,” as used in the outcome metrics of Disease-Free Recurrence (DFS) or Recurrence-Free Survival (RFS), differed from study to study. In Hsueh (2017), RFS was defined by regional and distant metastases while in Zager (2018) RFS included local metastases in addition to regional and distant metastases and excluded sentinel lymph node positivity.115,117 Podlipnik (2019) used the term DFS, defining it by “relapse” without further detail, and Keller (2019) used the term RFS without providing a clear definition altogether.121,122 A majority of studies indicated that the outcome risk estimates represented the first five years following a primary diagnosis of melanoma with only a few studies reducing the risk estimate to cover only the first three years. Note again that only one study from Zager and colleagues in 2018 included patients all followed-up for a minimum of five years.117
As evidenced in the previous paragraphs, there are several weaknesses in both the quality and thoroughness of data collection in DecisionDx-Melanoma studies as well as methodologic and definitional inconsistencies. In terms of conclusions and results, we see the potential corollaries of these weaknesses. For instance, the PPV and NPV of these studies are particularly striking. Not all papers used these metrics when evaluating their results, but when PPV and NPV were provided, their values changed dramatically from study to study. This finding is particularly relevant when examining the latest version of the DecisionDx-Melanoma report.186 The DecisionDx-Melanoma report supplies a three to four tier prognostic classification of melanomas (Classes 1A, 1B/2A, 2B). In one of the interpretation tables in the DecisionDx-Melanoma report, the classes (1A, 1B/2A, and 2B) are paired with the AJCC clinical stages (I, II, or III) to provide five year risk estimates for three potential outcomes: Melanoma-Specific Survival(MSS), Distant Metastasis-Free Survival (DMFS), and RFS. According to the report, this interpretation table may be referenced to Greenhaw and colleague’s publication in 2020.107 If we look at Greenhaw’s meta-analysis study, we find the PPV and NPV is only provided for RFS (PPV 46%; NPV 92%) and DMFS (PPV 35%; NPV 93%). Remember that PPV and NPV scores represent the number of true results divided by the number of all positive or negative results respectively, both true and false results. This means that for patients with a positive result, 35 of 100 patients will experience a distant metastasis (positive for this event) within five years of their original melanoma diagnosis and 65 will not experience a distance metastasis within five years of their original diagnosis. Negative predictive value provides the opposite reassurance, namely that a negative result means 93 of 100 patients will NOT experience a distant metastasis within five years while 7 of 100 patients will still experience a distant metastasis within five years. The reason the concept of PPV and NPV is described here in such basic detail is to highlight the risks of relying on the Class designation provided by DecisionDx-Melanoma to prognosticate patient outcome. The concern for test accuracy is further compounded when one considers that the PPV and NPV are different from study to study. The PPVs for DMFS for studies as described in the Summary of Evidence ranged from 14.6% to 62%.117,124 For reference, the only study with a minimum of five years of follow-up for all patients recorded a PPV of 40% for DMFS.117
Several studies were published addressing the clinical utility of DecisionDx-Melanoma. All of these studies focused on how DecisionDx-Melanoma would impact patient management, typically by measuring to what degree and how the test result changed patient management. Several of the studies utilized hypothetical scenarios and polled providers (ranging from trainee residents to practicing attendings) on how they would respond to these scenarios with and without DecisionDx-Melanoma results.195-198 These studies did not assess real world interactions of the test with patient management. A couple of studies prospectively measured changes in physician behavior and patient management when provided with DecisionDx-Melanoma results for their patients.184,199 However, as defined for the purposes of this LCD, a clinically utile test must positively affect patient outcome. While these six studies altogether demonstrated changes in physician behavior and/or patient management when DecisionDx-Melanoma was used, none of the studies demonstrated how this positively impacted patient outcome, ie, increasing patient survival. A demonstration of clinical utility could be accomplished in a clinical trial where patients’ overall survival is compared between patients where the test is used or patients managed without test results. To date, such a trial has not been performed for DecisionDx-Melanoma.
Finally, as discussed in an editorial review by WH Chan, MS and H Tsao MD, PhD in 2020, management of cutaneous melanoma has dramatically changed within the past few years.102 Prognosis determination plays less of a role in determining patient management when other factors are used to determine predictive (therapy-related) outcomes. For instance, sentinel lymph node biopsy status is used to determine if a patient should receive adjuvant chemotherapy. Targeted sequence analyses for specific gene mutations (such as BRAF V600E) now can inform clinicians on which targeted therapy would most benefit their patients. This changing landscape appears to be recognized by Castle Biosciences, who most recently added clinicopathologic algorithmic prognostic results to their test.186 Unfortunately, there is currently insufficient peer-reviewed literature to establish the clinical validity and utility of these new features: two papers as of the writing of this LCD, both of which are validation papers, one for i31-SLNB and the other for i31-ROR.110,111 Without more published literature, including clinical trials, the i31-SLNB and i31-ROR cannot be considered clinically valid or utile for Medicare patients.
It is beyond the scope of this LCD to provide comprehensive analysis of all individual papers reviewed. While the major concerns regarding peer-reviewed literature for DecisionDx-Melanoma are well characterized above, many other concerns were not detailed and still should be addressed, even if not detailed in this Analysis of Evidence. Examples of concerns not expounded in the analysis of evidence include:
• Inadequate information regarding patients with hereditary melanoma disorders
• Inadequate study of the effects of therapies on measured outcomes
• Inadequate information comparing melanomas with different mutational profiles (e.g., tumors with BRAF V600E)
In summary, the body of peer-reviewed literature concerning DecisionDx-Melanoma is insufficient to establish the analytic validity, clinical validity, and clinical utility of this test in the Medicare population. As such, this test does not currently meet medically reasonable and necessary criteria for Medicare patients and will not be currently covered.
DecisionDx-SCC
In 2020, Wysong and colleagues described a 40 GEP test (which would later become DecisionDx-SCC) for risk classification in cases of cutaneous squamous cell carcinoma (cSCC).134 Their biomarker study aimed to validate a GEP test that could assess the risk of metastasis in cSCC. Fundamentally, DecisionDx-SCC is a GEP that analyzes 34 genes of interest (considered by Castle Biosciences to be significantly informative of the prognosis of cSCC) and six control genes.
Gene expression profiles are founded on the principle of differential gene expression in a cell of interest, like a cancer cell, when compared to background cells, namely all other cells in the surrounding tissue. Tissues are comprised of many different types of cells. Skin, for instance, is comprised of a variety of cell types including melanocytes, keratinocytes, immune cells such as lymphocytes, structural cells such as fibroblasts and fibrocytes, hair generating cells, and specialized glandular cells such as apocrine cells. Many different factors influence gene expression in a cell including changes in the cell’s surroundings (as is seen in a response to sun damage) and the cell’s stage of development, such as whether a cell is part of a germinal layer (and mitotically active) or terminally differentiated. This means that even among cells of the same type (same lineage) the gene expression profile can be different. Considering all of these complexities, a GEP from cells of interest can be difficult to untangle from the GEP of background cells.
In terms of DecisionDx-SCC, the test development publication from Wysong and colleagues did not adequately address these complexities inherent to GEPs.134 Many key questions were left unaddressed, such as: how did the relative quantity of tumor versus non-tumor cells affect test sensitivity, did lighter or darker skin tones affect the test outcome, did the test perform differently between histologically distinct cSCCs, and did the presence of sun damage affect test results? As a more specific example, we would expect the presence of TILs to affect the outcome of a GEP test. Tumor infiltrating lymphocytes change the composition of the background tissue, increasing the density of background cells. Additionally, TILs are thought to interact with tumor cells, which would suggest tumor cells would respond with differential GEP expression. These factors alone could influence the expression profile of the 40 genes in the DecisionDx-SCC test, but until this scenario is tested, its impact is unknown. In summary, understanding the potential pitfalls of GEP testing is critical for understanding the reliability, performance, and accuracy of a GEP test.
An additional validation study from Borman and colleagues was published in 2022.135 This paper primarily focused on whether or not the DecisionDx-SCC test would provide “actionable class call outcomes.” They did not provide any information regarding patient follow-up or accuracy of the class call outcomes. While they did test for replication and precision, the sample sizes for these assessments were considerably smaller than the overall cohort used in the study. Additionally, they (as seen in other studies from Castle Biosciences) failed to sufficiently address many other pre-analytic variables including, but not limited to:
• Protocols for central pathologic review of cohort specimens:
o How many pathologists participated?
o What specific features were evaluated in each slide?
o How were discrepancies between outside report and internal review handled?
• How much time passed between biopsy or wide excision and placing the specimen in fixative?
• Was the fixation time (time in fixation before processing to FFPE) consistent for each specimen?
• Was the same fixative (e.g., formalin) used for each specimen?
• What was the time between tissue sectioning for slide creation and RNA extraction? Was this time consistent between different specimens?
• When cDNA was “preamplified” prior to testing, was the process confirmed to consistently amplify all relevant genes to the same, consistent degree or were some amplifications more efficient than others?
These questions address known pitfalls in both the comparability of specimens and the integrity of extracted RNA. Moreover, even if some of the above questions were addressed during test development, the lack of transparency in the published literature prevents clear assessment of the integrity of the test development.
The additional observational studies produced by Castle Biosciences included three cohort studies and a case series, all published between 2020 and 2022.136-138
Farberg and colleagues published a paper, using the same dataset as the validation study, aiming to assess whether or not the DecisionDx-SCC test could be integrated into the existing NCCN guidelines for the management of patients with cSCC.136 Another paper, from Aaron and colleagues, also used samples from the same dataset as the original validation study.137 This paper assessed whether DecisionDx-SCC could predict recurrence and “provide independent prognostic value to complement current risk assessment methods.” The third cohort study was from Ibrahim and colleagues whose paper attempted to clinically validate the DecisionDx-SCC test.138 In general, these studies had the same issues as those outlined above, and in the two studies that assessed rates of recurrence, patient follow-up data was not reported. The studies stated that “cases had a documented regional or distant metastasis, or documented follow-up of at least three years post-diagnosis of the primary tumor without a metastatic event” but did not give any further information.136-138 DecisionDx-SCC advertises its results as three year prognosticators for risk of recurrence, metastases, and/or death. At a baseline, data supporting this assertion must account for a minimum of three years of patient follow-up, even if the patient experiences a recurrence event. If the patient experiences a local recurrence, they may still develop distant metastases and/or death may result from the cSCC within the three year time frame, both events of which would be relevant to the DecisionDx-SCC prognostics.
Au and colleagues described two cases of cSCC, one with fatal recurrence and one without recurrence, and the retrospective results of DecisionDx-SCC testing on tissue samples from each case.139 While the results did show that the recurrent case was classified as high risk of recurrence and the non-recurrent case was classified as low risk, two cases are insufficient to provide meaningful insight into generalizability of the test in the normal population. Additionally, there is no evidence that the test results would have resulted in a change of management decisions for the cases or in eventual patient outcomes.
Aside from the paper from Au and colleagues, papers addressing clinical utility included surveys, a panel review, and literature reviews.128-133 These papers had a number of shortcomings and limitations, including, but not limited to:
• A high likelihood of selection and response bias in the surveys
• No description of survey participant recruitment methods
• An expert panel composed of Castle Bioscience employees, consultants, and researchers
• Reviews that cited the authors’, or their colleagues’, previous work without acknowledgement
• Lack of methods descriptions or, in review papers, inclusion criteria
Notably, there are no significant studies assessing patient outcomes or clinician treatment decisions in a real-world setting following a DecisionDx-SCC test. Without such data, clinical utility cannot be determined. For example, a demonstration of clinical utility could be accomplished in a clinical trial where patients’ overall survival is compared between patients tested with DecisionDx-SCC and patients managed without this test. To date, such a trial has not been performed for DecisionDx-SCC.
In summary, the body of peer-reviewed literature concerning DecisionDx-SCC is insufficient to establish the analytic validity, clinical validity, and clinical utility of this test in the Medicare population. As such, this test does not currently meet medically reasonable and necessary criteria for Medicare patients and will not be currently covered.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.