Post-radiotherapy stage III/IV non-small cell lung cancer radiomics research: a systematic review and comparison of CLEAR and RQS frameworks

Background Lung cancer, the second most common cancer, presents persistently dismal prognoses. Radiomics, a promising field, aims to provide novel imaging biomarkers to improve outcomes. However, clinical translation faces reproducibility challenges, despite efforts to address them with quality scoring tools. Objective This study had two objectives: 1) identify radiomics biomarkers in post-radiotherapy stage III/IV nonsmall cell lung cancer (NSCLC) patients, 2) evaluate research quality using the CLEAR (CheckList_for_EvaluAtion_of_Radiomics_research), RQS (Radiomics_Quality_Score) frameworks, and formulate an amalgamated CLEAR-RQS tool to enhance scientific rigor. Materials and methods A systematic literature review (Jun-Aug 2023, MEDLINE/PubMed/SCOPUS) was conducted concerning stage III/IV NSCLC, radiotherapy, and radiomic features (RF). Extracted data included study design particulars, such as sample size, radiotherapy/CT technique, selected RFs, and endpoints. CLEAR and RQS were merged into a CLEAR-RQS checklist. Three readers appraised articles utilizing CLEAR, RQS, and CLEAR-RQS metrics. Results Out of 871 articles, 11 met the inclusion/exclusion criteria. The Median cohort size was 91 (range: 10–337) with 9 studies being single-center. No common RF were identified. The merged CLEAR-RQS checklist comprised 61 items. Most unreported items were within CLEAR’s “methods” and “open-source,” and within RQS’s “phantom-calibration,” “registry-enrolled prospective-trial-design,” and “cost-effective-analysis” sections. No study scored above 50% on RQS. Median CLEAR scores were 55.74% (32.33/58 points), and for RQS, 17.59% (6.3/36 points). CLEAR-RQS article ranking fell between CLEAR and RQS and aligned with CLEAR. Conclusion Radiomics research in post-radiotherapy stage III/IV NSCLC exhibits variability and frequently low-quality reporting. The formulated CLEAR-RQS checklist may facilitate education and holds promise for enhancing radiomics research quality. Clinical relevance statement Current radiomics research in the field of stage III/IV postradiotherapy NSCLC is heterogenous, lacking reproducibility, with no identified imaging biomarker. Radiomics research quality assessment tools may enhance scientific rigor and thereby facilitate radiomics translation into clinical practice. Key Points There is heterogenous and low radiomics research quality in postradiotherapy stage III/IV nonsmall cell lung cancer. Barriers to reproducibility are small cohort size, nonvalidated studies, missing technical parameters, and lack of data, code, and model sharing. CLEAR (CheckList_for_EvaluAtion_of_Radiomics_research), RQS (Radiomics_Quality_Score), and the amalgamated CLEAR-RQS tool are useful frameworks for assessing radiomics research quality and may provide a valuable resource for educational purposes in the field of radiomics.


Introduction
Lung cancer is the second most prevalent malignancy worldwide, with approximately 2.2 million newly diagnosed cases in 2020 [1], the majority of which are nonsmall cell lung cancer (NSCLC), comprising nearly 84% of cases [2].NSCLC stage I and II are typically surgically managed, while treatment for locally advanced unresectable stage III and metastatic stage IV often necessitates adjuvant radiotherapy, frequently combined with chemotherapy and sometimes immunotherapy.
Despite therapeutic advancements, there has been only marginal improvement in the 5-year survival rates for stage III/IV from 24.6% in 2016 to 26.4% in 2020 [2].Consequently, the research focus has shifted towards screening, diagnosis, and personalized management strategies to ameliorate both quality of life and survival outcomes.
Radiomics, an emerging field, leverages noninvasive techniques to extract radiomic features (RFs) from medical images, surpassing standard radiology reporting.RFs, also known as texture analysis, capture grey-level intensities and spatial relationships within the region of interest (ROI) in two-dimensional (2D) pixel and threedimensional (3D) voxel spaces, hypothesized to be associated with tissue heterogeneity and tumor microenvironment [3][4][5][6][7].A primary objective of radiomics is to provide predictive imaging biomarkers that, in conjunction with clinical parameters, could improve diagnosis and treatment prognostication, quality of life, and overall survival (OS), aligning with personalized and precision medicine goals.
Despite the substantial volume of NSCLC radiomics research, the translation into clinical practice has been constrained by technical and methodological challenges, resulting in studies with low statistical power and decreased replicability, reproducibility, and generalizability [3,[8][9][10][11][12][13]. Quality scoring tools and checklists, such as the Radiomics Quality Score (RQS) with 16 items and a maximum point score of 36, and the CheckList for EvaluAtion of Radiomics Research (CLEAR) with 58 items but without point-scoring, have been developed to address these challenges [10,14].However, their adoption has been limited, and concerns persist regarding their reliability in uniformly assessing the quality of radiomics research [9,15].
Our study aims to 1) identify promising radiomics biomarkers in stage III/IV NSCLC treated with radiation in the literature and 2) critically appraise the research pipeline using the recently published CLEAR and longerexisting RQS systems, and merge the wording of both CLEAR and RQS frameworks into a comprehensive checklist (CLEAR-RQS) allowing a comparison between CLEAR-RQS point-scoring against CLEAR and RQS [9,10].CLEAR-RQS aims to serve as a valuable resource to radiomics researchers and educators across various disciplines.

Materials and methods
For this research, IRB approval was not required since it does not include any human subjects or include any identifiable private information.The final article selection comprised original research in human studies with articles written in the English language on CT radiomics in post-radiotherapy stage III/IV NSCLC (Table 1).Figure 1 shows the PRISMA flow diagram of the literature search.

Literature data extraction and analysis
Article data extraction included cohort size, radiotherapy/ CT technique, utilized radiomics software, selected RFs, and study endpoints.
Critical appraisal of full-text articles was performed regarding the following research questions: 1) are there commonly selected RFs for treatment response, adverse events, and/ or outcomes in patients undergoing radiotherapy?2) are there factors within the research study design that would impede reproducibility?
Objective 2: critical appraisal of selected articles applying CLEAR and RQS frameworks and development of a comprehensive radiomics assessment checklist (CLEAR-RQS) All articles were assessed by three readers, D.G. (radiologist with 4 years of general radiology experience), K.T., and H.S.K., utilizing the RQS metrics and the CLEAR/ CLEAR-RQS criteria [10,14].To facilitate a direct comparison between RQS and CLEAR/ CLEAR-RQS, a point score of 1 for "yes" and of 0 for "no" or "NA" responses  was assigned to each CLEAR/ CLEAR-RQS item, resulting in a maximal possible score of 58 for CLEAR and 61 for CLEAR-RQS.The mean score from all three readers was utilized to compare the RQS, CLEAR, and CLEAR-RQS frameworks.To enable a relative comparison between frameworks, the score of each tool was proportionally converted to a percentage based on its metric (e.g., 100% equated to a CLEAR point score of 58, a CLEAR-RQS score of 61, and an RQS score of 36).K.T. and H.S.K. systematically compared the wording and interpretation of all 58 CLEAR and 16 RQS items (Table 2).To prevent redundancy, identical and very similar items were merged, retaining the wording of the more specific source framework (CLEAR or RQS).No new wording was introduced to ensure adherence to the respective source framework.

Results
Objective 1: PRISMA literature search After the exclusion of 462 duplicates, 409 article abstracts were screened.This resulted in 22 identified articles that underwent full-text assessment, of which a further 11 were excluded based on inclusion and exclusion criteria (Table 1).Finally, 11 articles were included in the systematic review (Supplemental Table S1).

Study endpoints of selected radiomic features
Study endpoints varied with selected RFs relating to OS in three studies [17,23,24] and to treatment response in two studies [19,25].Three studies analyzed both OS and progression-free survival [11,21,22], and two studies examined the treatment-related complication of radiation pneumonitis [18,20].One study measured RF changes in the NSCLC tumor before and during radiotherapy without association with any clinical endpoint [16].

CT imaging protocol
CT vendor/ scanner type and scanning technique varied or were not disclosed in multiple aspects.
Regarding CT vendor and scanner models, 6 out of 11 articles mentioned the scanner type model, and out of these 6, 5 used a single CT scanner model.
Three studies specified the respiratory cycle timepoint of image acquisition, with 2 at free breathing cycles [18,20] and 1 at the end-expiratory phase [21].
Supplemental Table S1 describes the articles' detailed data extraction.
Objective 2: applying CLEAR and RQS point-scoring to selected articles (n = 11) and development of a comprehensive radiomics assessment checklist (CLEAR-RQS)

CLEAR metrics
The median CLEAR point score was 32.33 (55.74%, range: 25.33-48 [47.7-82.75%]).Across all three readers, all studies fulfilled the "manuscript preparation" CLEAR criteria of providing a title, abstract, keywords, introduction, and discussion.All articles failed to report details regarding the items "sample size calculation" and "flowchart for eligibility criteria", and the entire domain of "open science." Table 3 summarizes the 44 items in detail where two or all three readers identified missing data pertaining to the respective CLEAR item.

RQS metrics
The median RQS point-score was 6.33 (17.59%) with a range of 0-16 points (0-44.44%)out of a maximal possible 36point-score.Many criteria scored 0 or below by all readers as illustrated in Table 4, e.g., no study contained "phantom calibrations", were "prospective studies registered with a database", or performed a "cost-effective analysis".

Comparing CLEAR and RQS point distribution
Table 5 demonstrates the point distribution for papers evaluated using the CLEAR and RQS criteria.Ranking differed for the top 3 articles when using the CLEAR versus RQS systems, for example, Chen et al [23] ranked 1st on the RQS but 4th according to CLEAR metrics, whereas Van Timmeren et al [22] ranked 1st on the CLEAR but 2nd according to the RQS framework.
Figure 2 shows the score point values and respective ranking of appraised articles according to the CLEAR and RQS metrics.

Amalgamation of CLEAR and RQS items into a comprehensive assessment checklist (CLEAR-RQS) and comparing CLEAR-RQS with CLEAR and RQS
The 58 CLEAR and 16 RQS items' wording was compared and identical or similar, resulted in the merging of items and the development of a 61-item CLEAR-RQS checklist (Table 2).
When applying the newly developed CLEAR-RQS checklist, the scoring percentage of each article was between its CLEAR and RQS score, with CLEAR-RQS adhering closer to the CLEAR checklist (Supplemental Fig. S1).This is easily explained, given that the CLEAR-RQS checklist contains 61 items, which is much more aligned with the 58-item containing CLEAR checklist compared to the RQS framework which only contains 16 items.

Discussion
This systematic literature review on radiomic features in post-radiotherapy stage III/IV NSCLC patients yielded 11 retrospective studies, exhibiting substantial variations in their study design, rendering them incomparable, and failing to identify an RF suitable for clinical translation.Moreover, there was low reporting quality when applying both the CLEAR and RQS frameworks, consistent with findings from other radiomics data reviews and metaanalyses [8,15,27,28].Merging the CLEAR and RQS frameworks into a comprehensive CLEAR-RQS checklist aimed to provide a comprehensive yet detailed guide for designing and critically appraising published research to the radiomics research community.

Limitations in radiomics study design
This review revealed several shortcomings in research design, potentially diminishing the generalizability and reproducibility of identified RFs.
The heterogeneity of study cohorts and relatively small sample sizes may limit comparability.Notably, two studies featured small sample sizes (n = 10, n = 23), rendering validation nearly unfeasible [16,17].
Data harmonization, particularly image acquisition and reconstruction settings (referred to as "pre-processing" by CLEAR and RQS), emerged as a key requirement in radiomics research [29,30].Three studies did not disclose whether CT slice thickness harmonization was performed [19,24,25].Body habitus, scanner models, and demographic parameters may influence radiomic analysis, necessitating their specifications for future validation [30].This may require further data postprocessing to ensure reproducibility [29].Two studies [17,22] used cone-beam CT (CBCT) images, introducing challenges related to radiomic region-of-interest delineation caused by scattered radiation artifacts [31,32].Only three studies detailed the use of free breathing CT images [17,18,20], with the remaining studies neglecting to specify the CT acquisition breathing cycle point [11,16,19,[21][22][23][24][25].Free-breathing studies introduce image blurring due to movement artefacts, acknowledged to impact radiomics analysis [33].Consequently, RF extraction from inherently  CLEAR items metric where two (light grey) or three readers (dark grey) identified missing data scored 0 on the respective articles (gray fields).Articles are listed in alphabetical order inconsistent or highly variable CT scanning protocols may compromise result interpretation and reproducibility.
Only 2 studies reported details of feature extraction segmentation of reliability analysis [11,21].Description of this step is important, as manual or semiautomated segmentation methods may introduce intraand inter-observer variability, impacting reproducibility [35].
Certain categories of RFs, including first-order (intensity, shape) and higher-order (GLCM (Grey-Level Co-Occurrence Matrix), GLSZM (Grey-Level Size Zone Matrix)) groups, were more commonly investigated [11, 16-20, 22, 24, 25].The CLEAR checklist offers a general guideline covering all aspects of the radiomics workflow, while the RQS framework comprises 16 criteria with varying weighted point scores.Certain domains, such as "prospective validation in an appropriate trial" (0 or +7 points) and "validation cohorts" (-5, +2, +3, +4, +5 points), are assigned more points compared to others.These items contributed most to top-scoring papers on RQS, which did not align with their CLEAR ranking.For instance, the RQS item "validation" negatively impacted the scores of Yang et al (3.67 points) [18], Shi et al (0.67 points) [17], and Zhang et al (0.67 points) [16], ranking them 7th, 10th, and 11th out of 11 articles, respectively.Such large point score disparities were not observed with CLEAR criteria, as exemplified by the comparison of Wang et al and Fried et al [21,24].With RQS, Wang et al ranked 5th (12.67 points) while Fried et al ranked 6th (6.33 points), whereas in the CLEAR metric, the point scoring disparity was less A recently published quality scoring tool for radiomics research, METRICS (METhodological RadiomICs Score), has been developed by an international panel and has been endorsed by the European Society of Medical Imaging Informatics (EUSoMII).METRICS contains weighted items carefully selected and discussed via a modified Delphi process to ensure a balanced consensus among panelists [36].This new point-scoring framework aims to facilitate critical appraisal of a broad range of radiomics research, from the manual data labeling and extraction to deep learning artificial intelligence (AI) pipelines.
Inter-rater variability D'Antonoli et al's study revealed that the RQS metric is susceptible to inter-rater biases, as its domains can be construed differently depending on raters' backgrounds [9].This corresponds to our findings, as our three ratersa graduate medical student, a junior radiologist, and a senior radiologistexhibited minor discrepancies in RQS scores, which were reconciled through consensus.This variability aligns with prior research indicating low RQS scores and poor inter-rater reliability [9,27,28].

Creating a comprehensive CLEAR-RQS checklist to aid future education and research
Efforts aim to develop a robust tool for assessing radiomics research quality, with a focus on machine learning and other AI models [37][38][39].The RQS and CLEAR frameworks specifically address radiomics methodology [10,14], which has garnered attention from the Society of Nuclear Medicine and Molecular Imaging, the European Association of Nuclear Medicine [39], and the Scientific Editorial Board of European Radiology [40].
The herein presented CLEAR-RQS checklist, developed by an international research group from two academic tertiary institutions, aims to comprehensively evaluate radiomics methodologies, without sacrificing specificity.It integrates standards from both CLEAR and RQS tools, preserving their detailed wording catering to radiomics researchers, while also serving educational purposes across various disciplines.The application of a pointscoring system to the CLEAR-RQS checklist should be avoided, given the intricate complexities inherent in realworld research scenarios, which may not be granular enough to adequately capture the nuanced quality of the assessed research investigations.
In conclusion, stage III/IV NSCLC radiomics research suffers from suboptimal reporting quality, hindering The horizontal red bar delineates 50% percent highlighting that no RQS score was above 50%.Articles are listed in alphabetical order the discovery of validated predictive RFs.Technical and lack of access to source images and model files impede reproducibility.Thorough validation and open access to data and code are essential to increase transparency and raise reporting standards [41,42].Adoption of the CLEAR-RQS checklist could accelerate the translation of radiomics research into clinical practice.Furthermore, sustained multi-disciplinary collaboration for continuous assessment and improvement in this rapidly evolving field is required to ultimately benefit patient outcomes in personalized medicine.

Objective 1 :
PRISMA literature search to identify radiomics studies in stage III/IV NSCLC patients treated with radiotherapy We conducted a literature search of online databases MEDLINE, PubMed, and SCOPUS from June to August 2023.Search fields comprised of [Stage III NSCLC OR Stage IV NSCLC OR nonsmall cell lung cancer] AND [radiotherapy OR SABR (stereotactic ablative body radiation) OR SBRT (stereotactic body radiation therapy)] AND [CT radiomic OR [quantitative AND imaging] OR [texture AND feature]].Initial title and abstract analyses were performed by K.T. (3rd-year graduate medical student) with subsequent full-text screening assessment by K.T. and H.S.K. (radiologist with 20 years of general and 16 years of oncological imaging subspecialty knowledge).

•
Fig. 1 PRISMA flow diagram of PubMed, MEDLINE, and SCOPUS literature search

Figure 1
Figure 1 demonstrates the PRISMA diagram, which outlines the literature search.In total, 871 articles were found (PubMed n = 403, MEDLINE n = 249, SCOPUS n = 219).After the exclusion of 462 duplicates, 409 article abstracts were screened.This resulted in 22 identified articles that underwent full-text assessment, of which a further 11 were excluded based on inclusion and exclusion criteria (Table1).Finally, 11 articles were included in the systematic review (Supplemental TableS1).

Fig. 2
Fig. 2 RQS and CLEAR percentage score distributions of assessed radiomics articles in post-radiotherapy stage III/IV NSCLC (n = 11).Red bars representing the RQS, and green bars representing the CLEAR, frameworks.Numbers on top of the bars represent the RQS and CLEAR rank, respectively.The horizontal red bar delineates 50% percent highlighting that no RQS score was above 50%.Articles are listed in alphabetical order

Table 1
PRISMA literature search

Table 3
Missing data on CLEAR framework

Table 4
Missing data on RQS framework

Table 5
RQS and CLEAR scores and rankings [21,24] CLEAR scores of 11 articles on radiomics quality in stage III/IV NSCLC treated with radiotherapy (in alphabetical order).The top 3 ranks according to RQS and CLEAR scoring have been highlighted by a gray cell background color with the top 1 rank in bold * Equal 10th evident, and with Wang et al ranking lower (rank 7, 31.33 points) than Fried et al (rank 5, 35.67 points[21,24].