Introduction

Response evaluation criteria are crucial in the assessment of the efficacy of cancer drugs in clinical trials [1]. Four decades ago, at the dawn of cross-sectional imaging, the World Health Organization (WHO) introduced the first imaging criteria for the assessment of tumor burden, based on the sum of the products of diameters of the target lesions [2]. In 2000, the Response Evaluation Criteria In Solid Tumor (RECIST) working group published new guidelines, the RECIST version 1.0 [3], introducing new rules to better objectify the evaluation of the tumor burden, as the definitions of minimum size and the number of measurable lesions per organ. The new criteria also introduced unidimensional measurements bringing a simplification with respect to the WHO criteria [4]. A revised version, RECIST 1.1, that incorporates major changes such as a reduction in the number of lesions to assess, a new categorization of lymph nodes based on short axis, new recommendations for the assessment of progressive disease, and so on, was introduced in 2009 [5]. RECIST 1.1 criteria are based upon imaging modalities that are globally available and easily interpretable and are therefore widely used in clinical trials [6, 7].

Though intended for use in the clinical trial setting, oncologists increasingly rely on RECIST 1.1-based measurements for clinical management of patients also in daily clinical practice [8]. The main justifications are as follows: (1) the opportunity for a more standardized and structured approach to response assessment and (2) the increased clarity of the radiological report [9, 10]. Indeed, terms such as measurable disease, tumor burden, target lesions, and response categories are now part of the radiologist’s lexicon. However, in clinical practice, reporting strategies are mostly left to the local radiologist and oncologist, who may issue their own set of rules [11].

The need for a standardized approach, and of universally applicable rules for the assessment of response in daily routine, should be considered an important priority for the oncology and oncologic imaging community. The aim of this study was to investigate and compare the opinions and preferences of radiologists, with a dominant interest in oncologic imaging, on treatment response evaluation in clinical practice to gather information for the development of reporting models and recommendations.

Materials and methods

To gather the opinions and preferences of radiologists, a survey was developed by an expert panel of members of the European Society of Oncologic Imaging (ESOI) board, composed of a radiology resident and two board-certified radiologists, the latter with more than 20 and 6 years of experience in oncologic imaging and expertise in tumor response assessment by RECIST 1.1 and other criteria.

The survey was conducted anonymously and was launched on the ESOI website (ww.esoi-society.org). ESOI members were reached on the same day by an email with a link to the survey; a week later, those who had not responded received a reminder and a final call was sent after 20 days.

The survey consisted of 5 sections with a total of 24 questions. In brief:

  • - The first section gathered demographic information (i.e., geographic distribution, age and site of main professional activity, and field of interest of participants).

  • - The second section focused on how to measure lesions and lymph nodes at the baseline examination and on how to deal with diminutive lesions.

  • - The third section was related to what to report on the baseline examination, which lesions to measure, how to evaluate non-measurable lesions, and which non-oncologic findings should be reported.

  • - The fourth section focused on reporting of follow-up examination and in particular which of the previous studies should be used as the comparator and how to compare previous findings.

  • - The fifth section included questions on the use of specific assessment criteria (mainly RECIST 1.1) in clinical practice. A focus is given on the medical imaging doctors’ practices and preferences including perceived advantages and disadvantages of using RECIST 1.1.

A web-based survey tool (Google Form, Mountain View, CA, USA) was used for the data collection. The results were downloaded and elaborated in Microsoft Excel format (Redmond, Washington, USA). Simple descriptive analyses and graphs were performed using Microsoft Excel 2018® (Microsoft Office, 2018). Proportions were compared using the “N − 1” chi-squared test [12]; a p-value < 0.05 was considered statistically significant.

Results

Two hundred eighty-six completed forms were received and evaluated. The answers to all 19 questions related to Sects. 2–5 are reported in the supplementary material and demographics and professional information (Sect. 1, 5 questions) are summarized in Table 1.

Table 1 Demographic and professional information (286 responders)

In brief, most responders were from Europe (n = 199; 69.6%) and approximately half were aged between 35 and 50 (n = 145; 50.7%). Most responders had a working experience of more than 10 years (n = 171; 60%). The fields of interest of responders are summarized in Fig. 1.

Fig. 1
figure 1

Main field(s) of oncologic imaging involvement of the responders. More than one response was allowed

Baseline assessment

Figure 2 summarizes the preferred measurement criteria for organ lesions and lymph nodes. Mimicking the RECIST 1.1 criteria, most responders measured only the main lesions (182; 63.6%), preferably two per organ if present (132 of 182; 64.7%). Approximately half of the responders (145; 50.6%) reported measuring only the short axis of lymph nodes.

Fig. 2
figure 2

Radiologists’ opinions on measurement of lesions (a) and lymph nodes (b) at baseline in clinical practice

Non-measurable lesions (e.g., pleural effusion, abdominal fluid collection, peritoneal carcinomatosis) were mainly assessed qualitatively (n = 176; 61.5%); however, a minority of responders (n = 103; 36%) preferred a quantitative evaluation whenever possible.

Concerning diminutive lesions, i.e., lesions with a diameter below a certain predefined threshold, more than half of the responders replied that they did not measure lesions smaller than or equal to 5 mm (n = 72; 25.2% if ≤ 5 mm and n = 112; 39.2% if ≤ 3 mm) but would mention them in the report (n = 168; 58.7%).

Most responders (n = 179, 62.6%) affirmed that they would report all non-oncologic findings at the baseline examination, including benign and non-clinically significant ones (e.g., hepatic, or renal cysts), while a minority (n = 103; 36%) would report only the clinically significant findings.

According to the large majority of survey responders, tumor measurements were reported in the text of a narrative report (n = 231; 80.8%). Measurements were transferred through hyperlinks together with the images for 13.6% (n = 39) of responders, while 15.7% (n = 45) of responders adopted a structured report.

Follow-up examination

Figure 3 shows responders’ opinions on which previous time point they use to compare the findings in real-world assessment. Of note, most responders (100 of 286; 35%) replied that they would compare findings with previous exam. In relation to measurable lesions, nearly all the responders report measuring the same lesions as in baseline (n = 249; 87.1%).

Fig. 3
figure 3

Percentage of which previous time point is used to compare the findings annotated in the baseline examination

For non-measurable lesions, 46.2% (n = 132) of participants would continue evaluation only with qualitative assessment; conversely, 40.6% (n = 116) would perform an objective measurement of findings when feasible. With reference to non-oncologic imaging findings, almost half of the responders (n = 137; 47.9%) replied that they would report them only in case of significant changes, a summary sentence being recommended in all other cases (e.g., “all the other findings are unchanged”). Moreover, nearly all the responders (n = 245; 85.7%) conclude the report with their personal impression on the response to treatment.

Use of RECIST 1.1 in clinical practice

Table 2 reports on the frequency of RECIST 1.1 use in real-world assessment. Overall, 74.1% of responders use RECIST 1.1 in their clinical practice either always or in specific cases. Of note, responders from research institutions use imaging criteria significantly less than responders from the remaining institutions. Table 3 summarizes the results on the opinion of responders on whether RECIST 1.1 should be used outside clinical trials. The overall rate of positive replies was 87.4%. Also, in this case, responders from research institution had a less favorable impression of the use of RECIST. Moreover, in reply to the specific question, 71% (n = 203) of responders, oncologists in their institution, consider response evaluation with RECIST 1.1 useful also in clinical practice.

Table 2 Responses to question 20 [Do you apply RECIST 1.1 criteria for response evaluation in clinical practice (not in clinical trials)?]. Results have been dichotomized according to geographic regions (Europe vs other countries), working experience (< 10 years vs > 10 years), and type of institutions (research facilities—university hospital and research institute vs other institutes). A p-value < 0.05 was considered statistically significant
Table 3 Responses to question 21 (Do you think RECIST 1.1 criteria should be applied in clinical practice and not only in clinical trials?). Results have been dichotomized according to geographic regions (Europe vs other countries), working experience (< 10 years vs > 10 years), and type of institutions (research facilities—university hospital and research institute vs other institutes). A p-value < 0.05 was considered statistically significant

Figure 4a and b report on the advantages and the disadvantages of reporting with RECIST 1.1 in clinical practice, respectively. Most responders consider the increased standardization with respect to the conventional report as the most important advantage. Conversely, most responders consider the use of RECIST 1.1 more time-consuming with respect to the narrative report.

Fig. 4
figure 4

The charts show the perceived advantages (a) and disadvantages (b) of reporting with RECIST 1.1 criteria

Discussion

Assessment of treatment response represents an important crossroad for the oncologic patient as it determines whether a specific drug, or ensemble of drugs, is effective or not. Within clinical trials, tumor response assessment relies primarily on the extent to which the sum of diameters of target lesions changes in time. Several imaging criteria have been developed for this purpose [13,14,15,16,17,18], being RECIST 1.1 [5] the most common. Opposite to the clinical trial environment where patients are carefully monitored, in daily clinical practice, the decision on whether to continue an oncologic treatment is left to the local multidisciplinary teams or to oncologists, who base their decision largely on their experience after gathering all useful clinical and imaging information. Cross-sectional imaging narrative reports usually provide fundamental information for decision-making. Unfortunately, narrative reports may lack standardization and clarity, and since recommendations are largely missing in this context, radiologists usually take a personal approach to reporting [19, 20]. Indeed, there is some evidence that narrative reports may not be as accurate as RECIST criteria in the assessment of response to treatment in clinical practice [21,22,23]. Feinderberg et al. showed that narrative reports were associated with an overestimation of treatment response in comparison to RECIST 1.1 among patients with complete response [21]. Schomburg et al. also compared the free text report with a response based on iRECIST criteria in 50 patients with metastatic renal cell carcinoma, finding only a moderate agreement between the two modalities (kappa 0.38 to 0.70), with new lesions frequently not recognized in free text [22]. These works underline the need for more standardized radiological criteria in daily clinical practice.

This survey was performed to gather information on radiologist’s practice in reporting cross-sectional imaging examinations in patients with advanced disease treated with cancer drugs in daily routine, to develop a common and shareable approach to the assessment of response to treatment. When preparing the questionnaire, the expert panel was aware that, since each cancer patient has a different story, conclusions drawn from its results would be difficult to generalize. This survey also aims to highlight the differences and criticalities of radiologists working in different institutions and with different professional backgrounds.

The first important observation is that most responders are inclined to follow RECIST 1.1 rules, with some notable exceptions. A slightly higher number of respondents (39.1% vs 36%) showed a preference for the measurement of two dimensions with respect to lesion maximum diameter only, even though the former is more time-consuming [24, 25]. We hypothesize that responders are more confident with measuring both long and short axis because they feel measures are more representative of the tumor, which often is characterized by an oblong and/or irregular shape. Only 7 respondents (2.4%) suggest measuring lesion volume, which is certainly more accurate, but not yet validated and time-consuming, unless a dedicated segmentation software is available [26]. Short axis diameter was considered the most reliable method to measure lymph nodes by half of the respondents. However, 33.6% of responders still prefer measuring both long and short axis of lymph nodes, probably because this method has been deemed more appropriate in some circumstances, for example in lymphoma assessment [15] or in predicting metastatic lymph nodes in gastric cancer [27]. Responders were not asked to define a cut-off measure for lymph nodes as size criteria are dependent on the lesion site. For example, inguinal lymph nodes may be considered pathological when the short axis is 15 mm or more while in other sites, e.g., mesorectum or mesenteric, the size cut-off is definitely smaller [28]. Moreover, morphology, shape, and borders may also be relevant in lymph node assessment [28].

Most radiologists measure lesions and lymph nodes on the axial plane, as recommended by RECIST 1.1. As for RECIST 1.1, most responders suggest measuring only the main lesions within each organ (63.6% of the cases) and a maximum of two lesions per organ (64.7%), following the selection criteria of target lesions, and to qualitatively evaluate diminutive and non-measurable findings, following the rules for non-target lesions.

According to RECIST 1.1 criteria, the baseline examination must be used as a comparator to define a stable, partial, or complete response to treatment and the nadir should be the reference to evaluate disease progression. Interestingly, in this survey, when deciding which of the previous studies should be considered as the comparator, responders expressed different opinions, some of which were controversial. Almost one third of responders affirmed that comparison should be performed only with the previous examination. In this regard, it must be noted that Weber et al. [11] developed a structured reporting concept for general follow-up assessment of cancer patients in clinical routine, based on RECIST 1.1 principles, but including only the prior tumor measurements, a limitation that the authors considered in their paper. It is the opinion of the authors that even in daily routine, a proper tumor response evaluation should require a careful comparison not only between the current and prior examination but also with older examinations, to avoid evaluation errors, as in the case of slowly growing lesions. One of the reasons why RECIST 1.1 response rules are difficult to apply in routine practice is that patient history is not always readily available and collecting clinical and imaging data is a hard and time-consuming process, especially when health electronic records are not readily accessible. Collaboration between radiologists and referring oncologists in this context is mandatory, and we recommend that controversial cases be discussed within a multidisciplinary framework.

In this survey, a substantial number of responders (41.6%) declare that they systematically use RECIST 1.1 criteria in clinical practice. Moreover, an even higher percentage (60.8%) believed that RECIST 1.1 should always be applied in clinical practice. This finding reflects the highly selected and motivated population of professionals that responded to this survey, mainly imaging specialists involved in oncologic reporting. However, interestingly, responders from research institutions use RECIST criteria less frequently than those working in other health facilities and believe they should be used in real-world assessment to a lesser extent. A non-negligible percentage of responders (32.5%) use RECIST 1.1 criteria in clinical practice only in specific cases. In free text answers, some responders affirmed that they prefer using RECIST 1.1 criteria in patients with mixed response, although the latter represents one of the well-known limitations of the criteria themselves. Other reasons that drive the radiologists to use RECIST 1.1 in daily routine are discrepancies between imaging and clinical data or cases with a high tumor burden and involving different organs, where a qualitative assessment can be difficult or misleading.

According to this survey, the main strengths of RECIST 1.1 are increased standardization, clarity, and improved communication with the oncologist. Opposite, the main concern of responders is the increased reporting time (68.5%). Of note, a minority of responders (16%) believe that the process is less time-consuming. Preference might depend on radiologists’ experience and on the availability of specialized software providing lesion identification, annotation, and allowing retrieval of target lesions from previous time point for comparison. These tools can create automatic reports with visual disease timelines and reduce errors by an automatic check when specific criteria are applied leading to a reduction of reporting time [29].

The main limitation of this study is represented by the specific target population that was addressed, i.e., members of the European Society of Oncologic Imaging, most of whom are imaging doctors with an interest in oncologic imaging. Indeed, general radiologists or oncologists might have different perspectives. Furthermore, responders are just a small proportion of the ESOI community, a fact that might have altered the results.

Conclusion

To overcome the lack of rules, responders suggest either to use RECIST or personal criteria, usually a combination of unidimensional and bidimensional measurements of the most significant target lesions. Differently from RECIST, many responders suggest comparing the last time-point with the previous study, instead of baseline and nadir. A major concern of responders is that structured reporting is more time-consuming than a narrative report; this can be overcome by using specialized software. In conclusion, based on this survey, we believe it is important to define rules for the assessment of tumor response in clinical practice. The broader oncology community should take charge of their implementation.