Introduction

Although improvement of clinical symptoms and survival are considered the ultimate proof of the effectiveness of anticancer drugs, surrogate endpoints based on radiological measurements are increasingly used to assess therapeutic effects. Such radiological measurements give early and objectively based information.

In 1979, the World Health Organisation (WHO) published objective tumour response criteria, which were adopted around the world [1, 2]. Due to differences in the interpretation and application of these criteria, a task force proposed a simplified set of standardised criteria in 2000: Response Evaluation Criteria in Solid Tumours (RECIST) [3]. Both WHO criteria and RECIST depend heavily on change in tumour size depicted on imaging studies. RECIST have largely replaced the WHO guidelines. Nevertheless, since their introduction RECIST have also received criticism. The RECIST Working Group acknowledged that RECIST would not solve all issues regarding the adequate monitoring of tumour response. Recently, they have published their revised RECIST guideline (version 1.1) to simplify, optimise and standardise the original criteria [4].

This review article discusses RECIST and their modifications, and reviews the issues of concern and debate. The focus will be on issues that are of interest and relevant to radiologists.

RECIST 1.0

RECIST 1.0 divides lesions into measurable and non-measurable lesions before the start of therapy [3]. There should be at least one measurable lesion in a patient.

Measurable lesions are lesions that can be accurately measured and have a longest diameter of ≥10 mm on computed tomography (CT) or magnetic resonance imaging (MRI) and ≥20 mm on conventional radiographs at baseline. In RECIST 1.0, up to ten lesions should be measured, up to five per organ (target lesions). The sum of the longest diameter of the target lesions (SLD) is calculated. At each time point, the same target lesions are to be measured.

Baseline SLD, measured ≤4 weeks before start of treatment, is the reference for assessment of tumour response. The ‘nadir’, the smallest SLD during treatment, is the reference for assessment of tumour progression.

It is important that the radiologist, who is responsible for the selection of target lesions, chooses the most appropriate, preferably the largest, target lesions that reflect the overall tumour load, possibly including all involved organs and those that can be accurately measured and followed.

All other lesions are regarded as non-target lesions. These include: the remainder of the measurable lesions, lesions smaller than required for target lesions, cystic and bone lesions and truly non-measurable lesions, such as effusions, leptomeningeal disease and lymphangitis. The non-target lesions do not need to be measured, but the presence and extent need to be specified in the radiological report at each time point.

CT is the preferred imaging technique, but MRI and chest radiography are also allowed. The same imaging method should be applied throughout the study.

Four response categories for target lesions are defined as: complete response [(CR) complete disappearance of all lesions, confirmed at ≥4 weeks], partial response [(PR) a ≥30% decrease in SLD from baseline, confirmed at ≥4 weeks], progressive disease [(PD) a ≥20% increase in SLD from smallest SLD] and stable disease [(SD) neither PR nor PD].

The overall response in RECIST 1.0 is based on the response of target lesions, non-target lesions and the appearance of new lesions. The response of non-target lesions can only confirm or aggravate the outcome of the target lesions.

RECIST 1.0 contains an appendix with some specifications for radiologic imaging to ensure standardised protocols.

RECIST 1.1

Initially, RECIST 1.0 criteria were widely adopted, but since their publication several questions and issues have arisen. The most important questions regard the potential role of modern imaging modalities, which may (also) give a functional rather than anatomical response assessment, the replacement of unidimensional by volumetric tumour measurements, application of RECIST in non-cytotoxic drugs, the optimal number of lesions to be measured and assessment of lymph nodes.

Another drawback is that RECIST is not applied in malignant lymphoma patients, for whom alternative international guidelines exist [5].

The RECIST Working Group has gathered and investigated a vast database, consisting of more than 6,500 patients with over 18,000 target lesions, included in 16 large trials from 1993 till 2005. From analysis of this database, together with a review of the literature, it has addressed the points of debate in the updated RECIST (version 1.1) in January 2009 [4].

The major changes between RECIST versions 1.0 and 1.1 that are important for radiologists are summarised in Table 1. All issues are discussed below.

Table 1 Summary of the major changes between RECIST 1.0 and 1.1, relevant to radiologists (modified from Appendix I [4]) (LN lymph node, CR complete response, SLD sum of longest diameter, PD progressive disease, PET positron emission tomography)

Number of lesions to be measured

Since the ten lesions to be measured according to RECIST 1.0 was an arbitrarily selected number, the RECIST Working Group retrospectively calculated from their database the effect of assessing one, two, three or five target lesions instead of ten on the response and progression outcome [6]. Assessment of three or five lesions did not change the overall response rate or progression-free survival.

A statistical simulation model also showed little difference between response assessments based on five target lesions compared with ten, but a smaller number of lesions tended to overestimate the response rate [7].

In a computer analysis simulating all possible combinations of lesions from unidimensional lesion measurements, the variance in response assessment was decreased by 90% if at least four lesions were measured instead of only one [8].

Therefore, in RECIST 1.1 the maximum number of target lesions is five, with a maximum of two per organ [4].

Some phase III trials allow inclusion of patients without measurable lesions, if only progression-free survival or time to progression are primary endpoints, not objective tumour response. Additional notes on assessment of PD in these patients and an additional table on overall response are included in the revised guideline [4], but this is not further discussed in this article.

Measurements of targets lesions

The longest diameter should always be measured, in principle, in the axial plane, even if the level or the orientation of the longest diameter of the lesion has changed on follow-up examinations [4].

The margin of target lesions should be carefully identified and measurements should include the whole lesion (Fig. 1).

Fig. 1a, b
figure 1

CT image in the portal-venous phase of a 34-year-old man with liver metastasis of a adrenocortical carcinoma. The hyperdense rim of this liver metastasis (arrows) is only faintly visible on soft tissue settings (a), but better appreciated when window width and window level are adjusted to the liver (b). Not only the necrotic centre but also the rim should be included in measuring this lesion

If a lesion on follow-up breaks into separate fragments, the sum of the fragments should be added and similarly, if lesions coalesce, the longest diameter of the merged lesion should be measured [4]. In isotropic CT or MRI, lesions may be measured in the coronal or sagittal plane if this is more appropriate [4].

On CT, it is recommended for purposes of consistency and radiation protection to reconstruct 5-mm slices (or less) contiguously. To avoid partial volume averaging effects and, therefore, inconsistent measurements of the same lesion between serial CT examinations, a measurable lesion should be at least twice the slice thickness at baseline, i.e. 10 mm if slice thickness is 5 mm. This is also applicable for MRI examinations. For chest X-rays, the minimum target lesion size remains 20 mm, provided that the lesion is surrounded by pulmonary parenchyma [4]. Watanabe et al. [9] confirmed that a minimum lesion size increased the reproducibility of unidimensional measurements in non-small cell lung cancer patients.

If at follow-up, a target lesion is still visible, but ‘too small to measure’, it should be assigned a default measurement of 5 mm to avoid false classification of response or PD, secondary to inaccurate measurements [4]. Since potentially false classification of PD may be due to inaccurate measurements in small residual lesions, PD in RECIST 1.1 requires not only an increase of SLD of target lesions of ≥20% over the smallest SLD during the study (unchanged from RECIST 1.0) but also that the target lesions together increase ≥5 mm [4].

An exception to measuring the longest diameter is in patients with malignant pleural mesothelioma. It was reported that the non-spherical growth pattern in this disease makes reproducible RECIST measurements difficult [10]. Byrne et al. [11] proposed modified RECIST criteria, in which tumour thickness perpendicular to the chest wall rather than the longest diameter is measured in fixed positions related to anatomical landmarks (Fig. 2). They found good correlation with outcome; this was confirmed [12].

Fig. 2
figure 2

On this CT image of the chest of a 30-year-old female patient with mesothelioma, the longest diameter of the tumour is represented by the arrow, but the measurement perpendicular to the chest wall (dotted line) better represents tumour volume and is therefore more reliable

Unidimensional and volumetric measurements

The issue whether unidimensional measurements represent adequate assessment of total tumour burden has often been questioned. The RECIST Working Group stated from a retrospective analysis of more than 4,600 patients in 14 trials [3], based on the model proposed by James et al. [13] that application of unidimensional rather than the bidimensional measurements used in the WHO criteria, showed no difference in response and progression rate. This was confirmed by others [1416], but a retrospective statistical simulation to change tumour shape, showed a decrease in concordance between uni- and bidimensional measurements when lesions became more irregular [17]. Schwartz et al. [18] also found a greater discordance between uni- and bidimensional measurements in more ellipsoid lesions compared with spherical lesions.

Another concern is the inter- and intra-observer variability of manual measurements. In general, inter-observer variability is greater than intra-observer variability [19], especially in irregular and poorly defined lesions [20]. This may lead to substantial differences in response assessment [21].

With the available 3D software tools on modern CT and MRI equipment, the reproducibility of semi-automated volumetric measurements of metastases has been extensively investigated. Wormanns et al. [22] showed good repeatability of semi-automated volumetric measurements of small lung nodules, but an irregular shape degrades the segmentation process, contributing to larger variability [23]. Marten [24] reported significantly better inter- and intra-observer variability and treatment response assessment for semi-automated than for manual volumetric measurements in small pulmonary metastases. Volumetric measurements have also proved reliable in low contrast-to-noise areas, such as the liver [25], lymph nodes [26] and brain [27]. Nevertheless, although these results are promising, they have all been performed in single institutions with a variety of different software tools that are not widely available. There are no robust data yet to justify replacement of unidimensional measurements with volumetric assessment at this moment [28].

Ultrasound

Ultrasound is operator dependent; measurements on ultrasound are subjective and they cannot be reproduced for independent review. Therefore, it should only be used as an adjunct to clinical measurements, e.g. to measure superficial lymph nodes and subcutaneous lesions [4]. There has been concern from paediatric radiologists about radiation exposure because of the repeated use of CT in paediatric patients, whose prognosis may be fair to good [29]. The RECIST Working Group acknowledges the widespread use of ultrasound in daily paediatric oncology practice, but in the case of a phase II trial of a potential new anticancer drug, reproducibility of measurements is mandatory, thereby disqualifying ultrasound [4, 30].

Assessment of lymph nodes

RECIST 1.0 [3] does not specifically address the issue of lymph nodes. This implies that the longest axial diameter of lymph nodes should be measured according to RECIST 1.0. Since lymph nodes are anatomical structures that are normally shown by CT in various locations, a metastatic deposit in a lymph node that disappears completely will often still be visible as a normal sized lymph node on follow-up, thereby precluding CR, as was pointed out by the International Cancer Imaging Society [31]. Furthermore, it is well recognised that the short axis of lymph nodes is more reproducible than the long axis [32] and is a better predictor than the long axis for the presence of metastatic disease [33] and the response to chemotherapy [18].

In RECIST 1.1 [4], these issues are addressed. A lymph node is considered metastatic if at baseline the short axis is ≥10 mm. It is measurable (may serve as target lesion) if the short axis is ≥15 mm. A lymph node with a short axis ≥10 mm, but <15 mm at baseline is considered a non-target lymph node. If the short axis of a lymph node on follow-up studies drops below 10 mm, it is no longer considered pathologic, although continued measurement is needed to assess progression of these nodes on follow-up. This implies that a CR in patients is possible, while their SLD is not zero, if one or more target lesions are lymph nodes and all these lymph nodes have a short axis <10 mm on follow-up (Fig. 3) [4].

Fig. 3
figure 3

This 67-year-old male patient with non-Hodgkin lymphoma had enlarged target (arrows) and non-target (arrowheads) retroperitoneal lymph nodes on this CT image at baseline (a). After chemotherapy (b), these lymph nodes are still visible, but of normal size, categorising this patient as CR according to RECIST 1.1

The RECIST Working Group has applied these new criteria on patients in their database [34]. As was expected, there was a higher percentage of patients with CR in RECIST 1.1 whom were considered PR in RECIST 1.0. They also found a 3.0% increase in the overall response compared with RECIST 1.0, if only lymph nodes were investigated. This is explained by a greater size reduction of the small axis of lymph nodes (measured in RECIST 1.1), compared with the size reduction of the long axis (measured in RECIST 1.0).

Although the normal size of lymph nodes varies considerably depending on their location [32, 35] and a short axis over 10 mm is not specific for metastatic disease [36], the RECIST Working Group considers a short axis of 10 mm an acceptable cut-off point between pathological and normal and has chosen this measurement for reasons of simplicity [4].

Specifications on bone lesions and cysts

In general, bone lesions are not considered ‘measurable’, since often only change in appearance and not in size is noted on follow-up CT (Fig. 4). According to RECIST 1.1, only lytic or mixed lytic-blastic bone lesions with an identifiable soft tissue component may be used as target lesions, provided that the soft tissue component meets the criteria for measurability [4].

Fig. 4a–d
figure 4

CT images of the spine in a 50-year-old female patient with bone metastases of breast carcinoma. a At baseline, there is an osteolytic lesion in a thoracic vertebral body (arrow), and no visible metastases in one of the lumbar vertebra (b). After chemotherapy, the thoracic osseous lesion has not changed in size, but has become completely osteoblastic (arrow in c), representing a good response. The ‘new’ sclerotic lesions in the lumbar vertebra (arrowheads in d), are considered responding small osteolytic metastases that the baseline CT failed to identify. This patient also had a PR in soft tissue lesions (not shown)

Skeletal scintigram, fluorodeoxyglucose positron emission tomography (FDG-PET) or conventional radiographs may only confirm the presence or disappearance of osseous lesions. It has been shown that MRI can be used to monitor bone metastases in prostate cancer patients during chemotherapy [37], but this has not yet been validated in large trials.

In RECIST 1.0, all cystic lesions are regarded as non-measurable [3]. In RECIST 1.1, cystic and necrotic metastases can be considered as measurable lesions, although non-cystic lesions are preferred if present [4].

Unequivocal progression of non-target lesions

The increase in size of non-target lesions in patients is addressed in detail in RECIST 1.1. Only if the non-target lesions increase such that the overall tumour burden increases substantially can a patient with PR or SD based on target lesion response be categorised PD for non-target lesions and, therefore, PD as overall response (see also Table 2). Quantification of (truly non-measurable) non-target lesions is by definition impossible, but total increase in ‘volume’ of non-target lesions should be similar to increase in size of target lesions, e.g. at least 73% (equivalent to increase in SLD of 20%) to categorise them as PD. A modest increase of one or a few non-target lesions is usually not sufficient to qualify non-target lesion response as PD and should not lead to discontinuation of treatment (Fig. 5) [4].

Fig. 5
figure 5

This 50-year-old woman with a uterus sarcoma has several target pulmonary metastases (arrowheads) on these CT images of the chest at baseline (a, b). After chemotherapy (c, d), the target lesions have disappeared or decreased in size considerably, classifying the target lesions as PR. Only one non-target lesion (arrow in b, d) has increased in size, but this is insufficient for ‘unequivocal progression of non-target lesions’. The non-target lesions should be categorised as non-PD and the overall response is therefore PR

Table 2 Overall response in RECIST 1.1 [4] (CR complete response, PR partial response, SD stable disease, PD progressive disease, NE non-evaluable)

New lesions

Any new malignant lesion on follow-up implies PD. The finding of a new lesion should, therefore, be unequivocal, truly representing a new metastatic deposit and not attributable to other causes.

Equivocal new lesions may be due to a change in the type of examination (e.g. MR after CT) or technique (e.g. 3-T rather than 1-T MRI). An equivocal new small lesion should be followed to confirm if it truly represents a new lesion [4]. In RECIST 1.1, FDG-PET may complement CT to assess PD in case of a possible new lesion. A negative FDG-PET at baseline with a positive FDG-PET at follow-up is a sign of PD based on a new lesion. A positive FDG-PET lesion means one which is FDG-avid with an uptake greater than twice that of surrounding tissue on the attenuation corrected image. If no FDG-PET is performed at baseline, the interpretation depends on the CT findings [4].

‘New’ sclerotic bone lesions may represent healing lytic lesions that were not detected on previous examinations (Fig. 4).

A ‘new’ lesion may also be detected in an area of the body that was not examined at baseline, e.g. a brain metastasis. In such cases, this lesion is considered a truly new lesion according to RECIST 1.1, converting the response category to PD [4]. It is important, therefore, to image at baseline all predilection sites for metastases in the cancer type studied and also areas based on signs and symptoms of the individual patient.

Overall response

Table 2 contains the overall response for target and non-target lesions in RECIST 1.1, including the response in case of new lesions [4]. Compared with RECIST 1.0, this table is updated with the response category ‘NE’: non-evaluable. This is to be assigned at a time point during follow-up if not all target lesions are evaluated at that time point.

In case of residual disease, it may be difficult to distinguish this from normal or benign tissue, especially in some tumour types, e.g. non-seminomatous germ cell tumours. In these cases, tissue confirmation (fine needle aspiration or biopsy) or FDG-PET may be used to upgrade the response status to CR [4].

Additional imaging specifications

Appendix II of RECIST 1.1 contains many details on the different imaging modalities and how to perform them [4]. Many issues have already been addressed or discussed above. Other relevant specifications are [4]:

  • Intravenous contrast media administration in CT is mandatory. Although no specific instructions are given on the type, dose and rate of intravenous contrast media and the timing of CT data acquisition, it is stated that typically, CT acquisition should be performed during the portal venous phase and that a single phase is usually sufficient. However, triphasic CT protocols are recommended for hepatocellular and neuroendocrine tumours (Fig. 6). A method of contrast media administration should be chosen to demonstrate metastases to best effect and the method should be used consistently on follow-up CT studies. In case of contra-indications to intravenous contrast media, the decision on whether to perform non-contrast enhanced CT or MRI should depend on the tumour type, the anatomic location of the disease and the findings on prior studies, to allow for optimal comparison and reproducibility of results. Non-contrast chest CT is preferred over MRI or chest X-ray.

  • Administration of oral contrast media is recommended for abdominal CT.

  • If FDG-PET is used in trials, it should include whole-body images 60 min after injection with attenuation correction. Most importantly, the method should be consistent throughout the trial.

  • PET/CT may be used if the CT is of similar diagnostic quality as a CT performed without PET: with oral and intravenous contrast media.

  • MRI acquisition parameters should be specified, optimised and consistent throughout the trial. Axial T1- and T2- weighted and gadolinium-chelate enhanced sequences are recommended, preferably with breath-holding technique and on the same (type of) MR system, but no specifications are given. Measurements should be performed on the same sequence on serial studies.

Fig. 6
figure 6

In this 40-year-old man with hepatocellular carcinoma, a CT image of the liver in the late arterial phase shows a hypervascular tumour in the hepatic dome (arrows in a) that is well delineated from the surrounding parenchyma, whereas the same lesion is barely discernible from the liver parenchyma in the portal-venous phase (arrows in b) due to early washout of contrast media

Other issues of debate

New types of anticancer agents and local treatment

New molecular targeted agents often induce growth inhibition rather than tumour regression [38, 39], resulting in limited objective response rates according to RECIST, but improved survival in a number of cancer types [4042].

Therefore, Llovet et al. [43] have proposed amendments to RECIST in patients with hepatocellular carcinomas (HCCs): to assess response of target lesions, only viable tumour should be measured, i.e. tumoral enhancement in the arterial phase. The cut-off percentages for overall response are similar to those in RECIST. New hepatic nodules are considered HCCs, if they are ≥10 mm and show hypervascularisation in the arterial phase and wash-out in the portal-venous or delayed phase or if they show interval growth of ≥10 mm. For lesions not typical HCCs, conventional RECIST criteria are applied.

Choi et al. [44] showed that in patients with gastrointestinal stromal tumours (GISTs) treated with imatinib-mesylate (Glivec) decrease in tumour volume is also not the optimal indicator to assess antitumour activity. Their modified response criteria for GIST patients (a decrease in GIST size of ≥10% or a decrease in tumour density on CT of ≥15%) correlated better with time to progression than response according to RECIST (Fig. 7); this was validated [45].

Fig. 7
figure 7

CT images in the portal-venous phase of liver metastases of a GIST in a 44-year-old man at baseline (arrows in a). The metastases decrease somewhat in size after Glivec (arrows in b), but the most striking difference is a decrease in density, giving the metastases a cystic appearance. This is considered a good response according to the Choi criteria

This indicates that these new cytostatic drugs may require functional and molecular imaging to assess, for example, metabolism, perfusion or diffusion characteristics, rather than anatomical assessment of tumour size.

RECIST cannot be applied to evaluate local treatment of lesions by radiofrequency ablation or cryoablation. In successful ablation, the ablated area is larger than the original tumour and totally includes it, with no or subtle rim enhancement. Often this area gradually decreases in size over time and the rim enhancement disappears. Nodular enhancement suggests residual or recurrent tumour [46, 47].

Alternative imaging modalities

FDG-PET or PET/CT

In recent decades, FDG-PET and PET/CT have been shown in many, often single-centre, trials to provide earlier and more accurate response assessment than CT: it may allow for treatment monitoring after the first or second cycle of chemotherapy (sometimes even within hours after administration of anticancer drugs) and it can differentiate viable tumour from necrosis. However, varying approaches for acquisition and image analysis and the absence of generally accepted criteria make comparison of study results difficult [48]. In Hodgkin’s disease and high-grade non-Hodgkin lymphoma, FDG-PET results have been shown to be very accurate and correlate with survival [49]. FDG-PET has already been incorporated into international response criteria on lymphoma [5]. In non-small cell lung cancer, colorectal and breast cancer, the strength of FDG-PET may lie in early identification of non-responders to chemotherapy rather than assessing (complete) response [5052]. The main challenge is to implement and maintain standards in larger multi-centre trials [53]. As a starting point for further validation of FDG-PET in multi-centre studies and meta-analyses, Wahl et al. [54] have proposed guidelines for the standardisation of response criteria for FDG-PET, the so-called PET Response Criteria in Solid Tumours (PERCIST).

New developments in MRI

Dynamic contrast-enhanced MRI (DCE-MRI) has been investigated to quantify the effect of drugs on tumour angiogenesis and vascular disruption. Although early results show that this is indeed feasible [55, 56], there are at present a variety of biomarkers that can be measured in DCE-MRI. A review of the literature has pointed out that acquisition and analysis are very complex and concludes that further research and validation to correlate these results to clinical outcome measures are needed before DCE-MRI can serve as a new surrogate endpoint [57].

Experience in diffusion-weighted MRI in monitoring treatment response has so far been limited [58]. Recommendations on standardisation of the application have recently been published [59].

Perfusion CT

Perfusion CT is also very promising in monitoring treatment response to anti-angiogenic drugs [60, 61], but at this moment a variety of mathematical methods (including compartmental and deconvolutional analysis) and perfusion parameters (such as blood flow, blood volume, mean transit time and time to peak) are being used, implying a lack of standardisation. Therefore, much more research is needed to define and validate its role [62].

In conclusion, the RECIST Working Group believes there is not sufficient validation or standardisation of FDG-PET and other functional imaging modalities to substitute for the anatomical assessment described in RECIST. It would require a series of prospectively conducted multi-centre clinical trials and a formal meta-analysis to validate FDG-PET(/CT) as an appropriate end-point [28].

Conclusions

RECIST 1.1 has clarified many issues that had arisen after publication of RECIST 1.0, e.g. reduction of the number of target lesions, specifications on lymph nodes and new lesions and detailed imaging specifications. At this moment, the pool of data on molecular imaging and volumetric tumour measurement is not sufficiently validated to incorporate into response assessment criteria and therefore, unidimensional anatomical assessment of tumour burden remains at this moment the best surrogate endpoint. However, with all the emerging data, it is to be expected that functional imaging methods will be incorporated in the next RECIST update.