Introduction

Follicular lymphoma is the second most frequently occurring non-Hodgkin lymphoma subtype [1]. In contrast to the 2007 Revised Response Criteria for Malignant Lymphoma [2] that considered computed tomography (CT) as the standard method for therapy response assessment in follicular lymphoma, the recently published Lugano Classification recommends using 18F-fluoro-2-deoxy-d-glucose positron emission tomography (FDG-PET) in follicular lymphoma [3, 4]. Moreover, the current guidelines also advocate the use of interim FDG-PET during treatment, although they do no support changing therapeutic strategy on the basis of interim FDG-PET findings outside clinical trials [3, 4]. Despite the recommendations in the new guidelines, there is still considerable debate on the utility of FDG-PET for therapy response assessment in follicular lymphoma and, not surprisingly, wide variability in the use of FDG-PET in these settings among different institutions. This debate on and the variability in the application of FDG-PET in follicular lymphoma reflect the need for evidence-based data. Although individual studies can provide more insight, there may be variability among individual studies with regard to internal validity (i.e., risk of bias) and external validity (i.e., generalizability of the study results). To overcome the limitations of individual studies, a systematic review is required. The purpose of this study was therefore to systematically review published data on the prognostic value of interim and end-of-treatment FDG-PET in follicular lymphoma.

Materials and methods

Search strategy

A systematic literature search was performed in the PubMed/MEDLINE database using the search terms that are displayed in Table 1. The search covered the period from start date to March 20, 2015. The reference lists of all studies that were finally included were screened for potentially suitable articles that were not detected by the initial search.

Table 1 Search strategy and results as on March 20, 2015

Study selection

Original studies that reported on the prognostic value of interim or end-of-treatment FDG-PET in indolent (grade 1-3A) follicular lymphoma during or after first-line therapy were included. Only studies written in English, German, Italian, French, Spanish, or Dutch were included. Studies without patient data (guidelines, reviews, meta-analyses, editorials, and letters), studies including less than ten patients with follicular lymphoma, duplicate studies, conference abstracts, studies that did not allow data extraction of patients with follicular lymphoma patients from patients with other lymphoma subtypes, studies that included patients with refractory disease, studies in which patients received second-line therapy, and studies that tailored the therapy on the basis of the interim or end-of-treatment FDG-PET result were excluded. Titles and abstracts of all studies that were obtained by the PubMed/MEDLINE search were scrutinized, and clearly ineligible studies were excluded at this stage. The full-text versions of the remaining articles were then evaluated, and a final decision was made as to whether or not a study met the criteria to be included in the systematic review.

Study quality

The Quality In Prognosis Studies (QUIPS) tool [5], which was developed to assess the risk of bias in prognostic studies, was used for the methodological quality assessment. The QUIPS tool assesses the risk of bias in six different domains: study participation (“does the study population adequately represent the population of interest?”), study attrition (“do the study data available [i.e., patients not lost to follow-up] adequately represent the study sample?”), prognostic factor measurement (“is the prognostic factor measured in a similar way for all participants?”), outcome measurement (“is the outcome of interest measured in a similar way for all participants?”), study confounding (“have important potential confounding factors appropriately been accounted for?”), and statistical analysis and reporting (“is the statistical analysis appropriate, and are all primary outcomes reported”) [5]. Risk of bias was scored “low,” “moderate,” or “high for these six domains.

Data analysis

Results of included studies were extracted and descriptively analyzed.

Results

Literature search

The literature search revealed a total of 651 articles (Table 1). After screening titles and abstracts, 41 potentially relevant articles remained and were retrieved and read in full-text format. Of these 41 articles, 16 were excluded because the prognostic value of interim or end-of-treatment FDG-PET status was not reported, 6 were excluded because of inclusion of less than 10 patients with follicular lymphoma, 4 were excluded because data of patients with follicular lymphoma could not be separated from patients with other lymphoma subtypes, 4 were excluded because of (partial) inclusion of patients with refractory disease, 1 was excluded because it only included patients with transformed lymphomas, 1 was excluded because it reported data that were also used in another (larger) study, and 1 was excluded because it was written in Japanese. Thus, eight studies remained. Of these eight studies, three reported data on the prognostic value of interim FDG-PET and eight studies reported data on the prognostic value of end-of-treatment FDG-PET. Characteristics and applied methodology of the eight included studies are displayed in Tables 2 and 3.

Table 2 Characteristics of included studies and patients
Table 3 CT and FDG-PET acquisition/interpretation and patient follow-up methods

Methodological quality assessment

Results of the methodological quality assessment using the QUIPS tool are shown in Table 4. There was high risk of bias in one study for both the domains study participation and study confounding because this study only included cases that were clearly positive or negative and excluded cases that were not unequivocally positive or negative [7]. There was high risk of bias for the domain prognostic factor measurement in four studies [7, 8, 10, 13] because these studies did not report clear criteria for FDG-PET positivity/negativity, and there was moderate risk of bias for the domain prognostic factor measurement in three other studies because one study [9] did not report which PET system (modern hybrid PET/CT or diagnostically inferior stand-alone PET) was used for FDG imaging and two studies [6, 12] reported that (some of) the included patients underwent stand-alone FDG-PET. Finally, there was high risk of bias for outcome measurement in all eight studies [613], because none of the eight studies reported clear and reproducible criteria for disease progression/relapse/end of progression-free survival (PFS), three studies [6, 7, 11] did not report whether follow-up visits/examinations were standardized or similar in both the FDG-PET negative and positive group, and five of eight studies [7, 8, 1113] failed to provide data on the overall survival (OS).

Table 4 Quality assessment of included studies according to the QUIPS tool [5]

Interim FDG-PET

The results of the three studies reporting on the prognostic value of interim FDG-PET are shown in Table 5. The proportion of interim FDG-PET negative and positive patients ranged from 64 to 76 % and 24.3 to 36.4 %, respectively. All three studies investigated the association of the interim FDG-PET result with PFS, with two studies reporting no significant difference in PFS between FDG-PET positive and negative patients [6, 12], whereas one study [9] reported FDG-PET positive patients to have a significantly worse PFS than FDG-PET negative patients [9]. Only two of three studies investigated the association of the interim FDG-PET result with OS, with both studies reporting no significant difference in OS between FDG-PET positive and negative patients [6, 9]. Of interest, none of the studies corrected the interim FDG-PET result for the predictive value of the Follicular Lymphoma International Prognostic Index (FLIPI) in the OS analyses.

Table 5 Results of studies reporting on the prognostic value of interim FDG-PET

End-of-treatment FDG-PET

The results of the eight studies reporting on the prognostic value of end-of-treatment FDG-PET are displayed in Table 6. The proportion of end-of-treatment FDG-PET negative and positive patients ranged from 71.1 to 87.5 % and 12.5 to 28.9 %, respectively. Of the eight studies, six investigated the association of the end-of-treatment FDG-PET result with PFS, with five studies reporting FDG-PET positive patients to have a significantly worse PFS than FDG-PET negative patients [711] and one study reporting a non-significant trend towards a worse PFS for FDG-PET positive patients [6]. Yet, another two studies did not report whether there was any significant difference in PFS between FDG-PET positive and negative patients [12, 13]. Only four of eight studies investigated the association of the end-of-treatment FDG-PET result with OS, with three studies reporting FDG-PET positive patients to have a significantly worse OS than FDG-PET negative patients [6, 9, 10], whereas one study did not report whether there was any significant difference in OS between FDG-PET positive and negative patients [7]. Of interest, none of the studies corrected the end-of-treatment FDG-PET result for the predictive value of the FLIPI in the OS analyses.

Table 6 Results of studies reporting on the prognostic value of end-of-treatment FDG-PET

Discussion

This systematic review included three studies on the prognostic value of interim-FDG-PET and eight studies on the prognostic value of end-of-treatment FDG-PET in follicular lymphoma. The results show that there is a lack of data supporting the use of interim FDG-PET for the evaluation of first-line therapy in follicular lymphoma, given the fact that interim FDG-PET was reported to be not predictive of PFS in two of three studies and was not found to be predictive of OS in the two available studies on this topic. On the other hand, end-of-treatment FDG-PET seems to have some predictive value based on the available studies, with five studies reporting a significant and one study reporting a nearly significant association with PFS and three studies reporting a significant relationship between end-of-treatment FDG-PET status and OS.

Importantly, however, these results should be interpreted very cautiously due to the overall poor methodological quality of included studies. There was high risk of bias with regard to study participation and confounding in one study (due to the exclusion of unequivocal FDG-PET cases), moderate and high risk of bias with regard to prognostic factor measurement in three and four studies, respectively (due to the [possible] use of diagnostically inferior stand-alone PET systems and lack of reporting of FDG-PET interpretation criteria, respectively), and high risk of bias with regard to outcome measurement in all eight studies. The latter is due to several important reasons, the first being the fact that the reference standard may have been different between FDG-PET positive and negative patients in three studies. This is of particular concern in retrospective studies, because it is not uncommon that patients with residual disease according to FDG-PET are monitored more closely, with more follow-up visits and more follow-up imaging and laboratory examinations, which will result in an earlier detection of (asymptomatic) progressions in this group compared to the FDG-PET negative group. This may have seriously biased the reported prognostic values of FDG-PET. Second, none of the included studies reported clear and reproducible criteria for disease progression/relapse/end of PFS. Various definitions are used for end of PFS in follicular lymphoma: first clinical/radiological evidence of disease after attaining complete remission (not applicable in end-of-treatment FDG-PET positive cases), increase in FDG avidity at follow-up FDG-PET, increase in tumor volume at follow-up CT, alterations in laboratory assessments, development of symptomatic disease, histological evidence of high-grade disease transformation, and initiation of second-line therapy or death. It should also be noted that the majority of included studies only reported the PFS and not the OS or the interval between end-of-treatment FDG-PET and initiation of the next therapy cycle. End of PFS, when defined as (asymptomatic) disease progression detected by radiological studies, is not a sufficient indication for second-line treatment initiation or changing patient management otherwise. None of the studies reported data on the value of FDG-PET in predicting the interval between end-of-treatment and (re)development of symptomatic disease or time interval between the end-of-treatment and the initiation of a subsequent second-line therapy, which can be regarded as perhaps more important outcome measures. Thus, it should be emphasized that the definition of PFS that is employed by many studies is not the most important outcome measure in this disease entity, and that although the results of this systematic review may suggest that a positive end-of-treatment FDG-PET may be associated with a worse PFS, it does not support the use of end-of-treatment FDG-PET in routine clinical practice. Overall, the subjectivity of the outcome measure PFS and the fact that the PFS do not provide very clinically relevant data underline that future studies should focus in particular on the OS or the time interval to develop subsequent symptomatic disease requiring second-line therapy. Finally, and surprisingly, none of the included studies corrected the end-of-treatment FDG-PET result for the predictive value of the FLIPI [14] in the OS analyses. Until end-of-treatment FDG-PET is proven to be predictive of OS, independently of the (less expensive) FLIPI, it cannot be recommended yet for routine use in clinical practice.

There are several factors that make the role of FDG-PET in follicular lymphoma more devious in contrast to that in aggressive non-Hodgkin lymphomas and Hodgkin lymphoma. First, follicular lymphoma is an incurable disease as a result of which absence of viable tumor at FDG-PET never signifies absence of residual tumor cells but rather that the majority of the tumor bulk has responded and made the lymphoma undetectable by FDG-PET. Disappearance of lymphoma from the “FDG-PET radar”, may indicate a long-term (asymptomatic) remission, but not cure. Second, since viable tumor at FDG-PET in follicular lymphoma is not a direct indication for treatment initiation [1], detection of residual disease at end-of-treatment FDG-PET may be less relevant in follicular lymphoma than in aggressive non-Hodgkin lymphoma and Hodgkin lymphoma where residual disease after first-line therapy will often result in rapid additional diagnostic work-up with eventual consolidation radiation therapy or initiation of second-line therapies [3, 4, 15]. Third, in follicular lymphoma, bone marrow involvement is present in approximately 50 % of patients [14]. However, FDG-PET has proven to be inadequately sensitive for the detection of bone marrow involvement in follicular lymphoma [11, 16]. This in turn proves that FDG-PET cannot be used either for the evaluation of response to treatment of follicular lymphoma that lodges in the bone marrow. Fourth, in contrast to the more aggressive histologies, determination of the prognostic value of interim and end-of-treatment FDG-PET for predicting OS requires very long follow-up times, and results are not rarely biased by variations in the initiation, nature and intensity of post-first-line therapy, and other, non-lymphoma-related causes of death.

The present systematic review had limitations. First, it was not possible to meta-analyze the results of individual studies, because there was considerable variation in data reporting, and hazard ratios for PFS and OS were not reported in the majority of included studies. Second, only a minority of studies reported the value of FDG-PET in predicting OS and no study reported data on the time to subsequent treatment initiation of second-line therapy or time interval to (re)develop asymptomatic disease after treatment. Third, applied treatment regimens among the included studies were heterogeneous, which affects interstudy comparisons. Fourth, the included studies reported follow-up times ranging between 6 and 104 months, which is insufficiently long considering the generally prolonged survival of patients with follicular lymphoma after first-line treatment.

In conclusion, the available evidence does not support the use of interim FDG-PET in follicular lymphoma. Although published studies suggest end-of-treatment FDG-PET to be predictive of PFS and OS, they suffer from numerous biases and failure to correct OS prediction for the FLIPI.