Introduction

Giant cell arteritis (GCA), Takayasu arteritis (TAK) and isolated (non-infectious) aortitis are the main forms of large vessel vasculitis (LVV) [1, 2]. GCA is frequently associated with cranial symptoms, such as headache and jaw claudication [3], whereas limb claudication and loss of peripheral pulsations are more common in patients with TAK [4]. Constitutional symptoms can be observed in all three forms of LVV. Arterial occlusion in LVV may lead to ischaemic damage of end organs (eye, brain, internal organs), whereas progressive aortic dilatation poses the risk of aortic dissection. The erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP) level are frequently elevated at diagnosis [3]. LVV is typically treated with high-dose glucocorticoids but biological disease-modifying anti-rheumatic drugs are increasingly used in the management of LVV. Anti-interleukin(IL)-6 receptor therapy is effective as maintenance therapy for GCA [5], whereas anti-tumour necrosis factor (TNF)α therapy is widely used in TAK [6]. Monitoring disease activity during treatment can be challenging since none of the clinical symptoms and laboratory markers are entirely specific for LVV. The ESR and CRP levels may remain normal during relapse [7, 8]. Moreover, anti-IL-6 receptor therapy precludes a rise of inflammatory markers by direct interference with the acute phase response. Therefore, additional modalities for the assessment of disease activity are warranted.

Imaging tools are increasingly applied in treatment monitoring of patients with LVV. Ultrasonography of temporal and axillary arteries is recommended as a first-line diagnostic test in patients with suspected GCA [9, 10]. The characteristic halo sign in the temporal arteries appears to gradually disappear upon treatment, whereas this abnormality may persist in the axillary arteries during clinical remission [11]. Magnetic resonance angiography (MRA) and computed tomography angiography (CTA) are extensively used in the diagnostic work-up of GCA, but little is known about their use to monitor treatment response [12]. These imaging modalities are also used for monitoring disease activity in TAK and aortitis, although evidence is even more scarce [13].

Imaging with 2-deoxy-2-[18F]fluoro-d-glucose ([18F]FDG) positron emission tomography combined with low-dose computed tomography ([18F]FDG-PET/CT) is a valuable diagnostic tool in LVV [9, 14]. [18F]FDG accumulates in metabolically active immune cells and stromal cells via the glucose transporter. A growing number of studies have evaluated [18F]FDG-PET/CT during treatment in patients with LVV, but its clinical value remains unclear. In this systematic review and meta-analysis, we aimed to summarize current evidence on the role of [18F]FDG-PET/CT for treatment monitoring in patients with LVV.

Methods

This study is reported in agreement with the Preferred Reporting Items for a Systematic Review and Meta-Analysis (PRISMA) statement. No ethical approval or informed consent was required.

Search strategy

A comprehensive search of records through the PubMed/MEDLINE and Cochrane Library databases was carried out (date of the last search: October 21, 2020). Search terms included ‘giant cell arteritis’, ‘Takayasu arteritis’, ‘aortitis’, ‘FDG’, ‘PET’, ‘positron emission tomography’, ‘follow-up’ and ‘response’. A detailed overview of the full search strategy is provided in Supplemental Table 1. The search was restricted to English language articles. In order to achieve a more comprehensive search, the references of the selected articles were screened manually by two investigators (OG, RS).

Study selection

Two reviewers (KG, GT) independently screened the titles and abstracts. Predefined inclusion criteria were original articles performing [18F]FDG-PET/CT for monitoring treatment response in patients with large vessel vasculitis (i.e. GCA, TAK or non-infectious aortitis). Exclusion criteria were (a) review articles, letters, comments, editorials, study protocols; (b) case reports or small case series (less than 10 patients with data of interest); (c) articles not available in English and (d) articles outside the scope of the current review (e.g. articles reporting [18F]FDG-PET without CT, animal studies, studies applying other tracers than [18F]FDG and studies related to infectious aortitis). The following studies were selected for the meta-analysis: (a) studies reporting sufficient data to evaluate the proportion of patients in which [18F]FDG-PET/CT remained positive during clinical remission following an initially positive [18F]FDG-PET/CT at baseline reflecting active disease; (b) studies reporting sufficient data to evaluate the diagnostic accuracy of [18F]FDG-PET/CT to discriminate between clinical relapse and remission; (c) studies in which at least 90% of patients received treatment at the time of the treatment monitoring scan. In case of potential overlap between studies from the same centre, only data from the largest study was used in the meta-analysis. Disagreements were solved through consensus between the reviewers.

Data extraction

All data extraction was performed by two independent reviewers. The following data was collected: authors, year of publication, country, study design (prospective, retrospective), LVV population, reference standard for LVV, number of scans, age and sex (FJ, OG); hybrid imaging modality, [18F]FDG injected activity, time interval between [18F]FDG injection and image acquisition, scan coverage, image analysis and definition of positive findings (OG; RS); arterial regions examined (AG, MS); additional study design (longitudinal, cross sectional), disease stage, disease duration, reference standard for disease activity, treatment, main findings related to [18F]FDG uptake during treatment (KG, RS). The authors were not contacted to retrieve unpublished data.

Quality assessment

The revised ‘Quality Assessment of Diagnostic Accuracy Studies’ tool (QUADAS-2) was used to evaluated the quality of all studies in the systematic review. The latter was used to assess the risk of bias for the following criteria: patient selection, index test, reference test and flow/timing, whereas applicability concerns were assessed for patient selection, index and reference test.

Statistical analysis

A meta-analysis of the proportion of patients in which the [18F]FDG-PET/CT remained positive during clinical remission was performed by the Stuart-Ord (inverse double arcsine square root) method and a DerSimonian-Laird (random effects) model. Heterogeneity was evaluated and a I2 statistic > 75% precluded evaluation of the pooled proportion. A bivariate model was used to assess the summary estimates of sensitivity, specificity, diagnostic odds ratio (DOR), positive likelihood ratio (LR+) and negative likelihood ratio (LR−). Pooled data were given with 95% confidence intervals (95% CI) and displayed using forest plots and hierarchical summary receiver operating characteristics (HSROC) plots. Likelihood ratios of more than 2.00 or less than 0.50 with 95% CI not including 1.00 were considered statistically significant. Publication bias was evaluated with an effective sample size (ESS) funnel plot and the associated regression test of asymmetry [15]. A threshold effect was evaluated: (a) by comparing the sensitivity and specificity as determined by the bivariate model to those obtained by a univariate random-effects model (DerSimonian-Laird method) and (b) by evaluating Spearman’s correlation coefficient of the logit of sensitivity and logit of 1-specifity. Proportion meta-analysis and the associated I2 statistic were evaluated with StatsDirect 3.2.109. Bivariate model analysis, HSROC plot and evaluation of funnel plot asymmetry were performed with STATA version 15.1 (metandi and midas commands). Forest plots were constructed in Review Manager version 5.3 and StatsDirect 3.2.109. Assessment of the threshold effect was performed with MetaDiSc 1.4. No additional sub-analyses were performed.

Results

Literature search

A comprehensive database search yielded a total of 444 unique records (Fig. 1). The earliest reference is dated from July 1987. Title and abstract screening led to exclusion of 381 records. A further 42 articles were excluded following full-text assessment. Eventually, 21 studies were selected for the qualitative analysis (systematic review) [161718,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36]. Eight studies were included in the meta-analysis since these reports contained sufficient data to either evaluate the accuracy of [18F]FDG-PET/CT for discriminating between active disease and remission during treatment [19, 23, 25, 33] or to evaluate the proportion of patients in which the scan normalized during clinical remission in patients on treatment [17, 20, 22, 36].

Fig. 1
figure 1

PRISMA flow diagram

Qualitative analysis (systematic review)

Study and patient characteristics

Table 1 summarizes the main characteristics of the 21 included studies. All articles have been published in the last decade with 16 (76%) being published in the past 5 years. Eleven studies (51%) were performed in Europe, 5 studies (24%) in Asia, 4 studies (19%) in the USA and 1 study in Australia. Studies with a retro- or prospective design were equally distributed (10 studies each), whilst one study consisted of a retrospective subgroup analysis of a large European prospective trial. Eleven studies (52%) reported data on serial [18F]FDG-PET/CT scans in patients with LVV, 8 studies (38%) reported cross-sectional data, whereas 2 studies (10%) contained both longitudinal and cross-sectional data. The vast majority of studies used the ACR criteria 1990 for GCA or TAK as a reference standard for LVV [37, 38]. All studies reporting gender distribution and age showed a female predominance for both vasculitis types and an overall lower mean/median age for patients with TA compared to patients with GCA. However, several studies reported a mean/median age > 40 years for patients with TA.

Table 1 Study and patient characteristics

Technical aspects

The technical aspects of [18F]FDG-PET/CT in the 21 studies are summarized in Table 2. [18F]FDG-PET scanning was performed together with low-dose CT except for one study in which part of the scans (65%) was performed without CT [34]. Contrast-enhanced CT was performed in two studies. Four studies from the same centre reported that [18F]FDG-PET/MRI was applied in paediatric patients, whereas the adult patients underwent [18F]FDG-PET/CT [16, 21, 29, 31]. The injected [18F]FDG activity was quite heterogeneous and included both weight-based and fixed activities. The [18F]FDG uptake time was 60 min in 13 studies (62%), < 60 min in 2 studies (10%) and 120–180 min in 4 studies (19%). Two studies (10%) did not report this technical aspect. The vast majority of scans covered the skull (either from the vertex or skull base) to the thigh region, including the (middle)large arteries, whilst some studies also included the feet. Reconstruction algorithms or adherence to EARL was not always specified. [18F]FDG-PET/CT image interpretation was primarily performed by visual analysis in 11 studies (52%) and a combination of visual analysis and semi-quantitative analysis using the maximum standardized uptake value (SUV) in 7 studies (33%). In 8 studies (38%), a target-to-background ratio (TBR) was used: 5 studies (24%) applied a target-to-liver ratio and 5 studies (24%) used the blood pool activity as background. The definition of positive [18F]FDG uptake was different among the included studies, but the majority of studies used the liver as the reference organ. In 6 studies (29%), visual uptake equal or higher to the liver was considered positive whilst uptake higher than the liver (either visual or semi-quantitatively) was defined as positive in 3 studies (14%). One study (5%) used different cutoff points for visual uptake at different arterial regions [33]. Six studies (29%) did not report any specific definition of positive [18F]FDG-PET/CT finding.

Table 2 [18F]FDG-PET/CT characteristics in the studies

Methodological quality of studies

Patient selection and the reference standard were the main sources of bias in the 21 studies (Fig. 2 and Supplemental Figure 1). Concerns regarding the applicability of the findings were related to the reference standard in studies applying instruments (i.e. NIH criteria, BVAS or ITAS2010) that have not been thoroughly validated for treatment monitoring of patients with LVV [12, 39].

Fig. 2
figure 2

Overall summary of QUADAS-2 items. Risk of bias and concern of applicability was assessed for 21 studies in the systematic review

Main findings of qualitative assessment

Five longitudinal studies (24%) only contained patients with newly diagnosed LVV, whereas the other longitudinal studies also included patients with relapsing and/or refractory disease (Supplemental Table 2). The median disease duration in the latter studies ranged from 6 months to 4.8 years. The cross-sectional studies mostly contained patients on treatment with a disease duration ranging from 2.0–6.9 years (Table 3). Four cross-sectional studies (19%) also contained patients with newly diagnosed LVV: in 1 study, at least 90% of all scans were performed during treatment [34], whereas this was unclear (at least 54%) in the other 3 studies [21, 29, 31]. Clinical disease activity was determined according to standardized instruments (i.e. NIH criteria, BVAS or ITAS2010) in 6 studies (29%). Physician’s clinical assessment (i.e. symptoms, physical signs, with/without inflammation markers) was used as the reference standard for disease activity in the other 15 studies (71%). [18F]FDG-PET/CT findings were involved in the reference standard for disease activity in 2 studies (10%). Treatment included glucocorticoid therapy, conventional synthetic disease-modifying anti-rheumatic drugs (DMARDs) and/or biological DMARDs. Several studies investigated a specific DMARD: anti-IL-6R therapy (i.e. tocilizumab) (n = 3) [16, 30, 36], anti-TNFα therapy (i.e. infliximab) (n = 2) [16, 28] and cyclophosphamide (n = 1) [22].

Table 3 Main findings in cross-sectional studies on hybrid imaging for monitoring of treatment response

Longitudinal studies indicated that the enhanced arterial [18F]FDG uptake at baseline decreases upon treatment-induced remission of LVV (Supplemental Table 2). A representative example of serial [18F]FDG-PET/CT scans in a patient with LVV is shown in Fig. 3a. In the longitudinal studies, the number of arterial segments with positive [18F]FDG uptake, composite [18F]FDG-PET/CT scores and especially target-to-background ratios (SUVartery/SUVliver) improved when patients were scanned during clinical remission (Fig. 3b and c). Two longitudinal studies indicated that [18F]FDG uptake remains high in patients with a relapsing or refractory disease on treatment (Supplemental Table 2) [19, 24]. Three studies reported complete normalization of the scans during long-term follow-up, whereas few scans became normal during remission in another study (Fig. 4). One longitudinal study investigated early [18F]FDG-PET/CT changes after initiation of high-dose glucocorticoid treatment [27]. This study showed that pathological, arterial FDG uptake disappears in 64% of patients within 10 days after the start of treatment, whereas the scans still showed pathological FDG uptake after only 3 days of treatment. In essence, the cross-sectional studies indicated that [18F]FDG uptake is higher during clinically active disease than during clinical remission (Table 3).

Fig. 3
figure 3

Modulation of quantitative [18F]FDG-PET/CT measures upon clinical remission in longitudinal studies. Per scan data or per patient data at baseline and during serial scans were obtained, if the disease activity during the scans was clearly defined in the studies. a Representative [18F]FDG-PET/CT scans of a patients with giant cell arteritis (GCA). Scans were performed at diagnosis and during immunosuppressive treatment. b Timing of follow-up scans and c quantitative PET measures (including no. of positive arteries, composite PET scores, target to background ratio (TBRs) in the included, longitudinal studies. CRP, C-reactive protein; ESR, erythrocyte sedimentation rate; MTX, methotrexate; PRED, prednisolone; TCZ, tocilizumab (anti-IL-6 receptor therapy)

Fig. 4
figure 4

The proportion of patients with a positive [18F]FDG-PET/CT during clinical remission in longitudinal studies. Per scan data or per patient data at baseline and during serial scans were obtained, if the disease activity during the scans was clearly defined. a Timing of follow-up scans and b the number of patients with a positive scan during clinical remission in each study

Quantitative analysis (meta-analysis)

Four longitudinal studies (n = 57 patients) provided sufficient data to determine how often the [18F]FDG-PET/CT remained positive during clinical remission. The pooled proportion of patients with a positive follow-up scan during clinical remission was 25.4% (95% CI 0.0 to 77.1). However, marked heterogeneity was observed in the forest plot (Supplemental Figure 2). The I2 statistic was 94.1% (95% CI 88.5–96.3) indicating that pooling of proportions is not appropriate due to high heterogeneity.

Four cross-sectional studies (n = 111 patients with 136 scans) allowed to evaluate the ability of [18F]FDG-PET/CT to distinguish the active disease from clinical remission in patients on treatment. [18F]FDG-PET/CT showed a moderate diagnostic accuracy for detecting active disease with a pooled sensitivity of 77% (95%CI 57–90%) and specificity of 71% (95%CI 47–87%) according to the bivariate model (Table 4). Substantial between-study heterogeneity was observed in the forest and HSROC plots (Fig. 5a and b). Funnel plot analysis was not suggestive of publication bias (Fig. 5c). A threshold effect did explain the heterogeneity in the studies since meta-analysis with a univariate model (Supplemental Table 3) provided similar estimates of sensitivity and specificity as the bivariate model, and no positive correlation was found between the logit of sensitivity and logit of 1 specificity (Spearman correlation coefficient − 0.40, p value = 0.600).

Table 4 Diagnostic accuracy of [18F]FDG-PET/CT for discrimination between active disease and remission during follow-up of large vessel vasculitis
Fig. 5
figure 5

Heterogeneity and publication bias in meta-analysis of diagnostic accuracy of [18F]FDG-PET/CT during follow-up. Data were obtained from 4 cross-sectional studies in which at least 90% of patients were receiving treatment during the scan. a Forest plot and b HSROC plot of sensitivity and specificity. Pooled sensitivity was 77.3% (95%CI 56.5–89.9), and pooled specificity was 70.9% (95%CI 47.3–86.8). c Effective sample size (ESS) funnel plot and the associated regression test of asymmetry. A p value < 0.10 was considered evidence of asymmetry and potential publication bias

Discussion

Main findings

The current systematic review and meta-analysis provide a comprehensive overview on the value of [18F]FDG-PET/CT for treatment monitoring in patients with LVV. The majority of studies indicates that arterial [18F]FDG uptake improves upon clinical remission in patients treated for LVV. It remains to be elucidated, however, to what extent [18F]FDG-PET/CT completely normalizes during clinical remission. Overall, [18F]FDG-PET/CT has moderate diagnostic accuracy to discriminate between patients with active disease and those in clinical remission. Differences in technical aspects and interpretation of [18F]FDG-PET/CT, as well as clinical differences among the included patients, may have contributed to marked between-study heterogeneity.

Current data indicate that [18F]FDG-PET/CT may aid in monitoring treatment response in patients with LVV, but its findings need to be interpreted in the context of other clinical findings. Careful evaluation of symptoms and laboratory markers remains a critical step in the assessment of disease activity. Due to its moderate diagnostic accuracy in patients on treatment (sensitivity 77%, specificity 71%), a [18F]FDG-PET/CT scan by itself cannot rule in or rule out disease activity. Prior recommendations on imaging in LVV have recognized the potential role of FDG-PET/CT for monitoring treatment response [9]. Despite the paucity of evidence, other imaging methods such as ultrasonography, MRA and CTA are also often applied to monitor treatment in LVV [12, 13]. Although [18F]FDG-PET/CT has various drawbacks including high cost and radiation exposure, it has several advantages. [18F]FDG-PET/CT is inherently a whole-body imaging method and allows for a comprehensive evaluation of all relevant large arteries in a single scan. Even temporal arteries might be evaluated due to improved resolution on newer camera systems [40, 41]. [18F]FDG-PET/CT also allows evaluation of concomitant PMR in patients with GCA [42]. Contrast-induced nephropathy is not an issue for [18F]FDG-PET/CT as no contrast is needed. Since [18F]FDG-PET/CT detects metabolic activity in the arterial wall, it could also provide complementary information to other imaging methods. For instance, persistence or gradually worsening of arterial wall thickening in the absence of [18F]FDG uptake might suggests ‘burnt out fibrotic disease’ [43]. Further studies are needed to firmly establish a role for [18F]FDG-PET/CT and other imaging methods in the management of LVV.

Factors related to [18F]FDG-PET/CT scanning and interpretation might have contributed to the between-study heterogeneity as observed in the current analysis. Differences in methodological aspects of [18F]FDG-PET/CT scan (e.g. administered activity, time interval between [18F]FDG injection and camera acquisition, scan systems and reconstruction algorithms) could lead to such heterogeneity. Scans were partly performed with a [18F]FDG-PET system in one study, which precluded inclusion in our meta-analysis [34]. Moreover, variation in scoring systems was observed across the included studies. Most included studies applied a visual uptake scoring system, with or without a semi-quantitative parameter (i.e. SUVmax). In several studies, visual scores at different arterial regions were combined into a composite [18F]FDG-PET/CT score (e.g. PETVAS), but the scan coverage and the examined arterial regions differed across the studies. Furthermore, visual grading systems used either the liver activity or blood pool activity as the reference background. The definition of [18F]FDG positivity on a visual scale as well as the optimal SUV cutoff value differed substantially between the studies and was even not reported in 29% of studies. It remains questionable which reference background is most reliable for treatment monitoring in LVV, given the increased [18F]FDG uptake by the liver due to high-dose glucocorticoids [44], and the higher [18F]FDG blood activity in patients with renal failure. This highlights the need for a standardized scoring system for LVV activity on [18F]FDG-PET/CT in addition to standardization of the scanning protocol itself. Importantly, procedural recommendations for [18F]FDG-PET/CT imaging in LVV have recently been reported [14]. The scarce data using [18F]FDG-PET/MRI in this setting does not allow any further comment on the use of MRI instead of CT, but the combination of [18F]FDG-PET and MRA may be of interest in the future.

Study heterogeneity could be further explained by patients’ characteristics in the studies. First, the number of patients on treatment during the initial scan varied among the studies. Although most cross-sectional studies only contained patients on treatment, longitudinal studies showed substantial variation in the disease stage of the included patients (i.e. newly diagnosed, relapsing and/or refractory LVV). Second, treatment differed among the studies. Marked differences existed in both glucocorticoid dosages and use of DMARDs. This could be explained by the predominant patient population in the studies (i.e. GCA versus TAK, new-onset disease versus relapsing or refractory disease) as well as local hospital preferences. Future studies should compare the effect of glucocorticoid dosage and specific DMARDs on vascular [18F]FDG uptake in patients with LVV. Third, the timing of follow-up scans differed among the studies. Follow-up scans could be performed several months or even years after treatment. It would be interesting to know the disease course preceding the follow-up scans since it might be relevant if LVV is in clinical remission for a few weeks as compared to a few months or years.

Limitations

Our study has several limitations. The number of patients in the systematic review and especially meta-analysis was relatively small. Various types of bias may have influenced the study findings. The decision to perform [18F]FDG-PET/CT could have introduced selection bias, e.g. for patients with a refractory or relapsing disease course, who perhaps may show more persistent arterial [18F]FDG during clinical remission. In some studies, [18F]FDG-PET/CT findings were incorporated into the assessment of disease activity. This could lead to overestimation of its ability to evaluate disease activity. Another limitation was the assessment of disease activity by instruments such as the NIH criteria, ITAS2010 and BVAS in part of studies. These instruments have not been thoroughly validated for LVV and their ability to evaluate disease activity remains uncertain [12, 39]. Our study highlights a need for large, prospective studies with serial [18F]FDG-PET/CT scans at fixed time points during clinical remission in addition to scans performed at the suspicion of clinical relapse. A rigorous clinical definition of disease activity is required in the conduct of these studies as recognized by recent recommendations on the management of LVV [45].

Conclusion

This systematic review and meta-analysis suggest that [18F]FDG-PET/CT may aid in the assessment of disease activity in patients with LVV. FDG uptake decreases during clinical remission, but it remains unclear to what extent the arterial wall [18F]FDG uptake normalizes. [18F]FDG-PET/CT has moderate accuracy to distinguish the active disease from remission in patients on treatment. Therefore, [18F]FDG-PET/CT findings should be interpreted in the context of clinical and biochemical findings. This study also highlights the relevance of procedural recommendations for [18F]FDG-PET/CT in LVV.