Introduction

Currently, resection is the only curative option available for patients with gastric cancer [14]. Accurate assessment of local tumor depth invasion (T), regional lymph node invasion (N), and distant metastases (M) is crucial to appropriate surgical and treatment planning [1, 2, 5]. Understaging of the disease may lead to positive resection margins or unnecessary laparotomy if metastases were not identified on pre-operative imaging. Overstaging a patient may lead to ineffective care if a potentially curative patient is incorrectly categorized as a palliative patient [5].

Available pre-operative staging modalities include abdominal ultrasound (AUS), computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET). Current National Comprehensive Cancer Network (NCCN) practice guidelines for gastric cancer [6] suggest using a variety of techniques as part of the workup, including CT of abdomen and pelvis, chest imaging, pelvic ultrasound, PET, PET-CT, esophagogastroduodenoscopy (EGD), and endoscopic ultrasound (EUS). However, the guidelines do not recommend specific modalities or workup pathways [6].

Despite the routine use of the above imaging modalities for pre-operative staging, each modality has limitations. AUS has difficulty in evaluating the wall of the gastric fundus and greater curvature, as well as lymphatic spread [7, 8]. It is also highly dependent on patient body habitus and the operator [9]. Traditional single detector scanners (S-CT) are limited by large section thickness, low image resolution, and slow scanning causing respiratory motion artifacts, and they are unable to provide multi-planar reformations [10, 11]. Multi-detector row CTs (MDCTs) have difficulty in detecting flat type lesions and have poor soft tissue contrast resolution [10, 11]. Nodal assessment is limited to size criteria, which does not allow diagnosis of microscopic nodal invasion or the exclusion of enlarged reactive nodes [10]. MRI scans have limitations including respiratory motion artifacts, long examination time, high costs, and lack of a standard gastric protocol [12, 13]. Assessment of nodal status by MRI is also limited to size criteria. Furthermore, MRI is limited in the amount of body coverage that can be achieved in a single exam, making it unsuitable for M staging [14]. 18-F-Fluoro-2-deoxyglucose positron emission tomography (FDG-PET, simplified as PET throughout this paper) uses a semi-quantitative method, the standardized uptake value (SUV), to assess the uptake of FDG in a tumor [15]. However, SUVs are dependent on several factors, including time post FDG injection, tumor size, normoglycemia, and technical parameters [16, 17]. PET is also highly dependent upon the pathological subtype of the cancer, as mucinous tumors may give false-negative results [15].

The limitations of each technique have an effect on the ability of these modalities to accurately stage gastric cancer prior to surgery [14, 7, 8, 1013, 1517], yet patient operability and tumor resectability are heavily dependent on the quality of pre-operative imaging [5]. Therefore, the purpose of this review is to provide a detailed meta-analysis of the pre-operative TNM staging abilities of AUS, CT, MRI, and PET in patients with pathology-confirmed gastric cancer over the past decade.

Methods

Data sources

Electronic literature searches were conducted using Medline and Embase from January 1, 1998 to December 1, 2009, according to the search algorithm presented in Appendix A of electronic supplementary material. Search terms included: [exp stomach cancer/or (((gastric or stomach) adj1 cancer$) or ((gastric or stomach) adj1 carcinoma) or ((gastric or stomach) adj1 adenocarcinoma) or ((gastric or stomach) adj1 neoplasm$)).mp.] and [cancer staging/or diagnostic imaging/or exp computer assisted tomography/or computer assisted emission tomography/or exp positron emission tomography/or exp nuclear magnetic resonance imaging/or exp barium meal/] and [clinical trial/or controlled clinical trial/or exp comparative study/or meta analysis/or multicenter study/or exp practice guideline/or randomized controlled trial/] not [review or case report/] not [*gastrointestinal stromal tumor/or exp B cell lymphoma/and “marginal zone”.mp.]. A separate search of the Cochrane Central Register of Controlled Trials (1998–2009) was performed using the search term “gastric cancer”. Reference lists from review papers and relevant articles were also examined for additional studies that met our inclusion criteria.

Study selection and review process

To be eligible, studies had to meet the following criteria: (1) investigation of preoperative T, N, or M staging performance of AUS, CT, MRI, or PET in newly (not recurrent) diagnosed patients with histopathology-confirmed gastric adenocarcinoma; (2) patients underwent surgery, and pre-operative staging was compared with post-operative pathological staging; (3) studies involved human patients with a minimum of 30 patients; and (4) studies were published in peer reviewed journals in English. Studies were excluded according to the following exclusion criteria: (1) studies that involved animals and/or ex vivo samples; (2) studies that involved patients with mixed cancer or studies investigating the diagnostic performance in other cancers with no separate analysis of gastric cancer subjects; (3) studies that did not provide sufficient information to determine pre-operative T, N, or M staging performance; and (4) review articles, meta-analyses, abstracts, conference proceedings, editorials/letters, and case reports. Studies that reported performance characteristics on more than one imaging technique were included only if the images from each technique were independently analyzed and the reviewers were blinded. All electronic search titles, selected abstracts, and full-text articles were independently reviewed by a minimum of two reviewers (NC, LP, AM, RC, RS). Disagreements on study inclusion/exclusion were resolved with a consensus meeting.

Data extraction

A systematic approach to data extraction was used to produce a descriptive summary of participants, interventions, and study findings. The first reviewer (RS) independently extracted the data and a second reviewer (RC, CM) checked the data extraction. No attempt was made to contact authors for additional information. The TNM staging categories were extracted from corresponding publications. Staging classifications for individual studies can be found in Appendices 1–4 of the electronic supplementary material. Modality-specific staging definitions incorporated by the majority of studies are shown in Fig. 1. Both the Union International Contre le Cancer (UICC)/American Joint Committee on Cancer (AJCC) staging classifications [18], and the Japanese Gastric Cancer Association (JGCA) 2nd English edition [19] classification system were used (Appendix B of the electronic supplementary material).

Fig. 1
figure 1

TNM staging criteria for gastric cancer by modality. aAdapted from [42, 43]. bAdapted from [28, 53]. cAdapted from [36]. dAdapted from [50, 56]. All articles adapted the above definitions or slight variations of these definitions when describing their respective modalities. AUS, abdominal ultrasound; CT, computed tomography; JGCA, Japanese Gastric Cancer Association; AJCC, American Joint Committee on Cancer; UICC, Union International Contre le Cancer; MRI, magnetic resonance imaging; PET, positron emission tomography

Data analysis

A range of definitions was found for the calculation of accuracy, sensitivity, and specificity. Therefore, the following performance characteristics were re-calculated from the original numbers provided in each included publication: detection rate, accuracy, overstaging rate, understaging rate, agreement/kappa statistic (κ), sensitivity, and specificity. Detection rate was defined as the ability to detect the presence of a tumor. Accuracy was defined as the ability to match the pre-operative stage of a given tumor with the post-operative pathology staging (i.e., T1 accuracy = [number correctly staged by pre-operative imaging technique as T1/number staged by pathology as T1] × 100). Over- and understaging refer to when the tumor was incorrectly staged higher and lower compared to post-operative pathological staging, respectively. Overall calculations for accuracy, overstaging rate, and understaging rate were based on the average performance values for all cases (i.e., combined values for T1–T4; i.e., overall accuracy = [number of cases correctly staged/number of all cases] × 100). Agreement between the pre-operative imaging technique and pathology was calculated using a 4 × 4 table (corresponding to stages T1, T2, T3, and T4). A 5 × 5 table was used when the pre-operative imaging technique did not detect the presence of a tumor (stage T0), while a 3 × 3 table was used when two of the stages were combined (e.g., T1–T2). The following interpretation of κ was used: <0 = less than chance agreement; 0.01–0.20 = slight agreement; 0.21–0.40 = fair agreement; 0.41–0.60 = moderate agreement; 0.61–0.80 = substantial agreement; 0.81–0.99 = almost perfect agreement [20]. For pre-operative N staging, sensitivity and specificity of staging a lymph node as negative (N0) or positive (N+) was determined using a 2 × 2 table (corresponding to N0 and N+). For pre-operative M staging, sensitivity and specificity of staging metastases as negative (M0) or positive (M1) was determined using a 2 × 2 table (corresponding to M0 and M1). Overall calculations for sensitivity and specificity for N and M stage were based on the average values for all cases (i.e., N0–N+ and M0–M+). Statistical analyses were conducted using R version 2.10.1 statistical package (http://cran.r-project.org/). Meta-analysis (pooling of data) was calculated using the inverse variance method and the random effects estimate based on the DerSimonian–Laird method [21]. Only performance characteristics that were re-calculated were included in pooling analyses. Significance within and between imaging techniques was calculated by comparing pooled scores. A Bonferroni correction was applied when multiple comparisons were made such that significance was reached when P ≤ α/N (where α = 0.05 and N = number of comparisons/outcomes measured) [22].

Results

Literature search

A total of 5204 titles/abstracts were identified from the electronic searches and reference lists for preliminary review. After removal of duplicates and screening for relevant titles and abstracts, a total of 167 articles were submitted for a full-text review. A total of 40 articles [2362] involving 3758 patients met our inclusion criteria and were included in this review (Fig. 2). We included 29 prospective studies and 11 retrospective studies.

Fig. 2
figure 2

Article selection flow

Performance characteristics of pre-operative imaging studies

Overall TNM staging results for each technique (AUS, CT, MRI, and PET) are presented in Tables 1, 2, 3, and 4, respectively, with more detailed (stage/metastatic site specific) analyses found in the correspondingly numbered electronic Appendices 1–4 of the electronic supplementary material. For the evaluation of pre-operative diagnostic AUS (3 studies [37, 42, 43]), a total of 168 patients were assessed for T stage, 149 patients were assessed for N stage, and 101 patients were assessed for M stage pre-operatively by AUS and post-operatively by pathology. For the evaluation of pre-operative diagnostic CT (32 studies [2335, 3742, 4449, 5259]), a total of 2909 patients were assessed for T stage, 2646 patients were assessed for N stage, and 916 patients were assessed for M stage pre-operatively by CT and post-operatively by pathology. For the evaluation of pre-operative diagnostic MRI (3 studies [36, 49, 51]), a total of 109 patients were assessed for T stage, and 75 patients were assessed for N stage pre-operatively by MRI and post-operatively by pathology. For the evaluation of pre-operative diagnostic PET (9 studies [29, 44, 45, 50, 55, 56, 6062]), a total of 422 patients were assessed for T stage, 420 patients were assessed for N stage, and 282 patients were assessed for M stage pre-operatively by PET and post-operatively by pathology.

Table 1 Performance characteristics of abdominal ultrasound (AUS) studies
Table 2 Performance characteristics of computed tomography (CT) studies
Table 3 Performance characteristics of magnetic resonance imaging (MRI) studies
Table 4 Performance characteristics of positron emission tomography (PET) studies

Comparison of AUS, CT, MRI, and PET

The pooled TNM performance characteristics of all modalities are reported in Table 5. Overall, MRI had significantly better T staging performance compared to all CT scanners, as well as better T1 staging performance compared to AUS. Because PET cannot stage cancers by tumor depth, we calculated the primary tumor detection rate reported in all studies. This pooled value was 80.4 ± 4.9%, with an overall detection rate ranging from 58.1 to 95.9% (Appendix 4.3 of the electronic supplementary material). The primary tumor detection rates for AUS, CT, and MRI ranged from 90.7–100, 61.1–100, and 97.8–100%, respectively (Appendices 1.3, 2.3, and 3.3 of the electronic supplementary material). For N staging, PET had the lowest sensitivity and the highest specificity. There was no superior modality for determining M stage.

Table 5 Comparison of performance characteristics by imaging technique

Pre-operative TNM staging performance by detector number and use of multi-planar images

We compared the pooled TNM performance characteristics of CT scanners with <4 detectors [25, 26, 3033, 35, 37, 4042, 46, 49, 52, 57, 59] to those with ≥4 detectors [23, 24, 27, 28, 34, 38, 39, 47, 48, 5355, 58], to determine whether the use of more detector rows to capture images translated into better pre-operative staging performances (Table 6). Overall, CT scanners with ≥4 detectors had significantly better T staging performances compared to CT scanners with <4 detectors. However, detector number did not significantly affect N or M staging performances.

Table 6 Comparison of computed tomography performance characteristics by detector number and MPR images

We compared the pooled TNM performance characteristics of CT scanners using traditional single plane axial images [25, 2835, 37, 38, 4042, 46, 49, 52, 5759] with scanners using multi-planar reformatted (MPR) images [23, 24, 27, 28, 34, 38, 39, 44, 47, 48, 5356, 58], to determine whether the addition of multiple image planes translated into better pre-operative staging performances (Table 6). Overall, CT scanners using MPR images had significantly better T staging performances compared to axial images. However, additional MPR images did not significantly affect N or M staging performances.

Discussion

Accurate assessment of pre-operative TNM staging in gastric cancer is crucial for determining appropriate treatment strategies, especially for planning surgery, which remains the foundation for cure [14]. We reviewed a total of 40 studies (3758 patients): 3 AUS studies (168 patients) [37, 42, 43], 32 CT studies (2909 patients) [2335, 3742, 4449, 5259], 3 MRI studies (109 patients) [36, 49, 51], and 9 PET studies (422 patients) [29, 44, 45, 50, 55, 56, 6062] on their pre-operative TNM staging performance values over the past decade (Tables 1–4/Appendices 1–4 of the electronic supplementary material).

TNM staging classifications

This review includes studies published over a span of 10 years and as such many TNM staging classifications are utilized (Appendix B of the electronic supplementary material). There are no differences between the 3rd and 4th editions of the UICC/AJCC system, which were incorporated by 25% of the included studies. Although the 6th edition divides T2 into T2a and T2b, the studies included in this review did not incorporate this breakdown; for the purposes of this review, the 5th and 6th editions are considered the same (37.5% of the included studies) [18]. The main difference between the 3rd/4th and 5th/6th UICC/AJCC editions is the classification of N stage. The 3rd/4th editions did not have an N3 stage, and the N1 and N2 stages were defined according to the distance of the perigastric regional lymph nodes from the edge of the primary tumor [18]. The 5th/6th editions defined N1, N2, and N3 stages according to the total number of lymph node metastases present. Additionally, the 5th/6th editions considered metastases to the hepatoduodenal nodes as regional lymph nodes, whereas the 3rd/4th editions considered them as distant metastases (M1 disease) [18]. The 2nd English edition of the JGCA classification system was utilized in 30% of the studies [19]. The main differences between the JGCA and UICC/AJCC systems are the classifications for N and M stage. The JGCA defines N1, N2, and N3 stages according to the lymph node groups with respect to the location of the primary tumor [19]. In general, Group 1 nodes refer to the perigastric nodes, Group 2 nodes refer to the nodes along the celiac artery and its branches, and Group 3 nodes refer to the retropancreatic or paraaortic nodes, whereas in the UICC/AJCC classification, retropancreatic and paraaortic nodes are classified as distant metastases (M1 disease). Furthermore, the 2nd English edition of the JGCA system does not consider peritoneal, liver, and cytological metastases as M1 disease (although the presence of these indicates stage IV disease), whereas the UICC/AJCC system does [18, 19]. Finally, 7.5% of the included studies used other staging classifications, such as those adopted by the World Health Organization (WHO), as well as those created by the former JGCA (the Japanese Research Society for Gastric Cancer; JRSGC) in 1993 and 1995.

Despite the incorporation of various classification systems, this is not a limitation in our meta-analysis. The T stage breakdown across all editions is the same, because the T2a and T2b definitions were not incorporated; thus, the pooling of data and comparison between studies was not affected. Due to the various N stage classifications, our meta-analysis only compared the ability to identify N0 versus N+ disease, as these definitions are consistent across all systems, thus making it possible to compare studies. With respect to M stage, our meta-analysis utilized the UICC/AJCC and not the JGCA definitions, and thus considered peritoneal, liver, and cytological metastases as M1 disease. Re-classification as M1 was possible for the included studies utilizing the JGCA definitions because the presence of peritoneal, liver, and cytological metastases was mentioned within the publications.

Evaluation of T staging

The value of AUS in pre-operative T staging remains unclear. We did not find any significant differences between AUS and the other imaging modalities, except for poor T1 staging performance compared to MRI (Table 5). The lack of significance is most likely attributable to the large standard error and limited published studies. We included only 2 studies that reported pre-operative T staging values (Appendix 1 of the electronic supplementary material), one of which had fair agreement (κ, 95% confidence interval [CI]: 0.40, 0.20–0.60) and another with substantial agreement (0.66, 0.55–0.77). This variation may be explained by the difficulty in staging tumors found in the gastric fundus and greater curvature [7, 8, 42, 43], as well as the highly subjective nature of AUS staging and thus its strong operator dependence [42, 43].

The T staging performance characteristics of CT scanners are moderate, with a pooled κ of 0.55, an overall accuracy of 71.5%, and stage-specific accuracies ranging from 63 to 75% (Table 5). However, when taking detector number and MPR images into consideration, the performance value of CT is improved (Table 6). Specifically, the use of ≥4 detector scanners results in a substantial increase in agreement with pathology (κ = 0.65), an overall accuracy of 80%, and stage-specific accuracies ranging from 75 to 84.5%. These results are supported by other studies that have shown similar improvements in T staging with increased detector number [10, 11, 13, 63]. Therefore, we recommend that pre-operative T staging of gastric cancer be performed on MDCT scanners with ≥4 detectors. If determination of organ invasion is necessary, a higher-capacity scanner may give more accurate results (T1: 75.2 vs. 47.5% and T3: 84.5 vs. 69.3% for ≥4 detector and <4 detector scans, respectively, Table 6). Accurate staging of T1 versus T2 is important for endoscopists considering endoscopic mucosal resection (EMR) and may be aided by using EUS [64, 65], whereas accurate staging of T3 versus T4 is important as the surgeon would need to plan a multi-visceral resection. The use of MPR images significantly improved T staging performance compared to axial images alone, resulting in a substantial agreement with pathology (κ = 0.67), an overall accuracy of 82%, and stage-specific accuracies ranging from 76 to 85% (Table 6). These results are supported by other studies that have shown similar improvements in T staging with multiple image planes [10, 11, 13, 63]. Therefore, we recommend that MPR images be included in the protocol for pre-operative T staging of gastric cancer if determination of T stage is critical.

Our results show that MRI had the best overall performance characteristics for T staging compared to other staging modalities, with a substantial agreement with pathology (κ = 0.73), an overall accuracy of 83%, and a stage-specific accuracy ranging from 77 to 87% (Table 5). However, it is important to note that only 3 MRI studies examining 109 patients were included in this review. Therefore, while the pre-operative T staging ability of MRI is highly accurate, a publication bias may be present, as all 3 studies reported excellent results (compared to the literature found on CT scans which included publication of poor results), which may have caused an overestimation of its performance abilities. Furthermore, current MRI protocols are breath-hold-dependent [12, 13, 36, 49, 51]; as such, it is possible that the patient cohorts included in these studies were better able to comply than the gastric cancer population as a whole.

Despite the inability to stage gastric cancer by tumor depth, PET has a pooled primary tumor detection rate of 80%, which suggests a good overall ability for identifying a gastric cancer if one exists. Not surprisingly, PET has a higher capacity to detect advanced gastric tumors (83–100%) compared to early gastric tumors (26–63%; see Appendix 4 of the electronic supplementary material). However, the ability of PET to detect various pathological tumors varies greatly with type: intestinal type (65.5–83%), non-intestinal type (41–79%), poorly differentiated adenocarcinomas (61.5–79%), and signet ring cell carcinoma (0–78%; see Appendix 4 of the electronic supplementary material).

It is important to mention that the overall accuracy of T staging for a given study (and for the pooled population) is dependent on the distribution of T stage within the evaluated patient population. Typically, T1 and T2 accuracies are generally lower than those for T3 and T4 because of the inability to discriminate depths of invasion in early cancers. The relationship between T1 versus T2 tumors and T3 versus T4 tumors, however, is more complicated and sensitive to operator performance and imaging modality. In our meta-analysis, a significant difference between T-staged groups was not found (data not shown). Nonetheless, with the exception of MRI, a visible trend was found for higher T3 compared to T1 accuracies across modalities (Table 5). However, exceptions to this trend have been documented. For example, Table 2 shows that Ahn et al. [23] had a high overall accuracy of 86.4%, with 88% of the patients staged as T1, while Blackshaw et al. [25] had a low overall accuracy of 60%, with 85% of the patients staged as T3/T4. Consistent with the concept of being able to differentiate early versus advanced tumors, the distribution of the patient population can also have an effect on the sensitivity and specificity of identifying lymph nodes. In a patient population with a greater number of advanced tumors, it is likely that there will be a higher sensitivity and specificity for identifying lymph node involvement compared to a population with a greater number of early tumors, due to a higher pre-test probability of nodal involvement.

Evaluation of N staging

The ability to stage lymph node (LN) status pre-operatively in gastric cancer patients remains poor. Our results show that imaging modalities range in overall accuracy from 53% (MRI) to 66% (CT), in sensitivity from 40% (PET) to 85% (MRI), and in specificity from 75% (MRI) to 98% (PET), with no significant differences between modalities (Table 5). The specificities for all modalities were higher than their respective sensitivities. Among CT scanners, neither detector number nor MPR images significantly improved N staging (Table 6). The 85% sensitivity reported for MRI is from only one study, and thus it cannot be stated that MRI is clearly superior to other modalities. PET had the worst sensitivity (40%) of differentiating N0 and N+ nodes, but the best specificity (98%), suggesting it may be used to clarify true positive patients. These results confirm the analysis of another review that showed neither AUS, MDCT, conventional MRI, nor PET could reliably confirm or exclude the presence of LN metastasis [66]. Tumor-positive LNs are not always enlarged, and certain enlarged LNs are not always tumor-positive but instead are enlarged due to inflammation, both of these possibilities make N staging extremely difficult [15, 66]. Moreover, there are varying LN size criteria (ranging from >6 mm to >1 cm) required for LN detection [10]. We found that the majority (68%) of the studies incorporated a definition of ≥8 mm for LN involvement, although this criterion was applied to the short axis diameter in some cases and the long axis diameter in other cases (Appendix 2.4 of the electronic supplementary material). These size-dependent diagnostic criteria for AUS, CT, and MRI may also contribute to the lower specificity found among these modalities compared to PET, which utilizes a metabolic diagnostic criterion. However, the mean SUV noted for N staging can also vary, with overall values ranging from 4.5 to 6.8 (Table 4), and mean SUVs overlapping between N stage categories (N0: 3.5–6.0; N1: 2.7–7.5; N2: 4.5–9.0; N3: 6.2–8.7; Appendix 4.4 of the electronic supplementary material). These inaccuracies in true nodal status make pre-operative determination of disease spread difficult, and must be taken into account in reports of pre-operative staging for neoadjuvant and peri-operative treatments, as well as in the selection of patients for EMR in early gastric cancer. However, the progress made in the field of molecular biology, where studies have successfully documented the ability to use specific radio-labeled probes to tag and identify specific tumor antigens and/or receptors [67], will undoubtedly contribute to the advancement of pre-operative staging in gastric cancer, which should lead to more effective staging strategies in the future.

Evaluation of M staging

Currently, pre-operative M staging of gastric cancer can be best assessed by PET and ≥4 detector CT (overall accuracies of 88 and 82%, respectively; Tables 5 and 6). However, only 3 PET studies, compared to 11 CT studies, reported M staging accuracies. The value of AUS in M staging remains unclear. It had a pooled overall accuracy of 65%, but only 2 studies evaluated its potential (Table 1), resulting in high variation. The value of MRI in M staging was not assessed in any studies evaluated. In practice, MRI is not suitable for screening for metastases because of the limited area of the body that can be scanned in a single session [14]. However, it is often used to characterize non-specific liver lesions found by CT [68]. A limitation of this review is that pre-operative staging studies were included only if patients had a post-operative pathology report for comparison. Patients who were not offered curative resection on the basis of metastases found on pre-operative imaging were excluded. Therefore, the false-positive rate for metastatic disease may indeed be higher for all imaging techniques.

Overall

Despite the reasonable T and M staging abilities of CT, MRI, and PET, all are far from perfect. The importance of accurate pre-operative TNM staging has been demonstrated by studies that show pre-operative staging frequently differs from post-operative assessments. Schwarz [5] found that post-operative assessment (based on intra-operative findings and pathology) differed from pre-operative staging in 29% of patients. In 45% of the cases, the changes in curative intent could be traced to uncertainty of diagnosis or disease extent [5]. Furthermore, 45.5% of patients with pre-operative stage assignment were ultimately re-classified into a different pathologic stage category post-operatively, and patients undergoing a curative-intent procedure were re-staged 50.4% of the time intra-operatively [5]. These high re-staging rates support the use of diagnostic laparoscopy (DL) to clarify pre-operative intent. Our review on the use of DL in gastric cancer found that DL changed management in 10–60% of cases [69].

It may be possible to increase the accuracy of pre-operative assessments by using combined staging modalities. van Vliet et al. found that the performance of CT alone was not sensitive enough for the detection of distant metastases, whereas the performance of AUS, neck US, and chest X-ray, in combination with CT resulted in higher accuracies in patients with esophageal or gastric cardia cancer [70]. Chen et al. [29] reported that the combined use of PET and CT was more accurate for pre-operative N and M staging that either modality alone; however, the combined pre-operative staging accuracy was still low, 66%. Therefore, further research is required to determine whether pre-operative TNM staging is improved by using combined and/or multiple imaging techniques.

Finally, the performance characteristics of a staging modality are determined by both the experience of the investigator and the quality of protocols, as well as by the equipment. Blackshaw et al. [63] found that pre-operative TNM staging by CT improved significantly with radiologist experience, with lower agreements in the first 75 patients compared to the last 25 patients staged. In this examination of the learning curve, the authors reported a twofold improvement in tumor detection and a sevenfold improvement in suspicious LN detection [63]. Variations in CT scanning protocols and equipment have been reported by Callaway and Bailey [71]. These authors surveyed 5 cancer networks (21 hospitals) covered by the South West Cancer Intelligence Service of the United Kingdom [71]. They found variation in the following: MDCT capabilities, gastric cancer patient volume, number of radiologists in each institution, radiologist specialty, CT scanning protocol, and image type used to evaluate patients [71]. Variations in scanning protocols included the use of various positive (gastrograffin vs. barium) and negative (water vs. milk) oral contrasts, execution of a pre-contrast scan, timing of scans (arterial phase vs. portal phase vs. both), and scan location (chest vs. abdomen vs. pelvis) [71]. Given these results, it is clear that pre-operative TNM staging studies contain heterogeneous data, which may explain the high variation in performance characteristics reported by the studies included in our review. Importantly, the results reported in journal articles are likely better than those achieved on average. The publication of current and clear guidelines/protocols for routinely used imaging techniques is advocated.

Conclusion

The agreement between pre-operative TNM staging by radiology imaging and post-operative staging by pathology is far from perfect. For pre-operative T staging the performance characteristics of AUS and CT were not significantly different; however, MRI had a better performance, although in a limited number of patients. Among CT scanners, those using ≥4 detectors and MPR images performed better than scanners with <4 detectors and axial images only. For pre- operative N staging overall accuracy was not significantly different across modalities; however, PET had the worst sensitivity yet highest specificity among modalities. CT performance did not significantly differ by detector number or addition of MPR images. For pre-operative M staging performance did not significantly differ by modality, detector number, or addition of MPR images. However, the lack of significance was most likely due to large standard errors. Operator dependence and heterogeneity of data may account for the variations in staging performance. Physicians should consider the implications of staging inaccuracy, and may want to use multiple imaging modalities and/or DL to confirm the specifics of a tumor prior to developing treatment strategies for gastric cancer patients.