Non–IgE mediated gastrointestinal (non-IgE-GI) food-induced allergic disorders encompass numerous and different clinical pictures. Some of these are well-characterized, such as Food Protein-Induced Allergic Proctocolitis (FPIAP), Food Protein-Induced Enterocolitis Syndrome (FPIES), Food Protein-Induced Enteropathy Syndrome (FPE), and Eosinophilic Gastrointestinal Disorders (EGIDs) (including Eosinophilic Esophagitis (EoE), Allergic Eosinophilic Gastroenteritis (AEG), and Eosinophilic Colitis (EC)) [13]. Others, especially in the first age of life, may present with less specific symptoms such as acute abdominal discomfort, persistent crying and unsettled behavior, frequent regurgitation or vomiting, and persistent watery diarrhea, often in combination with poor growth or constipation [4]. These latter were recently defined as Food Induced Motility Disorders (FPIMD) [4, 5], meaning all entities not included in the above-mentioned classical non-IgE mediated allergy, which improve after dietary elimination of specific food proteins and which motility alteration has been hypothesized, although the exact pathogenetic mechanisms remain largely unknown [6, 7]. FPMID are included in the group of non-IgE-GI food allergy, as sIgE for foods are not detected in most cases [8].

Diagnosis for non-IgE-GI food allergy is usually based on clinical features by recovery after dietetic therapy and subsequent positivity challenge test. The process is not supported by classical diagnostic tests like skin prick tests (SPT) and serum-specific IgE (sIgE), which are often negative. For these reasons, the APT has been proposed as a tool in the diagnostic work-up. A positive reaction correlates with infiltrating allergen-specific Th2 cells which secrete interleukin 4 and 13 already 24 h after application of the allergen [9], after 48 h a shift towards a Th1 pattern with the secretion of interferon gamma [10] underling the role of APT in delayed reaction type IV, rather than immediate type I reaction.

APT are detected as positive and are mainly useful in delayed/mixed reactions (non-IgE gastrointestinal FA, atopic dermatitis, EoE) rather than IgE mediated FA. However, its diagnostic accuracy remains controversial, and it is not routinely recommended because of the lack of a standardized process and the wide variability in sensitivity and specificity of results in previous studies [11, 12].

Although most studies analyzed APT in groups of patients affected by both immediate and delayed allergic reactions [13-15], recent evidence [16] suggests an increased APT diagnostic efficiency if employed in better-selected cohorts.

Two systematic reviews [17, 18] analyzed the accuracy of APT in patients affected by FA. However, the metanalysis by Luo et al. considered studies including children with different types of food allergies and, in some cases, with atopic dermatitis [17]. The second one [18] provides few informations and does not allow to analyse data from included studies.

The aim of this systematic review was to evaluate the diagnostic accuracy of the APT compared with the diagnostic gold standard, i.e., the oral food challenge (OFC), in children living with non-IgE-GI food allergy, including the evaluation in the milk allergic subgroup.


Search strategy

A comprehensive search was conducted in Medline via PubMed and Scopus (from January 1, 2000, through June 30, 2022), using the keywords “food allergy” and (“patch test” or “atopic patch test”) and (“Food protein-induced enterocolitis syndrome” or “FPIES” or “enterocolitis”), (“Eosinophilic Esophagitis” or “Eosinophilic Colitis” or “Eosinophilic Gastroenteritis”), (“enteropathy” or FPE), (“proctocolitis” or FPIAP), “haematochezia,” “colitis,” “gastritis,” “rectal bleeding,” “failure to thrive,” (“stypsis” or “constipation”).

Two independent researchers (M.U.A.S. and E.M.) screened the databases. The references were imported into a citation manager software (EndNote 20.2.1®) for initial duplicate removal.

They independently screened the search string, reviewed all abstracts, and agreed on which full-text articles to retrieve to assess for potentially eligible studies. Disagreements were resolved through discussion, and, if required, in cases of incongruence, a third reviewer (B.C.) was responsible for mediating a discussion and consequent decisions. The systematic review was based on the PRISMA (preferred reported items for systematic reviews and meta-analyses) guidelines, and its protocol was registered in the PROSPERO database. The authors had no conflicts of interest, and the study did not receive any funding.

Eligibility criteria

We developed a PICOS (patient, intervention, comparators, outcome, and study design) approach to formulate the eligibility criteria for the studies. The following question was set: “Are APT as accurate as OFC for non-IgE-GI food allergy in children?” We consider any kind of OFC, both in open and single- or double-blind form. The studies were not restricted to English-language publications, publication type, or study design; however, they were limited to children (0 to 18 years).

We included only studies that allow us to evaluate the diagnostic performance of APT compared to the gold standard for diagnosing food allergy, i.e., OFC in children affected by FPIAP, FPIES, FPE, EGIDs, or FPIMD. Studies were also included if it could be possible to extract data, and if necessary, additional explanations by contacting the authors were requested.

Studies were excluded if the information was not specific to the topic of this review or if the clinical diagnosis was made without the confirmation by OFC, or if data about specificity and/or sensitivity was not provided or impossible to extract.

Data collection and analysis

Two authors independently retrieved and reviewed the following data (if available) from all included studies: year of publication, first author, design, population size, mean age, type of symptoms, number of allergic subjects, number of patients with positive sIgE or SPT, APT methods used, and data of its accuracy.

Therefore, eligible studies were classified into two categories: (a) studies considering patients living with classic well-defined clinical pictures like FPIAP, FPIES, FPE, and EDGs; and (b) studies including patients living with FPIMD.

Studies were widely discussed in detail and evaluated by all authors in a standardized and independent manner; the methodological quality was evaluated according to criteria proposed by the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool [19]. At the same time, any divergence was resolved by discussion and agreement among all reviewers. This instrument judges the risk of bias and accessibility from diagnostic accuracy studies. QUADAS-2 contains four key domains (patient selection, index test, reference standard, and flow and timing), and each domain is rated as low, high, and unclear.

Statistical analysis

Meta-analysis was performed with the midas command in Stata 16.1. Pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR) with their 95% confidence (95%CI) were calculated by a bivariate mixed-effect regression model. The area under the curve (AUC) and relative 95% CI was also calculated. The AUC of summary receiver operating characteristic curve (SROC) results were considered low (0.5 >  = AUC <  = 0.7), moderate (0.7 > AUC <  = 0.9), or high (0.9 > AUC <  = 1). The test could be considered highly informative with PLR exceeding 10.0 and NLR below 0.1; moderately informative with PLR values of 5–10 and NLR of 0.1–0.2; or very lowly informative with LRs of 2–5 and 0.2–0.5. The I2 statistic was used to evaluate the heterogeneity between studies; a value of 0% indicates no observed heterogeneity, while values greater than 50% indicate substantial heterogeneity [20].


The selection and inclusion process of the studies is reported in the PRISMA Statement Flowchart (Fig. 1). The electronic database search identified 196 citations on PubMed and 261 on Scopus. After analyzing titles and abstracts, respectively, 39 and 16 articles remain. After removing duplicates (3 papers), 52 articles were identified, and the full text was assessed for eligibility. A total of 16 articles were then selected, and the other ones were refused for showing data that did not meet the panned inclusion criteria (Fig. 1). Evaluating the most relevant studies’ references allowed us to detect one additional article. In total, we included 17 studies in this review.

Fig. 1
figure 1

Studies search flow diagram. The electronic databases search identified 196 citations on PubMed and 261 on Scopus. After analyzing titles and abstracts, duplicates were removed and full text were individuated and assessed for eligibility. 16 articles were selected after other ones looked like they had different outcomes or included patients without gastrointestinal symptoms, not comparing to the diagnostic gold standard OFC or not identifying specificity or sensitivity. One additional paper was added from reviewing relevant articles. 17 articles were included in this study

We found three studies evaluating children affected by FPIES [2123] (Table 1); three studies included patients with FPIAP [2426] (Table 2), and one included patients affected by FPIAP, FPIES, and FPE [27] (Table 3). Ten studies included [16, 2836] patients affected by FPIMD (Tables 3 and 4).

Table 1 Studies including patients affected by FPIES
Table 2 Studies including patients affected by FPIAP
Table 3 Studies including patients affected by more than one form among FPIAP, FPIES, and FPE

Regarding children living with FPIES, reports were conducted on small patient populations (overall 52 patients) and showed very different results: sensitivity ranged from 11.8 to 89%, while specificity ranged from 85.7 to 100%.

One retrospective study allowed to extract data on only eight patients [21]. Two studies [21, 23] received a lower evaluation on Quadas-2, mainly on those domains concerning patients’ selection. Specifically, the increased risk of bias was based on the exclusive enrollment of patients not representative of the general population as they were already known to be affected by FA when performing OFC and APT. The only prospective study [22] is qualitatively better than the previous and described high values of specificity and PPV (100%) with lower sensitivity and NPV (respectively 76.2% and 70.6%).

Three studies included children with proctocolitis. Based on QUADAS-2, two of them [24, 26] were low-quality studies because the diagnosis were not all confirmed with OFC or apply to a restricted group of selected patients with severe clinical forms not responsive to therapy. The third [25] is qualitatively better than the previous and shows high specificity values (100%) and an NPV of 80.7%.

The only study [27] that enrolled a mixed population affected by FPIAP, FPIES, and FPE is a retrospective one of good quality. It found high values of specificity and PPV for APT performed with fresh foods regardless of which was tested (the best with milk, respectively 100% and 100%, and eggs, 90.9% and 80%). On the contrary, sensitivity values are low (9.1% for milk and 40.4% for egg).

We identified ten studies performed on patients living with FPIMD. These studies have enrolled a large number (n. 770) of children, which were divided into two groups (Tables 4 and 5) to permit separate evaluation of the results of APT diagnostic performance in patients with negative allergy tests and those with mixed positive and negative sIgE-associated forms. Two [28, 30] of these studies allowed data extraction and were included in each group. Thus, three studies included 320 children affected by FPIMD without specific IgE and a negative SPT (Table 4). Nine studies included 598 children affected by FPIMD with or without specific IgE and positive or negative SPT (Table 5). Nocerino’s study [29] was selected for the FPIMD group even if it included a few enterocolitis and enteropathy allergic patients (Table 4).

Table 4 Studies including patients affected by FPIMD without specific IgE and with negative SPT
Table 5 Studies patients affected by FPIMD with or without specific IgE and positive or negative SPT

Studies that evaluated APT diagnostic accuracy in patients with negative sIgE showed good values of specificity and PPV, respectively, of 88.3–100% and 82.8–100% but poor sensitivity (40–80.9%) (Table 4). However, their quality is poor for enrolling patients not representative of the general population as they were already allergic known persons at the beginning of the study and/or were affected by severe clinical forms.

Data analysis from the second group of FPIMD studies showed that APT has high specificity and PPV regardless of sIgE and/or SPT positivity. In this subgroup, prospective studies received better QUADAS-2 evaluation and reached specificity and PPV values of 95–100% (range values of 63.6–100% and 66.7–100%, respectively).

Canani [16] showed better data when performing APT with fresh food than with food extracts; Alves [25] and Sirin Kose [27] et al. obtained 100% specificity using fresh milk, while most of the other studies do not clearly declare which type of allergenic material had used.

Methodological quality


According to the QUADAS-2 criteria, studies enrolling patients with well-defined gastrointestinal clinical pictures (FPIAP, FPIES, and FPE) and those that included patients affected by non-sIgE FPIMD (Tables 1, 2, 3, and 4) resulted in both in lower judgments on QUADAS-2 evaluation, particularly in the domain dedicated to the selection of enrolled patients (Fig. 2). In most cases, children were not representative of the general population as they were already allergic and/or were affected by severe forms. Since inappropriate exclusions may result in overoptimistic estimates or in underestimation of diagnostic accuracy, the risk of bias was judged as high. The same occurs if APT results were not collected at an appropriate time interval, ideally at the same time as OFC, or if not all the patients receive a diagnosis by positive objective signs at OFC. Studies that enroll subjects with FPIMD both with and without specific IgE (Table 5) appear to be of higher quality.

figure 2

QUADAS-2 results. Proportion of studies with low, high or unclear RISK of BIAS. Proportion of studies with low, high or unclear. Concerns regarding applicability

In general, very few studies have been designed on the specific question of this review [28, 29, 32, 33, 35, 36].

Meta-analysis results

Due to the small sample size or lack of necessary data in the studies enrolling children affected by FPIAP, FPIES, and FPE, meta-analysis was conducted only in the group of patients with FPIMD. More analysis was also made in the subgroup of children with suspected milk allergy, the most frequent involved food allergen.

For FPIMD, a total of 8 studies analyzed 491 patients. Overall results of the meta-analysis show that APT has high specificity 94% (95% CI: 0.88–0.97) moderate positive likelihood ratio (PLR 8.3 95% CI: 4.1–16.6) and a low negative likelihood ratio (NLR 0.57 95% CI: 0.40–0.82), while sensitivity 46% (95% CI: 0.27–0.66) appears variable between studies. Two out of three studies (Canani et al. and Yukselen et al. [16, 30]) showing lower values for APT were performed with commercial extracts. The heterogeneity is high, with I2 always greater than 50%. The AUC value was moderate/high (0.90) with high corresponding DOR of 14 (95% CI: 6–34).

The subgroup analysis for milk-allergic patients included eight studies from all clinical groups, including 551 subjects. Seven out of ten studies included FPIMD. Data are like those seen for FPIMD and show even higher pooled specificity of 96% (95% CI: 0.89–0.98) and slightly better accuracy of ATP (AUC = 0.93). The other values are also good: sensitivity 52% (95% CI:0.31–0.73), PLR 9.7 (95% CI: 4.8–19.6), NLR 0.50 (95% CI: 0.32–0.79), and DOR 19 (95% CI: 8–48). Figure 3 illustrates the results of each meta-analysis.

Fig. 3
figure 3figure 3

A Results of meta-analysis conducted in the group of FPIMD patients. @ dial, & Fresh, Ø eggs, #Lyophilized, + Wheat, § eggs + fresh. B Results of meta-analysis conducted in the group of milk allergic patients. @ dial, & Fresh, Ø eggs, #Lyophilized, + Wheat, § eggs + fresh


APTs are scarcely used in the diagnosis of FA because their diagnostic accuracy and clinical utility in clinical practice are still debated [11, 12]. The present systematic review aimed to investigate the diagnostic accuracy of the APT compared with the diagnostic gold standard, i.e., the OFC, in children living with non-IgE-GI food allergy. To date, the available data on APT have been obtained in mixed populations of patients suffering from both immediate and non-IgE-mediated FA. A recent systematic review by Luo et al. [17] including 41 studies, aimed to evaluate the diagnostic accuracy of APT in children affected by different clinical pictures of FA with or without atopic dermatitis. In a group sub analysis, they found that APT is specific in children with FA‐related gastrointestinal symptoms. In contrast to the metanalysis by Luo, we aimed to evaluate the diagnostic accuracy of APT exclusively in children with non-IgE mediated food allergy with gastrointestinal symptoms, confirmed by the OFC.

Most available studies focused on FPIMD [8, 3739] which have not been specifically considered in the previous meta-analysis by Luo et al.

In this group, statistical analysis showed high diagnostic performance of APT, especially in the subgroup of milk-allergic patients (94% with 95% CI: 0.88–0.97, for milk-allergic patients 96% with 95% CI: 0.89–0.98). Thus, APT could help identify offending foods in allergic patients, leading to simplifying the diagnostic process. In fact, in non-IgE-GI food allergy, except for acute FPIES, the delay between food assumption and reaction makes it difficult to suspect the responsible food, while the shortness between food assumption and reaction facilitates the offending food identification in IgE-mediated allergies.

Our search found only a few studies evaluating APT efficacy in the well-known characterized picture of gastrointestinal allergy like in FPIAP, FPIES, and FPE. In detail, studies investigating the APT diagnostic role in FPIES enrolled only a small population and showed different results, probably due to different methodological accuracy. However, the most methodologically correct study by Fogg [22] showed an optimal (100%) specificity and a lousy (76.2%) sensitivity of APT.

Studies enrolling patients suffering from FPIAP also showed very different results. The prospective one obtained a low QUADAS-2 score, and its results contrast with the retrospective one, which was methodologically accurate: the first showed a 100% sensitivity, the second a 100% specificity, and a third unspecified study provided only the low value of VPP (52.17%).

Sirine Kose [27] study enrolled patients affected by FPIES, FPIAP, and FPE and showed that specificity and PPV values are high and varied according to the offending food, both 100% for cow’s milk and 90.9% and 80%, respectively, for hen’s egg. Different food could also account for the different results in the studies. Thus, no definitive conclusion can be drawn about APT role in patients affected by FPIAP, FPIES, and FPE, and meta-analysis was not possible. Further methodologically adequate studies are needed.

In summary, our systematic review provides further important information compared to the two previous meta-analyses. It has been designed on the specific outcome and the search included 4 more studies. Two of those were released later [26, 27]. The remaining two are studies that have been excluded from the previous meta-analysis. We have decided to include them for different reasons. For Kalach’s study [32], we justified its inclusion by extracting data for GI allergy excluding atopic dermatitis. Shirogoy’s study [34] was included after counting non-responder patients to an elimination diet in the group of non-allergic subjects. Our study provides more in-depth results based on the different clinical pictures of GI-FA.

In the FPIMD group for APT in general and for cows’ milk APT, specificity is higher in our study (0.94 and 0.96) than in Luo’s meta-analysis [17] (0.91 and 0.86), respectively. Thus, our data showing a very high AUC in FPIMD and even more for milk allergy indicate that APT is an accurate tool for diagnosing FA in FPMID, especially in the case of CMA. Instead, because of the low sensitivity value, negative APT results cannot exclude an FA diagnosis. For this reason, to increase the APT diagnostic performance, it has been suggested that it could be useful to perform APT joined to the search for sIgE or SPT [16, 36, 4042].

We did not include studies about APT efficacy in patients affected by EGDIs. These studies were excluded because they did not report sensitivity and specificity [43, 44] or for the absence of comparison between APT results with OFC [4547]. Most of the studies enrolling patients living with EoE focus on APT efficacy versus exclusion of the suspected food, and diagnosis is generally confirmed on symptom relief and histologic remission without documentation of clinical and histological relapsing after reintroduction of the offending food. Only in Spergel et al. [48] study, OFC was performed but sensitivity and specificity cannot be calculated.

Limits of our systematic review are that our analysis does not allow us to draw any conclusion regarding diagnostic test performance when carried out with fresh, lyophilized, or commercial extracts allergenic foods, as the majority of the studies did not give any data about. Following the analysis, one characteristic that affected our review was the limitation of heterogeneity between studies, which may be explained by non-standardized performing test.


This systematic review suggests that the APT test may be a useful tool in children living with FPIMD, especially in children affected by CMA.