Diagnostic accuracy of artificial intelligence-aided capsule endoscopy (TOP100) in overt small bowel bleeding

Background Capsule endoscopy (CE) is the first-choice exploration in case of overt small bowel bleeding (SBB). An early CE is known to increase diagnostic yield, but long reading times may delay therapeutics. The study evaluates the diagnostic performance of the artificial intelligence tool TOP100 in patients with overt SBB undergoing early CE with Pillcam SB3. Methods Patients who underwent early CE (up to 14 days from the bleeding episode) for suspected overt SBB were included. One experienced endoscopist prospectively performed standard reading (SR) and a second blind experienced endoscopist performed a TOP100-based reading (TR). The primary endpoint was TR diagnostic accuracy for lesions with high bleeding potential (P2). Results A total of 111 patients were analyzed. The most common clinical presentation was melena (64%). CE showed angiodysplasias in 40.5% of patients (45/111). In per-patient analysis, TR showed a sensitivity of 90.48% (95% CI 82.09–95.80), specificity of 100% (95% CI 87.23–100) with a PPV of 100% (95% CI 94.01–100), NPV of 77.14% (95% CI 63.58–86.71) and diagnostic accuracy of 92.79 (86.29–96.84). At multivariate analysis, adequate intestinal cleansing was the only independent predictor of concordance between TR and SR (OR 2.909, p = 0.019). The median reading time for SR and TR was 23 min (18.0–26.8) and 1.9 min (range 1.7–2.1), respectively (p < 0.001). Conclusions TOP100 provides a fast-reading mode for early CE in case of overt small bowel bleeding. It identifies most patients with active bleeding and angiodysplasias, aiding in the prioritization of therapeutic procedures. However, its accuracy in detecting ulcers, varices and P1 lesions seems insufficient. Graphical abstract


Graphical abstract Keywords Artificial intelligence • Hemorrhage • Capsule endoscopy • Intestine • Small
Small bowel bleeding (SBB) accounts for 5 to 10% of all gastrointestinal bleeding and is defined as overt or occult bleeding which originates in the small bowel (SB) in patients with an inconclusive exploration of the upper and lower gastrointestinal tracts (formerly named obscure gastrointestinal bleeding).[1,2] It represents a cause of hospital admission and blood transfusion, resulting in a mortality rate of up to 10% [3].Major guidelines in Europe and the United States address small bowel capsule endoscopy (SBCE) as the mainstay technique to explore the SB and detect the cause of bleeding [4,5].Alternative techniques may be represented by CT or Magnetic Resonance Imaging enterography, especially in the case of solid or extramural lesions.On the other hand, balloon and spiral enteroscopy, which represent more invasive techniques, are limited to therapeutic use or to a "diagnose and treat" strategy in the case of early (< 72 h) overt bleeding [5].
Nowadays, SBCE represents a safe and efficient technique.[6] However, due to long reading times, the procedure may be exhausting and efficiency depends on reader experience and proper reading technique.[7] Recently, the introduction of artificial intelligence (AI) has spread in all fields of medicine.AI may help identify or diagnose SB lesions, thus reducing reading times [8].
Pillcam SB3 (Medtronic, Minneapolis, USA) has been the first capsule endoscopy device to integrate AI tools in its reading software (Pillcam™ Reader, previously Rapid Reader, Medtronic, USA) to assist with video assessment and potentially reduce reading times [9].TOP100 is an integrated AI tool that selects the 100 most relevant frames, including potential lesions from SBCE video recording.In a study by Arieira et al. evaluating patients with SBB, mainly represented by occult presentation (90.7%),TOP100 was able to identify 83.5% of significant lesions [10].However, these values are insufficient to completely rely on TOP100 reading for SBCE assessment, especially in the case of occult SBB which requires an accurate evaluation for the indication of further studies or therapeutic procedures.
Little is known about the utility and impact of TOP100 on lesion detection and reading times in acute and urgent settings like overt SBB, where a rapid diagnosis is needed to schedule the appropriate therapeutic procedure.The present study aimed to evaluate the diagnostic performance of TOP100 in patients submitted to early SBCE due to overt SBB.A new reading was performed for all SBCE videos.The study's primary endpoint was to assess TOP100 diagnostic performance (sensitivity, specificity, positive predictive value, negative predictive value, accuracy) for P2 lesions in patients with overt SBB in comparison with standard reading (SR) as the gold standard.Secondary endpoints were: (1) to compare SBCE reading time between SR and TOP100 (TR); (2) to assess the diagnostic performance of TOP100 for P1 lesions; (3) to detect predictors of concordance between TR and SR.

Study design
The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Ethics Committee (IEC) of Hospital Clínic de Barcelona (HCB/2020/1204).Informed consent was waived due to the study design and the use of anonymized data according to IEC requirements and local laws.

Capsule endoscopy protocol
A Pillcam SB3® (Medtronic, Minnesota, USA) was used in all patients.A 12-h clear liquid diet and 8-h fasting from solid food were requested before the procedure.A 1L polyethyleneglycol bowel preparation (Moviprep, Norgine Pharmaceuticals) had to be ingested 2 h before the procedure.Prokinetic drug (Metoclopramide) was administered 15 min before capsule ingestion in all inpatients according to local protocols, except for drug contraindication.The endoscopic capsule was swallowed with 200 ml of water with 300 mg of simethicone.Patients could start a liquid diet at 2 h from capsule ingestion followed by a normal solid diet after 4 h.

Capsule exploration assessment
SBCE procedures included in the analysis were first anonymized and coded.Two independent endoscopists (AG & BGS), with experience in more than 1000 endoscopic capsule explorations, performed the new SBCE reading.Both endoscopists were blind to the previous report on the exploration, patients' demographic, and clinical background.One endoscopist performed SR according to ESGE (European Society of Gastrointestinal Endoscopy) guidelines [11]: briefly, the anonymized capsule video was marked at the first gastric, first duodenal, and first cecal image and read using Pillcam™ Reader Software at a speed of no more than 10 frames per second.The assessment was limited to the SB (first duodenal image to first cecal image).All detected lesions were registered on a coded digital sheet according to type, number, and location.Reading time was measured with a digital timer and included the marking of the first duodenal and cecal image, SB reading, and lesion identification, whereas the image review and report writing were excluded.A second endoscopist performed the reading based on TOP100 images (TR).The video was marked as for SR and by the activation of the TOP100 tool, a full screen reading of the static frames presented by the software was carried out (Fig. 1).Lesions characterization and reading time were assessed as per SR.

Lesion identification and characterization
All detected lesions were classified as per Saurin classification [12,13]: P0 lesions (with no potential for bleeding: submucosal veins, diverticula without the presence of blood or nodules without mucosal breaks), P1 lesions (with uncertain hemorrhagic potential: red spots, small or isolated erosions without bleeding), and P2 lesions (with high potential for bleeding: typical angiodysplasias, multiple erosions areas, ulcers, active bleeding/visible blood, tumors, and varices).Only P2 and P1 lesions were considered for the analysis.In the case of the detection of multiple identical lesions, the differentiation was performed according to three main characteristics: the aspect of the lesion, the surrounding mucosa, and the time interval between lesions.

Study variables definition
Diagnostic yield was defined as the number (percentage) of patients with at least one P2 lesion.The detection rate was defined as the percentage of patients or lesions found by TR in comparison with SR; lesions not confirmed by SR were considered false positives.
SB cleansing was assessed according to a qualitative scale (excellent, good, fair, poor) described by Brotz et al. [14].
Excellent and good cleansing was considered adequate intestinal preparation.

Statistics
Data were analyzed with the Kolmogorov-Smirnov test to exclude normal distribution.Qualitative variables were expressed as absolute numbers and percentages, quantitative values were expressed as median and interquartile range (IQR).Positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy were expressed as percentages and 95% confidence interval (95% CI).Reading time was measured in minutes and seconds and expressed in decimals.The concordance between SR and TR was calculated with Cohen's kappa coefficient (per-lesion analysis).Univariate analysis was conducted with the Mann-Whitney U test or Chi-squared test, when appropriate.Predictors of concordance were entered in a multivariate model (forward stepwise method) and analyzed with multiple binary logistic regression.Statistical analysis was performed with SPSS

Study population
From March 2018 to March 2021 a total of 591 SBCE videos were found.Of these, 431 were excluded for not meeting the inclusion criteria, and 49 procedures for being incomplete explorations (n = 9), duplicate patients (n = 20), or having a prior upper gastrointestinal endoscopy or ileo-colonoscopy realized > 14 days before SBCE (n = 20).Finally, 111 SBCE videos were analyzed (Fig. 2).The patient population presented a median age of 69 years (56-78), with 63.1% of males; at the time of SBCE, 74.8% were inpatients.In 48.6% of patients (54/111), bleeding was active in the last 24 h before SBCE exploration.The most common presentation for SBB was melena (64%).Globally, 59.5% (64/109) of patients were transfused due to SBB.Anticoagulants (34.2%) and anti-platelet drugs (9.9%) were the most common concomitant medications with bleeding risk.Demographic characteristics are extensively described in Table 1.
Overall, TOP100 demonstrated an accuracy of 92.79% for the identification of patients with P2 lesions, with high

Predictors of concordance between TOP100 and standard reading
TOP100 and SR showed a substantial agreement (k 0.75).
In 84 patients, TR and SR were concordant as per-lesion type and the number of lesions.Potential predictors of concordance were analyzed, as shown in Table 5.At univariate analysis, the number of days from the first bleeding episode, an adequate cleansing, and intestinal transit time (ITT) were associated with positive concordance.At multivariate

Discussion
The growing complexity of endoscopic procedures and the advances in optical diagnosis claim higher precision and shorter procedure times, thus opening the door to artificial intelligence as a promising technology for a rapid and effective diagnosis.[15,16] "Suspected Blood Indicator" was one of the first AI tools to be introduced in Pillcam® capsule endoscopy reading software, showing a very low sensitivity of 55.3% and specificity of 57.8% for bleeding or potentially bleeding lesions.[17] Current AI tools applied to capsule endoscopy have shown a sensitivity of 80 to 99% and a specificity of 94 to 99% in case of gastrointestinal hemorrhage, with discordant results depending on AI algorithm complexity and clinical setting.[18] The present study focuses on the efficacy of the AI tool TOP100 for the diagnosis of suspected overt SBB in Pillcam® capsule endoscopy.In per-patient analysis, TR detected 90.5% of patients with significant P2 lesions identified by SR, thus showing a diagnostic accuracy of 92.79% with high specificity (100%) and PPV (100%), independent of lesion type; moreover, considering patients with active bleeding and angiodysplasias, sensitivity accounted for 97%-100% with a specificity of 100%, thus correctly identifying patients with potentially treatable lesions.In the case of P1 lesions, which present a lower bleeding potential, TOP100 showed inferior performance in per-patient analysis.It exhibited a sensitivity of 73.53%, specificity of 97.4%, and diagnostic accuracy of 90.09%.Consequently, a second reading is necessary to fully report P1 lesions.
In per-lesion analysis, TR detected 88.35% of P2 lesions diagnosed by SR.TOP100 showed the highest accuracy for intestinal bleeding (100%), whereas, in the case of angiodysplasias, it showed a sensitivity of 91.3% and a specificity of 74.16% with a total accuracy of 86.15%.Sensitivity decreased with other P2 lesions such as ulcers (79%), areas with multiple erosions (75%), and non-bleeding varices (57%).
To the best of our knowledge, this is the first comprehensive study to illustrate the diagnostic accuracy of TOP100.Present results are in line with a previous study by Ariera et al. who evaluated TOP100 in SBB, showing an 83.5% detection rate for patients with P2 lesions (91.2% for patients with angiodysplasias), although with a lower detection rate (54.5%) for P1 lesions.[10] A similar multicenter study by Saurin et al. described the detection ability for P2 and P1 lesions of another Pillcam® AI tool, Quick-view, describing a sensitivity of 85.1% and specificity of 84.7% in per-patient analysis and 89.2% and 84.7% in per-lesion analysis [19].
In the case of positive significant findings, TOP100 permits an adequate definition of lesions and presumptive location based on the Pillcam reading software detector or intestinal transit time, without the need for SR.[20] Moreover, in patients with active bleeding, which represents a roundclear urgent case, TOP100 enables a complete reliance on AI assessment.These two features allow the physician to take the appropriate therapeutic decision.
Both per-patient and per-lesion analysis show an insufficient NPV of 77.14% and 44.65%, respectively; this percentage of miss rate implies that completely negative explorations must pass a second conventional reading to finally discard the presence of significant lesions.Both present study results and previously published studies indicate that TOP100 is mainly designed for the detection of red-colored lesions, showing high sensitivity and NPV for active bleeding and angiodysplasias, probably due to the abundant red color in frames, yet lower results for ulcers and varices, in the white and blue/green color spectrum.[10] Moreover, while the use of cleansing in the human review of the SBCE video has not been associated with an increased diagnostic yield, [5] the analysis of predictors of concordance between TR and SR showed that adequate intestinal cleansing significantly increases lesion detection with TOP100 (OR 2.909, p = 0.019).This finding suggests that when employing AIbased reading, the use of a cleansing agent is recommended.
The study strength of the study is the description of the impact of AI on reading time.A conventional reading requires between 30 and 60 min.[7,21,12] A previous study by Saurin et al. demonstrated a significant time reduction with the Pillcam AI tool Quick-view with a median reading time of 11 min.[19] TOP100 shows a radical change from a median time of 23 min for SR to 1.8 min for TR with a time reduction of 91.7%.In cases of overt SBB, the medical priority is to rapidly identify the potential bleeding source and schedule the appropriate treatment.[9] The great benefit of TOP100 reading appears to be its rapidity which permits an early schedule of complementary diagnostic or therapeutic procedures.
The study's main limitation is its retrospective design; however, two new independent readings were performed based on anonymized videos to reduce potential biases.Moreover, this study was designed to directly compare TOP100 with the standard reading as the gold standard, therefore it lacks the second review of both TOP100 and standard reading to confirm or discard possible false positive and negative cases and this might potentially underestimate its real diagnostic performance.
All in all, TOP100 showed the ability to detect 90.5% of patients with significant lesions and 100% of active bleeding with a median reading time of less than 2 min.Rapidity and high sensitivity make TOP100 an ideal tool for a quick review of SBCE videos and for prioritizing therapeutics 1 3 in patients with overt SBB.A second standard reading is still required in most cases to fully exclude the presence of lesions, however, the clinical impact of this miss rate is still unknown and should be evaluated in future prospective clinical trials.

Fig. 1
Fig.1Application of the TOP100 Tool in Pillcam™ Reader Software v9.The TOP100 Tool, an integrated AI feature within the Pill-cam™ Reader Software v9, enhances the reading process.To begin a TOP100-based reading, the user is required to identify and select the initial duodenal and cecal images (a).By clicking on the TOP100

Fig. 2
Fig. 2 Flow-chart of inclusion and exclusion criteria

Table 1
Demographic characteristics of the study population

Table 2
Technical data of SBCE exploration

Table 3
PPV positive predictive value; NPV negative predictive value, CI confidence interval

Table 4 TOP100
diagnostic performance -per-lesion analysis PPV positive predictive value; NPV negative predictive value