Preparation of training and test image sets
For an algorithm to detect gastric cancer, images of EGD were retrospectively obtained from two hospitals (Cancer Institute Hospital Ariake, Tokyo, Japan, and Tokatsu-Tsujinaka Hospital, Chiba, Japan) and two clinics (Tada Tomohiro Institute of Gastroenterology and Proctology, Saitama, Japan, and Lalaport Yokohama Clinic, Kanagawa, Japan) from April 2004 to December 2016. EGD was performed for screening or preoperative examinations in daily clinical practice, and images were captured using standard endoscopes (GIF-H290Z, GIF-H290, GIF-XP290N, GIF-H260Z, GIF-Q260J, GIF-XP260, GIF-XP260NS, and GIF-N260; Olympus Medical Systems, Co., Ltd., Tokyo, Japan) and standard endoscopic video systems (EVIS LUCERA CV-260/CLV-260 and EVIS LUCERA ELITE CV-290/CLV-290SL; Olympus Medical Systems).
The inclusion criteria were images with standard white light, chromoendoscopy using indigo carmine spraying, and narrow band imaging (NBI). The exclusion criteria were any images that were magnified as well as poor quality images resulting from less insufflation of air, post-biopsy bleeding, halation, blur, defocus, or mucus. After selection, 13,584 images were collected for 2639 histologically proven gastric cancer lesions as a training image data set. At least one gastric cancer lesion was presented in all images, and multiple images were prepared for a same lesion to include differences in angle, distance, and extension of the gastric wall. All images of gastric cancer lesions were marked manually by an author (TH) who is an expert on gastric cancer and a board-certified trainer at the Japan Gastroenterological Endoscopy Society. The author (TH) carefully marked the range of cancer lesions using rectangular frames (Figs. 1, 2, 3).
To evaluate the diagnostic accuracy of the constructed CNN, an independent test data set of stomach images was collected from 69 consecutive patients with 77 gastric cancer lesions (62 cases had 1 gastric cancer lesion, 6 had 2 lesions, and 1 had 3 lesions), who received an EGD at the Cancer Institute Hospital Ariake from 1 to 31 March 2017 during daily clinical practice. All EGD procedures used a standard endoscope (GIF-H290Z) and a standard endoscopic video system (EVIS LUCERA ELITE CV-290/CLV-290SL). During the procedure, an entire stomach was observed, and images of all parts were captured with white light images. The chromoendoscopy, NBI, and poor-quality images were excluded. The final test data set included 2296 total images, and each case had 18–69 images.
Constructing a CNN algorithm
To construct an AI-based diagnostic system, we used a deep neural network architecture called the Single Shot MultiBox Detector (SSD, https://arxiv.org/abs/1512.02325), without altering its algorithm. SSD is a deep CNN that consists of 16 layers or more. The Caffe deep learning framework, which is one of the most popular and widely used frameworks originally developed at the Berkeley Vision and Learning Center, was then used to train, validate, and test the CNN.
All CNN layers were fine-tuned using stochastic gradient descent with a global learning rate of 0.0001. Each image was resized to 300 × 300 pixels, and the bounding box was also resized accordingly to make CNN analyze optimally. These values were set up by trial and error to ensure all data were compatible with SSD.
After constructing the CNN using the training image set, we evaluated the performance through the test image set. When the CNN detected a lesion of gastric cancer from the input data of test images, the CNN outputted a disease name (early or advanced gastric cancer) and its position. A detected lesion was displayed with a yellow rectangular frame on the endoscopic images (Fig. 1).
Because some gastric cancer lesions were presented in multiple images, we used the following definitions to perform the test.
When the CNN detected even one gastric cancer lesion in multiple images of the same lesion, it was defined as a correct answer (Fig. 2).
Because the demarcation line of gastric cancer was sometimes unclear, when the CNN detected a partial gastric cancer lesion, it was regarded as a correct answer.
The sensitivity and positive predictive value (PPV) for the CNN’s ability to detect gastric cancer were calculated as follows:
Sensitivity = detected number of correct gastric cancer lesions/actual number of gastric cancer lesions
PPV = detected number of correct gastric cancer lesions/number of lesions that were diagnosed as gastric cancer by the CNN.
This study was approved by the Institutional Review Board of the Cancer Institute Hospital Ariake (no .2016–1171) and Japan Medical Association (ID JMA-IIA00283).