Abstract
Background
Image recognition using artificial intelligence with deep learning through convolutional neural networks (CNNs) has dramatically improved and been increasingly applied to medical fields for diagnostic imaging. We developed a CNN that can automatically detect gastric cancer in endoscopic images.
Methods
A CNN-based diagnostic system was constructed based on Single Shot MultiBox Detector architecture and trained using 13,584 endoscopic images of gastric cancer. To evaluate the diagnostic accuracy, an independent test set of 2296 stomach images collected from 69 consecutive patients with 77 gastric cancer lesions was applied to the constructed CNN.
Results
The CNN required 47 s to analyze 2296 test images. The CNN correctly diagnosed 71 of 77 gastric cancer lesions with an overall sensitivity of 92.2%, and 161 non-cancerous lesions were detected as gastric cancer, resulting in a positive predictive value of 30.6%. Seventy of the 71 lesions (98.6%) with a diameter of 6 mm or more as well as all invasive cancers were correctly detected. All missed lesions were superficially depressed and differentiated-type intramucosal cancers that were difficult to distinguish from gastritis even for experienced endoscopists. Nearly half of the false-positive lesions were gastritis with changes in color tone or an irregular mucosal surface.
Conclusion
The constructed CNN system for detecting gastric cancer could process numerous stored endoscopic images in a very short time with a clinically relevant diagnostic ability. It may be well applicable to daily clinical practice to reduce the burden of endoscopists.
Introduction
Gastric cancer is the fifth most common form of malignant tumor and the third leading cause of cancer-related death worldwide, with approximately 952,000 new cases and 723,000 deaths per year [1, 2].
The prognosis of patients with gastric cancer depends on the cancer stage at diagnosis [2, 3]. Although patients with advanced gastric cancer have a poor prognosis, the 5-year survival rate of patients with gastric cancer detected at an early stage is greater than 90% [2,3,4,5]. Therefore, endoscopic detection of gastric cancer at an earlier stage is the single most effective measure for reducing gastric cancer mortality. It also offers an opportunity to treat patients with organ-preserving endoscopic therapy such as endoscopic mucosal resection or endoscopic submucosal dissection (ESD) [6,7,8,9,10,11,12].
Although esophagogastroduodenoscopy (EGD) is the standard procedure for diagnosing gastric cancer, the false-negative rate for detecting gastric cancer with EGD is 4.6–25.8% [13,14,15,16,17,18]. Furthermore, inexperienced endoscopists tend to overlook gastric cancer because most cases arise from atrophic mucosa. In addition, some early gastric cancer lesions show only subtle morphologic changes, which are difficult to distinguish from background mucosa with atrophic change [12, 19,20,21]. Therefore, endoscopists require long-term specific training and experience to detect gastric cancer properly.
In recent years, image recognition using artificial intelligence (AI) with machine learning has dramatically improved and been increasingly applied to diagnostic imaging in various medical fields. These fields include skin cancer classification, diagnosis in radiation oncology and diabetic retinopathy, histologic classification of gastric biopsy, and characterization of colorectal lesions using endocytoscopy [22,23,24,25,26].
Deep learning, which represents a new method of machine learning, enables machines to analyze various training images and extract specific clinical features using a backpropagation algorithm [27]. Based on the accumulated clinical features, machines can diagnose newly acquired clinical images prospectively. This type of deep learning system has become possible through the use of convolutional neural networks (CNNs) that logically imitate the structure and activity of brain neurons on a computer. Various kinds of neural networks have been developed, and CNN is particularly known as the best performance model in the field of image recognition [27, 28].
Fitting optimal parameter values automatically is called learning of the neural network, and properly defining these parameters determines the neural network’s ability. Supervised learning uses data sets consisting of both input and appropriate output information. Thus, deep learning through a CNN using extensive image data has a high potential for clinical application in recognizing clinical images.
To develop deep learning through a CNN to detect early and advanced gastric cancer, we constructed an AI-based diagnostic system that was trained by more than 13,000 images of EGD. We then tested the diagnostic accuracy of this system to detect gastric cancer.
Methods
Preparation of training and test image sets
For an algorithm to detect gastric cancer, images of EGD were retrospectively obtained from two hospitals (Cancer Institute Hospital Ariake, Tokyo, Japan, and Tokatsu-Tsujinaka Hospital, Chiba, Japan) and two clinics (Tada Tomohiro Institute of Gastroenterology and Proctology, Saitama, Japan, and Lalaport Yokohama Clinic, Kanagawa, Japan) from April 2004 to December 2016. EGD was performed for screening or preoperative examinations in daily clinical practice, and images were captured using standard endoscopes (GIF-H290Z, GIF-H290, GIF-XP290N, GIF-H260Z, GIF-Q260J, GIF-XP260, GIF-XP260NS, and GIF-N260; Olympus Medical Systems, Co., Ltd., Tokyo, Japan) and standard endoscopic video systems (EVIS LUCERA CV-260/CLV-260 and EVIS LUCERA ELITE CV-290/CLV-290SL; Olympus Medical Systems).
The inclusion criteria were images with standard white light, chromoendoscopy using indigo carmine spraying, and narrow band imaging (NBI). The exclusion criteria were any images that were magnified as well as poor quality images resulting from less insufflation of air, post-biopsy bleeding, halation, blur, defocus, or mucus. After selection, 13,584 images were collected for 2639 histologically proven gastric cancer lesions as a training image data set. At least one gastric cancer lesion was presented in all images, and multiple images were prepared for a same lesion to include differences in angle, distance, and extension of the gastric wall. All images of gastric cancer lesions were marked manually by an author (TH) who is an expert on gastric cancer and a board-certified trainer at the Japan Gastroenterological Endoscopy Society. The author (TH) carefully marked the range of cancer lesions using rectangular frames (Figs. 1, 2, 3).
Output of the CNN. a A slightly reddish and flat lesion of gastric cancer appears on the lesser curvature of the middle body. b The yellow rectangular frame was marked by the CNN as a possible lesion and to indicate the extent of a suspected gastric cancer lesion. An endoscopist manually marked the location of the cancer using a green rectangular frame. [0–IIc, 5 mm, tub1, T1a(M)]
Cancer lesion presented in multiple images. An endoscopist manually marked the location of the cancer in each image using a green rectangular frame. The yellow rectangular frame was produced by the CNN to identify a suspected lesion and indicates the extent of gastric cancer. Although the CNN did not identify gastric cancer in the distant view (a), it correctly located gastric cancer in the near view (b). This was counted as a correct answer
Six lesions missed by the CNN. The green rectangular frames show gastric cancer missed by the CNN. a Greater curvature of the antrum, 0–IIc, 3 mm, tub1, T1a(M). b Lesser curvature of the middle body, 0–IIc, 4 mm, tub1, T1a(M). c Posterior wall of the antrum, 0–IIc, 4 mm, tub1, T1a(M). d Posterior wall of the antrum, 0–IIc, 5 mm, tub1, T1a(M). e Greater curvature of the antrum, 0–IIc, 5 mm, tub1, T1a(M). The yellow rectangular frame shows a pyloric ring, which the CNN misdiagnosed as gastric cancer. f Anterior wall of the lower body, 0–IIc, 16 mm, tub1, T1a(M)
To evaluate the diagnostic accuracy of the constructed CNN, an independent test data set of stomach images was collected from 69 consecutive patients with 77 gastric cancer lesions (62 cases had 1 gastric cancer lesion, 6 had 2 lesions, and 1 had 3 lesions), who received an EGD at the Cancer Institute Hospital Ariake from 1 to 31 March 2017 during daily clinical practice. All EGD procedures used a standard endoscope (GIF-H290Z) and a standard endoscopic video system (EVIS LUCERA ELITE CV-290/CLV-290SL). During the procedure, an entire stomach was observed, and images of all parts were captured with white light images. The chromoendoscopy, NBI, and poor-quality images were excluded. The final test data set included 2296 total images, and each case had 18–69 images.
Constructing a CNN algorithm
To construct an AI-based diagnostic system, we used a deep neural network architecture called the Single Shot MultiBox Detector (SSD, https://arxiv.org/abs/1512.02325), without altering its algorithm. SSD is a deep CNN that consists of 16 layers or more. The Caffe deep learning framework, which is one of the most popular and widely used frameworks originally developed at the Berkeley Vision and Learning Center, was then used to train, validate, and test the CNN.
All CNN layers were fine-tuned using stochastic gradient descent with a global learning rate of 0.0001. Each image was resized to 300 × 300 pixels, and the bounding box was also resized accordingly to make CNN analyze optimally. These values were set up by trial and error to ensure all data were compatible with SSD.
Outcome measures
After constructing the CNN using the training image set, we evaluated the performance through the test image set. When the CNN detected a lesion of gastric cancer from the input data of test images, the CNN outputted a disease name (early or advanced gastric cancer) and its position. A detected lesion was displayed with a yellow rectangular frame on the endoscopic images (Fig. 1).
Because some gastric cancer lesions were presented in multiple images, we used the following definitions to perform the test.
-
When the CNN detected even one gastric cancer lesion in multiple images of the same lesion, it was defined as a correct answer (Fig. 2).
-
Because the demarcation line of gastric cancer was sometimes unclear, when the CNN detected a partial gastric cancer lesion, it was regarded as a correct answer.
The sensitivity and positive predictive value (PPV) for the CNN’s ability to detect gastric cancer were calculated as follows:
Sensitivity = detected number of correct gastric cancer lesions/actual number of gastric cancer lesions
PPV = detected number of correct gastric cancer lesions/number of lesions that were diagnosed as gastric cancer by the CNN.
Ethics
This study was approved by the Institutional Review Board of the Cancer Institute Hospital Ariake (no .2016–1171) and Japan Medical Association (ID JMA-IIA00283).
Results
A total of 714 images (31.1%) out of the 2296 test image sets confirmed gastric cancer. Table 1 presents the patient and lesion characteristics used in the test image set. Fifty-eight cases (84.1%) had moderate to severe gastric mucosal atrophy. Forty-two lesions (67.5%) were early gastric cancer (T1), and 25 (32.5%) were advanced gastric cancer (T2–T4). The median tumor size in diameter was 24 mm (range 3 to 170 mm). Most were superficial types (0–IIa, 0–IIb, 0–IIc, 0–IIa + IIc, 0–IIc + IIb, and 0–IIc + III) with 55 lesions (71.4%).
The CNN required 47 s to analyze the 2296 test images. The CNN diagnosed 232 total lesions as gastric cancer; 161 were non-cancerous lesions, and it correctly identified 71 of 77 gastric cancer lesions with an overall sensitivity of 92.2% and a PPV of 30.6%. The sensitivity by tumor size and depth is shown in Table 2. Seventy of 71 lesions (98.6%) with a diameter of ≥ 6 mm were correctly detected by the CNN. All invasive cancers (T1b or deeper) were correctly detected by the CNN. Conversely, the details of the six missed lesions are shown in Fig. 3. Five of the six lesions were minute cancers (≤ 5 mm). All missed lesions were superficially depressed and differentiated-type intramucosal cancers that were difficult to distinguish from gastritis even for experienced endoscopists.
Table 3 shows the details of false-positive lesions. Nearly half of the misdiagnosed lesions were gastritis with changes in color tone or irregular mucosal surface as shown in Fig. 4a–c. The next most common cause of misdiagnosis was the normal anatomical structures of the cardia, angulus, and pylorus, as shown in Fig. 4d.
False positive lesions. The yellow rectangular frames show non-cancerous lesions that the CNN misdiagnosed as gastric cancer. a Intestinal metaplasia with irregularity of surface mucosa. b Whitish mucosa as a result of localized atrophy. c Reddish mucosa as a result of superficial gastritis. d Bending of the angulus
Discussion
To develop an AI-based diagnostic system to detect gastric cancer, we used a CNN that simulates the human brain. Extensive training data are generally required to construct such a system [29], and we used over 13,000 clear endoscopic images that had been stored at our institutions. To the best of our knowledge, this is the first report that evaluates the ability of CNN to detect gastric cancer in endoscopic images. In this study, the constructed CNN detected 92.2% of gastric cancers in the independent test image set. The lesions detected by the CNN included small intramucosal gastric cancers that are relatively difficult to detect, even by endoscopists. Furthermore, all invasive gastric cancers were detected by the CNN. The missed six lesions were differentiated-type intramucosal cancers that are similar to gastritis and difficult to diagnose even by experienced endoscopists. Because the doubling time of gastric mucosal cancer is considered to be 2–3 years [30], those small cancer cases that were missed would be detected as intramucosal cancer when performing annual EGD, and the clinical applicability of the CNN might not be considerably hampered.
By contrast, 69.4% of the lesions the CNN diagnosed as gastric cancer were benign. The most common reasons for misdiagnosis were gastritis with redness, atrophy, and intestinal metaplasia. These findings are sometimes even difficult for endoscopists to distinguish from gastric cancer. An earlier study reported that the PPV of gastric biopsy without magnifying endoscopy for gastric epithelial neoplasms was only 3.2–5.6% [31, 32]. Considering that the PPV of biopsy by endoscopists is relatively low, and false negatives are more problematic than false positives in diagnosing cancer, a 30.6% PPV by the CNN would be clinically acceptable. The anatomical structures of the cardia, pylorus, and angulus were also misdiagnosed as gastric cancer, which are unlikely to be misdiagnosed by endoscopists. If the CNN can learn such normal anatomical structures as well as various benign lesions more systematically, the PPV of gastric cancer detection will improve further in the future.
Remarkably, the CNN consumed only 47 s to analyze more than 2000 test images. This high rate of speed to recognize and judge images is not achievable by humans. In 2016, an endoscopic mass screening program for gastric cancer was started in Japan. This program requires a time-consuming double checking of endoscopic images, which produces a heavy burden on clinicians. The CNN system will remarkably improve this situation if introduced as a supporting tool. Furthermore, the procedure can be performed completely “online,” thereby addressing the problem of insufficient numbers of endoscopists in remote and rural areas as well as in developing countries as a telemedicine tool. Thus, in the near future, an AI-based diagnostic system might generate major global changes in the endoscopic diagnoses of gastric cancer.
This study has several limitations. First, we used only high-quality endoscopic images for the training and test image sets. If there is less insufflation of air, post-biopsy bleeding, halation, blur, defocus, or mucus, the CNN will make a mistake in judgment (although the same occurs with endoscopists) [33]. Second, we collected a vast number of training set images from the beginning to establish a good accuracy of the CNN, but did not try other numbers of training set images. More training images might result in a more accurate diagnostic ability of the CNN. However, in this study, we did not examine the association of the number of training images and the CNN accuracy, which seems to be an issue to solve in the future studies. Third, we used only gastric cancer cases for the test image sets. The frequency of gastric cancer cases in an endoscopic mass survey would be extremely low. Fourth, because 161 false-positive lesions were not histologically proven, occult cancerous lesions may be included among them. Fifth, despite the fact that he has over 10 years of experience working at a cancer specialty hospital and has diagnosed more than 6000 cases of gastric cancer, a single endoscopist manually marked the training and test image sets of gastric cancer. Sixth, we did not compare the diagnostic accuracy of the CNN with that of endoscopists. Seventh, all test images were provided by the same type of endoscope (GIF-H290Z) and endoscopic video system (EVIS LUCERA ELITE CV-290/CLV-290SL) and did not include images obtained from other endoscopic devices. Finally, in verifying other test images, including those of non-gastric cancer cases, incorporating the CNN system in daily clinical practice will be necessary because all the test images consisted of gastric cancer cases. We are currently planning a multicenter trial to tackle these limitations and further validate the capabilities of the CNN system using endoscopic mass survey screening images.
In conclusion, we developed a CNN system for detecting gastric cancer using stored endoscopic images, which processed extensive independent images in a very short time. The clinically relevant diagnostic ability of the CNN offers a promising applicability to daily clinical practice for reducing the burden of endoscopists as well as telemedicine in remote and rural areas as well as in developing countries where the number of endoscopists is limited.
References
GLOBOCAN 2012. Available from: http://globocan.iarc.fr/Pages/fact_sheets_cancer.aspx on 28 April 2017.
Sano T, Coit DG, Kim HH, Roviello F, Kassab P, Wittekind C, et al. Proposal of a new stage grouping of gastric cancer for TNM classification: international Gastric Cancer Association staging project. Gastric Cancer. 2017;20:217–25.
Katai H, Ishikawa T, Akazawa K, Isobe Y, Miyashiro I, Oda I, et al. Five-year survival analysis of surgically resected gastric cancer cases in Japan: a retrospective analysis of more than 100,000 patients from the nationwide registry of the Japanese Gastric Cancer Association (2001–2007). Gastric Cancer. 2017. https://doi.org/10.1007/s10120-017-0716-7 (Epub ahead of print).
Itoh H, Oohata Y, Nakamura K, Nagata T, Mibu R, Nakayama F. Complete ten-year postgastrectomy follow-up of early gastric cancer. Am J Surg. 1989;158:14–6.
Crew KD, Neugut AI. Epidemiology of gastric cancer. World J Gastroenterol. 2006;12:354–62.
Tsubono Y, Hisamichi S. Screening for gastric cancer in Japan. Gastric Cancer. 2000;3:9–18.
Yeoh KG. How do we improve outcomes for gastric cancer? J Gastroenterol Hepatol. 2007;22:970–2.
Jeon HK, Kim GH, Lee BE, Park DY, Song GA, Kim DH, et al. Long-term outcome of endoscopic submucosal dissection is comparable to that of surgery for early gastric cancer: a propensity-matched analysis. Gastric Cancer. 2017. https://doi.org/10.1007/s10120-017-0719-4 (Epub ahead of print).
Isomoto H, Shikuwa S, Yamaguchi N, Fukuda E, Ikeda K, Nishiyama H, et al. Endoscopic submucosal dissection for early gastric cancer: a large-scale feasibility study. Gut. 2009;58:331–6.
Choi MK, Kim GH, Park DY, Song GA, Kim DU, Ryu DY, et al. Long-term outcomes of endoscopic submucosal dissection for early gastric cancer: a single-center experience. Surg Endosc. 2013;27:4250–8.
Ahn JY, Jung HY. Long-term outcome of extended endoscopic submucosal dissection for early gastric cancer with differentiated histology. Clin Endosc. 2013;46:463–6.
Gotoda T, Iwasaki M, Kusano C, Seewald S, Oda I. Endoscopic resection of early gastric cancer treated by guideline and expanded National Cancer Centre criteria. Br J Surg. 2010;97:868–71.
Menon S, Trudgill N. How commonly is upper gastrointestinal cancer missed at endoscopy?A meta-analysis. Endosc Int Open. 2014;2:E46–50.
Hosokawa O, Hattori M, Douden K, Hayashi H, Ohta K, Kaizaki Y. Difference in accuracy between gastroscopy and colonoscopy for detection of cancer. Hepatogastroenterology. 2007;54:442–4.
Hosokawa O, Tsuda S, Kidani E, Watanabe K, Tanigawa Y, Shirasaki S, et al. Diagnosis of gastric cancer up to three years after negative upper gastrointestinal endoscopy. Endoscopy. 1998;30:669–74.
Amin A, Gilmour H, Graham L, Paterson-Brown S, Terrace J, Crofts TJ. Gastric adenocarcinoma missed at endoscopy. J R Coll Surg Edinb. 2002;47:681–4.
Yalamarthi S, Witherspoon P, McCole D, Auld CD. Missed diagnoses in patients with upper gastrointestinal cancers. Endoscopy. 2004;36:874–9.
Voutilainen ME, Juhola MT. Evaluation of the diagnostic accuracy of gastroscopy to detect gastric tumours: clinicopathological features and prognosis of patients with gastric cancer missed on endoscopy. Eur J Gastroenterol Hepatol. 2005;17:1345–9.
Zhang Q, Chen ZY, Chen CD, Liu T, Tang XW, Ren YT, et al. Training in early gastric cancer diagnosis improves the detection rate of early gastric cancer: an observational study in China. Medicine (Baltimore). 2015;94:e384.
Yamazato T, Oyama T, Yoshida T, Baba Y, Yamanouchi K, Ishii Y, et al. Two years’ intensive training in endoscopic diagnosis facilitates detection of early gastric cancer. Intern Med. 2012;51:1461–5.
Yoshida S, Yamaguchi H, Tajiri H, Saito D, Hijikata A, Yoshimori M, et al. Diagnosis of early gastric cancer seen as less malignant endoscopically. Jpn J Clin Oncol. 1984;14:225–41.
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–8.
Bibault JE, Giraud P, Burgun A. Big Data and machine learning in radiation oncology: state of the art and future prospects. Cancer Lett. 2016;382:110–7.
Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402–10.
Misawa M, Kudo SE, Mori Y, Takeda K, Maeda Y, Kataoka S, et al. Accuracy of computer-aided diagnosis based on narrow-band imaging endocytoscopy for diagnosing colorectal lesions: comparison with experts. Int J Comput Assist Radiol Surg. 2017;12:757–66.
Yoshida H, Shimazu T, Kiyuna T, Marugame A, Yamashita Y, Cosatto E, et al. Automated histological classification of whole-slide images of gastric biopsy specimens. Gastric Cancer. 2017. https://doi.org/10.1007/s10120-017-0731-8.
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proceeding NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1. Lake Tahoe, Nevada; 2012. pp. 1097–105. https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:1–9.
Deng, J. Dong W, Socher R, Li L, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: EEE Conference on Computer Vision and Pattern Recognition. 2009:248–55.
Fujita S. Biology of early gastric carcinoma. Pathol Res Pract. 1978;163:297–309.
Yoshimizu S, Yamamoto Y, Horiuchi Y, Omae M, Yoshio T, Ishiyama A, et al. Diagnostic performance of routine esophagogastroduodenoscopy using magnifying endoscope with narrow-band imaging for gastric cancer. Dig Endosc. 2017. https://doi.org/10.1111/den.12916 (Epub ahead of print).
Yao K, Doyama H, Gotoda T, Ishikawa H, Nagahama T, Yokoi C, et al. Diagnostic performance and limitations of magnifying narrow-band imaging in screening endoscopy of early gastric cancer: a prospective multicenter feasibility study. Gastric Cancer. 2014;17:669–79.
Gotoda T, Uedo N, Yoshinaga S, Tanuma T, Morita Y, Doyama H, et al. Basic principles and practice of gastric cancer screening using high-definition white-light gastroscopy: eyes can only see what the brain knows. Dig Endosc. 2016;28(Suppl 1):2–15.
Japanese Gastric Cancer Association. Japanese classification of gastric carcinoma. Gastric Cancer. 2011;14:101–12 (3rd English edition).
Kimura K, Takemoto T. An endoscopic recognition of the atrophic border and its significance in chronic gastritis. Endoscopy. 1969;1:87–97.
Acknowledgements
The authors thank Yuma Endo and other engineers at AI Medical Service, Inc. (Tokyo, Japan), for their cooperation in developing the CNN.
Author information
Authors and Affiliations
Contributions
Study concept and design (TH, KA, TT, SI and TT), acquisition of data (TH, SS, TO, TO, KM and TT), analysis and interpretation of data (TH, KA, TT, SI and TT), drafting of the manuscript (TH, KA, TT, SI, SS, TO, MF, JF and TT)
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical standards
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1964 and later versions. Informed consent or substitute for it was obtained from all patients for being included in the study.
Rights and permissions
About this article
Cite this article
Hirasawa, T., Aoyama, K., Tanimoto, T. et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer 21, 653–660 (2018). https://doi.org/10.1007/s10120-018-0793-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10120-018-0793-2