Introduction

Gastric cancer is the third leading cause of cancer death worldwide and is the most common malignancy in east Asian countries [1]. The revolution of the endoscopic treatment strategy for early gastric cancer (EGC) using endoscopic submucosal dissection (ESD) dominantly benefits patients with EGC who have the opportunities of en bloc endoscopic resection with shorter hospital stay and a similar 5-year postoperative survival rate compared with the traditional surgery [2, 3]. This technological advancement has become widespread and is currently recommended as the first-line treatment for EGC without lymph-node metastasis [4].

Accurate delineation of cancer margins is the first critical step in treatment strategy, especially for achieving endoscopic curative resection [5]. Previous studies demonstrated the effectiveness of chromoendoscopy (CE) with an indigo carmine solution after the conventional white-light imaging (WLE) in EGC delineation. Although CE/WLE is widely used in delineating the resection margins of EGC pre-ESD operation [6, 7]. It still has limitations including the inaccuracy for approximately 20% EGC patients [8]. Recently, magnifying endoscopy with narrow-band imaging (ME-NBI) has been reported as a useful technique in both EGC diagnosis and horizontal delineation of the lesions [9, 10]. However, the conclusions of recent studies comparing ME-NBI and CE in EGC delineation were contradictory [8, 11]. In addition, despite guidelines recommend ME-NBI for delineating the horizontal extent of EGC, the evidence to date still remained insufficient [4].

Artificial intelligence (AI) is one of the fastest growing technologies in recent years. Convolutional neural networks (CNNs) show a remarkable performance in recognizing medical images [12, 13]. Deep convolutional neural networks (DCNNs) have been successfully used for real-time differentiation of colorectal polyps and classification of skin cancer [14, 15]. Our previous studies have developed a real-time system for automatically detecting EGC without blind spots during gastroscopy and verified its effect on improving gastroscopy completeness in a randomized-controlled trial [16, 17]. Till now, the application of CNN in delineating the resection margin of EGC under CE or WLE has rarely been investigated.

In this study, we trained a fully convolutional neural network (FCN) method ENDOANGEL to assist endoscopists in determining the resection extent of EGCs under CE or WLE. Furthermore, we evaluated the performance of ME-NBI and ENDOANGEL in delineating resection margins using post-ESD pathology as the gold standard.

Methods

Image data set and preprocessing

We collected ESD images obtained from January 1, 2014 and May 1, 2019 in Renmin Hospital of Wuhan University. The data included images of histological confirmed EGC patients with negative lateral margins. Patients who had an undifferentiated EGC, a piecemeal resection, or a positive horizontal margin were excluded. In the study, 1244 images from 536 patients were enrolled, with 889 images from 304 patients for training and the other 355 images from 232 patients for testing. For objectively evaluating the performance of ENDOANGEL, similar images in the test data set were deleted, and only images showing lesions from different perspective were retained. The characteristics of the lesions are shown in Table 1. One expert in EGC, who has performed > 1000 ESD, reviewed all the images and delineated the resection margins in the enrolled images without ESD knife markers.

Table 1 Characteristics of the lesions

In training data set, 546 CE images from 67 patients were included, and 34 CE images from 14 patients were included in the test data set. In WLE data set, the train data set consisted of 343 images from 260 patients, and test data set consisted of 321 images from 218 patients. All images were adjusted into 512 × 512 pixels. Data were augmented by rotation; scaling; adjusting transparency, and vertical and horizontal directions; and cutting images randomly from 0 to 10% to improve data set variability and prevent overfitting. The study protocol was approved by the ethics committee of Renmin Hospital of Wuhan University. For retrospective patients whose endoscopic images were stored, informed consent was exempted by the IRB. All prospective patients provided written informed consent before enrollment. All procedures were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1964 and later versions.

Training algorithms

FCNs, which are used for semantic segmentation, were introduced by Jonathan Long and colleagues in 2015. Classification networks are transferred to fully convolutional networks, and their classified information is correspondingly transferred to segmentation. Then, a skip architecture combines semantic information and appearance information to produce segmentation results [18]. UNet++ add redesigned skip pathways and deep supervision to UNet, which meets the need for a more accurate segmentation in medical images [19]. Here, we used UNet++ based on VGG-16 to train the model.

Testing ENDOANGEL in still images

To investigate the effectiveness of ENDOANGEL in delineating resection margins, we calculated the intersection ratio between the expert and ENDOANGEL delineation regions to the expert delineation region. In the image test data set, manual markers of en bloc resected EGC lesions were histologically demonstrated and, therefore, were chosen as gold standards. The overlap ratio was defined as the ratio of intersection between the expert delineation region and the predicted resection region to the expert delineation region. Threshold was the overlap ratio predefined which ranged from 0.01 to 1.0 with a step size of 0.01. Exceeding the threshold overlap ratio meant the overlap ratio of a specific case which exceeded the threshold overlap ratio. Correct delineation was defined as the intersection region over the expert delineation region exceeding the threshold overlap ratio. A corrected delineated case was defined as correct delineation of all images of a patient. The accuracy of ENDOANGEL in the test data set was calculated as the number of corrected delineated cases divided by the number of all enrolled cases. To choose an appropriate value to describe the accuracy of ENDOANGEL, two experts were asked to review the results predicted by ENDOANGEL and chose those acceptable results with the overlap value blinded. Through this method, the threshold of 0.6 was chosen to describe the accuracy of ENDOANGEL.

Testing ENDOANGEL in unprocessed ESD videos

Ten ESD videos with endoscopy–pathology point-to-point markings and post-ESD ME-NBI observations were retrospectively (seven videos) and prospectively (three videos) collected to test the accuracy of ENDOANGEL in real-time delineating the resection margins of EGC. Images were processed and analyzed at 25 frames per second. The blue dotted line indicated the predicting extent of the resection margins (Fig. 2c; Video 1). The video test set was collected retrospectively and the observations were controlled by the endoscopists. Although the application of the ENDOANGEL was presented, a refined analysis was hard to perform because of the quick movements in some endoscopic video clips. We then clipped the video to the sequential images and calculated the accuracy. The gold standard in the video test dataset was the pathological results rebuilt post-ESD. The point-to-point markings by the endoscopists and pathologists were performed as follows:

  1. 1.

    Management of the post-endoscopic resection pathological samples

    The specimen was flattened, pinned with a 2-mm pitch on a plastic plate, with appropriate force to flatten the original shape and not to destroy the border of the specimen. After labeling oral or anal border, position, and other information, the plate was immersed in saline followed by ME-NBI as reported previously [20]. The endoscopists examined the integrity of the lesion and marked the suspected region with silk threads. Pins were used to fix the ends of the silk thread. The suspicious cancerous fields were framed by silk threads with color marked by a marker pen (Fig. 1a). Then, the specimen was fixed in 10% buffered formalin solution (Fig. 1b). The pathologists painted the marked area black with Indian ink (Phygene, China; Fig. 1b). Pathological diagnosis was categorized according to the revised Vienna classification (C1: negative for neoplasia; C2: indefinite for neoplasia; C3: mucosal low-grade neoplasia [low-grade adenoma/dysplasia]; C4: mucosal high-grade neoplasia [4.1: high-grade adenoma/dysplasia; 4.2: noninvasive carcinoma [carcinoma in situ]; 4.3: intramucosal carcinoma]; and C5: submucosal invasive carcinoma) [21].

    Fig. 1
    figure 1

    Marking of the lesion on the specimen. a The resection specimen has undergone post-ESD magnification. A marker pen is used to blacken the silk thread to indicate the suspected lesions. b The fixed specimen. c Application of India ink to suspected lesions. d The specimen is cut into pieces with a 2-mm pitch. e Numbering cut specimen. f Specimens are placed in the corresponding box

  2. 2.

    Rebuilding the pathological results on endoscopic images

    A pathologist rebuilt the pathological results on the formalin-fixed specimen images according to the guidelines and then rebuilt the pathological results on endoscopic images using the ERDAS IMAGINE 9.2 software [22, 23]. The function was achieved by establishing the equation. The coordinates of six matching points (12 points for two images) were used to establish the equation. Six points were required in one image according to the specification, regardless of the size of the lesion.

    1. (1)

      Rebuilding the pathological result to the resected specimen: The specimen shrank after fixing in formalin solution. Six matching points were selected from the fixed and resected specimen images. The six points were evenly distributed (Fig. 2a).

      Fig. 2
      figure 2

      Process of rebuilding the pathological result on the endoscopic image and rebuilding the endoscopic markings on the resection specimen image. a Step 1: Rebuild the pathological result on the resection specimen image. b Step 2: Rebuild the pathological result on the endoscopic image. The six corresponding points are shown in the images. The change of image’s color is caused by the ERDAS IMAGINE 9.2 software, which does not affect the rebuild process. c Step 3: Present the extent delineated by ENDOANGEL on the endoscopic image and present different marks on the endoscopic image. d Step 4: Present the marks on the resection specimen image. Rebuild margins based on the postoperative ME-NBI markings on the resection specimen image, and then mark the minimum distance

    2. (2)

      Rebuilding the pathological result on endoscopic images: six matching points were selected from resected specimen and endoscopic images. The six points were evenly distributed. Adobe Photoshop (CC 2019) was used to display the lines of the pathological results on endoscopic images (Fig. 2b).

Comparison of the minimum distance between the labeled margins and the pathological cancerous boundary on resection specimen images

An expert delineated the resection margins according to the post-ESD ME-NBI images, and then, the markers of post-ESD ME-NBI images were rebuilt to the resected specimen with a measurable scale as above steps. The minimum distance between the cancerous boundary and the marks was determined by calculating the pixel length (Fig. 2d).

Statistical analysis

Positive predictive value (PPV = true-positive region/[true-positive region + false-positive region]), sensitivity (true-positive region/positive region), and intersection over union (IoU = true-positive/false-positive + true-positive + false-negative) were calculated to evaluate the performance of ENDOANGEL on the image and video data set [24]. A two-tailed unpaired Student’s t test with a significance level of 0.05 was used to compare the minimum distance of the margins and cancerous boundary. A two-tailed unpaired Student’s t test with a significance level of 0.05 was used to compare the average area covered by the ENDOANGEL and ESD knifes. All the calculations were performed with SPSS 20 (IBM, Chicago, IL, USA).

Results

Endoscopic characteristics of the lesions

Lesion characteristics are shown in Table 1. All data for a single patient were assigned to exactly one of training or testing splits. The groups were balanced in terms of tumor size, location, and macroscopic type.

Performance of ENDOANGEL in delineating resection margins in stationary images

In Fig. 3, the accuracy of ENDOANGEL was shown under a series of overlap ratio ranging from 0.01 to 1.0 with a step size of 0.01. Compared with the manual markings by the expert, ENDOANGEL had an accuracy of 85.7% in the CE images and 88.9% in the WLE images under an overlap ratio threshold of 0.60. The average IoU is 0.688 of CE image test data set and 0.668 of WLE image test data set. Representative images of ENDOANGEL delineating the resection extent are shown in Fig. 4.

Fig. 3
figure 3

Performance of ENDOANGEL in delineating the resection margin in the test dataset. a Chromoendoscopy image data set. b White-light endoscopy image data set

Fig. 4
figure 4

Representative images of ENDOANGEL delineating the resection extent. The dark dotted line is the resection margin predicted by ENDOANGEL. The light dotted line is the resection margin delineated by the expert. a ENDOANGEL delineates the resection extent in the CE images. The overlap ratio is 98.79%. b ENDOANGEL delineates the resection extent in the WLE images. The overlap ratio is 89.49%

Performance of ENDOANGEL in unprocessed ESD videos

Compared with ESD knife markings, ENDOANGEL reached an accuracy of 100% under an overlap ratio threshold of 0.60 (Table 2). Compared with pathological rebuilt results, ENDOANGEL and ESD knife marked regions both covered all areas of high-grade intraepithelial neoplasia and cancers. The average area covered by ENDOANGEL and ESD Knife had no difference (p = 0.33). The minimum distance between the labeled margins and pathological cancerous boundary was 3.40 ± 1.49 mm for ENDOANGEL and 3.32 ± 2.32 mm for the ESD knife marking by the experts, with no significant difference between them (p = 0.32). A representative video of ENDOANGEL delineating the resection extent in real time was presented in Video 1.

Table 2 Performance of ENDOANGEL in image data set and video data set in comparison with that of manual markings by experts

Evaluation of ME-NBI and ENDOANGEL in delineating resection margins by post-ESD pathology

In ten cases with endoscopy–pathology point-to-point markings, the resection margins predicted by ENDOANGEL covered all areas of high-grade intraepithelial neoplasia and cancers, while the resection margins based on ME-NBI only covered all cancerous regions in 80.00% (8/10) of the patients. Furthermore, for eight cases with negative lateral margins determined by ME-NBI, the minimum distance between the predicted margins and pathological cancerous boundary was 3.44 ± 1.45 mm for ENDOANGEL and slightly shorter for ME-NBI (3.21 ± 1.72 mm, p = 0.80). Representative images of the performance of ME-NBI and ENAOANGEL in delineating resection margins are shown in Fig. 2c.

Discussion

Precise evaluation of the lateral margins of EGCs is vital for curable resection and reducing complications. WLE combined with CE, which enhances the characteristics of lesions, has been widely used for delineating the horizontal margins of EGCs [25, 26]. Recently, ME-NBI, which determines the demarcation line by observing the microvasculature and microsurface, has also been recommended [9, 10]. Several studies have investigated and compared the effectiveness of ME-NBI and CE in delineating EGC. However, their conclusions were contradictory [8, 11]. Here, we developed a novel method to objectively evaluate the performance of post-ESD ME-NBI in delineating EGC margins and built a deep learning-based method to identify the resection extent of EGC under CE or WLE both in images and sequential images clipped from real-time EGC videos.

Although pathological analysis was routinely performed after ESD, direct comparison between the pathological characteristics and endoscopic lesion appearances is still challenging for both pathologists and endoscopists. Researchers matched pathological results and lesions under endoscopy mainly according to the typical changes such as noticeable protuberance and indentation [27]. Nagahama et al. also studied EGC resection margins. They took two biopsies 5 mm outside and inside the oral-side margins to compare the capacity of ME-NBI and CE in delineating the extent of EGC [8]. The main limitations included that the small biopsy specimen could not represent the extent of the whole resected specimen, and sampling errors and subjective deviations increased the possible discrepancies [8]. Here, we first developed a novel method to recover post-ESD from endoscopic views which achieved point-to-point matching between endoscopy and pathology, and provided an objective and visualized method to evaluate the performance in different endoscopy modalities. More importantly, through this method, we revealed that post-ESD ME-NBI only covered whole cancerous regions in 80.00% (8/10) of the patients, and the distance of the cutting edge was slightly short even in negative-margin cases. These findings were finally reconfirmed by pathological rebuilding. In another two cases, post-ESD ME-NBI predicting EGC margins were assessed as positive when compared with rebuilt pathological images. The previous studies in undifferentiated cancers revealed that in-depth infiltration of cancer cells and with non-neoplastic epithelium covering might cause ME-NBI misdiagnosis [28]. However, in the current studies, pathological results of these two cases with positive margins predicted by post-ESD ME-NBI excluded this possibility, because they were both surficial, differentiated EGCs and without any in-depth invasion. It was not clear whether the ME-NBI which was made on the resected specimens might account for the difference. Further studies need to investigate the ability of ME-NBI in determining EGC margins objectively and provide more practical evidence for clinical treatments.

A previous study tried to use a computer-aided system to diagnose and delineate EGCs in retrospective ME-NBI images [29]. However, little studies focused on delineating EGC margins under CE or WLE. Based on our previous work on AI in EGC, we developed a deep learning method ENDOANGEL to identify the resection extent of EGC under CE or WLE [16, 17]. Its performance was comparable with that of the expert both in stationary images and real-time videos. In the pathologically rebuilt endoscopic cases, ENDOANGEL completely encircled the cancerous areas with suitable ranges of cutting margin. Furthermore, ENDOANGEL achieved real-time delineation by analyzing video data at 80 ms per frame. In contrast, delineating EGC margins by ME-NBI is absolutely time-consuming (a mean operating time of around 45 min per lesion) [30]. The much shorter prediction time and the absence of fatigue with the CNN, ENDOANGEL will not only assist endoscopists in delineating resection margins, but also save a large part of time and energies. The outstanding real-time delineation by ENDOANGEL will also help endoscopists marking EGC lesions in ESD operations.

It is also worth noting that our pathological rebuilt method was greatly facilitated the accurate evaluation of EGC margins. The novel way of rebuilding pathological results on endoscopic images could greatly promote the relocations of endoscopic lesions in pathological cancers, directly match pathological findings with endoscopic characteristics in a point-to-point way, and greatly improve the endoscopists understanding of EGC. These first direct and rational findings will potentially benefit and be applied in future clinical diagnosis, EGC training, and endoscopic cancer studies. In noncurative resection cases, ENDOANGEL is also expected to provide useful information for further endoscopic retreatment.

This study has several limitations. First, we selected patients who underwent en bloc resection to develop ENDOANGEL, which might cause bias. Lesions that were easier to resect were more likely to be enrolled. However, the bias can be reduced by increasing the number of cases. Second, the number of cases with pathological rebuild results in video test set was limited. A large-scale, prospective study needs to further verify the robustness of ENDOANGEL. We are carrying out further prospective and multicenter studies and more patients would be enrolled to verify the model in the future. A system based on ENDOANGEL will also be established, which contains input module, analyzed module, and output module to connect with clinical endoscopic video imaging system. Third, the real-time prediction of ENDOANGEL could be affected by man-made disturbances such as cautery marks. The performance of ENDOANGEL should be further refined with more images to limit the impact.

We trained an AI method ENDOANGEL to delineate resection margins of EGC under CE or WLE and developed an endoscopy–pathology by point-to-point matching for objectively evaluating its performance. ENDOANGEL presented a potential in assisting endoscopists in the delineation of the resection margins for EGC patients.