Introduction

Deep learning (DL) is a landmark methodology in artificial intelligence (AI) driven by big data, high computing power, and deep network models, which has achieved state-of-the-art performance in many challenging tasks, such as image classification, natural language processing, audio processing, and playing strategy games [1,2,3,4,5]. DL is capable of extracting features from sparse and noised medical data, and have obtained many excellent achievements in healthcare, especially in the image-based areas, such as radiology, ophthalmology, and dermatology [6,7,8]. Furthermore, FDA has approved several DL-based systems in the healthcare area for liver and lung cancer (CT and MRI) diagnosis, CT brain bleed diagnosis, X-ray wrist fracture diagnosis, and screening diabetic retinopathy patients [6, 9]. Given the advanced development of artificial intelligence (AI), incorporating deep learning (DL) techniques into the clinical process becomes a most potential solution to provide scalable, timely and high-quality cytopathology services [10,11,12].

According to related data, peritoneal metastasis which comes from intraperitoneal free cancer cells characterized by out-of-control cell growth. Positive free cancer cells found in ascites or abdominal lavage fluids are the strong diagnostic evidence of stage IV gastric cancer. Current literature shows 3–6% of patients with gastric cancer have ascites as the first symptom of tumor [13]. Imaging diagnosis which is difficult at early stage of peritoneal metastasis leads to a high rate of false positive, so that patients with typical signs of diagnosis are already at an advanced stage [14]. Therefore, cytologic detection of ascites or abdominal lavage fluids is the golden standard to indicate early peritoneal metastasis. Some patients present obvious ascites, clinicians can extract ascites for cytologic examination. Meanwhile, for those at advanced stage of gastric cancer without obvious ascites, laparoscopy and abdominal lavage fluids are required to verify whether there is peritoneal metastasis [15]. A precise assessment of the state of peritoneal metastasis can guide optimal therapeutic strategies; therefore, some patients can avoid unnecessary surgeries [16]. However, patients can be actively operated on if no tumor cells are found. Therefore, the early and accurate diagnosis of cytology can help patients get timely and effective treatment when it is at low risk of metastasis [17]. If more attention is paid to abdominal exfoliated cells, there will be a promising opportunity for patients to get longer disease-free survival, extend overall survival (OS) in variant cancers.

DL-based models have been successfully applied to cytology of pleural fluid, showing that it has reached the level of pathologists [18]. However, relevant assessment in clinical setting about artificial intelligence in cytology identification of ascites and abdominal exfoliated cells still lacks. Therefore, we aim at establishing a deep learning system to explore the great potential of it and compare it with the diagnosis of pathologists. In this study, we developed an end-to-end DL-based system for autonomous cytopathology interpretation. We validated the proposed system with a cytopathology image dataset of ascites cancer cells. Our methods detected all cells in the input cytopathology image and classified them into benign and malignant groups using DL techniques. We believe that with a reliable automated method of examining tumor cells in gastric cancer, convolutional neural networks are expected to be the preliminary screening of cytology, largely lightening the burden of pathologists.

Materials and methods

Patient cohorts

According to the pathologic report system, patients with gastric cancer identified or suspected as peritoneal metastasis was enrolled in this study admitted to Peking University Cancer Hospital from January 2015 to December 2018. The patients were archived based on the following criteria: (1) pathologically diagnosed with gastric adenocarcinoma; (2) with peritoneal metastasis; (3) the morphology of tumor cells classic; (4) tumor cells forming cluster excluded. In addition, various kinds of ascitic fluid specimens, including ascitic fluids, peritoneal washes and peritoneal lavage fluids were detected by both hematoxylin–eosin (HE) staining and Papanicolaou (PAP) Staining. There were 139 patients involved in this study, with 52 males and 87 females, respectively. The average age of patients was 52 years.

Neural network structure

We decomposed the problem of autonomous cytopathology interpretation into two main tasks. The first task is to find the location of all cells in cytopathology images, calling cell detection by DetectionNet. The other task is to determine whether a target cell is malignant or benign, calling cell classification by ClassificationNet. Cell detection and classification work together to achieve accurate and reliable cytopathology interpretation. We constructed DetectionNet based on faster Region-Based Convolutional Neural Network (R-CNN) [19] to achieve automatic cell detection (Fig. 1a), then all detected cells were processed by ClassificationNet to achieve classification (Fig. 1b). Pre-trained alexnet, vgg16, googlenet, resnet18, and resnet50 were fine-tuned by transfer learning to achieve cell detection and classification. We used image augmentation to expand existing dataset artificially through various image processing operations, including translation, rotation, scale, and random crop.

Fig. 1
figure 1

Deep learning models used to achieve cytopathology interpretation. a Cell detection by DetectionNet based on Faster R-CNN model. Left, original cytopathology image with multiple cells as the input to DetectionNet. Middle, feature maps extracted by ConvNet and Bboxes for cells determined by RPN. Right, cell detection results with yellow bounding box for each detected cell. b Cell classification by ClassificationNet by transfer learning. Upper, pre-trained ClassificationNet models using ImageNet dataset. Lower, fine-tuned ClassificationNet to achieve cell classification. Gray area denoting transferred convolutional and pooling layers. c The architecture of the proposed DL-based system for cytopathology interpretation

Overview architecture of DL-based system

The proposed end-to-end DL-based system consisted of learning and application sections (Fig. 1c). The learning section included two main parts: cell detection by DetectionNet and cell classification by ClassificationNet. These two models were trained individually and independently. In the application section, the DetectionNet and ClassificationNet models were combined to implement cell detection and classification sequentially, thus achieve autonomous cytopathology interpretation. Deep learning architectures and experiments were implemented on MATLAB 2019b using an Nvidia GeForce GTX 1080 Ti GPU with 11 GB memory.

Results

Dataset and models for cell detection

We cropped 176 images from the ascites cytopathology images at 40 × magnification and annotated all 6573 benign and malignant cells using bounding box (Bbox) manually in each cropped image using MATLAB 2019b Image Labeler toolbox. The resolution of cropped images were 0.5 \(\upmu \mathrm{m}\) per pixel. In the representative image, each annotated yellow square denote a cell’s Bbox (Fig. 2a). The 176 cropped images were split into training and testing dataset. The training dataset consists of 144 images with 5642 cells in total. The testing dataset consists of 32 images with 931 cells in total.

Fig. 2
figure 2

Performance of cell detection by DetectionNet. a Example image of cell Bboxes annotation for cell detection. b Histogram of diagonal length of Bboxes in cell detection dataset. c Augmentation of Bboxes annotation by eight transformations. d Architecture of DetectionNet based on Faster R-CNN model using pre-trained alexnet. e Two example image of cell detection results by DetectionNet. Yellow square, Bboxes of annotation. Blue square, Bboses of detection. f Higtogram of IoU for detected Bboxes by alexnet- and resnet18-based DetectionNets. g Precision-Recall curve for two DetectionNet models

To choose proper preprocessing methods before training, we analyzed the statistical properties of annotated Bboxes firstly. The size of Bboxes for corresponding cells is different because of the various size of cell body (Fig. 2a). In this study, we used diagonal length of Bboxes to evaluate the size of cells. We plotted the distributions of diagonal length for all annotated Bboxes (Fig. 2b). Most of the diagonal lengths of annotated cells are in the range of 20–70 pixels (Diagonal length: mean ± std, 40.51 ± 8.36 pixels). We augmented the annotated images to improve the cell detection accuracy by eight image transformations using rotation and reflection operations (Fig. 2c). Notably, only the original image need annotation and the Bboxes of all augmented images can be determined according to the transformation operation and the original annotation. Thus, the image augmentation technique expanded the training dataset without the need for additional annotation.

We constructed DetectionNet based on the Faster R-CNN model using pre-trained alexnet and resnet18 to achieve cell detection (Fig. 2d). The input image’s width and height of alexnet and resnet18 models were 227-by-227 and 224-by-224 pixels, with the diagonal length of 321 and 317 pixels, respectively. To capture more details of images and increase cell detection accuracy, we zoomed in all images and their corresponding annotations with the same amplification factor to ensure the identical image resolution. Considering the statistical properties of Bboxes (Fig. 2a), we magnified all images four times by bicubic method before training and testing.

Quantitative evaluation for cell detection

We validated the performance of Faster R-CNN model using testing dataset after training. Two representative testing images denoted that most of the annotated (yellow squares) and detected (blue squares) Bboxes were largely overlapped in pair (Fig. 2e). First, we adopted Intersection of Union (IoU) to evaluate the exactness between annotated and detected Bboxes. For two Bboxes, the value of IoU was calculated as the ratio of intersection area over union area (Fig. 2f). The value of IoU was in the range of 0–1. For each annotated Bbox, we compared it to all detected Bboxes to find the largest IoU value. Our observation showed that 85.28–87.22% of IoUs were bigger than 0.5 for alexnet- and resnet18-based DetectionNet, respectively (Fig. 2f). Furthermore, we used the precision-recall (PR) curve and mean average precision (mAP) to quantitatively evaluate the accuracy of cell detection. The threshold of IoU when judge the positive detection was set as 0.5. The value of mAP for alexnet- and resnet18-based models were 80.49–83.16, respectively (Fig. 2g).

Dataset and model for cell classification

We cropped 487 images of 1064 × 690 pixels from the ascites cytopathology images at 40 × magnification (0.5 \(\upmu \mathrm{m}\)/pixel). We annotated several benign and malignant cells manually in each cropped image using Leica ImageScope software. In the representative image, the annotated yellow and blue squares denoted malignant and benign cells, respectively (Fig. 3a). In the annotation process, we followed several rules to make the annotated cells capture more information, such as trying to make the annotated Bboxes away from each other and each class’s annotated cells have different characteristics, including size, texture, and color pattern. These cropped images had 18,558 and 6,089 annotated malignant and benign cells in total, respectively. The images were split into training and testing dataset according to patients. The training dataset consisted of 416 images from 119 patients with 17,117 malignant cells and 5636 benign cells. The testing dataset consisted of 71 images from 20 patients with 1441 malignant cells and 453 benign cells.

Fig. 3
figure 3

Cell patch extraction protocol for cell classification. a Example image of cell Bboxes annotation for cell classification. Yellow square, malignant cell. Blue square, benign cell. Filled red triangle labeling the cell 1 which is a malignant cell. Filled black triangle labeling the cell 2 which is a benign cell. b Patch extraction results using the annotation in panel A. Each annotated Bbox in Panel A was extended to 15 Bboxes using scaling and translation operations. c Extracted cell patches in panel B for cell 1 and cell 2. For each annotated cell, 15 image patches were extracted

Implementing minor changes about the size and position of the Bbox for a target cell does not change the label of this cell. Thus, we extended all annotated Bboxes by scaling and translation operations to increase the robustness of our system (Fig. 3b). For each annotated Bbox, we performed \({N}_{S}\) = 3 scaling operations with the zoom factor of 0.83, 1.0, and 1.20. In addition, for each zoomed Bbox, we implemented \({N}_{T}\) random translations with the range of – 5–5 pixels. Thus, we extended each annotated Bbox to \({N}_{E}={N}_{S}\times {N}_{T}\) Bboxes for all benign and malignant cells. Accordingly, we extracted \({N}_{E}\) cell patches by cropping operation using the \({N}_{E}\) extended Bboxes (Fig. 3c). Notably, the scaling operation on Bboxes was performed before cropping and all cell patches have the same spatial resolution.

We analyzed the statistical properties of benign and malignant cell patches to choose proper processing operations. Most of the diagonal length of benign and malignant cells are in the range of 20 to 90 pixels (Diagonal length of benign cell patches: mean ± std, 46.88 ± 8.23 pixels; Diagonal length of malignant cell patches: mean ± std, 51.58 ± 7.04 pixels. Figure 4a. Detailed analysis of diagonal length for four types of benign cells was shown in Fig. S1). We normalized all image patches into 57 × 57 pixels (with the diagonal length of 81 pixels) by center cropping and zero padding before further processing (Fig. 4b). This operation ensures consistent spatial resolution when these images are upsampled to the input size of ClassificationNet model.

Fig. 4
figure 4

Performance of cell classification by ClassificationNet and results of the end-to-end system. a Histogram of diagonal length of Bboxes in cell classification dataset. Gray, benign cells. Red, malignant cells. b Image augmentation and balance by rotation operation. The images of malignant and benign cell were augmented for 3 and 8 times, respectively. c Plot of ROC for seven classification models. d Example TP, FP, FN, and NN images by ClassificationNet. e Results of the DL-based end-to-end system combining cell detection and classification. All cells were detected by resnet18-based DetectionNet and classified by resnet50-based ClassificationNet automatically. Yellow square, malignant cell. Blue square, benign cell. f Plot of time consuming versus number of detected cells in each image. Two cell detection model and seven cell classification model were tested on 30 images. The target model which consisted of resnet18-based DetectionNet and resnet50-based ClassificationNet used only about 3.7 s for per 100 cells. Experiments were implemented on MATLAB 2019b in win10 system with 16 GB memory, and i7-7800X CPU, Nvidia GeForce GTX 1080 Ti GPU

We implemented offline and online image augmentation for training dataset to improve the cell classification accuracy and reduce overfitting of ClassificationNet models. Firstly, we implemented offline augmentation by performing \({N}_{r}\) rotations for each extracted patch (Fig. 4b). The imbalanced size of training benign and malignant cells (5636: 17,117 = 1: 3.04) would result in biased classification tending to be the majority class (malignant class). The dimension of all rotated images were resized to the input size of ClassificationNet models by center-cropping and zero padding. We set different values of \({N}_{r}\) for benign and malignant cells to generate a balanced dataset. For benign and malignant cells, \({N}_{r}\) equals to 9 and 3, respectively. The ratio between augmented benign and malignant cell patches became more balanced after augmentation (50,724: 51,351 = 1: 1.01). In addition, we implemented online augmentation in the training process by translation and rotation. The image translation for two dimensions is in the range of -3 to 3 pixels. The image rotation is in the range of -10 to 10 degree.

Quantitative evaluation for cell classification

We validated the performance of Classification Net models using testing dataset after training. We extended each annotated cell Bbox to \({N}_{test}={N}_{E}\times {N}_{T}\) cell image, where \({N}_{E}\)=15 and it denoted patch extraction operation as described above (Fig. 2b, c). In testing period, we set \({N}_{T}\)=15 and \({N}_{test}\)=150. We calculated the final score by averaging the scores on \({N}_{test}\) images and determined the types of cell patch according to its final score. Firstly, we plotted the ROC curve for all four models, which was also called as sensitivity curve (Fig. 4c). Furthermore, we adopted AUC, Precision, and FNR to quantitatively evaluate the performance of cell classification (Table 1). The vgg16_128X64 and resnet50 model achieved better performance (resnet50: AUC = 0.8851, Precision = 96.80%, FNR = 4.73%; vgg16_128X64: AUC = 0.8776, Precision = 96.69%, FNR = 4.68%). Several example TP, FP, FN, and TN images by ClassificationNet model were illustrated in Fig. 4d. We further tested the performance of resnet50-based ClassificationNet recognizing suspicious and determined malignant cells (Fig. S2). The testing accuracy in recognizing the determined malignant cells was 96.88%. However, the accuracy that the DL system classified the suspicious malignant cells into malignant class was 85.02%. In addition, we analyzed the cell classification accuracy of resnet50-based ClassificationNet for PAP staining and HE staining images (PAP staining: AUC = 0.8800, Precision = 99.48%, FNR = 9.33%; HE staining: AUC = 0.9048, Precision = 97.71%, FNR = 4.58%; Fig. S3).

Table 1 Evaluation of cell classification for seven ClassificationNet models

Application of DL-based system

After training DetectionNet for cell detection and ClassificationNet for cell classification, we combined these models to achieve the end-to-end DL-based system for autonomous cytopathology interpretation. The DL-based system accepted the original cytopathology images as input. The DL-based system automatically detected all cells in the input image, and then achieved cell classification. The system output results with typical image patches for the examination of physician finally (Fig. 4e). In addition, the proposed DL-based system was more efficient than human in detect and check the property of cells from cytopathology image, and the target model which consisted of resnet18-based DetectionNet and resnet50-based ClassificationNet used only about 3.7 s per one hundred cells (Fig. 4f). We further conducted the experiments of image-level classification to show the performance of DL system in distinguishing whether an ascites image has or does not have malignant cells. The image-level testing dataset consisted of 41 positive images and 100 negative images (Fig. S4). The DL system classified the images into positive (malignant ascites) when malignant cells were found (Precision = 97.50%, FNR = 4.88%).

Discussion

With the advanced DL techniques, efficient open-source DL platforms, and enormous medical data, there is an explosive development in DL healthcare systems in recent years. Although there are still several challenges in practical clinical implementation, DL healthcare systems have made a giant stride forward, especially in interpreting medical images. DL systems for histopathology are well studied [20,21,22,23], while the development of AI techniques for cytopathology is lagging behind [24]. Cytopathology contributes to basic researches and clinical applications in almost all organs, such as breast [25], lungs [26], stomach [27], liver [28], pancreas [29], and cervix [30]. In the manual procedure, the experts examine the cytology glass slides visually, and identify the abnormality in every single cell morphology and structure, such as cellular morphology, nuclear-cytoplasmic ratio, and texture pattern [31,32,33], which is time-consuming and objective. In the cytopathology area, advanced whole slide scanners can translate cytopathology slides into high-resolution images [34, 35], and researchers also have been paying increasing attention to the development of DL-based image interpretation [24]. Several studies have reported that DL techniques are useful computer-aided diagnosis (CAD) tools in cytopathology. As useful computer-aided diagnosis (CAD) tools in cytopathology, convolutional neural networks (ConvNet) have achieved satisfying performance in cytoplasm and nuclei segmentation [36, 37]. System based on the DP-based method achieved the classification and segmentation of cell nuclei on the cytology of pleural fluids [18].

However, existing AI-based systems in cytopathology are still far away from autonomous AI interpretation in clinical application. Part of the reason lies in the challenges in the interpretation of cytopathology slides [24]. Firstly, there is an enormous unmet demand for cytopathology service in low-income and middle-income countries (LMICs). Secondly, interpreting cytopathology images is a highly objective and labor-intensive task in which cells are common to be clustered and malignant cells may be hidden by benign cells or background elements. Thirdly, human experts can only check a small area of cytopathology image probably leading to misdiagnosis. Moreover, most of these previous researches combining DL with cytopathology are hard to integrate into the clinical process, because these systems based on humanly segmented cell patches rather than the original cytopathology images. Due to these difficulties, relevant research about artificial intelligence in cytology is still inadequate, especially about ascites.

As a sign of peritoneal metastasis, malignant ascites generates from the excessive accumulation of effusion within the abdominal cavity unable to be absorbed. Because of many serious complications such as jaundice and ileus, ascites is one of the leading causes of death in patients with advanced gastric cancer. Thus, there is a huge demand for developing DL systems for cytopathology in ascites and abdominal exfoliated cells.

Using combined DetectionNet, ClassificationNet and TL techniques, the DL system we developed was able to achieve accurate and robust diagnosis basing ascites cytopathology images. We adopted Faster R-CNN to implement cell detection. Faster R-CNN has made great contributions to many medical image researches, such as CT [38] and skin disease [39]. We achieved great cell detection performance with the mAP of 0.8316 by resnet18-based Faster R-CNN, and has great potential in solving cytopathology images interpretation.

In several other studies, both the location and the contour of the cell were calculated by segmentation techniques using traditional image processing methods and DL-based models. These studies were able to extract many cell characteristics from the segmented cell. However, the cell segmentation was more difficult to achieve equivalent performance than cell detection. In addition, the subsequent ClassificationNet was able to extract cell features and accomplish cell classification automatically using the original cell patch. Thus, the proposed system was efficient and reasonable.

In recent years, DL achieved surpassing human-level accuracy in natural image classification and was also widely used to classification tasks in medical images. We developed ClassificationNet basing on pre-trained alexnet, vgg16, googlenet, resnet18 and resnet50 models to automatically classify benign and malignant cells in this study. We compared the classification performance of seven ClassificationNets and the resnet50 model achieved the best performance (AUC = 0.8851, Precision = 96.80%, FNR = 4.73%). Actually, Faster R-CNN can achieve cell detection and classification tasks simultaneously, as described in above that it can detect the locations of multiple objects in dataset PASCAL VOC 2007 and MS COCO [19]. In this research, the Faster R-CNN was used only for cell detection, and another ClassificationNet accomplished cell classification. The proposed DL system with this design has many advantages. Firstly, the separation of detection and classification make DL system easier to evolve by retraining with other dataset. In addition, in the further development of classification model to malignant degree with more classes rather than just two classes (benign and malignant), the cell detection model does not need to be retrained.

There were still several limitations to this study. Firstly, all the images used in this study were obtained in the same hospital, thus the proposed DL system might have poor performance for the specimens of other hospitals because of image variations such as color and texture. Actually, the cytopathology images obtained from the same hospital may exist various variations because of the difference between the ascites specimens and the subjectivity of experts (Fig. S5), and the images obtained from different hospital would have greater color variation because of different tissue acquisition methods, slide preparation methods, and imaging conditions. As a result, the generalization ability of DL systems trained by the datasets from only one hospital would be weak. In our further research, we will do our best to make a multi-center study to reduce discrepancies between hospitals, and to make the DL system more robust. Secondly, we excluded cell clusters to establish a more accurate and sensitive diagnosis system in this study, which makes the diagnostic system less applicable to tumor cells that are characterized by cluster. Though our observations showed that the proposed DL systems were able to recognize clustered cells and had the potential to be applied to cell agglomeration (Fig. S6), more detailed studies were needed to solve this problem, including specific data annotations and investigations about cell density. In the following research, we plan to carry out more in-depth investigations on the phenomenon of cell agglomeration.

The proposed system was a typical multidisciplinary integration of AI and healthcare. By embracing AI, many repetitive and rote tasks can be efficiently solved and physicians can have more time to focus on human elements, especially in pathology. In addition, there is a crucial gap in access to pathology services in LMICs which hosts 87% of the world’s population. For example, China needs at least 70,000 additional pathologists comparing to the ratio of pathologists per population in high-income countries [31, 40]. Telepathology techniques based on AI is a promising approach to fill this gap.

In conclusion, we established a novel ascites cytopathology dataset and developed a DL system for cytopathology interpretation with great accuracy. We used DetectionNet and ClassificationNet to achieve cell detection and classification, respectively. The proposed system incorporated TL techniques to increase accuracy and was validated by ascites cytopathology image dataset. Ultimately, our approach aims at demonstrating the clinical utility and value for computer-aided tools in digital pathology, and more studies are required for a prospective clinical evaluation and thorough assessment of the clinical task in question.