Introduction

Eyelid tumors accounts for 5–10% of skin tumors [1, 2], which can be divided into benign and malignant lesions, according to their tissue or cell of origin. In general, 80–90% eyelid lesions appear as benign tumors [3,4,5,6]. However, malignant eyelid lesions can be life threatening, with up to 30% 5-year mortality in sebaceous cell carcinoma patients [7]. Malignant cases include basal cell carcinoma (80–90%), squamous cell carcinoma (5%), sebaceous cell carcinoma (1.0–5.5%), and melanoma (< 1%) [8]. The prevalence and outcomes of various eyelid tumor subtypes vary significantly, due to the different geographic location, genetic background, socio-economic status, and healthcare policies [9]. A complete surgical excision with intraoperative margin control is the standard treatment for malignant lesions, which can reduce the rate of recurrence [8]. Therefore, it is necessary to accurately differentiate malignant lesions from benign at treatment onset for the reduction in the mortality and complications. The diagnosis of eyelid lesions requires a specific expertise and biological or pathological process, which is laborious, time consuming and subject to the experience of the pathologists. There is also a concern that the lack of access to pathology expertise may result in foregoing intraoperative frozen section diagnosis and gross anatomical features for lesion identification. Thus, new efficient approaches are needed to address current limitations.

Over the past few years, artificial intelligence (AI), especially deep learning (DL) [10], has played a major role in the field of medicine, including image recognition [11], auxiliary diagnosis [12], drug development [13], and health care management [14]. In fact, DL based systems have been developed to detect eyelid melanoma and basal cell carcinoma using dermoscopic images [1516] or pathological images [17, 18]. However, they still require pathological examination or dermatoscopical process which are not convenient for screening malignancies among non-experts. Pathological examination is invasive and dermatoscopical process need professional equipment.

Digital photograph is the most commonly used approach to analyze facial data due to the conveniency and intuition. The application of DL based on digital photograph has achieved physician-equivalent classification accuracy in lid position [19,20,21,22] and skin cancer [23, 24]. Moreover, importing photograph into a smartphone can achieve portable and convenient telemedicine-technology [25] to determine whether an emergency medical treatment is required. We meant to provide a noninvasive identification in differentiate malignant eyelid tumors from benign ones without pathological or dermatoscopical process, makes it possible for patients to monitor eyelid tumors and identify malignant ones at early stage. The present study aimed to establish DL models to automatically differentiate benign and malignant eyelid tumors using common digital photographs. Then we compared the performance between DL system with different levels of ophthalmologists.

Methods

Study design and participants

This study was a single centre, diagnostic research with prospective validation. From 1 to 2017 to 30 September 2020, eyelid tumors patients who underwent basic ophthalmic examinations in Beijing Tongren Hospital (Beijing, China) were retrospectively collected. The pictures of eyelid tumor were captured using a digital camera (DSC-F828, Sony, Japan) at the first visit. Patients were asked to look horizontally, and camera positioned in the frontal plane at pupil height, one meter away from the patients. For tumors that cannot expose completely in primary position, we used a medical cotton swab for auxiliary exposure. Photographs were taken in outpatient clinics and inpatient wards, hence the lighting and background of the images were not uniform, indicating the richness and diversity of our datasets. Those who finally underwent tumor resection surgery and had histopathological diagnosis were included in this study. These data were used as developmental dataset, and were randomly divided into two independent datasets with four-fold cross-validation as subject independent manner, and the best model can be chosen with repeated four rounds of training with a development dataset and testing with a validation dataset [26,27,28]. The tumors’ images of a patient will not be split into training and testing datasets at the same time.

To further validate the performance of the DL system, another group of patients were prospectively collected in Beijing Tongren Hospital between 1 and 2020 and 30 June 2021 as the prospective validation dataset. All patients underwent surgical resection and received histopathological diagnosis. The pictures of those eyelid tumors were also captured using the same digital camera. The flow chart of data selection was showed in Fig. 1.

Fig. 1
figure 1

Schematic diagram of this study

In this study, all procedures were conducted in accordance with the Declaration of Helsinki. The Ethics Committee of Beijing Tongren Hospital approved the study. Written informed consent was obtained from each subject.

Image preprocessing and quality control

To improve the DL analysis, we resized the images to a resolution of 256 × 256 pixels before developing the algorithm. In the quality control process, we assessed the image quality and filtered out unqualified images after mask removal. The pixel values of the selected images applied to a linear mapping with a pixel value ranging from (0, 255) to (0, 1). Based on several arguments, such as the readable region ratio, illumination, blurriness, and image contents, pictures with poor quality were also excluded. The tumors on patients’ faces were encircled using polygon tool and annotated based on histopathological diagnosis, and then the regions of tumors were cropped for training the algorithms. The annotation tool is LabelMe (https://github.com/wkentaro/labelme).

Algorithm development

We applied several convolutional neural networks to automatically detect whether the eyelid tumor was benign or malignant. Histopathological diagnosis was used as ground truth. We first compared the performance of some architectures including ResNet-50, ResNet-101, InceptionV3, and InceptionResnetV2 [29,30,31]. Based on the GPU (Graphical Processing Unit) memory and generalization ability, we chose these four types of CNNs mentioned above. We adopted four-fold cross-validation to develop the models and selected the optimal one. To further test the performance of the DL models, we then used the prospective validation datasets. The overview of the deep convolutional neural network-based model training pipeline was illustrated in Fig. 2. All models were developed with Tensorflow 1.10.0 and Keras 2.2.4 on the server with three NVIDIA 1080 GPUs, and were pretrained with imagenet dataset [32]. We fine-tuned the weight of CNNs from the pretrained models which were trained with imagenet dataset, instead of training from scratch. We used several data augmentation methods to enrich the dataset in the training stage, including horizonal flipping, vertical flipping and rotation up to 90°, which could reduce the possibility of overfitting. Because we thought the colour and shape features were the most important characteristics, some other data augmentation methods which might modify the pixel values and appearance were not adopted. The appearance images of eyes were used as the input to discern whether this tumor was benign or malignant. The samples were shown in Fig. 3. The optimization algorithm was SGD (Stochastic Gradient Descent) [33], the default hyperparameters in Keras 2.2.4 were used, at the same time, batch size was 15. Besides, class weight was used to trade off the effect of imbalanced distribution of two classes. Based on the repeated experiments, different epochs were also applied to train the models without underfitting. Early stopping was applied, and if the validation loss did not improve over 10 consecutive epochs [34].

Fig. 2
figure 2

Overview of the deep learning-based system to automatically predict eyelid tumors from digital clinical images

Fig. 3
figure 3

Input samples of benign and malignant tumors

Comparison between human and DL system

Three senior ophthalmologists (with more than 15 years clinical experience), two junior ophthalmologists (with more than 5 years clinical experience), and two medical students were invited to independently diagnose the tumors in the prospective validation dataset. The results of the DL system and histopathological information were not available to any human doctors. We compared the performance between these human ophthalmologists with the DL system.

Statistical analyses

All statistical analyses were performed using Python 3.7.3 (Wilmington, DE, USA) and MATLAB R2016a (https://www.mathworks.com/). We used the accuracy, sensitivity, specificity, and receiver-operating characteristic curve to assess the performance of the DL model. The area under curve (AUC) with 95% confidence interval (CI) was calculated.

Results

A total of 309 pictures from 229 patients with eyelid tumors were retrospectively gathered for the training, tuning, and internal validation of the DL system (Table 1). The mean age (standard deviation, SD) was 49.3 ± 17.5 years old, and 63.76% patients were female. 157 subjects were histopathologically diagnosed with benign tumors, while 72 patients were diagnosed with malignant tumors. The most common malignant eyelid tumor in our datasets is basal cell carcinoma (60/122, 49.18%), followed by sebaceous adenocarcinoma eyelid (32/122, 26.23%), squamous cell carcinoma (12/122, 9.84%), and eyelid melanoma (9/122, 7.38%). The most common benign eyelid tumor is nevus (149/223, 66.82%), followed by cyst (14/223, 6.28%), seborrheic keratosis (13/223, 5.83%), and xanthelasma (10/223, 4.48%). The top two malignant eyelid tumors in our prospective validation datasets are basal cell carcinoma (5/15, 33.33%) and eyelid melanoma (3/15, 20%). The top two malignant eyelid tumors are nevus (7/21, 33.33%) and seborrheic keratosis (2/21, 9.52%). Patients with malignant eyelid tumors were older than those with benign tumors. Seven tumors existed in bilateral eyelids, and twenty-five tumors were located in both upper and lower eyelids. Another thirty sixty pictures of eyelid tumors images from 36 patients were prospectively collected as the prospective validation dataset. Similar age, sex distribution, and tumor location were recognised in the two datasets.

Table 1 Components of the developmental dataset and the prospective validation set

Table 2 showed the performance of different DL models for the detection of eyelid tumors. All eight models reached an average accuracy greater than 0.958 in the internal cross-validation. The average sensitivity and specificity were greater than 0.795 and 0.965, respectively, and the mean AUCs were greater than 0.960. Table 3 showed the performance of these models in prospective validation dataset, the best model reached the accuracy, sensitivity, specificity, and AUC of 0.889 (95% CI 0.747–0.956), 0.933 (95% CI 0.702–0.988), 0.857 (95% CI 0.654–0.950), and 0.966 (95% CI 0.850–0.993), respectively. The ROC and PR curves of these eight models were shown in Fig. 4.

Table 2 Performance of models in the internal validation dataset
Table 3 Performance of models in the prospective validation dataset
Fig. 4
figure 4

Performance of models in cross validation. AB Epoch = 80. C, D: Epoch = 60. Model 1: ResNet101; Model 2: ResNet50; Model 3: InceptionResNetV2; Model 4: InceptionV3; Model 5: ResNet101; Model 6: ResNet50; Model 7: InceptionResNetV2; Model 8: InceptionV3

When comparing the performance between human ophthalmologists and the DL system, we found that DL system reached a similar, and even better diagnostic performance than senior ophthalmologists. In general, DL system performed much better than junior ophthalmologists and medical students (Fig. 5). The features maps of four types of CNNs were shown in Fig. 6, which showed the contour and pixel values were more important.

Fig. 5
figure 5

Performance of models in prospective validation dataset and comparison with human ophthalmologists. A, B Epoch = 80. C, D Epoch = 60. Model 1: ResNet101; Model 2: ResNet50; Model 3: InceptionResNetV2; Model 4: InceptionV3; Model 5: ResNet101; Model 6: ResNet50; Model 7: InceptionResNetV2; Model 8: InceptionV3. Red filled rhombus: Senior Ophthalmologist 1, blue filled rhombus: Senior Ophthalmologist 2, red filled circle: Junior Ophthalmologist 1, blue filled circle: Junior Ophthalmologist 2, pink filled circle: Junior Ophthalmologist 3, red filled triangle: Medical student 1, blue filled triangle: Medical student 2

Fig. 6
figure 6

The feature maps in four types of CNNs

Discussion

In this study, we successfully trained DL models that could automatically identify benign and malignant eyelid tumors from clinical images. Even with a rather small database in the training set, our CNN algorithms had more accurate diagnosis than junior ophthalmologists and medical students, reaching an 88.89% accuracy, 85.71% sensitivity, and 93.33% specificity in the detection of eyelid tumors. The DL system showed a comparable performance with senior experts.

Eyelid tumor is common seen, but it is important but difficult to distinguish benign and malignant tumors, as they sometimes have overlapping features of irregular shapes, irregular pigmentation, and telangiectasia to malignant ones. Benign tumor is most commonly described as a well-demarcated, waxy, pigmented lesion, and developed at a younger age. Malignant tumor may diffuse infiltration to surrounding tissues, damage the orbit and intraorbital tissues, result in loss of lashes, central ulceration and/or destruction of eyelid architecture, distant metastasis may also occur. Our model combines with smartphone may help patients to monitor the malignant eyelid tumors themselves and assist in doctors’ clinical decision making. Lord et al. [35] first proposed the novel use of smartphone in ophthalmic imaging. Detailed use of smartphone-based image applications in ophthalmology was described later by various researchers [36,37,38]. Smartphone-based ophthalmic imaging techniques can be adopted by any clinician to obtain opinions from experts, and portable image documenting.

There are numerous imaging devices in ophthalmic examination, most of which are sophisticated and specialized for specific regions of the eye, which requires close interaction of the patient and the clinician. Therefore, a simple, portable alternative high-quality imaging tool for routine examination is needed. Clinical images are of more convenient, and intuitive compared to dermoscopic images [17, 39] or histopathological images [18, 40, 41] reported previously. Clinical images have been used to evaluate eyelid disorders [22, 25, 42, 43], ocular motility and strabismus [44,45,46,47].

Table 4 Comparison of the current study and previous methods

Recently, Adamopoulos et al. [48] and Li et al. [49] also developed AI system to detect eyelid tumors. The details of comparison with these studies have been listed in Table 4. In brief, Adamopoulos et al. [48] used photographic images with small sample size to distinguish basal cell carcinoma. However, basal cell carcinoma is not the only malignant eyelid tumor so that this model may fail to identify other malignant mass. In addition, this study did not provide some important evaluation metrics including sensitivity, specificity, and AUC. Li [49] et al. also developed DL model to identify malignant eyelid tumors from benign ones with bigger sample size. Before identifying the characteristics of the tumors, they trained model to locate the tumor first with an average precision of 76.2%, which meant about a quarter of mass was wrongly located. Therefore, human ophthalmologists in our study were assigned to precisely delineate the tumors before the implementation of DL in the development stage and real deployment scenario, so that our model showed better performance and this approach might be more appropriate for decision-making in clinic. Also, as an attempt, we used another prospectively collected dataset to compare the performance between model and human ophthalmologists, which proved the model has surpassed most ophthalmologists.

High accuracy and efficiency rates are the main advantages in the application of DL system in medical diagnosis, since the developed algorithms can capture and integrate information in ways in a fraction of time that the human brain cannot perform. Different diagnosis can be found for the same eyelid tumor, due to the dependence of patients’ collaboration and clinician’s experience. DL system enable automatic risk stratification of tumors and can be used as a triaging tool before clinician assessment, which may reduce unnecessary biopsies [50]. Its successful implementation could reduce human error, providing early diagnosis and consequent cost reduction in eyelid tumor treatments [51].

The majority of eyelid tumors represent as benign proliferative ones. Skin tumors involve the eyelid region in up to 10% of cases, with BCCs being the most prevalent among Chinese population [6]. Various factors affecting the incidence of benign and malignant eyelid tumors, including race, geography, and genetics. Suspicions of malignancy should be aroused when clinical signs arise, such as loss of lashes, central ulceration, infiltration, gradual enlargement, loss of sensation, induration, irregular or ‘pearly’ borders, destruction of eyelid structure, telangiectasia. Risk factors such as smoking, history of previous skin cancer, excessive sun exposure, previous radiation and immunosuppression also contribute to the occurrence of eyelid tumors [52,53,54].

In our study, 35.36% eyelid tumors are malignant tumors. According to epidemiological investigations, Sendul et al. [54] found that 87.1% were benign tumor, the left (12.9%) composed of malignant tumors. Xu et al. [55] revealed that 86.2% were benign tumors with data from a same medical center. The difference may be due to the different sample size and that difficult cases are more likely to visit our medical center. Supporting the literatures, malignant eyelid tumors developed at an elder age than the benign tumors [2, 54, 55]. Similar to some previous studies, we also observed a lower eyelid predominance for the malignant eyelid tumors [9, 56]. Prolonged exposure to sunlight seems to be an important predisposing factor of this predominance [57]. No preference of benign tumors in the upper eyelid was observed as described before [6, 53]. Of seven patients with bilateral eyelid tumors in our study, six of them were xanthoma. The similarities and differences in the incidence rates of benign tumors among studies describing eyelid tumors can mainly be attributed to racial and regional factors rather than data bias.

This application available to general public can be used for patient self-examination and in community outreach programs when applying on smartphones, and it also provides support for junior clinicians while documenting cases and following up in a better way. Thereby, it can help to reduce unnecessary biopsies, minimize over diagnosis and other potential harms associated with screening, as well as to improve clinician workload and timely access to specialist care for people requiring urgent attention.

Although the performance of DL system in date analysis is promising, and several studies have also reported CNN algorithms have surpassed the classification efficacy of physician, the real performance of DL system still remains unclear. Future work could focus on the differentiation on subtypes of eyelid tumors, assuring the best classification outcome. Rigorously tests should be performed before implementation, and monitored after the utilization of this technology.

There are some limitations in our study. First, in comparison with skin tumors, eyelashes, eyebrows, and the background color of pupil, sclera, and iris could be the possible confounding factors associated with bias. Second, the classification in this study included only benign and malignant tumors, rather than specific types of tumors. We intend to enrich our dataset of each subtype tumors in order to develop a system with better outcome employed to make specific diagnosis. Third, lesions were identified only from 2-dimensional photographs without any additional clinical information. Combining available data with the algorithm for classification such as clinical images, dermoscopic images or pathological images was proved to have a higher accuracy than single CNN model.

Conclusions

This study proves DL system is a convenient way that can be used in the identification of benign and malignant tumors through common clinical images. Our system has achieved a medical application of AI, with a better performance than most ophthalmologists. Compared with related researches, our study avoids the object detection procedure and reached better classification performance, which is appropriate for clinical use. In the future, combining DL system with smartphone may further enable patients’ self-monitoring for malignancy in eyelid tumors and assist in doctors’ clinical decision making.