1 Introduction

Melanoma is the lethal form of skin cancer [1, 2] and its occurrences are rising exponentially. In 2018, 287,723 new melanoma cases were reported in the world and the number of deaths occurred due to melanoma is 60,712 [3]. In the United States, almost 96,480 new melanomas will be diagnosed and about 7230 people are expected to die of melanoma (almost 4740 male and 2490 female) in 2019 [1]. The severe UV radiation from the Sun accounts for the occurrence of about 90% melanomas [4]. Malignant melanomas always tend to spread into surroundings and very often it goes to adjacent lymph nodes, lungs, and brain. Although the melanoma occurrence rate is very high, the survival rate is almost 100% if it can be diagnosed and treated at its early stage. Therefore, early diagnosis and treatment of melanoma are very essential as well as it has been a demanding area of research. Automated diagnosis of melanoma from dermoscopy images comprises the steps below: a collection of suitable image dataset, pre-processing of input images, segmentation of skin lesion, distinct and effective feature extraction and eventually classification of the image under study.

The collected input dermoscopy images are habitually corrupted by the hair and may also contain some other noises. Dull-Razor algorithm was developed by Lee et al. [5] to remove dark hair from the images. Eshaver [6] is an efficacious method of hair occlusion from skin images which works based on Radon transform. An anisotropic diffusion filter has been employed to smoothen lesion images and to eliminate artifacts properly without altering necessary characteristics from lesion images [7]. Image segmentation separates the skin lesion from the surrounding skin background for extracting features from the lesion. A global thresholding method proposed by Otsu [8], an improved adaptive thresholding method developed by Sforza et al. [9], an iterative stochastic region merging algorithm based on the regional statistics [10], an iterative segmentation algorithm using canny edge detector [11], active contour segmentation algorithms [12], classical Gradient Vector Flow (GVF) developed by Zhou et al. [13] are widely used for skin lesion segmentation.

Many authors proposed many feature extraction algorithms for melanoma diagnosis from dermoscopy images. The most common methods are the ABCD rule [14], seven-point checklist method [15], three-point checklist [16], Menzies Method or CASH algorithm [17]. The most widespread algorithm for detecting melanoma cancer is the ABCD rule [14, 18]. This fundamental evaluation process consists of four main criteria: asymmetry of the segmented lesion (A), geometric properties of the lesion border (B), color variegation of the pigmented lesion (C), diameter of the lesion (D). The features used for ABCD rule are described as follows.

  • Asymmetry (A): Asymmetry specifies the axis of symmetry [19]. The asymmetry measure is generally computed by dividing the segmented region of the lesion into two sub-regions by an axis of symmetry and then determining the non-overlapping region between two sub-regions across the axis. If the two sub-regions are very similar, the lesion is measured as symmetric otherwise, it is considered as asymmetric. Benign lesions contain symmetricity in shape whereas malignant lesions are highly asymmetric.

  • Border (B): It measures the irregularity of the lesion’s border [19, 20]. Benign lesion is considered as of regular and even shaped border while an irregular shaped border is considered for malignant lesions instead.

  • Color (C): This criterion consists of determining the color variegation on the pigmented dermoscopic lesions. According to this criterion, malignant melanoma lesions typically present non-uniform colors.

  • Diameter (D): D is followed as diameter, if the diameter is greater than 6 mm or growing in following 1 month then it is malignant melanoma [18]. Diameter is determined by the major axis length of the best-fitted ellipse [19] or the longest distance between any two points on the lesion’s border [21].

Shape, color and texture features were extracted from skin lesions and three classifiers were used to classify melanomas in [22]. Performances of the features were analyzed by both individually and combined in this paper where sensitivity of 98% and specificity of 70% was achieved for texture feature with the random forest algorithm for 180 images. Kasmi et al. [23] extracted features based on ABCD rule from 200 dermoscopic images and achieved 91.25% sensitivity and 95.83% specificity. In [24], the authors determined 54 features including border irregularity, eccentricity, color histogram, etc. from 173 images (39 melanomas, 14 nonmelanoma and 120 benign images). This paper gave 97.4% sensitivity and 44.2% specificity. Korjakowska [25] extracted shape, color, and texture features and proposed a feature selection algorithm to classify micro-malignant melanoma with a diameter under 5 mm in their initial stage and achieved 90% sensitivity and 96% specificity using Support Vector Machine (SVM) classifier for 200 dermoscopic images. In [26], two classifiers (KNN and ANN) were applied to 172 dermoscopic images where ANN gives better results than KNN. The authors presented an internet-based melanoma screening system using ANN classifier in [27]. ANN classifier is also used in [28,29,30,31] for melanoma classification.

The novelty of this paper lies in feature extraction and segmentation approach. Three new features are proposed with some existing features as distinct and effective for classification of malignant and benign images. A two-stage segmentation method is also implemented in this paper which includes Otsu method and Chan–Vese method. Although ABCD features were used in [22,23,24,25,26], our feature extraction method differs from the related literature. Extracted features from 200 images are used to develop an ANN classifier model and this model is finally tested with 22 new images. The classifier model detects all test images accurately. This automated diagnostic system improves classification accuracy in terms of both sensitivity and specificity. Each of our considered features can detect melanoma lesions with an accuracy of over 72% individually and all features together achieve 98.2% accuracy.

This paper is organized in four sections including the Introduction (Sect. 1). Section 2 describes the proposed approach which includes Image Pre-processing, Feature Extraction, and Image Classification. Section 3 gives the results from the conducted experiments and discussions on the results, and eventually, Sect. 4 concludes the paper.

2 Proposed approach

Figure 1 illustrates a pipeline of the proposed approach comprising the following primary steps: image acquisition, image pre-processing, segmentation, feature extraction, and classification of the image. Input image dataset is collected from PH2 database [32]. To remove hair and other artifacts from the input images, Dull-Razor algorithm [5] and median filter are employed. A two-stage segmentation approach including Otsu algorithm [8] and Chan–Vese model [12] is applied for lesion segmentation. Seven features based on ABCD rule are extracted from the segmented lesion and passed to ANN classifier for classification.

Fig. 1
figure 1

Framework of the proposed approach to detect malignant melanoma

2.1 Image acquisition

Collection of suitable input dataset is important to implement the following steps successfully and eventually develop a consistent and robust automated diagnosis system for melanoma lesions. The input database must contain all types of potential lesion images. In this paper, dermoscopy images were collected from PH2 database which was constructed by a joint research collaboration of the University of Porto and the Dermatology Service of Pedro Hispano Hospital, Portugal [32]. The database is freely available via the internet for all researchers to use it as a benchmark. We collected 200 melanocytic images which are divided into two main groups: 160 benign lesions (include common and atypical nevus) and 40 malignant lesions (or melanomas) from this database. These images were collected in the same conditions through Tuebinger mole analyzer system and contain the following characteristics:

  • 8-bit RGB images.

  • Bitmap format: BMP.

  • Resolution: 768 × 560 pixels.

  • Magnification factor: 20x.

We also collected 22 (12 benign and 10 malignant) new images from some other internet sources to test the proposed system.

2.2 Image pre-processing and segmentation

The human body is usually covered by hair for the whole and contains hairs of various colors, textures. Skin images may contain some other artifacts as well. Pre-processing is performed to remove these noises from input images. Dull-Razor algorithm [5] is employed in this paper to remove dark hair from the images. Dull-Razor algorithm includes grayscale morphological closing operation, bilinear interpolation and adaptive median filtering. To remove the thin hair and other noises like air bubble, noise generated in image acquisition by cameras, etc. median filter of window size 25 is applied in the pre-processing step after Dull-Razor algorithm. It also smoothens out the border of the segmented image in the next segmentation stage. When the window size is too small, the border of the segmented image becomes zigzag, jawed, sharp cornered, which is nearly impossible to analyze for extracting features in the following stage whereas larger window size tends the segmented image to highly smooth, polished bordered image which loses its important details. Therefore, window size should not be too large or too small so that segmented image can be analyzed properly.

Segmentation is one of the most challenging steps due to high intra-class variance in melanoma images and low inter-class variations among melanoma and benign lesions. In this work, a two-stage segmentation algorithm is employed which includes Otsu’s thresholding method [8] in the first step and Chan and Vese [12] model for final segmentation. Otsu’s method converts the grayscale image to a binary image by global thresholding. This algorithm works assuming that the image contains two classes of pixels following bi-modal histogram (foreground pixels which are included in the object of interest in the image and background pixels), it then determines the optimum threshold to separate the two different classes so that their intra-class variance is minimal, or equivalently inter-class variance is maximal. At first, a gray-level threshold value is automatically selected for separating the object from the background based on their gray-level distribution. Then inter-class variance is computed for this threshold value. The value corresponds to the maximum inter-class variance is selected and used as the optimum threshold which separates the image into object and background.

The binary image from the Otsu method is applied as an initial contour to the Chan–Vese model. Chan–Vese model works for minimizing the energy by Eq. (1).

$$F(c_{1} ,c_{2} ,E) = \mu .Length(E) + \nu . Area(inside(E)){\,+\, }\lambda_{1} \int\limits_{inside(E)} {|u_{0} (x,y) - c_{1} |^{2} dx} dy\,+\, \lambda_{2} \int\limits_{outside(E)} {|u_{0} (x,y) - c_{2} |^{2} dxdy}$$
(1)

Here, E is the initial contour, c1 and c2 are specified as the mean value of pixels inside the E and the mean value of pixels outside the E, respectively. u0 stands for the entire image. μ, ν, λ1, λ2 are regularizing parameters, selected by the user to fit a particular class of images. λ1= λ2= 1 and ν = 0 as suggested by the original paper. Implementation results of pre-processing and segmentation of three sample input images are presented in Fig. 2.

Fig. 2
figure 2

a, b Two sample input images, c, d corresponding hair masks, e, f images after hair removal, and g, h their segmented images

2.3 Feature extraction

After the separation of the region of interest, the next step is to extract pertinent and distinct features from the segmented lesion. Seven shape features and one color feature based on ABCD rule are extracted from the lesion image. The four criteria of ABCD can be categorized into two groups: representation of shape properties (A, B, and D), and chromatic information (C). To compute the shape features, some properties have to be analyzed from binary segmented lesion first. Area (A), centroid (x0, y0), perimeter (P), orientation angle (θ), principal axes lengths (D, d) of the best fitted ellipse are extracted from lesion image using image moments. Here, orientation defines the angle between the major axis of lesion’s best-fit ellipse and the x-axis of the coordinate system, and principal axes define the major axis (D) and minor axis (d) of the best fitted ellipse.

2.3.1 Shape features

One of the most essential parameters of melanoma is asymmetry. To determine the asymmetry feature, the segmented image is aligned with the image coordinate system. It is done by translating its centroid into the origin of the coordinate system and then rotating the image by its orientation angle to fit the major axis of the image onto the x-axis. Figure 3 shows a sample aligned segmented image. This aligned image is flipped along the x-axis and non-overlapping region (ΔLx) between the lesion image (L) and the flipped image (Lx) is computed. Asymmetry score across x-axis (A1) is calculated using Eq. (2) and asymmetry across y-axis (A2) is also determined similarly by Eq. (3). Figure 4 presents the steps for computing asymmetry scores from a sample lesion image in details. Malignant lesions are considered more asymmetric than benign lesions.

Fig. 3
figure 3

a Centroid and orientation angle, b aligned with image coordinate (L)

Fig. 4
figure 4

a Flipped along x-axis (Lx), b non-overlapping region between Lx and L, c flipped along y-axis (Ly), d non-overlapping region between Ly and L

$$\Delta L_{x} = L \oplus L_{x} ,\quad \, A1 = \frac{{Area\,of\, \, \Delta L_{x} }}{Area\,of\, \, L}$$
(2)
$$\Delta L_{y} = L \oplus L_{y} ,\quad \, A2 = \frac{{Area\,of\, \, \Delta L_{y} }}{Area\,of\,L}$$
(3)

The other shape features, B1 (Area to perimeter ratio), B2 (Compactness index), B3 (Perimeter multiplied by area), D1 (Average diameter of the lesion), and D2 (Difference between principal axes lengths [33]) are determined by Eqs. (4), (5), (6), (7), and (8), respectively. Malignant lesions always tend to grow larger (greater than 6 mm diameter [18]), therefore, B1, B3, D1, and D2 features are of larger values for malignant lesions. B2 specifies compactness index. It represents the smoothness of the lesion border. Since the most compact shape is a circle, it is considered of compactness 1, and for all other shapes, compactness is deviated from 1 to 0. Melanoma lesion has ragged, uneven, blur, and irregular border and thus its B2 score approaches to zero.

$$B1 = \frac{A}{P}$$
(4)
$$B2 = \frac{4\pi A}{{P^{2} }}$$
(5)
$$B3 = PA$$
(6)
$$\, D1 = \frac{{D1^{{\prime }} + D1^{{\prime \prime }} }}{2} ,\;{\text{where}}\; \, D1^{{\prime }} = \sqrt {\frac{4A}{\pi }} \;{\text{and}}\; \, D1^{{\prime \prime }} = \frac{D + d}{2}$$
(7)
$$D2 = D - d$$
(8)

2.4 Color feature

This feature (C) measures the color variegation in the pigmented dermoscopic images. The score of this feature is represented by the number of colors present in the lesion image. One early sign of melanoma is the emergence of variations in lesion color. The benign lesion contains uniform color while malignant lesions consist of three or more colors. They may even contain five or six colors. To determine the color score in a lesion, six different colors are considered in this paper according to PH2 database [32]. These colors are white, red, light brown, dark brown, blue gray, and black. Thresholds for pixel ranges of red, green and blue channel to constitute these six colors are computed from 200 input images. The threshold ranges in this experiment are given in Table 1.

Table 1 Thresholding of red, green and blue channels for six colors

To determine the color score, RGB segmented image is scanned thoroughly and if the number of pixels of an individual color is greater than 5% of the total number of pixels in that lesion, then this color is considered as present. The color score (C) is estimated as the total number of colors present in the lesion and it ranges from 1 to 6. Two sample images containing color score 1 and 4 are shown in Fig. 5.

Fig. 5
figure 5

Color feature for two sample images

2.5 ANN classifier

After the feature extraction, the next step is to train a classifier model by these extracted features from 200 images. In this paper, a feedforward artificial neural network (ANN) with backpropagation algorithm is employed for classification where output values are provided to ANN during training. The output value is 0 or 1 where 0 stands for benignity and 1 defines malignancy. The structure of the neural network is shown in Fig. 6 which consists of one output layer with eight neurons for eight inputs (A1, A2, B1, B2, B3, C, D1, D2), one hidden layer with 100 neurons and one output layer. Weights and biases are initialized with small random numbers. The outputs of NN are calculated from these weights, biases, and inputs. These outputs are compared with desired pre-defined outputs. Difference between actual output and the desired output is defined as an error signal and is minimized by continuously updating weights and biases. We used Scaled Conjugate Gradient backpropagation algorithm here as the training function, where Gradient descent weight and bias learning function was used to optimize the bias and weight for leaning the data. “Cross entropy” is set as performance function.

Fig. 6
figure 6

Neural network structure

Receiving Operating Characteristic curve or ROC curve of the ANN classifier found from this experiment is shown in Fig. 7. The diagonal line divides the ROC space. Since the points on the curve lie above the diagonal, it represents a good classification.

Fig. 7
figure 7

ROC curve of ANN classifier

3 Results and discussion

The diagnostic system in this work is implemented with MATLAB 2016a. Implementation and results of the system are discussed in this section.

3.1 Testing with new images

The proposed diagnostic system has also been tested with 22 new images (12 benign and 10 malignant) from some other sources. The considered training images are from PH2 database which contains RGB color images with a resolution of 768 × 560 pixels and the test images are resized accordingly to compare with training images. The system is able to diagnose all the test images accurately. Implementation results and feature values of two sample test images are shown in Figs. 8 and 9, respectively. Although some feature values do not lie in the predefined threshold ranges of benign and malignant (A2 in Fig. 8 and A1, B2, C in Fig. 9), ANN is able to classify them correctly.

Fig. 8
figure 8

Implementation result: a a sample benign image [34], b segmented image, c extracted feature values, d detected output

Fig. 9
figure 9

Implementation result: a a sample malignant image [34], b segmented image, c extracted feature values (a), d detected output

3.2 Feature analysis

Considered features are extracted from 200 dermoscopy images in real time. Feature values for ten sample images (five benign and five malignant) randomly selected from PH2 dataset are presented in Table 2. Threshold range of each feature for being melanoma cancer image is calculated from the considered dataset. If the value of a particular image lies in that range, it is identified as the melanoma image. Not all the features from all images give the values which lie in that particular range, hence the decision is made by an artificial neural network. The average value of each feature for benign and malignant images is also calculated from considered 80 benign and 120 malignant images and also given in Table 2.

Table 2 Extracted feature values from ten sample benign and malignant images

Tables 2 and 3 clearly state that melanoma lesions are found to have greater asymmetry scores along both axes (greater than 0.15 for most of the malignant cases) in comparison to benign lesions. Similarly, B1 and B3 contain larger values for malignancy since malignant images are of larger size. In this experiment, for nearly all malignant images B1 and B3 are found ≥ 90 and ≥ 2×108, respectively. B2 specifies the regularity of the lesion border. Therefore, for malignant lesion, this value deviates from 1 to 0 and found less than 0.75 for above 60% malignant images in this experiment. Both diameter scores (D1, D2) are found larger for malignant lesions than benign. For above 70% malignant images, D1 and D2 are found greater than 400 pixels and 90 pixels, respectively. The color score is estimated three or more for most of the malignant images.

Table 3 Average feature values for benign and malignant images

Next thing is to analyze individual parameter response to both benign and malignant cases. This will help to understand the impact of that particular parameter for melanoma images. Figures 10, 11, 12, 13, 14, 15, 16 and 17 show the actual response of eight considered features for benign and malignant images. Graphical analysis of 80 images (40 benign and 40 malignant) has been shown for each feature.

Fig. 10
figure 10

A1 score for 80 lesion images (40 benign and 40 malignant)

Fig. 11
figure 11

A2 score for 80 lesion images (40 benign and 40 malignant)

Fig. 12
figure 12

B1 score for 80 lesion images (40 benign and 40 malignant)

Fig. 13
figure 13

B2 score for 80 lesion images (40 benign and 40 malignant)

Fig. 14
figure 14

B3 score for 80 lesion images (40 benign and 40 malignant)

Fig. 15
figure 15

C score for 80 lesion images (40 benign and 40 malignant)

Fig. 16
figure 16

D1 score for 80 lesion images (40 benign and 40 malignant)

Fig. 17
figure 17

D2 score for 80 lesion images (40 benign and 40 malignant)

It can be seen from the discrete graphs that are from Figs. 10, 11, 12, 13, 14, 15, 16 and 17, lesser the overlap of the points for benign and malignant images, better is the usefulness of the feature for melanoma cancer investigation. B3 and D2 have fewer overlapping points for benign and malignant images as compared to the other features. Almost all the parameters have shown a good response to most of the images in spite of the fact that a portion of the feature values in these figures still overlap. Therefore, it is difficult to compare benign and malignant lesions from these figures due to overlapping and this disarray is overcome by Artificial Neural Network which tests the images for a number of iterations.

3.3 Overall system performance

Performance of the proposed diagnosis system is analyzed with three evaluation parameters: Accuracy, Sensitivity, and Specificity. These three performance parameters are computed by Eqs. (9), (10) and (11), respectively.

$$Sensitivity = \frac{Tp}{Tp + Fn}$$
(9)
$$Specificity = \frac{Tn}{Tn + Fp}$$
(10)
$$Accuracy = \frac{Tp + Tn}{Tp + Fn + Tn + Fp}$$
(11)

Here, Tp (True Positive) specifies the number of malignant lesions correctly classified as malignant. Fp (False Positive) stands for malignant lesions incorrectly classified as benign. Tn (True Negative) determines benign lesions correctly classified as benign and Fn (False Negative) defines the number of benign lesions which are incorrectly classified as malignant. Sensitivity, specificity, and accuracy of the training stage, testing stage, and overall system are shown in a confusion matrix in Table 4. With all the proposed feature, this system is able to classify lesions with 98.2% accuracy and 98% sensitivity.

Table 4 Performance analysis of the proposed approach

System accuracy is also evaluated by F-score which is a combined measure of precision and recall. The F measure (F1 score or F score) is defined as the weighted harmonic mean of the precision and recall of the test. F-score is calculated by Eq. (12) and found 96% in this experiment. This evaluation also shows a satisfactory level of results.

$$F{ - }score = \frac{2Tp}{2Tp + Fp + Fn}$$
(12)

3.4 Individual feature accuracy

The percentages of correctly classified malignant images among all malignant images (sensitivity) and correctly classified images among all the images (accuracy) by each individual feature is also determined by neural network and are shown in Fig. 18 and Fig. 19, respectively. These figures prove that all the considered features are able to classify lesion images with over 72% accuracy. The sensitivity of each feature is also achieved over 72%. Hence, the proposed features give promising results and could be employed for melanoma cancer diagnosis clinically.

Fig. 18
figure 18

Percentage of correctly classified malignant images (sensitivity)

Fig. 19
figure 19

Percentage of correctly classified images among all images (accuracy)

4 Conclusion and future works

Melanoma skin cancer has been rising dramatically for the last few decades, therefore, automated diagnosis of melanoma from dermoscopy images to avoid invasive biopsy has become a dynamic research field. In this paper, seven distinct features associated with shape, size and color properties are proposed to detect melanoma at its early stage based on ABCD rule. Implementation results indicate that the diagnostic system gives satisfactory results for both sensitivity and accuracy. We also applied the extracted features from our considered 200 images to SVM classifier as well as KNN classifier and achieved accuracy of 94% and 92% for SVM classifier and KNN classifier, respectively. Therefore, this system could assist dermatologists in making decision clinically. However, a large amount of data will be considered for further improving training and classification accuracy in future and convolutional neural network will be taken into account for melanoma classification. Also, we intend to modify our system in accordance with the requirements for clinical use.