1 Introduction

Skin cancer is the most common type of cancer seen in humans today. It could occasionally cause a human to pass death [1]. Cancer arises from uncontrollably growing cells, rapidly dividing cells in one area of the body, invading other bodily tissues, and spreading throughout the body. Some causes of skin cancer include doing some bad behaviors such as smoking and drinking alcohol, having certain types of allergies, becoming sick, infecting with viruses, etc. [2, 3]. Also, environmental changes (e.g., changing the atmosphere) can cause this type of cancer. The sun’s ultraviolet (UV) radiation has the power to destroy DNA found inside skin cells. Furthermore, skin cancer can also be the result of abnormal enlargements of the human body [4]. In the United States, approximately 5.4 million new cases of skin cancer are reported annually. Skin cancer can be broadly classified into two types: non-melanoma skin cancers, which originate from cells originating from the epidermis, and melanoma skin cancers, which arise from malfunctioning melanocytes [5].

Melanoma [6] is one of the most malignant malignancies and the third most prevalent type of skin cancer. Malignant melanoma, another name for melanoma, is a condition in which skin pigment-producing cells malfunction and cause colour changes. Melanin granule build-up and its dissemination to the skin’s outermost layer cause the condition [7]. Although melanoma has a high death rate, it is frequently curable when detected early. However, in the early stages of growth, dermatologists find it difficult to differentiate between melanoma and other benign moles. Therefore, earlier detection of skin cancer leads to less intrusive and more successful treatment alternatives. Early detection and treatment of cancer often increase the chances of a successful outcome [8]. Building a non-invasive computer-assisted diagnosis (CAD) system for skin lesion classification is therefore imperative and effective [9]. Furthermore, artificial intelligence (AI) plays an important role in the earlier detection of skin cancer, especially DL and machine learning (ML) [10].

In recent times, artificial neural networks (ANNs), an innovative approach within the realm of AI, have gained significant popularity in various domains, including computer vision, digital image processing, and image classification techniques [11]. ANNs consist of multiple layers of interconnected neurons, known as perceptrons [12]. This technology has attracted an increase in interest due to its ability to mimic cognitive processes and exhibit remarkable performance in tasks related to visual perception and pattern recognition [13]. Convolutional neural networks (CNN) represent an advance in ANN and have shown notable effectiveness in addressing diverse and intricate challenges [14], including image processing, classification [15], and object detection [16]. The efficacy of this approach was validated through assessments involving 21 board-certified dermatologists, using biopsy-proven clinical images. Due to the commendable performance of CNNs, they find extensive applications in various domains of medical imaging. These applications include the classification of lesions, the fusion of magnetic resonance imaging (MRI) images, the diagnosis of breast cancer and tumours, and comprehensive panoptic analysis [17]. In addition, researchers are investigating new technologies and approaches as a result of a better understanding of the value of early detection in enhancing patient outcomes, which is advancing the fields of oncology and dermatology as a whole. The joint endeavors of multidisciplinary groups consisting of dermatologists, computer scientists, and medical experts highlight the deliberate dedication to improving the diagnostic capacities for skin cancer, which will eventually help patients and healthcare systems globally [18]. This paper introduces a novel approach to skin cancer diagnosis, with a focus on early detection. Early detection holds immense importance as it facilitates timely intervention and offers opportunities for less invasive and more effective treatment modalities. Identifying and addressing cancer in its initial stages frequently improves the chances of positive treatment outcomes. The primary contributions of this paper can be summarized as follows:

  • Proposing a novel highly accurate DCNN model, which designed for more precise skin cancer classification.

  • The proposed DCNN model demonstrated superior performance in terms of accuracy, recall, precision, F1-score, specificity, and AUC when compared to five different transfer learning models.

  • The proposed DCNN model was also compared with other previous studies, and our model exhibited enhanced performance when evaluated with two distinct datasets.

  • The proposed DCNN effectively addresses the issue of class imbalance, presenting a promising solution to address this challenge. In addition, its potential extends to further assessment with diverse datasets, suggesting its applicability as an early detection and diagnostic method for various diseases beyond skin cancer.

The rest of this paper is organized as follows: Sect. 2 provides a comprehensive overview of the relevant literature in the field. Section 3 elucidates the materials and methods used in this study. Section 4 presents the proposed DCNN model. Section 5 discusses in detail the experimental results of the proposed method. Lastly, Sect. 6 provides the conclusion and describes avenues for future studies.

2 Related work

This section provides a summary of previous work on skin cancer diagnosis. Therefore, in recent years, numerous researchers [19, 20] have conducted extensive studies aimed at utilizing CNNs for the classification and diagnosis of skin cancer. Jasil et al. [21] introduced an innovative approach to skin cancer classification, combining Densenet and residual network architectures. This novel approach was meticulously evaluated against several prominent transfer learning models, such as Vgg16, Vgg19, and Inception V3. Impressively, their proposed method outperformed these models, achieving a remarkable accuracy of 95%. A sophisticated healthcare system, proposed in [22], employs entropy-based weighting and first-order cumulative moment (EW-FCM) for segmentation, coupled with wide-ShuffleNet for the classification of skin cancer. The system’s performance was evaluated using two distinct datasets: HAM10000 and ISIC-2019, demonstrating superior effectiveness compared to other transfer learning techniques.

Tlaisun et al. [23] introduced an approach for melanoma detection and classification, employing an optimized ResNet model with the Whale Optimization Algorithm (WOA). They achieved an accuracy of 92% using the HAM10000 dataset in their study. In [24], a DL approach was introduced for the classification of skin cancer. The performance of the proposed model was evaluated in the ISIC-2019 and ISIC-2020 datasets. The DCNN model compared to various transfer learning models and achieved the highest AUC score of 96.81%. In 2020, a system was introduced in [25] to detect skin cancer and benign tumors through the application of CNN technology. The CNN procedure, utilizing random regulators, achieved an accuracy of 97.49%, effectively discerning diverse skin lesions such as melanoma, carcinoma, and nevus. Augmentation data from the ISIC dataset was incorporated to differentiate between cases of malignant and benign skin cancer. The resulting model consists of three hidden layers and an output channel, employing various optimizers such as RMSprop, Adam, SGD, Nadam, and others. Significantly, the CNN model using Adam optimizer exhibited the highest accuracy at 99% in dataset classification. These performance outcomes underscore the potential utility of the proposed model as a valuable tool for medical practitioners in aiding the diagnosis of skin cancer. Keerthana et al.[26] presented an integrated approach involving hybrid CNN models and ML for cancer classification; where two transfer learning models, namely DenseNet-201 and MobileNet, are employed for feature extraction. The extracted features are then combined and passed to a Support Vector Machine (SVM) for the classification task. Their methodology is evaluated using the ISBI 2016 dataset that is divided into 85% for training and 15% for testing. This approach achieved the highest accuracy, reaching 88.02%. A Computer-Aided Diagnosis (CAD) system for cancer detection is presented in [27]. The system employs a two-step process: firstly, the application of a median filter and image segmentation using an optimized CNN through Satin Bowerbird Optimization (SBO) to enhance system performance. Second, the classification process is conducted using a SVM classifier. Comparative analysis with previous studies demonstrates superior performance, achieving an accuracy of 95%.

In the same context, the researchers in [28] presented a method for skin lesion classification utilizing ResNet50. The proposed approach was benchmarked against various transfer learning models, such as InceptionV3, Xception, and MobileNet. ResNet50 exhibited the highest accuracy at 90%, surpassing the performance of the other models. Ali et al. [29] introduced an advanced DCCN model for the classification of benign and malignant skin cancer. DCCN was compared with other DL models, specifically AlexNet, ResNet, VGG-16, DenseNet, and MobileNet. The model’s performance was evaluated on the HAM10000 dataset, with the data split into two configurations: first, 70% for training, 20% for validation, and 10% for testing; and second, a 80% training, 10% validation, and 10% testing split. The proposed model, in the second split configuration, achieved the highest accuracy at 91.43%, outperforming other transfer learning models. Popescu et al. [30] proposed a system leveraging DL principles and collective intelligence. They utilized diverse CNN-based models on the HAM10000 dataset, designed to distinguish between various types of skin lesions, including melanoma. The emphasis of their analysis was on preserving a weight matrix in different CNN models, where the elements aligned with the neural network lesion classes. Their system exhibited an improvement in accuracy of around 3%. Le et al. [31] presented an ensemble model utilizing ResNet50, which incorporated focal loss and class weight techniques to handle imbalances within the dataset. Their model achieved an accuracy of 93.00% when evaluated in the HAM10000 dataset. In a study of melanoma skin cancer detection [32], two methods, SVM and CNN, were proposed. The SVM method achieved an accuracy of 86.6%, while the CNN model achieved an accuracy score of 91%. Albahar [33] introduced a skin lesion classification method using CNN, incorporating a novel regularizer based on the standard deviation of the weight matrix of the classifier. This innovative regularizer is embedded in the convolution layer and targets the filter values corresponding to the weight matrix. By constraining these values, the complexity of the classifier is diminished. Their method was tested in the ISIC 2018 dataset, achieving an accuracy of 97.49%. It should be noted that the performance of their proposed model is intricately linked to the selection of an appropriate regularization parameter. Zhao et al. [34] utilized StyleGAN and DenseNet201 to classify dermoscopy images, achieving a classification accuracy of 93.64% in the balanced ISIC-2019 dataset in their experiments. The summary of the empirical literature reviewed on skin cancer classification is shown in Table 1.

Table 1 Summary of empirical literature review

3 Preliminaries

In this section, we present a detailed account of the materials and methods of our proposed skin cancer classification approach. The dataset and preprocessing steps used in this study are also presented, outlining the techniques used to improve the quality and suitability of the data for DL analysis.

3.1 Dataset description

In this study, we assess the proposed architecture using two imbalanced dermoscopy image datasets. This subsection will provide a description of these datasets. Both HAM10000 and ISIC-2019 datasets represent real-world scenarios in dermatology and dermatoscopic imaging by capturing clinical diversity, image variability, diagnostic challenges, and ethical considerations encountered in actual clinical practice. By leveraging these datasets for model evaluation, researchers can assess the performance and generalizability of ML algorithms to address real-world dermatological challenges and supporting clinical decision-making.

3.1.1 HAM10000 dataset

This dataset comprises various skin cancer images, and the use of DL approaches requires a substantial volume of data to produce satisfactory results. Acquiring a diverse collection of skin cancer images poses a significant challenge, and the scarcity of training data is a prominent concern when applying DL algorithms. To address these issues, we utilized the HAM10000 dataset (Human Against Machine with 10,000 training images), which consists of 10,015 dermoscopy images [35]. The HAM10000 dataset consists of images depicting skin lesions, and these lesions are categorized into seven different classes based on their characteristics. These classes include Melanocyticnevi(nv), Melanoma(Mel), Benign keratosis-like lesions(bkl), Basal cell carcinoma(bcc), Actinic keratoses(akiec), Vascular lesions(vasc), and Dermatofibroma(df). The HAM10000 dataset is widely used in dermatology research for skin lesion classification tasks. It represents a diverse range of skin conditions encountered in clinical practice, making it a suitable choice for evaluating dermatoscopic image classification models. In Fig. 1, samples from each class are displayed. The HAM10000 dataset is characterized by an imbalance problem, where certain classes have significantly fewer instances compared to others. This problem is visually demonstrated in Fig. 2, which provides a clear representation of the unequal distribution of data across each class.

Fig. 1
figure 1

Five samples of each class of the HAM10000 dataset

Fig. 2
figure 2

Description of each class in the HAM10000 dataset

3.1.2 ISIC-2019 dataset

The ISIC-2019 dataset [36] is a collection of dermoscopic images from the International Skin Imaging Collaboration (ISIC), used for the task of classifying skin lesions into nine different diagnostic categories. The goal is to train models to automatically categorize dermoscopic images based on their visual characteristics. The dataset comprises 25,331 images, which are divided into eight different classes for training. Each image in the dataset is associated with a specific diagnostic category that indicates the type of skin lesion present.The ISIC-2019 dataset is one of the largest publicly available datasets for skin lesion classification. It encompasses a wide range of skin conditions and provides a comprehensive evaluation platform for dermatoscopic image analysis algorithms. Figure 3 illustrates samples from eight distinct classes within the ISIC-2019 dataset. Furthermore, Fig. 4 highlights the imbalance issue present in the ISIC-2019 dataset.

Fig. 3
figure 3

Five samples of each eight different classes of ISIC-2019 dataset

Fig. 4
figure 4

Description of each class of the ISIC-2019 dataset

3.2 Preprocessing

Medical images often face challenges such as low contrast, blur, noise, artifacts, and diminished colors, underscoring the pivotal role of preprocessing [37]. Serving as a foundational step for achieving highly accurate results, our model incorporates two crucial preprocessing techniques: normalization and image reshaping.

Normalization ensures uniformity in pixel values, preventing certain features from overpowering others. This step is imperative for enhancing the stability of the model and promoting effective convergence during the learning process [38]. By normalizing pixel values, the model becomes less sensitive to variations in image intensity, leading to more reliable and consistent performance across diverse medical images.

Furthermore, all images in the dataset are reshaped to a standardized size of \(64\times 64\) pixels. This resizing not only maintains consistency in image dimensions but also optimizes computational efficiency and memory utilization. Standardizing image sizes is essential for seamless feature extraction, allowing the model to capture relevant information and patterns throughout the dataset [39].

These preprocessing steps collectively play a critical role in improving the robustness and performance of our model. They address the challenges inherent in medical image analysis, ensuring that the model is well-equipped to handle various and complex imaging conditions commonly encountered in the medical field.

3.3 Oversampling

Training classifiers on unbalanced or skewed datasets is a crucial and frequently encountered challenge in classification problems. In such scenarios, the vast majority of samples are labelled as one class, while significantly fewer samples are labelled as the other class, often representing the most critical class. Traditional classifiers, designed to achieve accurate performance across the entire spectrum of samples, are not suitable for addressing unbalanced learning tasks. This is due to their tendency to classify all data into the majority class, which, in these cases, is typically the less important class [40]. One common method of solving imbalanced problems is oversampling. Through the creation of artificial samples or the replication of preexisting ones, it entails raising the number of cases in the minority class. This lessens the effects of class disparity and improves the ability of DL models to detect patterns of the minority class [41]. The most common techniques to address the imbalance issue are random oversampling, synthetic minority oversampling technique (SMOTE), synthetic minority oversampling technique (SMOTEENN), and adaptive synthetic (ADASYN) [42]. We apply random oversampling, a widely adopted technique, to effectively address the class imbalance issues present in both the HAM10000 and ISIC-2019 datasets, which are discussed later in the next section. This technique ensures that the proposed DCNN model receives a more balanced representation of each class during training, promoting fair learning and enhancing the model’s ability to generalize across diverse classes. By implementing Random Oversampling, we aim to improve the robustness and overall performance of our models in dermatoscopic image classification tasks.

3.4 Convolution neural network (CNN)

The Convolution Neural Networks (CNNs) stand out as one of the most remarkable manifestations of ANN architectures. ANNs comprise an extensive network of strategically distributed interconnected computational nodes to assimilate knowledge from input data and enhance the final output [43]. Within CNNs, neurons are integral components that undergo a learning process to optimize their functionality. This characteristic aligns CNNs closely with traditional ANNs, as both aim to iteratively improve their performance through learning. Specifically designed for tasks involving structured grid-like data, such as images, CNNs excel at capturing intricate patterns and hierarchies within input, making them particularly effective in image recognition and computer vision applications [44].

Comprising distinct layers, CNNs encompass convolutional layers responsible for feature extraction, pooling layers dedicated to dimensionality reduction, and fully connected layers designed to synthesize high-level abstractions. This layered structure serves as the foundational framework for the CNN architecture, allowing the network to acquire hierarchical representations systematically from the input images. The convolutional layers employ filters to discern spatial patterns, pooling layers streamline information, and the fully connected layers amalgamate these features to facilitate robust decision making. The ability of CNNs to capture hierarchical patterns and features in data has led to their widespread use in various applications, including image and video analysis, natural language processing, and, in particular, medical image analysis [45]. In the context of medical imaging, CNNs have demonstrated remarkable performance in tasks such as disease diagnosis, lesion detection, and segmentation. Their capability to learn intricate representations from data makes CNNs a valuable tool for extracting meaningful information from complex medical images [46]. Figure 5 shows an example of CNN [47].

Fig. 5
figure 5

Convolutional neural network architecture

4 Proposed deep convolution neural network (DCNN)

The main objective of this study is to develop a novel approach for classifying skin cancer. Figure 6 provides an overview of the model’s workflow. Initially, we selected two public datasets, HAM10000 and ISIC-2019. The input images were resized to \(64\times 64\) pixels and normalized to reduce computational complexity, expedite training, enhance robustness to translation and scaling, and improve convergence through standardized data. Upon scrutinizing the dataset characteristics, we identified an imbalance issue, which we addressed through an oversampling step using the random oversampling technique. This strategic intervention became imperative as the initial accuracy of various models, including transfer models and those developed from scratch, fell significantly. Incorporating oversampling substantially elevated the effectiveness and performance of the DCNN model. The two datasets were divided into \(80\%\) for training the model and \(20\%\) for testing. The architecture of the proposed DCNN model is illustrated in Fig. 7. The DCNN model comprises eight convolutional layers, with every two pairs of convolutional layers followed by ReLU activation functions and batch normalization. The first pair consists of two convolutional layers with 32 filters each, both with a kernel size of 3x3, doubling the number of filters with each pair. Batch normalization helps stabilize and speed up the training process by normalizing the input to each layer. The number of filters in each pair typically increases as we go deeper into the network, allowing the model to learn more complex features. After all convolutional layers, a MaxPooling2D layer is added to reduce the spatial dimensions of the feature maps, leading to a more compact representation and faster computation. This is followed by a flattening layer that converts the 3D feature maps into a 1D vector, preparing the data for input into the dense layers. A dropout layer with a dropout rate of 0.5 is added to reduce overfitting by randomly setting a fraction of input units to zero during training. After that, three dense layers with 256, 128, and 64 units, respectively, are added, followed by ReLU activation functions and batch normalization. Finally, the output layer is added with a softmax activation function, producing class probabilities.

Fig. 6
figure 6

Flowchart of the proposed model

Fig. 7
figure 7

Architecture of the proposed DCNN model on HAM10000

After conducting numerous experiments, the final choice for the hyperparameters of the CNN model was determined. The model is compiled using the Adamax optimizer with a learning rate of 0.001 and categorical cross-entropy loss function, which is suitable for multi-class classification tasks. This selection of optimizer and loss function is intended to efficiently update the model parameters during training while minimizing the difference between predicted and actual class distributions. The model is then trained on the training dataset for 25 epochs with a batch size of 128. Throughout the training process, the model’s performance is assessed on the validation dataset to monitor its generalization ability and prevent overfitting. Additionally, a learning rate reduction callback is employed to dynamically adjust the learning rate during training, potentially enhancing convergence and model performance. Overall, this compilation and training process aims to optimize the model’s parameters and enhance its accuracy in classifying skin cancer images.

Figure 8 provides a visual representation of the inner workings of our DCNN as it processes data from the skin cancer dataset. In this illustration, each of the 32 feature maps corresponds to a unique set of extracted features from a single randomly selected image. These features are crucial for the network to discern patterns and characteristics indicative of skin cancer during the initial convolutional layer. By displaying these feature maps, we gain insight into how the DCNN detects and highlights relevant information within the input images, aiding in the accurate identification of potential cancerous lesions. Grad-CAM utilizes gradients from the final convolutional layer to generate a rough localization map, emphasizing crucial regions within an image for predicting a target concept. This technique allows us to visually confirm the areas where our network focuses its attention, ensuring it identifies pertinent patterns in the image and activates accordingly around them. Figure 9 shows the Grad-CAM on a random sample of the test set.

Fig. 8
figure 8

Illustration of 32 feature maps for a random sample input image of the skin cancer dataset

Fig. 9
figure 9

Grad-CAM on random one sample test image

5 Experimental results and discussions

This section presents a thorough explanation and in-depth analysis of the results, supporting the proposed DCNN model’s performance and resilience in the classification of dermoscopy images associated with skin cancer. The results and conclusions of the performance of the proposed DCNN model on various datasets are reported methodically.

5.1 Performance metrics

The performance of the proposed DCNN model was evaluated using sex evaluation metrics including accuracy, recall, precision, F1-score, specificity, and AUC. The incorporation of these metrics provided a nuanced understanding of the DCNN’s performance across different facets, ensuring a comprehensive evaluation of its effectiveness in skin cancer classification.

Accuracy: The ratio of correctly predicted instances to the total instances, provided an overall measure of the model’s correctness. The accuracy [48] is calculated using the following Eq. (1).

$$\begin{aligned} Accuracy=\frac{TP + TN}{TP + TN + FP + FN} \end{aligned}$$
(1)

Recall: Also known as sensitivity or true positive rate, gauged the model’s ability to correctly identify positive instances among the actual positives [49]. It is calculated using Eq. (2).

$$\begin{aligned} Recall=\frac{TP }{TP + FN} \end{aligned}$$
(2)

Precision: The ratio of correctly predicted positive observations to the total predicted positives, assessed the reliability of the model’s positive predictions [50]. Precision defined using this Eq. (3).

$$\begin{aligned} Precision=\frac{TP }{TP + FP} \end{aligned}$$
(3)

F1-score: F1-score, the harmonic mean of precision and recall, offered a balanced measure that considered both false positives and false negatives [51]. It is derived using Eq. (4)

$$\begin{aligned} F1-score = 2 \times \frac{\text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$
(4)

Specificity: Specificity, the true negative rate (TNR), also known as the inverse of recall, is calculated as the proportion of accurately classified negative samples relative to the total number of negative samples [52]. It is derived using Eq. (5). To compute the specificity for multiclass classification, the specificity is calculated for each class separately and then the average of these specificities is computed.

$$\begin{aligned} TNR=\frac{TN }{TN + FP} \end{aligned}$$
(5)

Area under the ROC curve (AUC): AUC metric, commonly used in binary classification tasks, can also be extended to multiclass classification scenarios. In multiclass classification, the AUC evaluates the performance of a model’s ability to rank instances of different classes. This is typically achieved by employing either the One-vs-All (OvA) or One-vs-One (OvO) strategy to generate multiple binary classification tasks. It is derived using Eq. (6) [53].

$$\begin{aligned} AUC=\sum TB +\sum \frac{TN }{P+ N}, \end{aligned}$$
(6)

where TP is the number of instances that are correctly predicted as positive. TN is the number of instances that are correctly predicted to be negative. While, FP represent the number of instances that are incorrectly predicted as positive. FN represent the number of instances that are incorrectly predicted as negative.

5.2 Analysis of DCNN model for HAM10000 dataset

This subsection presents the results of the proposed DCNN model for the HAM10000 dataset. In addition, it presents a comparison among other transfer learning models and previous studies and related work.

In medical diagnosis tasks, rare diseases might have significantly fewer examples compared to more common ones. This class imbalance can lead to models being biased towards the majority class, resulting in poor performance, particularly in terms of recall and precision for the minority class. To demonstrate the effectiveness of our model in addressing this problem, Table 2 presents the results of the proposed model compared to other compared DL models without the application of the random oversampling method. It reveals that all models achieved low performance in terms of accuracy, recall, precision, and F1-score. Particularly, the proposed DCNN achieved the lowest performance with values of \(66.50\%\), \(66.50\%\), \(69.58\%\), and \(63.33\%\) for accuracy, recall, precision, and F1-score, respectively.

Table 2 The performance on imbalanced dataset

We tackled the imbalance problem by applying four different resampling techniques including random oversampling, SMOTE, SMOTEENN and adaptive synthetic (ADASYN) sampling. Table 3 illustrates that the random oversampling method significantly improved the performance of our model compared to other resampling methods. Our model achieved its highest performance with the random oversampling method compared to other resampling techniques.

Table 3 The performance comparison between the proposed DCNN model with different resampling methods

The DCNN model exhibited exceptional performance during training, achieving an impressive training accuracy of \(99.98\%\) with a corresponding training loss of 0.0029. Even during the testing phase, the model maintained a high accuracy of \(98.50\%\), with a test loss of 0.067. The performance curve shown in Fig. 10 visually depicts the trends in accuracy and loss between the training and testing data, illustrating the model’s robustness. Additionally, Fig. 11 presents the DCNN model’s performance on the HAM10000 dataset using a confusion matrix, providing insights into its classification capabilities. In a comparative evaluation against five established DL models—VGG16, VGG19, DenseNet121, DenseNet201, and MobileNetV2—the proposed DCNN model surpassed them in terms of recall, precision, and f1-score, average specifity, and AUC, achieving scores of \(98.51\%\), \(98.56\%\), \(98.48\%\), \(99.73\%\), and \(99.92\%\) respectively. These results underscore the superior performance of our proposed model compared to the selected transfer learning models, as detailed in Table 4.

Fig. 10
figure 10

Performance curve based on accuracy and loss per epoch of our proposed DCNN model with HAM10000 dataset

Fig. 11
figure 11

Confusion matrix of the proposed DCNN model with HAM10000 dataset

Table 4 The performance comparison between the proposed DCNN model and transfer learning models with HAM10000 dataset

Furthermore, the performance of the proposed DCNN model was compared with other recently published studies on the diagnosis and detection of skin cancer using the HAM10000 dataset in Table 5. The proposed DCNN model was compared with these studies in terms of accuracy. In [54], a hybrid CNN technique was introduced for skin cancer classification using Densenet and the residual network. The model compared with Vgg16, Vgg19, and Inception V3. It outperformed these transfer learning models and achieved accuracy, recall, precision, and f1-score of \(95.0\%\), \(92.0\%\), \(97.0\%\), and \(94.0\%\), respectively. In [55], authors proposed a DL method for cancer classification. The proposed method achieved an accuracy of \(82\%\). In [56] a VGGNET-16 model was presented to classify skin lesions and achieved high accuracy of \(85.62\%\).The study in [57] proposed a modified MobileNet architecture to classify skin lesions. The proposed approach achieved higher performance results compared to the original MobileNet. It achieved accuracy, Sensitivity, specificity, and f1-score of \(83.23\%\), \(87\%\), \(85\%\), \(82\%\), respectively. In the study conducted by Garg et al. [58], a multiclass skin cancer classification method using ResNet was introduced. The reported accuracy of this method was \(90.51\%\).. In the study by Ibrahim et al. [59], an improved technique was proposed to fine-tune the EfficientNetB3 model, resulting in a high accuracy of \(91.6\%\). In [60], proposed the EfficientNet B4 model to classify skin cancer. This model achieved an accuracy of \(87.90\%\). n the work presented by Nugroho et al. [61], a skin lesion classification approach using CNNs was introduced, achieving an accuracy of \(78\%\). According to the results presented in Table 5, the proposed DCNN model demonstrates superior classification performance compared to other models. Outperforms all comparison methods in various evaluation metrics.

Table 5 The performance comparison between the proposed DCNN model and previous studies with HAM10000 dataset

While Table 6 shown the computational performance of proposed model compared to other DL models in term of total number of parameters, inference time, and Floating Point Operations(FLOPS). the metric Parameters represent the total number of weights and biases in the network. It’s a crucial factor as it directly affects the memory requirements for storing the model. while inference time is the time it takes for the network to process a single input and produce an output. FLOPS are the number of floating-point operations required for inference. It provides an estimation of the computational workload of the network. While VGG16 and VGG19 are moderately sized models with comparable inference times, DenseNet models exhibit fewer parameters but longer inference times (8.4 and 5.2 units). MobileNetV2 stands out as lightweight, with fast inference times (3.2 units) but lower FLOPs (0.05 billion). In contrast, the proposed DCNN This has a much larger number of parameters (68,326,247) compared to the other transfer learning models. It also takes a considerable amount of time for inference (7.4 units) and performs the highest number of FLOPS (9.7 billion). This indicates that the proposed DCNN is likely a deeper and more complex model compared to the other compared models, resulting in higher computational requirements. However, in the realm of skin cancer diagnosis, the foremost concern remains accuracy rather than computational efficiency. Thus, while computational intensity is a consideration, our focus centers on achieving the highest levels of diagnostic accuracy.

Table 6 The computational preformance of prposed model

5.3 Analysis of DCNN model for ISIC-2019 dataset

The results of the proposed DCNN model on the ISIC-2019 dataset are detailed in this subsection, along with a comparative analysis against other relevant studies. The proposed DCNN model demonstrated outstanding performance on the ISIC-2019 dataset, achieving an impressive training accuracy of \(99.96\%\) and a training loss of 0.0034, while in testing it showed an accuracy of \(97.11\%\) and a test loss of 0.14. The performance curve, depicted in Fig. 10, provides a visual representation of the accuracy and loss trends between the training and testing data. Furthermore, Fig. 11 illustrates the DCNN model’s performance for the ISIC-2019 dataset using the confusion matrix. Table 7 presents the performance results of the proposed DCNN model compared to other transfer learning models on the ISIC-2019 dataset. The evaluation includes accuracy, sensitivity, specificity, precision, and F1-score metrics. Our DCNN model demonstrated superior performance, achieving the highest accuracy, recall, precision, F1-score, specificity, and AUC of \(97.11\%\), \(97.12\%\), \(97.09\%\), \(97.08\%\), \(99.61\%\), and 99.82 respectively.

Table 7 The performance comparison between the proposed DCNN model and transfer learning models with ISIC-2019 dataset

Furthermore, Table 8 compares the performance of the proposed DCNN model with other recently published studies on skin cancer diagnosis and detection using the ISIC-2019 dataset. In the study by [36], a skin lesion classification approach using CNN was introduced. This approach achieved accuracy, sensitivity, specificity, and precision scores of \(94.92\%\), \(79.8\%\), \(97\%\), and \(80.36\%\), respectively. In [65], an image classification and segmentation technique using DL was proposed. The performance of the proposed technique was assessed on ISIC 2018, 2019, and 2020 datasets. The proposed model was compared to GoogleNet, DenseNet-201, ResNet152V2, EfficientNetB0, RBF-support vector machine, logistic regression, and random forest. The proposed model outperformed the other methods, achieving an accuracy of \(94.59\%\), \(91.89\%\), and \(90.54\%\) in ISIC 2018, 2019, and 2020 datasets, respectively. In [66], an Inception-V3 model was presented for the classification of skin cancer, achieving an accuracy rate of \(96.40\%\). In [67], the authors introduced a CNN model utilizing three datasets, namely SIC-2016, ISIC-2019, and PH2. The model’s performance on ISIC-2019 was reported as\(96.7\%\), \(95.1\%\), \(96.3\%\), and \(97.1\%\) in accuracy, precision, sensitivity, and specificity, respectively. In [68], a hybrid model combining VGG19 and SVM was introduced, achieving an accuracy of \(96\%\). Also in [69], a combination of CNN and VGG16 was proposed and the model was compared to ResNet50, InceptionV3, AlexNet, and VGG19. The proposed model outperformed other transfer learning models, achieving an accuracy of \(96.91\%\). There are similar studies [70, 71] addressing imbalance problems effectively to diagnose various diseases. but we do not add them in compression because they work on different datasets.

Table 8 The performance comparison between the proposed DCNN model and previous studies with ISIC-2019 dataset

6 Conclusion and future work

Skin cancer is a serious health risk characterised by unchecked development of skin cells as a result of DNA damage, which is frequently caused by UV radiation from the sun or other artificial sources. Skin cancer is dangerous because it can spread to other parts of the body and have serious consequences. Therefore, early detection and intervention are crucial, as advanced stages of skin cancer can be more difficult to treat. In dermatological diagnostics, AI technologies, particularly CNNs, are making great progress. CNNs are excellent at handling and interpreting intricate visual input, making them suitable for jobs such as pattern detection and image classification. CNNs can be trained on large datasets of dermatoscopic images to identify benign and malignant lesions with exceptional accuracy. In this research, we proposed an effective DCNN approach for the classification of skin lesions. We used two large dermatoscopic image datasets, namely HAM10000 and ISIC-2019. The selected datasets suffer from an imbalanced problem. We handle this issue in our study using the oversampling technique. The performance of the DCNN exceeded that of all comparable models, as assessed through various evaluation metrics. Compared to other state-of-the-art models, including VGG16, VGG19, DensNet121, DensNet201, and MobileNetV2, the proposed DCNN model demonstrated superior performance across multiple criteria. The evaluation metrics used encompassed essential measures such as accuracy, recall, precision, F1-score, specificity, and AUC. The results indicate that the DCNN model proposed in this study outperforms other DL models applied to the same datasets. Specifically, the proposed DCNN model achieved scores of \(97.11\%\), \(97.12\%\), \(97.09\%\), \(97.08\%\), \(99.61\%\), and 99.82, respectively, with the HAM10000 dataset, while achieving the highest accuracy, recall, precision, F1-score, specificity, and AUC of \(97.11\%\), \(97.12\%\), \(97.09\%\), and \(97.08\%\), respectively, with the ISIC-2019 dataset. Furthermore, compared to several recent studies utilizing these datasets, the proposed model exhibits superior performance, particularly in terms of accuracy.

In our future endeavours, our goal is to advance our skin cancer classification framework by expanding our datasets to include a more extensive collection of labelled skin lesions. This expansion will facilitate the training of a more robust and successful Deep Neural Network (DNN). Our focus will be on refining pre-processing steps to enhance the model’s predictive capabilities and achieve optimal classification accuracy. Additionally, we plan to employ optimization algorithms for hyperparameter tuning, a crucial step in maximizing the performance of our DCNN. The exploration of various optimization techniques holds the promise of further elevating the model’s efficacy. Additionally, we recommend integrating image segmentation as the initial step within our framework, as it is likely to yield more promising solutions. These considerations highlight limitations in our current model that could be further improved. To validate the real-world applicability of our proposed framework, we envision conducting a comparative study with dermatologists. By benchmarking our model against expert human assessments, our aim is to provide valuable insights for healthcare institutions. This information can guide decisions about when and how to integrate our framework into clinical practices, serve as a reliable second opinion, or potentially replace certain aspects of human involvement. Furthermore, the proposed model could be further evaluated to address other challenges present in medical imaging, such as variations in image quality, lighting, and artifacts. This is because, in this study, many pre-processing steps for data, such as noise and artifact removal and dataset quality enhancement, were neglected, yet the model still achieved promising results. Therefore, it is anticipated that the model will be valuable in addressing other challenges in medical imaging.