1 Introduction

Poxviruses are complex viruses with large double-stranded deoxyribose nucleic acid (DNA), and they cause disease in humans and many other types of animals [1]. Poxviruses are from the family Poxviridae, where the latter is subdivided into two different subfamilies: Chondropoxvirinae (poxviruses of vertebrates) and Entomopoxvirinae (poxviruses of insects) [2, 3]. In the subfamily Chondropoxvirinae, the orthopoxviruses genus that has infected humans include variola (smallpox) [4], monkeypox, and cowpox [1]. Smallpox was eradicated globally in 1979, while human cases of monkeypox and cowpox are still being reported to this day.

The rising number of monkeypox cases reported in countries where the disease is not endemic has alerted the global medical community. The disease is now considered of global public health importance as it spreads worldwide [5, 6]. The symptoms of monkeypox often include fever and rash, similar to smallpox [7]. These similarities are important for our diagnostic study.

Monkeypox, as the name implies, was a viral disease known at one time only to occur in monkeys [5, 6]. However, the disease was first identified in humans in 1970 in the Democratic Republic of Congo [8, 9]. Since then, monkeypox outbreaks have mostly been confined to countries in West and Central Africa, such as Nigeria, the Central African Republic, Liberia, Sierra Leone, and Cameroon. The first outbreak of monkeypox outside of Africa was in the USA in 2003. In 2022, multiple cases of monkeypox have been identified in several regions of the world, including non-endemic countries.

The monkeypox virus is spread through physical contact with the blood, bodily fluid, or lesion material from infected individuals or contaminated materials. Hence, maintaining good personal hygiene such as frequent washing of hands with soap and water, regular washing of clothes and social distancing can prevent disease transmission. Besides human-to-human transmission, animal-to-human transmission may also occur by a bite or scratch from an infected animal, eating the uncooked meat of an infected animal, or handling infected animals [10].

There are no proven treatments or vaccines for monkeypox infection but the effectiveness of two doses of the Jynneos vaccine is about 87% [11]. However, studies on the smallpox vaccine have shown it to provide some protection against other orthopox virus varieties, including monkeypox. In addition, approximately 85% of individuals previously vaccinated against smallpox were found to be resistant to the monkeypox virus [12, 13].

Computer vision models are very useful due to their high classification performances [14, 15]. Especially, patch-based models, for instance, vision transformers (ViT) [16], multilayer perceptron mixer (MLP-mixer) [17], and ConvMixer [18] have attained high classification performances. Several biomedical image classification models have been proposed which have significant advantages. In the COVID-19 pandemic, variable computer vision-based COVID-19 detection models have been proposed. We have proposed a new framework to detect monkeypox disorder using images in this research. Moreover, our main objective is to handle the time complexity problem of the fixed-sized patches-based feature extraction problems, since the optimal size of the used patches is a nondeterministic polynomial problem. Thus, we have used nested patches, as discussed following.

1.1 Literature Review

Skin image classification is one of the most popular topics in machine learning. Especially advances in artificial intelligence and image-based diagnosis and detection approaches offer important solutions in the field of healthcare. Table 1 summarizes the studies on machine learning-based classification of poxviruses.

Table 1 Poxvirus classification with machine learning using skin images

Ahsan et al. [19] collected the data for their research from Google. They performed two tests. The first test compares monkeypox and chickenpox. In the second test, monkeypox and others were compared. In this test, monkeypox data was subjected to data augmentation. Sitaula and Shahi [20], tested the data collected by Ahsan et al. [19]. Sitaula and Shahi combined 13 pre-trained deep network architectures using majority voting and obtained an ensemble model. Islam et al. [22] created their dataset consisting of a total of 804 images and 6 classes. In the test process, they applied the 5-fold CV strategy and achieved an average classification success of 83%. Sahin et al. [23] used an open-access dataset and applied binary classification. Abdelhamid et al. [25] performed binary classification with this method and achieved a classification success rate of over 98%. Bala et al. [26] created their own dataset with four classes. They tested this dataset with the hold-out CV method and obtained 98.91% accuracy in the test process with the augmented method. Ozsahin et al. [27] performed binary classification with a self-designed CNN. Yasmin et al. [28] classified monkeypox and others using an open-access dataset. In the research, they used their self-designed CNN method and increased the data with data augmentation method. As a result of this process, it achieved 100% classification success.

1.2 Motivation and our model

Machine learning and deep learning methods are useful in assisting researchers and medical professionals in the diagnosis and detection of COVID-19 cases [29,30,31]. Hence, we believe these methods could similarly be applied to monkeypox detection. In this paper, we propose to use machine learning for fast and accurate detection of monkeypoxes from imagery.

The dataset contains images of healthy skin and images from 4 different viral diseases that cause rash and blisters (monkeypox, chickenpox, smallpox, and zoster zona) obtained from the public database. In this work, we have proposed a novel deep feature engineering model to discern these classes automatically. Our model uses multiple nested patches of different initial sizes (14 × 14, 16 × 16, 28 × 28, 32 × 32, 56 × 56, and 112 × 112). The two deep learning features used (these are obtained from pretrained DenseNet201) are global average pooling and fully connected layers. Using six patch types and two feature extraction functions, 12 (= 6 × 2) feature vectors of various sizes are extracted. Applying three selectors and SVM, 36 (= 12 × 3) predicted vectors are used as input of the IHMV. The IHMV then generates 34 (= 36 - 2) voted predicted vectors and the predicted vector with the highest accuracy.

1.3 Contributions and novelties

The novelty and contributions of this research are given below.

  • Use of a new image dataset of four viral diseases that cause rash and/or blisters (monkeypox, chickenpox, smallpox, and zoster zona).

  • A novel monkeypox image classification model is developed using a deep feature engineering model.

  • A new multiple-patch division model is being proposed.

  • We have also proposed a self-organized framework that automatically selects the best-performing model.

2 Materials and Methods

2.1 Dataset

Nine hundred and ten skin images were collected from the web and classified into five categories. The categories are (i) monkeypox, (ii) chickenpox, (iii) smallpox, (iv) healthy, and (v) zoster zona. Skin images of smallpox, chicken pox, and zoster zona were included because these diseases cause similar skin afflictions (rash and/or blisters) to monkeypox. The collected images were stored as JPG and PNG and were of various sizes. The number of images in each category is tabulated in Table 2.

Table 2 Number of images in each category of the skin dataset

Representative skin images from each category are shown in Fig. 1.

Fig. 1
figure 1

Representative images of the collected dataset: (a) Monkeypox, (b) Chickenpox, (c), Smallpox (d) Healthy, (e) Zoster zona

2.2 Proposed image classification model

In this work, we have proposed deep feature engineering, with the feature extraction capability of the patch-based models. For a lightweight model, DenseNet201 [32], a popular pretrained CNN [33], was deployed. Global average pooling and fully connected layers of this CNN were used to extract deep features. Six nested patch divisions were utilized to evaluate the features and to find the most appropriate patch size. Three commonly used feature selection functions were used: NCA [34], Chi2 [35], and ReliefF [36]. From this, 36 transfer learning-based feature engineering models were obtained and classified by deploying an SVM [37, 38] classifier. IHMV was employed to obtain voted results, with the best result chosen using a greedy algorithm. Since this model is implemented with variable-sized nested patch divisions and DenseNet201 for image classification, we have named it MNPDenseNet, where the MNP stands for multiple nested patches. The block diagram of the proposed MNPDenseNet model is depicted in Fig. 2.

Fig. 2
figure 2

Block diagram of proposed MNPDenseNet model with input imaged dataset

The schematic in Fig. 2 demonstrates that our proposed MNPDenseNet is a self-organized architecture, and it can automatically select the best result from the generated 70 predicted vectors. As shown in Fig. 2, an end-to-end training approach was not used in this research. Instead, the DenseNet-201 architecture was chosen as the feature extractor. In this way, the network was not retrained and the weights of the network were not computed. In the model, the image was first divided into 6 patches and two separate feature vectors were obtained for each patch using 2 layers of the DenseNet-201 architecture. In this way, 12 feature vectors were obtained. The ensemble feature selection approach is used in the model. We favored 3 different feature selectors. These are NCA, Chi2, and ReliefF, respectively. A total of 36 (=3 feature selectors x 12 feature vectors) selected feature vectors were obtained from each feature vector using these feature selectors. In the classification phase of the model, the 36 selected feature vectors were separately classified using the SVM algorithm, resulting in 36 classification results. These results are the prediction vectors. In the final stage of the model, the results of these prediction vectors were combined using the IHMV algorithm and the best classification result was obtained using the Greedy algorithm. The MNPDenseNet architecture contains six layers. which are:

  1. (i)

    preprocessing,

  2. (ii)

    feature extraction,

  3. (iii)

    feature selection,

  4. (iv)

    classification,

  5. (v)

    majority voting, and

  6. (vi)

    selection of best result.

The details of these six layers are explained below.

  • Preprocessing: The multiple nested patch (MNP) division was applied to the original image. The original image was first converted to a 224 × 224 dimension before six nested patch divisions (14 × 14, 16 × 16, 28 × 28, 32 × 32, 56 × 56, and 112 × 112) were used to create six patch categories. These patch categories were then entered into the pretrained DenseNet201. The pseudocode of our MNP-based preprocessing layer is depicted in Algorithm 1.

Algorithm 1.
figure a

The proposed MNP-based preprocessing method

By using these six sizes (initial sizes are 14 × 14, 16 × 16, 28 × 28, 32 × 32, 56 × 56, and 112 × 112), six patch categories resulted, which contain 16, 14, 8, 7, 4, and 2 non-fixed sized patches, respectively. By deploying these patches, feature extraction based on pretrained DenseNet201 has been processed. In this section, the first step of the proposed architecture has been given.

  • Step 1: Apply Algorithm 1 to images for creating patches

In this phase, facial areas of images are segmented and the cropped/segmented images are converted to grayscale images. The obtained grayscale images are resized into 224 × 224-dimensional images.

  • Feature extraction: We used two layers of a pretrained network to extract deep features, and incorporated a pretrained DenseNet201 as a feature extractor. The DenseNet201 was trained on ImageNet1K [33]. We used DenseNet201 feed-forwarded, and we did not use the Softmax layer. Outputs of two layers have been utilized as feature vectors: global average pooling (GAP) and fully connected (FC) layers. By using the GAP layer, 1,920 features have been extracted from the GAP layer, and 1,000 features have been extracted from the FC layer. By using two layers, two deep feature extractors have been obtained from pretrained DenseNet201. The GAP layer calculates the average value for each channel, summarizing the features of the image, and capturing general patterns and characteristics in the images, such as the type and prevalence of objects or patterns. These features are then processed by the FC layer, which is responsible for further refinement and classification, with each of the 1,000 features being essential for accurate image classification tasks, representing specific classes or object types. These features play a fundamental role in our deep learning model, enabling it to learn from the data, recognize important patterns, and accurately classify images. They serve as critical information for the success of image classification tasks. The graphical outline of the DenseNet201-based feature extractors is summarized in Fig. 3.

Fig. 3
figure 3

Outline of the DenseNet201-based feature extractors using GAP and FC layers. Herein, transition layers have been denoted using Trans Layer and maximum pooling is demonstrated using Max Pooling

  • Step 2: Extract features from the created patches in the preprocessing layer by deploying the FC layer of the pretrained DenseNet201.

$${c}_{j}^{k}=dfc\left({p}_{j}^{k}\right), k\in \left\{\mathrm{1,2},\dots ,6\right\}, j\in \left\{\mathrm{1,2},\dots ,n\right\}, n\in \left\{\mathrm{16,14,8},\mathrm{7,4},2\right\}$$
(1)

In this step, features are extracted from the created patches using the Fully Connected (FC) layer of the pretrained DenseNet201. This layer, represented by \(c\), acts as a feature extractor and is defined as the function \(dfc(.)\). It generates 1,000 features from each patch. The notation \({c}_{j}^{k}\) represents the feature vector extracted from the FC layer for the patch \({p}_{j}^{k}\), where \(k\) varies from 1 to 6, \(j\) varies from 1 to \(n\) and \(n\) can take on values 16, 14, 8, 7, 4, or 2.

  • Step 3: Apply the GAP layer of the pretrained DenseNet201 to the generated patches.

    $${g}_{j}^{k}=dgap\left({p}_{j}^{k}\right)$$
    (2)

In this step, the generated patches are further processed by applying the Global Average Pooling (GAP) layer of the pretrained DenseNet201. The features extracted through this process are denoted as \({g}_{j}^{k}\). Each \({g}_{j}^{k}\) feature vector has a length of 1,920, and \(dgap(.)\) represents the function used for GAP-based feature extraction.

  • Step 4: Create features using the concatenation function.

    $${F}^{2k-1}=concat\left({c}_{1}^{k}, {c}_{2}^{k},\dots {,c}_{n}^{k}\right)$$
    (3)
    $${F}^{2k}=concat\left({f}_{1}^{k}, {f}_{2}^{k},\dots {,f}_{n}^{k}\right)$$
    (4)

In the final step, features are created by concatenating the feature vectors obtained from the FC layer and the GAP layer. There are two sets of features created: \({F}^{2k-1}\) and \({F}^{2k}\). These merged feature vectors are represented by \(F\). The function \(concat(.)\) is used for concatenation. Altogether, a total of 12 feature vectors are generated through this process.

  • Feature selection: We used three common feature selection functions: neighborhood component analysis (NCA), Chi2, and ReliefF feature selectors to investigate the feature selection abilities. The brief explanations of these selectors are given below.

  • NCA: It is a distance-based feature selection function. Therefore, it is named the feature selection version of the kNN classifier. It uses L1-norm distance (Manhattan distance) and stochastic gradient descent optimizer to compute the weights of the features, and these features are non-negative features. By using the computed weights, features have been qualified [34].

  • Chi2: It is a statistical-based feature selector, and indices of the sorted weights have been used for a Chi2 statistical function. The Chi2 selector is one of the fastest feature selectors described in the literature. Thus, we used the Chi2 feature selection function in our architecture [35].

  • ReliefF is a developed version of the Relief feature selection function and generates weights like NCA, but ReliefF generates both positive and negative weights. Negative weighted features are redundant features, according to ReliefF. Therefore, the best features have been selected by sorting the generated weights [39].

We employed these functions in the feature selection phase, where feature vectors were generated using DenseNet-201. The primary goal was to reduce the dimensionality of the feature vectors, subsequently lowering computational complexity.

  • NCA, Chi2, and ReliefF for Feature Selection: NCA, Chi2, and ReliefF were applied to the feature vectors generated by DenseNet-201. These feature selection functions assess the relevance of individual features, allowing us to identify the most informative features in the dataset.

  • Dimensionality Reduction: By utilizing NCA, Chi2, and ReliefF, we aimed to reduce the dimensionality of the feature vectors while preserving the most discriminative attributes. This dimensionality reduction enhances computational efficiency and reduces the risk of overfitting.

These feature selection functions allow working with more manageable feature vectors without compromising the quality of the feature vectors. Therefore, in our research, we have reduced the size of the feature vectors in a way that does not negatively affect the classification success and aimed for high classification success/low computational complexity.

  • Step 5: Apply NCA, Chi2, and ReliefF feature selection functions to the generated 12 feature vectors.

    $$in{d}^{3h-2}=\gamma \left({F}^{h}\right), h\in \{\mathrm{1,2},\dots ,12\}$$
    (5)
    $$in{d}^{3h-1}=\chi \left({F}^{h}\right)$$
    (6)
    $$in{d}^{3h}=\vartheta \left({F}^{h}\right)$$
    (7)

In this step, three different feature selection functions are applied to the 12 feature vectors generated in the previous steps. These functions are represented as \(\gamma \left(.\right),\chi (.)\) and \(\vartheta (.)\) and correspond to NCA, Chi2, and ReliefF feature selection methods. The outcome of this step is the sorted indexes of the features, denoted as \(in{d}^{3h-2}\), \(in{d}^{3h-1}\) and \(in{d}^{3h}\) for each of the 12 feature vectors \({F}^{h}\) (where \(h\) varies from 1 to 12).

  • Step 6: Choose the top 512 features (like ternary pattern) using the generated indexes.

    $${sf}^{3h-2}\left(i,j\right)={F}^{h}\left(i,in{d}^{3h-2}\left(j\right)\right),i\in \left\{\mathrm{1,2},\dots ,dim\right\},j\in \left\{\mathrm{1,2},\dots ,512\right\}$$
    (8)
    $${sf}^{3h-1}\left(i,j\right)={F}^{h}\left(i,in{d}^{3h-1}\left(j\right)\right)$$
    (9)
    $${sf}^{3h}\left(i,j\right)={F}^{h}\left(i,in{d}^{3h}\left(j\right)\right)$$
    (10)

In this final step, the top 512 features are selected from each feature vector using the generated indexes. The selected features are represented as \({sf}^{3h-2}\), \({sf}^{3h-1}\), and \({sf}^{3h}\) for each of the 12 feature vectors. The notation \(sf\) signifies the selected features, and there are 36 selected features in total, each with a length of 512. \(dim\) represents the number of images, which, in this dataset, is 910 images.

  • Classification: We used a shallow classifier to obtain classification results for the 36 generated feature vectors. The shallow classifier utilized is SVM, and we selected it in the MATLAB Classification Learner toolbox. In the MATLAB Classification Learner tool, there are 30 shallow classifiers. We used this toolbox to select the most appropriate classifier and employed the classifiers with default settings. The best-resulting classifier is Cubic SVM [40]. Hence, we classified the generated 36 feature vectors by deploying Cubic SVM with ten-fold cross-validation. In this layer, 36 predicted vectors have been generated.

  • Step 7: Generated predicted vectors were created by deploying a SVM classifier.

$$pr{v}^{t}=\xi \left(s{f}^{t},y\right), t\in \{\mathrm{1,2},\dots ,36\}$$
(11)

Herein, \(prv\) defines predicted vectors, \(\xi (.,.)\) is a Cubic SVM function, and \(y\) represents real labels. The attributes of the used SVM classifier are:

  • Kernel: Polynomial,

  • Kernel order: 3,

  • Kernel scale: Automatic,

  • Box constraint: 1,

  • Coding: One-vs-all,

  • Validation: 10-fold CV.

  • Iterative Hard Majority Voting: The IHMV was proposed by Dogan et al. [41] in 2021 to calculate more voted results. In the IHMV, we use the mode function to obtain voted results. Firstly, the generated predicted vectors in the classification layer have been utilized as input for the IHMV algorithm, and these results are sorted by their classification accuracies. Then, a loop has been created, and the loop range is 3 to 36. Therefore, 34 (= 36 - 3 + 1) voted results have been created in this layer. To better explain this function, the pseudocode of the IHMV is shown in Algorithm 2.

Algorithm 2.
figure b

IHMV algorithm

  • Step 8: Created 34 voted vectors by deploying the IHMV function

  • Selection: In the classification and IHMV layers, 70 (=36+34) predicted vectors were created. The best-resulted vector has been selected in this layer by deploying the greedy method. The greedy algorithm is a commonly used selection method and it has been implemented in metaheuristic optimization techniques to select the best solution [42]. By using this layer in the developed model, maximum performance is achieved without the need for manual selection. Herein, 70 predicted vectors have been generated and the best accurate predicted vector was selected by deploying the greedy algorithm. Using this layer, our proposed MNPDenseNet architecture became a self-organized architecture. The last two steps of our proposal are given in this layer, i.e.:

  • Step 9: Calculate classification accuracies of the generated 70 predicted vectors.

  • Step 10: Select the highest accurate vector as a final result.

The transition of the MNPDenseNet model is summarized in Table 3.

Table 3 Steps involved in MNPDenseNet model

3 Results

3.1 Experimental Setup

The proposed MNPDenseNet model was programmed using MATLAB ver. 2020a on a simple configured PC (16 GB main memory, intel i7 7700 processing unit, Windows 11) without the need for graphical or tensor processing units. Seven performance metrics - accuracy, overall recall (OR), overall precision (OP), overall F1-score (OF1), geometric mean (GM), cohen’s kappa (CK) and Matthews correlation coefficient (MCC) [43, 44] - were employed to evaluate the proposed model.

3.2 Results

We have proposed a framework, and this framework generates 70 results. The best result is the 38th result, and our proposed MNPDenseNet selected it. Therefore, the 38th (2nd voted vector) result was generated, voting the top four results (28th – 42 patches (56 × 56 sized patches) + GAP layer + NCA –, 10th – 142 patches (16 × 16 sized patches) + GAP layer + NCA –, 22nd – 72 (32 × 32 sized patches) patches + GAP layer + NCA –, and 34th – 22 (112 × 112 sized patches) patches + GAP layer + NCA –). To generate these top four feature vectors, the used feature extraction function GAP layer of the DenseNet201 and the used selector is NCA. These top four results have been fused and the best-resulted output has been generated. The confusion matrix of the best result is also given in Fig. 4 to calculate classification performances.

Fig. 4
figure 4

Confusion matrix of our proposal. 1: Monkeypox, 2: Chickenpox, 3: Smallpox, 4: Healthy, 5: Zoster zona

The classification performances of the proposed MNPDenseNet are summarized in Table 4. As seen in Table 4, the proposed MNPDenseNet model attained over 91% for all performance metrics.

Table 4 Performance metrics (%) of the proposed MNPDenseNet model

The class-wise recall, precision, and F1-score for the different classes are tabulated in Table 5. The most accurate class was Healthy, with 98.46% recall for this category, while the least accurate was Chickenpox with 86.61%. It has to be noted that the Chickenpox class had the smallest number of skin images (127).

Table 5 Class-wise recall, precision, and F1-score of the proposed MNPDenseNet model

3.3 Time Complexity

Our model is a lightweight approach compared to classic CNN architectures. The proposed model uses deep feature extraction instead of the end-to-end training phase in classical CNNs. Therefore, there is no need to calculate the weights of the network. In this context, the time complexity of the developed model is calculated in Table 6.

Table 6 Time complexity of the proposed model

4 Discussions

In this study, a new deep-feature engineering architecture was developed to automatically detect monkeypox from skin images. A new large dataset of skin images (from healthy individuals and those with monkeypox, smallpox, chickenpox, and shingles) was created from images obtained from the publicly available dataset and supplemented with relevant web images. The proposed MNPDenseNet model generated 70 results – the number of results denotes that the proposed MNPDenseNet tested all configurations to obtain the most accurate combination among the 70 result, and selected the best validation prediction vector for best classification result. The classification accuracies of the generated 70 results are depicted in Fig. 5.

Fig. 5
figure 5

Plot of calculated classification accuracies per calculated predicted vector

As seen in Fig. 5, the most accurate predicted vector is the 38th predicted vector, generated using IHMV. Thus, the 38th predicted vector is a voted vector. This vector was generated using the most accurate four feature vectors (in the first 36 predicted vectors). These vectors are 28, 10, 22, and 34, and they achieved 91.65%, 91.32%, 91.32%, and 91.32% accuracies, respectively. These results belong to \(s{f}_{28},s{f}_{10},s{f}_{22}\) and \(s{f}_{34}\). All of these features were generated using the GAP layer and NCA selectors. GAP is more effective in this case than the FC layer, and the top four features were created using the NCA selector. Moreover, these four vectors (\(s{f}_{28},s{f}_{10},s{f}_{22},s{f}_{34}\)) were generated using 56 × 56, 16 × 16, 32 × 32, and 112 × 112 sized nested patches, respectively. By voting these results, the best result is achieved. Moreover, the least accurate predicted vector is the 2nd predicted vector - generated using \(s{f}_{2}\) and using FC layer with 14 × 14 sized nested patch and Chi2 selector. Still, this vector resulted in an 81.98% classification accuracy.

4.1 Ablation of the proposed model

To discuss the effect of the size of the patches, average classification accuracies of the generated predicted vectors are calculated per the used sizes of the patches. Comparison results according to patch sizes are given in Fig. 6.

Fig. 6
figure 6

Average classification accuracies according to the used patch divisions

Figure 6 shows that the best patch division method is 56 × 56, with an average classification accuracy of 88.15%. Moreover, we have used two feature extractors and three feature selectors to create the feature vectors. The average classification accuracies of the feature vector creation methods are given in Fig. 7.

Fig. 7
figure 7

Average classification accuracies according to the used feature creation method

As shown in Fig. 7, the best feature creation method is GAP + NCA, and the average classification accuracy of this feature creation model is equal to 91.23%. Already, the best feature vector (\(s{f}_{28}\)) was created using 56 × 56 sized patches, a GAP feature extractor, and NCA feature selector. Figure 5 confirms the high classification result of the 28th selected feature. Furthermore, we can say that the best feature extractor and feature selectors are the GAP layer of the DenseNet201 and the NCA feature selector, respectively. The effect of using patches in the ablation test phase of the model was observed. For this, patch and non-patch-based architectures were compared. The result of this comparison is given in Fig. 8.

Fig. 8
figure 8

Average classification accuracies according to the patch and non-patch-based method

As shown in Fig. 8, the patch-based method achieved about 20% higher classification performance. Another test conducted within the scope of ablation studies is the measurement of the IHMV effect. In this test process, IHMV vs. Non-IHMV status was compared. The results of the test process are given in Fig. 9.

Fig. 9
figure 9

Average classification accuracies according to the IHMV and non-IHMV-based method

As shown in Fig. 9, the IHMV-based classification results are higher than the classification results without IHMV. The results given in Figs. 6, 7, 8 and 9 prove the effectiveness of the methods used in the model. With these methods, our MNPDenseNet model achieved 91.87% classification accuracy.

4.2 Comparative Results

To obtain comparative results, pretrained AlexNet, MobileNetv2, DarkNet53, and ResNet50 were used. The comparative results from these pretrained networks and our MNPDenseNet model are tabulated in Table 7.

Table 7 Comparative results (%) obtained with other models

In MNPAlexNet, fc6 and fc7 layers were used as feature extractors. the remaining CNN, GAP and FC layers were utilized as feature extractors. It can be observed that the most appropriate pretrained CNN for our architecture to classify monkeypox images is DenseNet201. The least accurate model is MNPAlexNet since AlexNet uses fewer layers than the others. As presented in the previous sections, our model uses SVM for the classification phase. In this research, six different classifiers are considered in the test phase. These are Decision Tree (DT), Linear Discriminant (LD), Naive Bayes (NB), k-Nearest Neighbor (kNN), Support Vector Machine (SVM) and Neural Network (NN). The performance comparison of these classifiers is given in Fig. 10.

Fig. 10
figure 10

Average classification accuracies according to the classifier methods

As shown in Fig. 10, the highest classification accuracy was achieved with the SVM algorithm and the lowest classification performance was achieved with DT. To prove the superiority of our model, we compared it with similar studies in the literature. Ahsan et al. [45] conducted a similar study using deep learning methods on monkeypox detection. They used a publicly available dataset containing 171 images with four classes [21, 46]: 1: Monkeypox, 2: Chickenpox, 3: Measles, 4: Healthy. Their deep learning-based model classified monkeypox [45] into two cases of binary classification (Case 1: Monkeypox vs. Chickenpox and Case 2: Monkeypox vs. Others). We applied our MNPDenseNet model to this dataset and the comparative results are summarized in Table 8.

Table 8 Comparative results with the proposed model using a publicly available dataset

Table 8 demonstrated that our proposal attained higher classification accuracies than a VGG16-based image classification model used by Ahsan et al., and our model attained over 10% classification accuracy compared with the state-of-the-art method.

There are highly effective and efficient CNN models in the literature. However, the traditional approach involves end-to-end training of these CNN models with new data. Although these models produce high classification results, it is a time-consuming process. Moreover, these models require a lot of data for high classification success. The model proposed in this research is designed to address these and similar challenges. The technical advantages of the proposed model are given below:

  • Reduced Time Complexity: Our model has lower time complexity compared to end-to-end CNN models. It has an architecture that is prone to work with limited computational resources.

  • Data Efficiency: Our model can handle a limited amount of data. Especially in areas where the dataset is limited (e.g. pox viruses), it has the capacity to produce very successful results.

  • Low Configuration Requirement: End-to-end models require computers with high configuration. However, the model proposed in this research is capable of running on low-configuration machines.

In summary, our research provides a practical and efficient solution in terms of time, data and computational resources. In addition, the results show that our model is capable of competing with classical CNN models. The main advantages of our proposed method are given below:

  • Monkeypox is an infectious disease, and to control the global spread of the virus, a new skin image dataset was collected to detect it via machine learning.

  • A transfer learning-based deep feature engineering model called MNPDenseNet is presented.

  • To use the effectiveness of the patch-based models, new multiple patch-based algorithms. We employed a nested patch division. By using a nested patch division, the complexity of the patch-based feature generation since the nested-patch division generates less patch as compared with a fixed-size patch division.

  • It is a self-organized computer vision model.

  • The proposed MNPDenseNet attained a 91.87% classification accuracy with our new image dataset and a 94.74% accuracy with the publicly available dataset.

  • Our model consistently outperformed the other computer vision methods.

The limitations of our proposed method are given below:

  • There are limited images of skin afflictions by monkeypox on the web and in the literature. The publicly available monkeypox image dataset is small, with N = 171 including other classes. Although we collected N = 910 images from the web, these images are from five different categories. The number of images depicting monkeypox is small at N = 217.

  • A larger dataset should be collected in the future to implement a real-time monkeypox detection model.

  • A shallow classifier (SVM) was used to depict the classification capability of the generated features. However, an improved classifier can be used in this model, or the hyper-parameters of the SVM can be optimized to obtain improved classification results.

5 Conclusions

A new image dataset related to monkeypox skin affliction was collected, and a new deep feature engineering architecture was proposed to detect monkeypox using these images. This model is named MNPDenseNet since multiple nested patches (six nested patch divisions) were used. Moreover, two feature extractors were added to this architecture, and these feature extractors were created using a pretrained DenseNet201. The three feature selectors employed were NCA, Chi2, and ReliefF. Our proposed MNPDenseNet yielded a 91.87% classification accuracy. When we applied MNPDenseNet to the publicly available dataset, our architecture yielded a 94.74% accuracy. Furthermore, our presented MNPDenseNet outperformed the other CNN models (AlexNet, MobileNetv2, DarkNet53, and ResNet50). In summary, the critical points about the proposed MNPDenseNet are as follows:

  • The most appropriate initial patch size has dimensions 56 × 56.

  • In our architecture, two feature generators were incorporated: the GAP layer and FC layer of the pretrained DenseNet201. The best feature extractor is GAP, among the feature extractors noted for solving the monkeypox classification problem.

  • NCA, ReliefF, and Chi2 selectors were used in this research, and the best feature selection function was NCA.

These results and findings confirm the high classification success of the proposed MNPDenseNet model for monkeypox classification using skin images.

In the near future, we plan to acquire more skin images for monkeypox detection and apply our presented MNPDenseNet model to create a trained dataset. As a result, a new generation of automatic monkeypox detection desktop/mobile applications can readily be developed, and these applications can be introduced at medical centers to assist medical professionals. Moreover, attention-based deep networks can be used to obtain higher classification results.