1 Introduction

There is a steady increase in the incidence rate of infectious and acute syndromes throughout the world, and appropriate clinical procedures are crucial to detecting and treating these diseases at an early stage. It is likely that untreated diseases will lead to a number of complications, including death, and may also place a substantial burden on the healthcare system. The severity of acute diseases, which may include death, is greater than that of infectious diseases. Recent studies have demonstrated that cancer is a severe acute disease that is responsible for a large number of deaths worldwide [1,2,3].

Globally, diseases such as COVID-19 and cancer will cause more deaths in 2020 than any other disease, according to the World Health Organization (WHO). Many studies have been proposed and implemented worldwide since 2020 with the objective of controlling and curing COVID-19/Cancer [4, 5].

Globally, approximately 10 million people will die from cancers affecting various internal and external body organs in 2020. Among the significant causes of death are lung cancer, which accounts for 1.80 million deaths; colon cancer, which accounts for 916,000 deaths; liver cancer accounts for 830 000 deaths, stomach cancer for 769 000 deaths, and breast cancer for 685 000 deaths. According to the information provided in this report, based on registered clinical reports, there were 2.26 million cancers diagnosed, 2.21 million lung cancers, 1.93 million colon cancers, 1.41 million prostate cancers, 1.20 million skin cancers, and 1.09 million stomach cancers. In accordance with the WHO’s disease prediction, BC is a top priority and a number of awareness programs have been launched throughout the world to raise awareness of BC symptoms, screening procedures, and treatment options.

The detection of BC early is essential for reducing the treatment burden associated with it, and the detection process includes [6, 7];

  • Initial verification of suspicious breast section by self, an experienced doctor,

  • Non-invasive image-supported screening, and

  • Needle biopsy-based sample collection and examination to authorize the harshness of the BC.

To confirm the phase of the BC, a biopsy must be examined using appropriate methodology, which is an essential step in the planning and implementation of treatment. By preparing the biopsy sample with a staining agent of choice and photographing the slide using a digital microscope, the biopsy sample is used to prepare the histology slide. To detect the severity and stage of cancer, these images are examined by a knowledgeable clinician or using a computer algorithm. Many research studies have been conducted and published in the literature on the examination of breast histology slides due to its importance [8,9,10].

A significant amount of discussion is given by the authors to machine learning (ML) and deep-learning plan (DLP)-based analysis of histology slides. It has been demonstrated in earlier studies, however, that DLP-based methods are more effective than conventional or machine learning methods for detecting benign and malignant BC on the basis of histology images. Therefore, several DLPs are proposed by the researchers in order to determine the BC stage from the histology slides. The ML-based approaches are simple and needs a chosen approach to extract the necessary features from the considered image database. When the DLP is executed, the necessary features with a size of \(1\times 1\times 1000\) is obtainable and it is then considered to detect the disease from the chosen medical images [10]. Hence, along with the ML schemes, the chosen DLP with the SoftMax and other classifiers are also considered to detect the disease with better accuracy.

It is proposed that a clinically significant DLP will be implemented in the proposed work to examine histology slides of benign and malignant tissues. The proposed system comprises five phases: (i) data collection, (ii) deep-feature mining (DFM) utilizing pretrained DLP, (iii) machine-feature mining (MFM) utilizing entropy and discrete wavelet transforms (DWT), (iii) selection of features with optimal PSO, and (iv) validation of the proposed scheme with both individual and dual features. An application of fivefold cross-validation is performed in order to demonstrate the merits of the proposed scheme. To accomplish the task assigned, VGG16, VGG19, ResNet18, ResNet50, and ResNet101 are employed as pre-trained deep learning methods. Using k-nearest neighbor, which is implemented in Python, the proposed scheme achieves a classification accuracy of > 94%. A comparison and verification of the results is performed.

The key contributions of the proposed work include:

  1. i.

    As a means of avoiding overfitting issues, particle-swarm optimization (PSO)-based feature optimization is used,

  2. ii.

    Improving cancer detection accuracy through deep-features and machine learning,

  3. iii.

    Verifying the performance of chosen pre-trained DLP with various features on the chosen histology database.

A further description of the associated earlier research is presented in Sect. 2, a methodology is presented in Sect. 3, and the outcome and conclusions are presented in Sects. 4 and 5.

2 Literature Review

Due to its importance, many research works are proposed to examine the BC using several bio-medical images collected from different schemes. The implementation of ultrasound images [11], magnetic resonance angiogram (MRA) [12], thermal images [13], and histology pictures are widely examined using chosen computerized methods. Compared to other methods, examination of the breast histology slides is essential to confirm the stage and the severity of the disease in a patient.

This section summarizes the works related to examining breast histology pictures using a chosen computerized scheme, and it also presents the merits and limitations of the earlier works.

The work of Celik et al. implements a pre-trained DLP to detect the ductal carcinoma using whole-slide pictures and this work achieved best result with ResNet50 (Accuracy = 90.96, F1-sore = 94.11) and DenseNet161 (Accuracy = 91.57, F1-score = 92.38) [14].

Barsha et al. presented a DLP-based breast cancer examination using Densenet121 and DenseNet169 and achieved a detection accuracy = 92.70% and an F1-score = 95.70% [15]. The implementation of the PDL scheme to detect the BC is presented in the recent work of Krithiga and Geetha [16], in which the segmentation and classification of the breast histology image are separately presented. This work presents a clear summary of the classification works existing in the literature, and this summary is presented in Table 1.

Table 1 Summary of chosen histopathology-based BC detection methods

As discussed above, the results of the above-discussed studies confirm that the earlier works using the DLP provide a superior level of detection accuracy when examining breast histology slides. The purpose of this study is to develop a novel procedure using DLP to achieve a higher level of detection accuracy than other methods currently available. To achieve better BC detection accuracy using binary classifiers, we proposed a dual-deep scheme [26] and a hybrid image feature scheme [27].

3 Methodology

This section demonstrates the methodology of proposed research. From the benchmark dataset, the necessary histology slides are collected, and the considered images are then resized to RGB images of pixel \(224 \times 224 \times 3\) dimensions. Using appropriate procedures, we are able to extract machine and deep features from these images for testing the effectiveness of the developed scheme. The PSO algorithm is used to recognize the finest features. These features are then integrated into a hybrid feature vector, which can be used to verify the binary classifier’s merits. The developed system structure is shown in Fig. 1, and Python is used to execute this procedure. Several parameters, such as accuracy, precision, sensitivity, specificity, and F1 score, are used to confirm the effectiveness of the strategy.

Fig. 1
figure 1

Structure of the developed BC detection scheme

3.1 Histology Image Collection

The necessary benign/malignant class histology test pictures are initially collected and then resized to \(224 \times 224 \times 3\) pixels. The histology images available in “Kaggle” [28] consist of a Large histology patches (277, 524) of dimension \(50 \times 50 \times 3\) pixels and to train and test proposed scheme, 6000 patches (3000 benign and 3000 malignant) are considered as presented in Table 2. Figure 2 depicts the sample test images of this study.

Table 2 Test images used in this research
Fig. 2
figure 2

Sample test pictures of benign/malignant group

3.2 Pre-trained Deep-Learning Procedures

In recent years, the DLP supported schemes are widely employed in data analytics tasks due to its superiority and adaptability. In medical domain, the DLP-based disease examination schemes are implemented to examine the bio-signals and bio-images of varied modality. Compared to other procedures. These schemes helps to get better disease detection and most of these schemes can be implemented in a variety of software and hardware. Even though customary models are existing, the pre-trained models are widely employed to examine a variety of medical data and the earlier works confirms its clinical significance [29, 30].

In this work, the well-known DLP, such as VGG16, VGG19, ResNet18, ResNet50, and ResNet101, are considered to examine the considered breast histology slices. According to the selected DLP, the following parameters are allocated: 100 epochs, maximizing accuracy, ADAM optimizer, depth 8, average pooling, SoftMax classifier, and five cross-validations. The merit of the DLP is independently tested during execution with the deep-feature vector described in Eq. (1);

$$ Feature1_{{deep\,{(1} \times {1} \times 1000{)}}} = DF_{(1,1)} ,DF_{(1,2)} ,...,DF_{(1,1000)} . $$
(1)

Equation (1) confirms that every DLP of this scheme is capable in offering a one-dimensional (1D) feature vector of size \(1 \times 1 \times 1000\) and these features are then considered to verify the performance using the SoftMax classifier.

3.3 Machine-Learning Features

Machine learning-based medical data assessment is a chosen procedure in the literature and it helps to detect the diseases with better accuracy. The outcome of this procedure depends mainly on the selected features. In the proposed work, the necessary features, such as Entropy (E) (Kapur, Vadja, Max, Fuzzy, Kittler and Renyi) are computed from the vertical and horizon section of the image and the necessary information about these features is found in the literature [31,32,33]. Along with the entropies, DWT features with dimension \(1 \times 1 \times 240\) is also extracted from the histology images of classes benign/malignant. The DWT-treated images are presented in Fig. 3. Figure 3a presents the DWT for benign class image and Fig. 3b presents the outcome for malignant image. The collected features from these images are presented in Eqs. (2) and (3).

$$ Entropy_{(1 \times 1 \times 12)} = E_{(1,1)} ,E_{(1,2)} ,...,E_{(1,12)} , $$
(2)
$$ DWT_{ \, (1 \times 1 \times 240)} = DWT_{(1,1)} ,DWT_{(1,2)} ,...,DWT_{(1,240)} . $$
(3)
Fig. 3
figure 3

Test image processed using DWT

3.4 Feature Optimization with PSO

The performance of the computerized disease detection technique relies on the features used to classify the images using a chosen classifier. To avoid the over-fitting issue and to discard the poor features, it is necessary to employ the feature selection procedures. Traditional feature selection process involves in complex mathematical operation (Student’s t-test) and hence, heuristic algorithm (HA)-based selection is executed to recognize the deep and machine features of this research. The concept of the HA optimization is simple and a considerable number of earlier research works are available in the literature [34, 35]. In the proposed work, feature selection is achieved using the traditions PSO algorithm [36,37,38]. The necessary information and the mathematical expression about this algorithm can be found in the literature and in this work, the PSO parameters are assigned as follows: number of agents = 30, objective value = cartesian distance among features, maximum iteration (\(Iter_{\max }\)) = 2500, and the stopping criteria = \(Iter_{\max }\). Other relevant information about this procedure can be found in [39,40,41,42].

The proposed technique helps to select 411deep features (\(1 \times 1 \times 411\)) and 87 machine features (Entropy + DWT). These are united together to form a new hybrid feature vector as shown in Eq. (4);

$$ Hybrid_{features\,(1 \times 1 \times 498)} = Optimal(Deep + Entropy + DWT). $$
(4)

With the help of this feature vector, the proposed system was trained and verified. Additionally, a duo-deep scheme is presented to assess the BC’s detection accuracy.

3.5 Performance Examination

To determine the advantages of this BC detection scheme, the classifiers used are based on the in-depth attributes of the images. In the first step, the proposed classifier calculates the measures, which include true positives, false positives, true negatives, and false negatives, and other measures are derived from these values. It is determined that the best classifier is determined by comparing the results of the various classifiers, such as SoftMax, decision tree (DT), KNN, Naive Bayes (NB), random forest (RF), and support vector machine (SVM) with linear kernels. For evaluating performance, Eqs. (5), (6), (7), (8) are presented. [11, 43, 44];

$$ Accuracy = AC = \frac{TP + TN}{{TP + TN + FP + FN}}, $$
(5)
$$ Sensitivity = SE = \frac{TP}{{TP + FN}}, $$
(6)
$$ Specificity = SP = \frac{TN}{{TN + FP}}, $$
(7)
$$ F1 - Score = FS = \frac{2TP}{{2TP + FN + FP}}. $$
(8)

4 Results and Discussion

Experimental validation of the proposed scheme has been carried out on a workstation equipped with an Intel i5 processor, 12 GB of RAM, and 4 GB of video memory using Python software.

In the proposed experiment, the VGG16 supported method is initially implemented and the classification of the histology is achieved using SoftMax. In this work, 80% data are considered for the training, 10% data for testing, and remaining 10% data are used for the validation. Later, other DLP of this study is considered and the achieved results are presented. In this work, fivefold cross-validation is executed. The results in Table 3 confirm that ResNet18-based technique helps achieving better accuracy compared to other methods. Hence, ResNet18 model is considered as the best scheme for the chosen database and the duo-deep feature-based classification is implemented by combining the ResNet18 features with other DLP features. During this task, a 50% dropout is implemented for each procedure and then the duo-deep-based classification is then employed. The essential information about the duo-deep method can be found in [26] and the outcomes are shown in Table 4 which presents the duo-deep-based classification and this procedure helped to achieve a maximum accuracy of 92.17 with ResNet18 and ResNet50 features. Finally, the classification is achieved using the PSO-optimized hybrid image features and the achieved results are presented and discussed.

Table 3 Classification achieved using the deep-features and SoftMax classifier
Table 4 Classification performed using duo-deep features and SoftMax classifier

During the hybrid image-based classification with ResNet18, the binary classifiers like SoftMax, DT, KNN, NB, RF, and SVM are implemented and the results are verified. Initially, the SoftMax classifier performance is verified using fivefold cross-validation and the obtained results are presented. Figure 4 presents the convergence of the DLP learning for both the training and testing processes. Figure 4a, b shows the achieved accuracy and loss values, respectively.

Fig. 4
figure 4

Convergence of the training and validation process with hybrid features

Figure 5 presents the various convolutional layer outputs, in which Fig. 5a–e shows the results from layer1 to layer5. The achieved confusion matrix (CM) and the area under curve (AUC) are shown in Fig. 6. Figure 6a shows the CM and Fig. 6b presents the AUC of value 0.924. Similar task is executed with other classifiers and the achieved result is depicted in Table 5. This table confirms that the KNN classifier provides a classification accuracy of > 94%, which is superior to other results presented in this work.

Fig. 5
figure 5

Results achieved from various intermediate layers of the ResNet18

Fig. 6
figure 6

Confusion matrix and the AUC achieved using the hybrid features and SoftMax

Table 5 Classification result with PSO-selected hybrid features and binary classifiers

A comparison of KNN’s accuracy with other accuracy as in Table 1 is then conducted to verify the merit of KNN. Related to other similar works presented in the literature, this work confirms the superiority of the proposed scheme. Accordingly, the proposed scheme has clinical significance, and it can be considered for evaluation of clinically collected histology slides in the future. In this work, machine learning features were considered and PSO based optimization was implemented. The results of this work can be compared with those of the state-of-the-art methods available in the literature in the future when the ensemble of features is used (Fig. 7).

Fig. 7
figure 7

Comparison of detection accuracy with other existing methods

The proposed work is tested and verified using a public dataset and the results of this scheme confirms that the proposed technique presents an aggregable result (accuracy > 94%). In future, the detection performance of the proposed scheme can be improved using hyperparameter tuning and/or considering the ensemble of deep-features to classify the chosen images. Further, merit of this scheme can be tested and verified using the real clinical data.

5 Conclusion

Women are suffering from BC, and early detection and treatment can help reduce its impact. To design and execute treatment, it is indispensable to detect benign/malignant class cancer, and this work considers histopathology image classification to do so. Through DLP, different features will be used to classify the test imageries into benign/malignant classes. The results of a fivefold cross-validation experiment confirm that ResNet18 with hybrid image features helped achieve > 94% disease classification. This experiment confirms that ResNet18 with hybrid image features assisted to accomplish a disease classification of > 94%. The results confirm the effectiveness of the presented work in detecting cancer.