Automatic Screening System to Distinguish Benign/Malignant Breast-Cancer Histology Images Using Optimized Deep and Handcrafted Features

Breast Cancer (BC) has been increasing in incidence among women for a variety of reasons, and prompt detection and management are essential to reducing mortality rates. In the context of clinical-level breast cancer screening, the needle biopsy sample is used to generate Breast Histology Images (BHIs), which will then be used to confirm the results. Using a novel Deep-Learning Plan (DLP), the proposed work identifies BHI accurately and confirms the severity of BC by confirming its severity. As part of the proposed DLP implementation, four phases are involved: (i) the collection and enhancement of images, (ii) the extraction of features, (iii) the reduction of features and their integration, and (iv) binary classification and validation. The purpose of this study is to optimize deep features and machine features using particle swarm algorithms. To evaluate the performance of the proposed scheme, we compare the results obtained using individual deep features, dual deep features, and hybrid features. Using the hybrid image features in the classifier, this study has determined that ResNet18 with k-nearest neighbor provides superior classification accuracy (> 94%).


Introduction
There is a steady increase in the incidence rate of infectious and acute syndromes throughout the world, and appropriate clinical procedures are crucial to detecting and treating these diseases at an early stage.It is likely that untreated diseases will lead to a number of complications, including death, and may also place a substantial burden on the healthcare system.The severity of acute diseases, which may include death, is greater than that of infectious diseases.Recent studies have demonstrated that cancer is a severe acute disease that is responsible for a large number of deaths worldwide [1][2][3].
Globally, diseases such as COVID-19 and cancer will cause more deaths in 2020 than any other disease, according to the World Health Organization (WHO).Many studies have been proposed and implemented worldwide since 2020 with the objective of controlling and curing COVID-19/Cancer [4,5].
Globally, approximately 10 million people will die from cancers affecting various internal and external body organs in 2020.Among the significant causes of death are lung cancer, which accounts for 1.80 million deaths; colon cancer, which accounts for 916,000 deaths; liver cancer accounts for 830 000 deaths, stomach cancer for 769 000 deaths, and breast cancer for 685 000 deaths.According to the information provided in this report, based on registered clinical reports, there were 2.26 million cancers diagnosed, 2.21 million lung cancers, 1.93 million colon cancers, 1.41 million prostate cancers, 1.20 million skin cancers, and 1.09 million stomach cancers.In accordance with the WHO's disease prediction, BC is a top priority and a number of awareness programs have been launched throughout the world to raise awareness of BC symptoms, screening procedures, and treatment options.
The detection of BC early is essential for reducing the treatment burden associated with it, and the detection process includes [6,7]; • Initial verification of suspicious breast section by self, an experienced doctor, • Non-invasive image-supported screening, and • Needle biopsy-based sample collection and examination to authorize the harshness of the BC.
To confirm the phase of the BC, a biopsy must be examined using appropriate methodology, which is an essential step in the planning and implementation of treatment.By preparing the biopsy sample with a staining agent of choice and photographing the slide using a digital microscope, the biopsy sample is used to prepare the histology slide.To detect the severity and stage of cancer, these images are examined by a knowledgeable clinician or using a computer algorithm.Many research studies have been conducted and published in the literature on the examination of breast histology slides due to its importance [8][9][10].
A significant amount of discussion is given by the authors to machine learning (ML) and deep-learning plan (DLP)based analysis of histology slides.It has been demonstrated in earlier studies, however, that DLP-based methods are more effective than conventional or machine learning methods for detecting benign and malignant BC on the basis of histology images.Therefore, several DLPs are proposed by the researchers in order to determine the BC stage from the histology slides.The ML-based approaches are simple and needs a chosen approach to extract the necessary features from the considered image database.When the DLP is executed, the necessary features with a size of 1 × 1 × 1000 is obtainable and it is then considered to detect the disease from the chosen medical images [10].Hence, along with the ML schemes, the chosen DLP with the SoftMax and other classifiers are also considered to detect the disease with better accuracy.
It is proposed that a clinically significant DLP will be implemented in the proposed work to examine histology slides of benign and malignant tissues.The proposed system comprises five phases: (i) data collection, (ii) deepfeature mining (DFM) utilizing pretrained DLP, (iii) machine-feature mining (MFM) utilizing entropy and discrete wavelet transforms (DWT), (iii) selection of features with optimal PSO, and (iv) validation of the proposed scheme with both individual and dual features.An application of fivefold cross-validation is performed in order to demonstrate the merits of the proposed scheme.To accomplish the task assigned, VGG16, VGG19, ResNet18, ResNet50, and ResNet101 are employed as pre-trained deep learning methods.Using k-nearest neighbor, which is implemented in Python, the proposed scheme achieves a classification accuracy of > 94%.A comparison and verification of the results is performed.
The key contributions of the proposed work include: i.As a means of avoiding overfitting issues, particleswarm optimization (PSO)-based feature optimization is used, ii.Improving cancer detection accuracy through deepfeatures and machine learning, iii.Verifying the performance of chosen pre-trained DLP with various features on the chosen histology database.
A further description of the associated earlier research is presented in Sect.2, a methodology is presented in Sect.3, and the outcome and conclusions are presented in Sects.4 and 5.

Literature Review
Due to its importance, many research works are proposed to examine the BC using several bio-medical images collected from different schemes.The implementation of ultrasound images [11], magnetic resonance angiogram (MRA) [12], thermal images [13], and histology pictures are widely examined using chosen computerized methods.Compared to other methods, examination of the breast histology slides is essential to confirm the stage and the severity of the disease in a patient.
This section summarizes the works related to examining breast histology pictures using a chosen computerized scheme, and it also presents the merits and limitations of the earlier works.
Barsha et al. presented a DLP-based breast cancer examination using Densenet121 and DenseNet169 and achieved a detection accuracy = 92.70% and an F1-score = 95.70%[15].The implementation of the PDL scheme to detect the BC is presented in the recent work of Krithiga and Geetha [16], in which the segmentation and classification of the breast histology image are separately presented.This Page 3 of 10 138 work presents a clear summary of the classification works existing in the literature, and this summary is presented in Table 1.
As discussed above, the results of the above-discussed studies confirm that the earlier works using the DLP provide a superior level of detection accuracy when examining breast histology slides.The purpose of this study is to develop a novel procedure using DLP to achieve a higher level of detection accuracy than other methods currently available.To achieve better BC detection accuracy using binary classifiers, we proposed a dual-deep scheme [26] and a hybrid image feature scheme [27].

Methodology
This section demonstrates the methodology of proposed research.From the benchmark dataset, the necessary histology slides are collected, and the considered images are then resized to RGB images of pixel 224 × 224 × 3 dimensions.Using appropriate procedures, we are able to extract machine and deep features from these images for testing the effectiveness of the developed scheme.The PSO algorithm is used to recognize the finest features.These features are then integrated into a hybrid feature vector, which can be used to verify the binary classifier's merits.The developed system structure is shown in Fig. 1, and Python is used to execute this procedure.Several parameters, such as accuracy, precision, sensitivity, specificity, and F1 score, are used to confirm the effectiveness of the strategy.

Histology Image Collection
The necessary benign/malignant class histology test pictures are initially collected and then resized to 224 × 224 × 3 pixels.The histology images available in "Kaggle" [28] consist of a Large histology patches (277, 524) of dimension 50 × 50 × 3 pixels and to train and test proposed scheme, 6000 patches (3000 benign and 3000 malignant) are considered as presented in Table 2. Figure 2 depicts the sample test images of this study.

Pre-trained Deep-Learning Procedures
In recent years, the DLP supported schemes are widely employed in data analytics tasks due to its superiority and adaptability.In medical domain, the DLP-based disease examination schemes are implemented to examine the bio-signals and bio-images of varied modality.Compared to other procedures.These schemes helps to get better disease detection and most of these schemes can be implemented in a variety of software and hardware.
Even though customary models are existing, the pretrained models are widely employed to examine a variety of medical data and the earlier works confirms its clinical significance [29,30].
In this work, the well-known DLP, such as VGG16, VGG19, ResNet18, ResNet50, and ResNet101, are considered to examine the considered breast histology slices.According to the selected DLP, the following parameters are allocated: 100 epochs, maximizing accuracy, ADAM optimizer, depth 8, average pooling, SoftMax classifier, and five cross-validations.The merit of the DLP is independently tested during execution with the deep-feature vector described in Eq. (1); Equation (1) confirms that every DLP of this scheme is capable in offering a one-dimensional (1D) feature vector of size 1 × 1 × 1000 and these features are then considered to verify the performance using the SoftMax classifier.

Machine-Learning Features
Machine learning-based medical data assessment is a chosen procedure in the literature and it helps to detect the diseases with better accuracy.The outcome of this procedure depends mainly on the selected features.In the proposed work, the necessary features, such as Entropy (E) (Kapur, Vadja, Max, Fuzzy, Kittler and Renyi) are computed from the vertical and horizon section of the image and the necessary information about these features is found in the literature [31][32][33].Along with the entropies, DWT features with dimension 1 × 1 × 240 is also extracted from the histology images of classes benign/ malignant.The DWT-treated images are presented in Fig. 3. Figure 3a presents the DWT for benign class image and Fig. 3b presents the outcome for malignant image.The collected features from these images are presented in Eqs. ( 2) and (3).

Feature Optimization with PSO
The performance of the computerized disease detection technique relies on the features used to classify the images using a chosen classifier.To avoid the over-fitting issue and to discard the poor features, it is necessary to employ the feature selection procedures.Traditional feature selection process involves in complex mathematical operation (Student's t-test) and hence, heuristic algorithm (HA)-based selection is executed to recognize the deep and machine features of this research.The   [34,35].In the proposed work, feature selection is achieved using the traditions PSO algorithm [36][37][38].The necessary information and the mathematical expression about this algorithm can be found in the literature and in this work, the PSO parameters are assigned as follows: number of agents = 30, objective value = cartesian distance among features, maximum iteration ( Iter max ) = 2500, and the stopping criteria = Iter max .
The proposed technique helps to select 411deep features ( 1 × 1 × 411 ) and 87 machine features (Entropy + DWT).These are united together to form a new hybrid feature vector as shown in Eq. ( 4); With the help of this feature vector, the proposed system was trained and verified.Additionally, a duo-deep scheme is presented to assess the BC's detection accuracy.

Performance Examination
To determine the advantages of this BC detection scheme, the classifiers used are based on the in-depth attributes of the images.In the first step, the proposed classifier calculates the (4) measures, which include true positives, false positives, true negatives, and false negatives, and other measures are derived from these values.It is determined that the best classifier is determined by comparing the results of the various classifiers, such as SoftMax, decision tree (DT), KNN, Naive Bayes (NB), random forest (RF), and support vector machine (SVM) with linear kernels.For evaluating performance, Eqs. ( 5), ( 6), ( 7), (8) are presented.[11,43,44]; (5)

Results and Discussion
Experimental validation of the proposed scheme has been carried out on a workstation equipped with an Intel i5 processor, 12 GB of RAM, and 4 GB of video memory using Python software.
In the proposed experiment, the VGG16 supported method is initially implemented and the classification of the histology is achieved using SoftMax.In this work, 80% data are considered for the training, 10% data for testing, and remaining 10% data are used for the validation.Later, other DLP of this study is considered and the achieved results are presented.In this work, fivefold crossvalidation is executed.The results in Table 3 confirm that ResNet18-based technique helps achieving better accuracy compared to other methods.Hence, ResNet18 model is considered as the best scheme for the chosen database and the duo-deep feature-based classification is implemented by combining the ResNet18 features with other DLP features.During this task, a 50% dropout is implemented for each procedure and then the duo-deep-based classification is then employed.The essential information about the duodeep method can be found in [26] and the outcomes are shown in Table 4 which presents the duo-deep-based classification and this procedure helped to achieve a maximum accuracy of 92.17 with ResNet18 and ResNet50 features.Finally, the classification is achieved using the PSO-optimized hybrid image features and the achieved results are presented and discussed.
During the hybrid image-based classification with ResNet18, the binary classifiers like SoftMax, DT, KNN, NB, RF, and SVM are implemented and the results are verified.Initially, the SoftMax classifier performance is verified using fivefold cross-validation and the obtained results are presented.Figure 4 presents the convergence of the DLP learning for both the training and testing  processes.Figure 4a, b shows the achieved accuracy and loss values, respectively.Figure 5 presents the various convolutional layer outputs, in which Fig. 5a-e shows the results from layer1 to layer5.The achieved confusion matrix (CM) and the area under curve (AUC) are shown in Fig. 6. Figure 6a shows the CM and Fig. 6b presents the AUC of value 0.924.Similar task is executed with other classifiers and the achieved result is depicted in Table 5.This table confirms that the KNN classifier provides a classification accuracy of > 94%, which is superior to other results presented in this work.
A comparison of KNN's accuracy with other accuracy as in Table 1 is then conducted to verify the merit of KNN.Related to other similar works presented in the literature, this work confirms the superiority of the proposed scheme.Accordingly, the proposed scheme has clinical significance, and it can be considered for evaluation of clinically collected histology slides in the future.In this work, machine learning features were considered and PSO based optimization was implemented.The results of this work can be compared with those of the state-of-the-art methods available in the literature in the future when the ensemble of features is used (Fig. 7).
The proposed work is tested and verified using a public dataset and the results of this scheme confirms that the proposed technique presents an aggregable result (accuracy > 94%).In future, the detection performance of the proposed scheme can be improved using hyperparameter tuning and/or considering the ensemble of deep-features to classify the chosen images.Further, merit of this scheme can be tested and verified using the real clinical data.

Conclusion
Women are suffering from BC, and early detection and treatment can help reduce its impact.To design and execute treatment, it is indispensable to detect benign/malignant class cancer, and this work considers histopathology image classification to do so.Through DLP, different features will be used to classify the test imageries into benign/malignant classes.The results of a fivefold crossvalidation experiment confirm that ResNet18 with hybrid image features helped achieve > 94% disease classification.This experiment confirms that ResNet18 with hybrid image features assisted to accomplish a disease classification of > 94%.The results confirm the effectiveness of the presented work in detecting cancer.
Author Contributions YY is the single author to carry out the entire work.
Funding Not applicable.

Availability of Data and Materials
The data considered in this research work can be accessed from "https:// www.kaggle.com/ datas ets/ pault imoth ymoon ey/ breast-histo patho logy-images".

Declarations
Conflict of Interest The authors declare no conflict of interest.
Ethical Approval Not applicable.need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http:// creat iveco mmons.org/ licen ses/ by/4.0/.

Fig. 4
Fig. 4 Convergence of the training and validation process with hybrid features

Fig. 5 Fig. 6
Fig. 5 Results achieved from various intermediate layers of the ResNet18

Fig. 7
Fig. 7 Comparison of detection accuracy with other existing methods

Table 1
Summary of chosen histopathology-based BC detection methods [20] et al.[19]Fuzzy-c-means-clustering for the detection of BC class 88 78 80 77 Pan et al.[20]Deep CNN-based detection of the histology slides Fig. 1 Structure of the developed BC detection scheme

Table 2 Test
Benign Malignant Fig. 2 Sample test pictures of benign/malignant group concept of the HA optimization is simple and a considerable number of earlier research works are available in the literature

Table 3
Participate Not applicable.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will