1 Introduction

Breast cancer is seen as a serious threat to women's health and life. According to studies, breast cancer is considered one of the most common types of cancer in women around the world [1]. The radiologist uses microscopic images of the breast to identify signs of cancer in women at an early stage. Early detection and treatment of cancer improves the rate of survival. There are two types of cancer tumors, which are categorized as benign and malignant tumor. The majority of benign tumors are unable to develop into breast cancer sources and are considered harmless. The malignant tumor is characterized by irregular growth and abnormal divisions.

Medical imaging examination is the most effective method to detect breast cancer tumors. Microscopic images classification has been developed for a variety of purposes, including diagnosing basic patient problems and understanding complex cell processes. Classification of tissues in the images is critical. Due to the increasing amount of mammograms produced by extensive screening, radiologists find it difficult to perform a proper manual evaluation. Manually classifying microscopic images is expensive and time-consuming. As a result, to detect breast cancer indicators and to enhance diagnostic accuracy, a computer-aided diagnosis (CAD) system was developed. For radiologists, these systems will make diagnosis easier, and they can use it as a second opinion [2].

The examination of medical images has become simpler and quicker with the help of artificial intelligence (AI) techniques like recent deep learning (DL) and traditional machine learning (ML). Several artificial intelligence methods for identifying medical diseases have been proposed in recent years. Examples of some medical problems such as breast cancer [3,4,5], brain tumors [6], intestine, lung diseases [7], heart problems, skin disease, and eye abnormalities. One of the modern effective ML techniques with the best performance for images classification in recent years is the DL method that uses convolutional neural networks (CNNs) [8]. Compared to manual diagnosis on ML, the diagnostics tools based on DL have many advantages. It is highly accurate, more effective, faster, and easier than traditional classification methods.

The artificial neural network (ANN) contains multiple neurons which connected with each other. ANN architecture consists of different layers of neurons as input, hidden, and output layers. The input layer containing the input vector. The hidden layers are interconnection between the input and output layers, it may be containing one or more layers of neurons. The output layer containing the output classification vector.

The basic CNN architecture consists of several layers. The convolutional layer is the basic building block of a CNNs. It has several filters whose parameters are learned during the training step. Each filter produces an activation map after convolution with the input. The product between each element of the convolution filter and the input is computed at each spatial point as the filter is moved over the image height and width. The Pooling layer is responsible for decreasing the size of parameters and activation maps. There are two types of pooling: average and Max pooling. The Dropout layer eliminates some neurons from the network during the training process. The neurons are randomly eliminated with a specific probability. The fully connected (FC) layer flattens and combines the output from the previous layers into a single vector. The output of the Conv layer represents high-level information, and adding a FC layer is an inexpensive method to learn nonlinear combinations of those features. FC layers are usually placed at the end of the CNN architecture.

Various methods have been developed to classify breast cancer lesions on microscopic images [9,10,11,12,13,14]. Deep convolutional neural networks have been used to overcome the accuracy limits in traditional machine learning techniques and have developed as one of the most advanced methods in the classification process [15]. For these reasons, this paper aim is to suggest a diagnostic tool based on DL approaches for the rapid and accurate detection of breast cancer in its early stages.

CNN systems perform well on large datasets, but they struggle to obtain substantial accuracy on small datasets. The principle of transfer learning is used to exploit deep neural networks to enhance the CNN structure's performance in small datasets, reduce computing costs, and achieve high accuracy [16]. The use of a combination of several CNN structures has been introduced to enhance transfer learning performance and may eventually take the place of using a single CNN model. An accurate and fast model for image classification has been developed by multiple pre-trained networks. The study aims to provide a diagnostic tool to classify images that indicate the presence of a benign tumor and those with a malignant tumor based on deep learning techniques.

There are some shortcomings of the past methods, such as low accuracy and large model size. Low accuracy leads to some errors in the classification process. The large size of the model leads to slowing down the classification process. In the proposed framework, we aim to suggest framework for classifying breast cancer tumors with high accuracy and low size of features (low framework size). High classification accuracy means a better model for cancer detection. The Low size of features means that the model runtime for classification is faster. Transfer learning and a combination of extracted features from multiple CNN architectures, and feature selection methods are used to overcome the shortcomings in cancer tumor detection and classification in existing systems.

We can summarize the main contributions of this study as follows:

  1. 1.

    Investigation and comparison of nine pre-trained deep CNN models with various architectures, these CNNs differ in their main construction block and the number of convolutional layers.

  2. 2.

    Extract the features from each model and examine them to determine which model has the highest performance impact.

  3. 3.

    Concatenation of the extracted features from different CNNs to improve the classification accuracy.

  4. 4.

    The integrated features have large size, so the feature selection method is applied to reduce their size.

This paper is organized as follows: The paper's methodology is presented in Sect. 2. The experimental results of the technique are discussed in Sect. 3. Finally, Sect. 4 concludes the paper.

2 Methods

In this paper, we propose a framework using nine different deep CNN architectures: VGG16, InceptionV3, ResNet50, Xception, InceptionResNetV2, DenseNet201, MobileNetV2, ResNet101, and ResNet152. These CNNs were previously trained on the ImageNet dataset [17]. The CNNs will be used for breast cancer tumor detection and classification in histopathology images. The suggested model concatenates various low-level features separately extracted from different CNN architectures. The extracted features are then fed into a fully-connected layer to classify the benign and malignant tumors. Then, a feature selection technique is applied to reduce the dimension of the features after the concatenation step. There are six modules that make up the proposed system: image pre-processing, transfer learning, features extraction, features concatenation, features selection, and classification stages. The Block diagram of the suggested framework is displayed in Fig. 1.

Fig. 1
figure 1

Block diagram of the proposed framework system

2.1 Image pre-processing stage

In this step, the size of cancer images is modified to be equal to the size of the input layer of each deep CNN architecture used in this paper. Next, the number of images in the dataset is increased by using the image augmentation technique. The augmentation technique is used to avoid the possibility of over-fitting issues that might occur due to the small dataset in the training step [18]. During the training stage, the images are augmented using several methods that include flipping the images in vertical and horizontal directions, translation, scaling, and rotation.

2.2 Transfer learning stage

To achieve high accuracy and train a model from scratch, it needs a large amount of data, but getting a large dataset of related problems can be difficult in some cases. As a result, the term "transfer learning" (TL) has been introduced. The CNN model architecture is first trained on a task using a large image dataset related to that task and then transferred to the desired task using a small dataset [19]. Reusing CNNs that were previously trained on big datasets for comparable classification problems but with smaller datasets that include a few number of images solves the over-fitting problem in transfer learning.

In this stage, transfer learning is utilized to modify the previously trained CNNs using the ImageNet dataset to classify breast cancer tumors. Next, instead of using the 1000 classes of ImageNet, the output layers of these pre-trained CNNs are changed to only 2 classes, which is equal to the number of classes in the cancer dataset (benign and malignant).

2.3 Feature extraction stage

Three different types of layers are used to construct DCNNs: a convolutional layer, a pooling layer, and a fully connected (FC) layer. The fully connected layer is used for classification, whereas convolution layers are used for feature extraction.

In this stage, by using transfer learning, the spatial deep features are extracted from the last average pooling layer for each deep CNN architecture. Table 1 illustrates the size of features extracted from each CNN architecture. The numbers of these features after extraction are 512 for VGG16, and 2048 for InceptionV3, ResNet50, ResNet101, ResNet152, and Xception architectures. While the numbers of features are 1536 for InceptionResNetV2, 1920 for DenseNet201, and 1280 for MobileNetV2 architecture.

Table 1 The size of features for different CNN Architectures

2.4 Feature concatenation stage

In this stage, the extracted features from different deep CNN architectures are concatenated to integrate the benefits of deep CNN models and improve the classification accuracy. There are different feature set combinations performed at this stage. The best combination of integrated features that affect the performance of the classification is explored.

2.5 Feature selection stage

From Table 1, it can be noticed that deep feature sizes obtained from each CNN architecture are large. After concatenating the extracted features, the size of the deep features becomes larger. Therefore, it's needed to reduce the size of the model features. At this stage, the best set of concatenated features that affects classification performance is explored. The process of extracting the most relevant features to improve model performance is known as feature selection, also referred to as attribute selection.

Chi-square (χ2) is a statistical method used to reduce the large size of extracted features to a smaller number that still retains the majority of its information. The number of features providing higher accuracy test performance is selected from the set of features using the Chi-square method. The feature selection technique is applied for each feature set combination. The relationship of each feature to the target is analyzed using a chi-square test, and the required number of features with the highest chi-square scores is selected.

2.6 Classification stage

There are several classification strategies used to classify cancerous tumor as benign or malignant. Some of these methods include artificial neural networks (ANN), support vector machines (SVM), and decision trees (DT) [20, 21]. In this work, nine deep CNN architectures are used to classify breast cancer tumors, specifically in the first scheme. In other schemes, the classification is done by support vector machine classifier using different kernels. The SVM kernels used such as Linear-SVM (L-SVM), Gaussian-SVM (G-SVM), Polynomial-SVM (P-SVM), and Sigmoid-SVM (S-SVM) kernel.

The SVM supervised learning technique divides data into categories. By constructing hyperplanes, the goal of the SVM classifier is to develop an efficient data classification method [22]. Two groups of data points can be classified using several numbers of hyperplanes, but the one with the largest margin is the best one. The margin is the largest possible width the border can expand before it hits both groups of data points. The support vectors are the vectors that define the hyperplane [23].

3 Results and discussion

3.1 Dataset description

The Breast Cancer Histopathological images (BreakHis) dataset [24] is used to evaluate the suggested framework. The dataset includes 7909 histopathology images of breast cancer at various magnification factors (40, 100, 200, and 400 X). It comprises 5429 malignant and 2480 benign samples that were gathered from patients. In this work, the dataset is divided into two parts, the first is used for training the models which include 80% of the dataset as a training set (6328 images). The second part is used for testing the models which contain 20% of the dataset as a testing set (1581 images). Example images of malignant and benign tumors at various magnification factors are shown in Figs. 2 and 3, respectively.

Fig. 2
figure 2

Malignant tumor at various magnification factors

Fig. 3
figure 3

Benign tumor at various magnification factors

3.2 Results

This section presents the results of the four schemes used in the proposed framework:

  1. 1.

    In the first scheme, the transfer learning method is applied to all models used. That allows DCNN architectures to learn generic characteristics from other image datasets without having to train the models from scratch.

  2. 2.

    In the second scheme, the deep features of all models will be extracted from the last layer of each model. The extracted features are then used separately to train and test the support vector machine classifier using different evaluation kernels.

  3. 3.

    In the third scheme, the extracted features of all models will be concatenated using different combination method sets.

  4. 4.

    In the fourth scheme, the feature selection method is applied to the concatenated features. Thus, reducing the dimensions of the features after the concatenation step. Consequently, thus reducing the computational cost and the execution time.

3.2.1 Results of scheme 1

The accuracy results for different DCNN architectures when applying the transfer learning method are shown in Table 2. These architectures include VGG16, InceptionV3, ResNet50, ResNet101, ResNet152, Xception, InceptionResNetV2, DenseNet201 and MobileNetV2.

Table 2 The accuracy for different CNN architectures

As shown from the results presented in Table 2, it can be seen that the range of classification accuracy is from 98.35 to 90.43%. The highest accuracy of 98.35% is obtained using DenseNet201 architecture. This is followed by ResNet152, ResNet101, ResNet50, and Xception architectures, which achieve an accuracy of 97.46, 97.34, 97.27, and 97.15%, respectively. After that InceptionV3, InceptionResNetV2, and VGG16 architectures give an accuracy of 96.39, 96.13, and 95.31%, respectively. The MobileNetV2 architecture has the lowest accuracy of 90.43%.

3.2.2 Results of scheme 2

In this scheme, the features of each DCNN architecture are extracted, and the size of the features is shown in Table 1. The extracted features are used to train and test the SVM classifier separately, using different kernel functions, such as Linear, Gaussian, Polynomial, and Sigmoid kernels. The accuracy of CNN architectures when using different SVM kernel functions is shown in Table 3.

Table 3 The accuracy of CNN architectures using different SVM kernel functions

As shown by the results presented in Table 3, the classification accuracy increases when using the SVM classifier, and the accuracy ranges from 90.87 to 98.67%. The highest accuracy is obtained from the DenseNet201 CNN network which achieved accuracy of 98.67, 98.54, 96.32, and 98.43% using linear, Gaussian, polynomial, and sigmoid kernels, respectively. It can be noticed that the linear SVM kernel achieved the highest accuracy among other SVM kernels.

3.2.3 Results of scheme 3

In this section, the extracted features from the models are concatenated using different combination sets. This best combination of integrated features that affects classification performance is investigated during this phase. The combination starting with the feature type that gives the highest classification accuracy from the previous step. It starts to add more features set and tests the accuracy for the new combination feature set. If the accuracy is increased, it is kept, otherwise, it is ignored.

The feature set combination is started by the features of one model which is DenseNet201 which is in feature set 1. Then each next feature set combination consists of the previous feature set and one more model feature. The details of feature set combinations are shown in Table 4. Feature set 1 contains the features of DenseNet201 model. Feature set 2 consists of features of DenseNet201 and ResNet152. Feature set 3 consists of features of DenseNet201, ResNet152, and ResNet101. And so on in the rest of the feature sets combinations. The accuracy of the feature set results is shown in Fig. 4.

Table 4 The details of feature set combinations
Fig. 4
figure 4

The accuracy of different feature set combinations using different SVM kernel functions

These results show that feature set 5 achieved the highest accuracy among all other feature sets with an accuracy of 99.24% at the Linear SVM classifier. Feature set 5 has a size of 10,112 features which contains the concatenation features of DenseNet201, ResNet152, ResNet101, ResNet50, and InceptionV3 networks.

3.2.4 Results of scheme 4

This section displays the results of classifications for the Chi-Square feature selection method. It is used to reduce the dimension of features for some feature sets that achieved the highest performance obtained from the previous results. The number of features that produce the highest accuracy is determined using a sequential forward technique. This technique starts with 10 features and adds more features iteratively.

The Chi-Square feature selection approach is applied to the feature sets that achieve the higher accuracy which are feature sets 4, 5, 6, and 7. Table 5 shows the classification accuracy, number of features, and the percentage of reduction resulting from the linear SVM classification after applying the Chi-Square feature selection approach. The linear SVM is chosen because it has the highest classification performance among other SVM kernels.

Table 5 The classification results after the features selection approach

The results show that, in comparison to Scheme 3, only 3070, 4830, 5009, and 5930 features are selected using the selection approach as shown in Table 5, achieving the same classification accuracy. In Table 4 the number of features is 8064, 10,112, 12,160, and 12,672. The reduction percentages are 62, 52, 59, and 53%, from the feature set 4 to feature set 7, respectively.

3.3 Discussion of the results

This paper proposed a framework for classifying breast cancer tumors with high accuracy and low feature size based on Deep learning techniques. It consists of four stages to find the best scenario that improves the performance of the suggested method. In the first stage of the suggested framework, we investigated the behavior of the different models after applying the transfer learning method to each model. The obtained results accuracy from the used models is presented in Table 2. The accuracy of the models ranged between 90.4 and 98.3%. The DenseNet201 model has the highest accuracy, and the MobileNetV2 has the lowest accuracy. In the second stage, the extracted features from the models are used individually with SVM classifier and different kernel functions, Linear, Gaussian, Polynomial, and Sigmoid kernels. The accuracy of the different SVM kernels ranged between 90.87 and 98.67%, as illustrated in Table 3.

In the third stage, to improve the classification accuracy of the framework, the extracted features from the CNNs were integrated. Nine feature sets were produced, each containing a different combination of features, as in Table 4. Figure 4 shows that the classification accuracy increases with the size of the deep features, up to feature set 5 which achieved the highest accuracy of 99.24%. The linear SVM kernel achieved the highest accuracy as shown in the presented results. The complexity and computational time of the model increase when the size of the combined features increases. Therefore, in the fourth stage, the size of the integrated features is reduced to improve the framework performance and reduce the computational time and complexity of the model. A feature selection method is applied to feature set 4 up to feature set 7. It can be seen that the feature size is reduced with the same classification accuracy as shown in Table 5.

3.4 Comparison with related studies

Finally, to prove the competency of the proposed framework, the result achieved from the proposed framework is compared with related studies. The comparison is displayed in Table 6. These results prove the superiority of the suggested framework over other related methodologies. This is because the accuracy achieved in the proposed framework is 99.24%, which is higher than the accuracy of other studies. The accuracy of other related methods ranges from 90.91 to 98.7%. Due to the performance of the proposed framework, it can be used as a diagnostic tool to help pathologists classify breast cancer more accurately.

Table 6 Comparison between the proposed framework and other methods

4 Conclusion

This paper presented a computer-aided system based on deep learning techniques for classifying breast cancer tumors on histopathological images. The proposed framework contains six modules: pre-processing of images, transfer learning, features extraction, features concatenation, features selection, and classification stage. By comparing the concatenated deep features to individual features, the system examined whether the deep features concatenation could enhance the accuracy of classification. As a result of deep feature concatenation, the performance of the SVM classifier is improved. The Chi-Square feature selection method was applied to reduce the dimensions of the combined features using the concatenation approach. The results confirmed that while maintaining the same classification accuracy of 99.24%. The feature selection method succeeded in reducing the dimensions of features. The performance of the proposed framework is compared with the various related classification methods, and the results verified the efficiency of the suggested framework. Therefore, the proposed framework is considered a reliable diagnostic tool that helps pathologists to classify breast cancer tumors with high accuracy.