1 Introduction

Breast cancer affects a huge bulk of women yearly around the world, and it causes fatalities among women. As per the World Health Organization (WHO), breast cancer is an extremely popular sort of cancer worldwide in 2020 (Organization WH 2022) as indicated in Fig. 1. The survival rates vary among different countries, from 80% in North America, 60% in Japan and Sweden, to 40% in low-income nations (Masud et al. 2020). The rates of occurrence shown in Fig. 2 and mortality shown in Fig. 3 differ per country, depending on many circumstances including the environment, the availableness of modern medical care, socioeconomic levels, and so on Francies et al. (2020). As shown in Fig. 4, the mortality rates in nations with a bigger “low to middle” income population are rising every year due to the inability to obtain profitable resources. Several affluent countries, such as Australia, are also seeing an increase in the number of cases. Therefore, raising awareness about breast cancer and encouraging women to be screened is critical because early detection and diagnosis can save lives (Zuluaga-Gomez et al. 2021). As shown in Fig. 5, Egypt is one of the top countries having new cases of breast cancer in Africa in 2020. Breast cancer is the second most prevalent malignancy in Egypt in 2020 as shown in Fig. 6.

Fig. 1
figure 1

Approximate number of novel patients of different types of cancer in 2020 (Organization WH 2022)

Fig. 2
figure 2

Approximate number of novel breast cancer patients distributed by continents in 2020 (Organization WH 2022)

Fig. 3
figure 3

Approximate number of deaths due to breast cancer distributed by continents in 2020 (Organization WH 2022)

Fig. 4
figure 4

Approximate number of deaths due to breast cancer distributed by income in 2020 (Organization WH 2022)

Fig. 5
figure 5

Approximate number of novel breast cancer patients distributed by countries in Africa in 2020 (Organization WH 2022)

Fig. 6
figure 6

Approximate number of novel cancer patients distributed by type in Egypt in 2020 (Organization WH 2022)

Breast cancer is a disease in which the cells of the breast uncontrollably multiply (For Disease Control 2022). The main elements of the breast are the (1) ducts, (2) lobules, and (3) connective tissues. A duct is a tube that transports breast milk to the nipple. Lobules are the glands responsible for making milk. Connective tissues consist of fiber and fatty tissues and connect the entire components of the breast (Lawrence 2022). Breast cancer is more commonly found in the lobules or the ducts of the breast (Charishma et al. 2020). It begins in the breast tissue. Like other malignant tumors, they can enter and spread to the tissues around the breast. It may also propagate to other organs of the body, leading to the formation of additional tumors, a process called metastasis (Clinic 2022). It is vital to keep in mind that the predominant of breast lumps are not cancerous (i.e., malignant). Non-cancerous breast tumors are irregular masses that remain locally in the breast. Benign breast lipomas are rarely dangerous. However, they do increase the risk of breast cancer in women (Society 2022).

Symptoms of breast cancer can vary according to the affected patient. The majority of people are completely oblivious to any indicators (Melekoodappattu et al. 2022). The frequently obvious sign of breast cancer is a new tumor or mass in the breast tissue (Bakker et al. 2019). A lump in the breast or armpit is the most prevalent symptom. Skin changes, soreness, a nipple that pushes inward, and unusual discharge from the nipple are among the other symptoms (Benson et al. 2020). The risks of acquiring breast cancer rise with age. Every year, more than 80% of women with breast cancer are above 45 years old, with around 43% of women being 65 years old or older (Duffy et al. 2020).

Mammography is the gold standard for routine screening. It is critical to examine the screening data and deliver a diagnosis as correctly and fast as possible after collecting it (Zuluaga-Gomez et al. 2021; Melekoodappattu and Subbian 2020). Experts use mammography and ultrasound pictures to discover malignancies, which necessitate the use of specialist radiologists (Indra and Manikandan 2021). The most commonly used characteristics, such as shape, texture, density, and other characteristics, are characteristics manually configured according to the experience of the physician, that is, subjective characteristics. Although the traditional diagnosis approach is widely utilized, its accuracy can still be improved (Wang et al. 2019). Consequently, computer-aided diagnosis systems (CADs) are now widely employed to assist radiologists in making decisions when diagnosing malignancies (Ahmed et al. 2020). CAD systems can reduce radiologists’ workload and decrease the amount of false-positive and false-negative diagnoses (Elter and Horsch 2009).

With deep learning’s exceptional performance in detecting and recognizing visual items, as well as other applications, deep learning techniques to aid radiologists providing increased accuracy of interpreting mammographic scans have piqued people’s curiosity (Kim et al. 2018; Hamidinekoo et al. 2018). According to recent research, deep learning-based CAD systems perform the same goes for radiation in the standalone mode and even improve radiologists’ performance in the assisted mode (Shen et al. 2019). The Convolutional Neural Network (CNN) is a deep learning algorithm frequently applied in solving challenging problems. It is a representative learning algorithm that can automatically extract meaningful information from the original image without manually designing function descriptors (Khan et al. 2020). It solves the drawbacks of classic machine learning techniques. Traditional machine learning algorithms necessitate feature extraction, which necessitates the assistance of a domain expert (Zheng et al. 2014). In addition, choosing the right function for a specific situation is a daunting task. However, deep learning technology solves the feature selection problem by automatically extracting relevant features from the original input without the need for pre-selected features (Indolia et al. 2018). Due to recent performance improvements in image segmentation, detection, and classification, CNN has been successfully applied in medical imaging challenges (Sohail et al. 2021).

In the present work, a novel hybrid framework for the segmentation and classification of breast cancer images is proposed. The framework is composed of two phases, namely the classification phase and the segmentation phase. In the classification phase, the model is used to classify breast images into two categories (i.e., benign or malignant). To train the framework, four different datasets representing different modalities (i.e., MRI, Mammographic, Ultrasound images, and Histopathology slides) are used. The variety of data types ensures that the model can be used with all image types. Each of these datasets is classified into a different number of classes, hence the framework can perform both binary- and multi-class classification. Five pre-trained CNN architectures, namely MobileNet, MobileNetV2, NasNetMobile, VGG16, and VGG19, are used in the classification phase. To refine the performance of the different models, Aquila Optimizer (AO) is used to tune the hyperparameters of the different CNN architectures. During the segmentation phase, five different segmentation models are used, namely U-Net, Swin U-Net, Attention U-Net, U-Net++, and V-Net, to identify the region of interest in the ultrasound breast images.

1.1 Paper contributions

The key contributions of the current study are:

  • Proposing a novel hybrid framework for classification and segmentation of breast cancer images.

  • Using four different datasets for training purposes.

  • The proposed model can be used for MRI, Mammographic, Ultrasound images, and Histopathology slides.

  • The use of five pretrained CNN architectures for breast image classification.

  • AO is used to tune the hyperparameters of the different CNN architectures.

  • Using five models for the segmentation of ultrasound breast cancer images.

1.2 Paper organization

The remaining of the article is divided into six sections. Section 2 gives a survey of the current studies about the use of CNN for detecting breast cancer and the different segmentation techniques. Section 3 presents background about the necessary techniques used in the proposed framework. Section 4 explains in detail the proposed framework for the classification and segmentation of breast cancer images while section 5 gives the experimental results and their discussions. Section 6 is the conclusion, limitations of the current study, and trends for future work.

2 Related studies

This section presents a state-of-the-art survey about the use of CNN in the diagnosis of breast cancer. Then, a survey about the different segmentation techniques applied to breast cancer images is presented.

2.1 Related studies using CNN

Melekoodappattu et al. (2022) developed a system for diagnosing breast cancer using CNN and image texture attribute extraction. They could achieve accuracies of 98% and 97.9% on the MIAS and DDSM repositories, respectively. Wang et al. (2021) proposed a boosted EfficientNet CNN architecture for automatically detecting cancer cells in breast cancer pathology tissue as a solution to low image resolution. Sharma and Kumar (2021) created a deep learning system to identify breast cancer using histopathology photographs. They used the DenseNet201 CNN model for extracting features. Malignant and benign classification tasks are the two categories of classification tasks. Salama and Aly (2021) used images from three different datasets, namely Digital Database for Screening Mammography (DDSM), Mammographic Image Analysis Society (MIAS), and the Curated Breast Imaging Subset of DDSM (CBIS-DDSM). These images are classified as benign and malignant using various models such as DenseNet121, InceptionV3, VGG16, ResNet50, and MobileNetV2. The best-achieved accuracy is 88.87% using InceptionV3 with data augmentation. Chorianopoulos et al. (2020) applied three CNN models, namely MobileNet, VGG16, and AlexNet on two different datasets i.e., ultrasounds and histopathological images. The best accuracy was 96.82% achieved by VGG16 on the ultrasounds dataset. MobileNet achieved the best accuracy of 91.04% on the Invasive Ductal Carcinoma dataset. Hameed et al. (2020) used four distinct CNN models based on the pre-trained VGG16 and VGG19 structures, namely VGG16 Fully Trained, VGG16 Refined, VGG19 Fully Trained, and VGG19 Refined trained to classify histopathological images of noncancerous and cancerous breast cancers using their collected dataset. They found that the best accuracy was 92% achieved by VGG19 Fine Tuned model.

Dabeer et al. (2019) used CNN to identify breast cancer cells into benign or malignant classes. They obtained an accuracy of 99.86%. Alghodhaifi et al. (2019) experimented with two CNN models using depthwise separable convolution (IDCDNet) and standard convolution (IDCNet). Several types of activation functions were investigated, including Sigmoid, TanH, and ReLU. The best achieved accuracy was 87.13% achieved by standard convolution (IDCNet) with ReLU activation function. Saikia et al. (2019) compared multiple fine-tuned transfer learning classification approaches based on CNN to diagnose cell samples. Their suggested method was examined on a dataset containing 212 images of which 113 images are malignant. This dataset was extended to 2120 images of which 1130 images are malignant. Four CNN architectures, namely ResNet50, VGG16, VGG19, and GoogLeNetV3, were used in training. The Fine-tuned GoogLeNetV3 achieved the best accuracy of 96.25%. Ismail et al. (2019) compared the identification of breast cancer using two deep learning model networks, namely VGG16 and ResNet50, and applied the models on the IRMA dataset for classifying benign and malignant tumors. In terms of accuracy, VGG16 outperforms ResNet50 with a score of 94% compared to 91.7% for ResNet50. Mehra (2018) used three popular CNN models, namely ResNet50, VGG16, and VGG19, for both full training and forward learning for classifying histological images of breast cancer. Their best accuracy was 92.60% achieved by VGG16 with Logistic Regression (LR). Gao et al. (2018) proposed a Shallow-Deep CNN to classify patients as benign or cancer from mammography images. The shallow CNN is used to find the recombined images from low-energy images, while a deep CNN is applied to extract the unique features from these images. Their best-achieved accuracy was 90% using their proposed technique.

2.2 Related studies using segmentation

Salama and Aly (2021) used a modified U-Net model for segmentation of breast cancerous area from mammographic images. There proposed model could do both segmentation and classification with an accuracy of 98.87% using InceptionV3 plus a modified U-Net model with data augmentation. Byra et al. (2020) proposed a deep learning method using U-Net for breast mass segmentation in ultrasonography. They achieved an overall accuracy of 97.6%. El Adoui et al. (2019) created two CNN using SegNet and U-Net to suggest two deep learning algorithms for automatic segmentation of breast tumors in dynamic contrast-enhanced magnetic resonance imaging. The SegNet architecture achieved a mean intersection over union (i.e., accuracy) of 68.88%, whereas the U-Net architecture achieved 76.14%. Li et al. (2019) used the segmentation of masses for enhancing the accuracy of diagnosing breast cancer and lowering the mortality rate because breast mass is one of the most characteristic markers for the diagnosis of breast cancer. In their experiments, they used U-Net, attention U-Net, and DenseNet for segmentation and could achieve accuracy of 74.37%, 74.83%, and 77.93%, respectively. Alom et al. (2018) applied the Recurrent Residual U-Net for Nuclei segmentation from high-resolution histopathology images to extract the fine features from nuclear morphometrics. They could achieve a dice coefficient of 92.15% segmentation accuracy. Dalmia et al. (2018) compared the accuracy of different algorithms, namely VGGNet, U-Net, and V-Net. They could prove that a larger dataset combined with parameter adjustment would allow the model to generalize to previously unseen examples more efficiently, resulting in better training and validation outcomes. They could achieve an accuracy of 81.6%, 99.5%, and 99.6% for the different models respectively.

3 Background

This section presents background about the techniques used in the proposed framework.

3.1 Convolutional Neural Networks (CNN)

CNN is a class of deep learning used in handling image data (Bingli et al. 2021). It is inspired by the visual cortex in animals (Jogin et al. 2018). It is designed for automatic and adaptive learning of structures, hierarchical and spatial characteristics, and low-level to high-level patterns (Balaha et al. 2021).

3.1.1 CNN layers

CNN is usually made up of three types of layers: convolution layer, pooling layer, and fully connected layer. The earlier two layers (i.e. the convolutional and pooling layers) extract features. On the other hand, the last layer (i.e., fully connected layer) maps the extracted objects to the final output space (Balaha et al. 2021).

3.1.2 Parameters optimization

The choice of the right optimization method and the efficient tuning of the hyperparameters strongly influences the training speed and the final performance of the learned model (Zhang et al. 2021). The current study uses the Adam, AdaGrad, NAdam, AdaDElta, AdaMax, RMSProp, and SGD optimizers. Adaptive Moment Estimation (Adam) Optimizer effective when dealing with a huge problem containing multiple parameters (Kingma and Ba 2014). AdaGrad Optimizer adjusts the learning ratio based on the settings, making smaller updates for settings related to common features and huge updates for settings related to non-features regularly (Luo et al. 2019). Nesterov Adaptive Momentum (NAdam) calculates the velocity before the gradient (Dozat 2016) AdaDelta Optimizer extends AdaGrad as a trial to decrease the rate of excessive and monotonous learning rather than assembling the entire past squared gradients (Dogo et al. 2018). AdaMax Optimizer represents the updated version of Adam (Vani and Rao 2019). RMSProp Optimizer was proposed simultaneously to address Adagrad’s plummeting learning rate. RMSprop is the same as Adadelta’s first update vector (Wu et al. 2016). Stochastic Gradient Descent (SGD) Optimizer is a repetitive technique used for the optimization of the objective function with appropriate regularity properties. Otherwise, it updates the parameter for each input (Bottou 2012).

3.2 Aquila Optimizer (AO)

Aquila’s behavior in the wild while capturing victims is the main inspiration for the AO algorithm. Therefore, the optimization methods of the AO algorithm are presented in 4 methods. The first method is to select a search area by navigating up with vertical tilt. The second method is to explore inside a disparate search area by contour flight with a small glide attack. The third method is to explore inside a convergent search area by low-level flight with a sinking attack for slow prays. The fourth method is walking and grabbing the victim (AlRassas et al. 2021). The selection between the four methods is done based on specific parameters.

3.3 Image segmentation

Image segmentation is one class of digital image processing including splitting an image into various parts on the basis of the image’s properties and qualities (Singh and Singh 2010). The fundamental reason behind image segmentation is to simplify the image for ease in analysis (Norouzi et al. 2014). In diagnosing patients with cancer, the form of cancer cells is important in determining the severity of the cancer disease. The use of image segmentation technologies has had a significant impact in this area so that cancer cells can be correctly and accurately identified (Senthil Kumar et al. 2019).

Segmentation using U-Net Model: U-Net model was created for biological-image segmentation (Ronneberger et al. 2015). The U-Net architecture is essentially a network of encoders followed by a network of decoders (Habijan et al. 2019). Segmentation using Swin U-Net Model: The use of transformers (Vaswani et al. 2017) has extended from natural language processing (NLP) tasks to vision-related and segmentation tasks. Swin U-Net (Cao et al. 2021) implants pure transformer structure into the U-Net architecture for segmentation tasks. Segmentation using Attention U-Net Model: is almost built upon the well-known U-Net (Vaswani et al. 2017). The network consists of a reduced path for extracting features of locality and an extension path for resampling the image map using contextual information (Abraham and Khan 2019). Segmentation using U-Net++ Model: U-Net++ is a general-purpose image segmentation architecture that tries to address the shortcomings of U-Net (Zhou et al. 2019). The U-Net++ is made up of multiple U-Nets of different depths, with the decoders firmly coupled at the same resolution via a revised skip connection (Lu et al. 2021). Segmentation using V-Net Model: The V-Net method consists of two main parts, i.e. left and right sections. The left section contains the compressed path and is divided into various stages that operate at other resolutions with each stage having 1 to 3 convolution layers. On the other hand, the right section compresses the input until an initial size is reached (Abdollahi et al. 2020).

3.4 Performance metrics

All learning algorithms require a metric to evaluate performance (Balaha et al. 2021b). The most commonly used performance metrics are TN, TP, FN, FP, Accuracy, Recall (Sensitivity), Precision, F1-score, Specificity, AUC (Area under the curve), IoU Coefficient, Dice Coefficient, Cosine similarity, Hinge, and SquaredHinge (Balaha and Saafan 2021; Abdulazeem et al. 2021). These metrics are: TN is the rightly estimated negative values so that the actual class value is false and the estimated class value is also false. TP is the rightly estimated positive values so that the actual class value is true and the estimated class value is also true. FN is the actual class value is true but the estimated class value is false. FP is the actual class value is false and the estimated class value is true.

Accuracy is the ratio of rightly estimated observations to overall observations. Precision is the ratio of the rightly estimated positive observations to the overall estimated positive observations. Sensitivity (or Recall) is the ratio of rightly estimated positive observations to overall observations. F1-score is the weighted ratio of Recall and Precision. Specificity is the number of cases identified as negative from all the real negative cases. Area Under Curve (AUC) is the area under the Receiver Operating Characteristics Curve (ROC Curve). Intersection over Union (IoU) is also known as the Jaccard index. Dice Coefficient is a measure of similarity of the objects. Cosine Similarity is a similarity measure using Euclidean distance. Hinge Loss is a similarity measure using the loss function. Squared Hinge Loss is a square of the output of the hinge.

4 Methodology

The suggested approach is trained on four different types from four different modalities for versatility. This is important to guarantee the robustness of the model for all types of images. Each type of dataset has a variable number of classes. For this reason, the proposed framework can perform both binary and multi-class classification. The purpose of the current work is to suggest a novel hybrid framework for the classification and segmentation of breast cancer images. The phases of the proposed framework are presented in Fig. 7. The phases of the proposed framework are described in the next subsections.

Fig. 7
figure 7

The hybrid framework for the classification and segmentation of breast cancer images

4.1 Datasets acquisition phase

In the current work, four different datasets with different modalities (i.e., MRI, mammographic, Ultrasound images, and histopathology slides) are used to train the models. Magnetic Resonance Imaging (MRI) is recommended when soft tissue imaging is required. So, it is used for expose lesioned regions (Yurttakal et al. 2020). On the other hand, Mammography is the most commonly used technique for breast cancer diagnosis. It is an accurate technique that uses low-dose X-Ray to display the inner texture of the breast (Maitra et al. 2012). The ultrasound image is preferred due to many advantages including low cost and acceptable accuracy (Feng et al. 2017). On the other hand, the biopsy is defined as the process of extracting a sample or portion of a mass in the human body, usually called the biopsy sample, for further examination (Preetha and Jinny 2021). Histopathology means to analyze the biopsy sample by the specialist, usually called the pathologist. Therefore, histopathology images are microscopic images of the tissues of masses taken from the human body (Aswathy and Jagannath 2017).

The first used dataset is “Breast Cancer Patients MRI’s” from Kaggle which can be retrieved from https://www.kaggle.com/uzairkhan45/breast-cancer-patients-mris. This dataset contains 1,480 MRI images classified into Healthy (Benign) and Sick (Malignant). The second dataset is “Breast Cancer Dataset” from Kaggle which can be retrieved from https://www.kaggle.com/anaselmasry/breast-cancer-dataset. This dataset contains histopathology slides. The third dataset is Dataset_BCD_mammography_images_out downloaded from Kaggle from https://www.kaggle.com/anwarsalem/dataset-bcd-mammography-images-out. This dataset consists of 8 classes representing the severity of breast cancer. The fourth dataset is Breast Ultrasound Images Dataset (Al-Dhabyani et al. 2020) containing ultrasound images of 600 female patients with 780 images classified into normal, benign, and malignant. It can be downloaded from https://www.kaggle.com/aryashah2k/breast-ultrasound-images-dataset. Table 1 presents a summary of the datasets used in the current study. Samples from the acquired datasets are displayed in Fig. 8.

Table 1 Summary of the datasets used in the current work
Fig. 8
figure 8

Samples from the acquired datasets

4.2 Pre-processing phase

The used datasets are not found with the same size and hence the datasets are resized equally to the size of (100, 100, 3) for classification and (256, 256, 3) for segmentation in RGB mode. The present work uses 4 distinct scaling techniques. They are (1) normalization, (2) standardization, (3) min–max scaler, and (4) max–abs scaler. The equations for them are shown in Eqs. 1 to 4 respectively where \(X_{input}\) is the input image, \(X_{scaled}\) is the scaled output image, \(\mu\) is the mean of the input image, \(\sigma\) is the standard deviation of the input image. The used datasets are not balanced as shown in Table 1. To overcome this problem, data balancing using data augmentation approach is employed. The present work uses rotation, shifting, shearing, zooming, flipping, and brightness changing augmentation techniques. Table 2 shows the different augmentation techniques and the corresponding configurations.

$$\begin{aligned} {X_{scaled}}= & {} \frac{X_{input}}{\max {(X)}} \end{aligned}$$
(1)
$$\begin{aligned} {X_{scaled}}= & {} \frac{X_{input}-\mu }{\sigma } \end{aligned}$$
(2)
$$\begin{aligned} {X_{scaled}}= & {} \frac{X_{input}-\min {(X)}}{\max {(X)}-\min {(X)}} \end{aligned}$$
(3)
$$\begin{aligned} {X_{scaled}}= & {} \frac{X_{input}}{|\max {(X)}|} \end{aligned}$$
(4)
Table 2 The different augmentation techniques and the corresponding configurations

4.3 Segmentation phase

Segmentation is important to label the area of the tumor (i.e., the region of interest) to facilitate the diagnosis for the physician. Hence, the first processing phase of the framework is to apply segmentation. In the segmentation phase, five different segmentation models (i.e., U-Net Ronneberger et al. 2015, Swin U-Net Cao et al. 2021, Attention U-Net Abraham and Khan 2019, U-Net++ Zhou et al. 2019, and V-Net Abdollahi et al. 2020) are used to identify the region of interest in the ultrasound breast images.

4.4 Classification and hyperparameters optimization phase

Classification of medical images into their correct class helps physicians in their diagnosis. Hence, the second processing phase of the framework is to classify breast images into either benign or malignant. Medical data always suffers from scarcity. Therefore, five different pre-trained architectures (i.e., MobileNet Howard et al. 2017, MobileNetV2 Sandler et al. 2018, NasNetMobile Addagarla et al. 2020, VGG16 Swasono et al. 2019, and VGG19 Carvalho et al. 2017) are used. As mentioned, different optimizers are applied to tune the parameters of the different CNN architectures (i.e., Adam, AdaGrad, NAdam, AdaMax, AdaDelta, RMSProp, and SGD optimizers). The corresponding used equations are Eq. 5 for Adam, Eq. 5 for NAdam, Eq. 7 for AdaGrad, Eq. 8 for AdaDelta, Eq. 9 for AdaMax, Eq. 10 for RMSProp, and Eq. 11 for SGD.

$$\begin{aligned} \theta _{t+1}= & {} \theta _{t} - \frac{\eta }{\nu + \epsilon } \times m_t \end{aligned}$$
(5)
$$\begin{aligned} \theta _{t+1}= & {} \theta _{t} - \frac{\eta }{\nu + \epsilon } \times \left( \beta _1 \times m_t + \frac{\left( 1 - \beta _1\right) \times g_t}{1 - \beta _1^t}\right) \end{aligned}$$
(6)
$$\begin{aligned} \theta _{t+1}= & {} \theta _{t} - \frac{\eta }{G_t + \epsilon } \odot g_t \end{aligned}$$
(7)
$$\begin{aligned} \theta _{t+1}= & {} \theta _{t} - \frac{RMS[\Delta \theta ]_{t-1}}{RMS[g]_t} \times g_t \end{aligned}$$
(8)
$$\begin{aligned} \theta _{t+1}= & {} \theta _{t} - \frac{\eta }{\max ({\beta \times \gamma _{x-1}}, g_t)} \times m_t \end{aligned}$$
(9)
$$\begin{aligned} \theta _{t+1}= & {} \theta _{t} - \frac{\eta }{\sqrt{E[g^2]_t + \epsilon }} \times g_t \end{aligned}$$
(10)
$$\begin{aligned} \theta _{t+1}= & {} \theta _{t} - \eta \times \nabla _\theta \times J\left( \theta ; x^{(i)}; y^{(i)}\right) \end{aligned}$$
(11)

where \(m_t\) is the mean, \(\nu\) is the uncentered variance of the gradients, \(\eta\) is the step size, \(\epsilon\) is a small quantity used to prevent the division by zero, \(G_t \in \mathbb {R}^{(d \times d)}\) is a diagonal matrix, RMS is the root mean squared, \(g_t\) is the gradient of loss function, \(E[g^2]_t\) is the average of squared gradients, \(\beta _1\) and \(\beta _2\) are hyperparameters, \(\gamma _{x-1}\) is the average of squared past gradient, and \(x^{(i)}\) and \(y^{(i)}\) are the input and output pair respectively.

For better selection of the different hyperparameters, AO is introduced at the learning phase. As mentioned in Sect. 3.2, it depends on 4 updating methods. They are expanded exploration \((X_1)\), narrowed exploration \((X_2)\), expanded exploitation \((X_3)\), and narrowed exploitation \((X_4)\). For the expanded exploration \((X_1)\) method, Eq. 12 is used. \(X_M(t)\) is calculated using Eq. 13. where \(X_{best}(t)\) is the best location, \(X_M(t)\) is the average location of the entire Aquila’s in the ongoing iteration, t is the number of the ongoing iteration, T is the total count of iterations, N is the population size, and \(r_1\) is an arbitrary number in the range of 0 to 1. For the narrowed exploration \((X_2)\) method, Eq. 14 is used where \(X_R(t)\) is any arbitrary chosen location of Aquila, D is the size of the dimension, and \(r_2\) is an arbitrary value in the range of 0 to 1. LF(D) is the Levy’s flight and is calculated as shown in Eqs. 15 and 16 where s and \(\beta\) are fixed as 0.01 and 1.5, respectively, v and u are arbitrary number in the range of 0 to 1, and x and y represents the helix movement during the search and can be calculated function as shown in Eqs. 17 to 20 where \(r_3\) is the count of search cycles in the range from 1 to 20, \(D_1\) consists of integers in the range from 1 to D, and w is 0.005. For the expanded exploitation \((X_3)\) method, Eq. 21 is used where \(\alpha\) and \(\delta\) are adapting parameters set at 0.1, LB and UB are the lower and upper limits of the problem, and \(r_4\) and \(r_5\) are arbitrary values in the range of 0 to 1. For the narrowed exploitation \((X_4)\) method, Eq. 22 is used. QF(t) is calculated using Eq. 23, \(G_1\) is calculated using Eq. 24, and \(G_2\) is calculated using Eq. 25 where \(r_6\), \(r_7\), and \(r_8\) are arbitrary values in the range of 0 to 1, X(t) is the current location, QF(t) is the value of the quality function used to stabilize the search technique, \(G_1\) is an arbitrary value in the range of − 1 to 1 expressing Aquila’s movement during victim pursuit, and \(G_2\) is linearly reduced from 2 to 0 and represents the slope of flight when hunting victim.

$$\begin{aligned} X_1(t+1)= & {} X_{best}(t) \times \left( 1-\frac{t}{T}\right) +\left( X_M(t) - X_{best}(t) \times r_1\right) \end{aligned}$$
(12)
$$\begin{aligned} X_M(t)= & {} \frac{\sum _{i=1}^{N}{X_i(t)}}{N} \end{aligned}$$
(13)
$$\begin{aligned} X_2(t+1)= & {} X_{best}(t) \times LF(D) + X_R(t) + (y - x) \times r_2 \end{aligned}$$
(14)
$$\begin{aligned} LF(D)= & {} s \times \frac{u \times \sigma }{|v|^{\frac{1}{\beta }}} \end{aligned}$$
(15)
$$\begin{aligned} \sigma= & {} \frac{r \times (1+\beta ) \times sin{\left( \frac{\pi \times \beta }{2}\right) }}{0.5 \times r \times (1 + \beta ) \times \beta \times 2^{\frac{\beta - 1}{2}}} \end{aligned}$$
(16)
$$\begin{aligned} x= & {} r \times sin(\theta ) \end{aligned}$$
(17)
$$\begin{aligned} y= & {} r \times cos(\theta ) \end{aligned}$$
(18)
$$\begin{aligned} r= & {} r_3 + 0.00565 \times D_1 \end{aligned}$$
(19)
$$\begin{aligned} \theta= & {} -w \times D_1 + 1.5 \times \pi \end{aligned}$$
(20)
$$\begin{aligned} X_3(t+1)= & {} \left( X_{best}(t) - X_M(t)\right) \times \alpha - r_4 + \left( (UB-LB) \times r_5 + LB\right) \times \delta \end{aligned}$$
(21)
$$\begin{aligned} X_4(t+1)= & {} QF(t) \times X_{best}(t) - \left( G_1 \times X(t) \times r_6\right) \nonumber \\{} & {} \quad - G_2 \times LF(D) + r_7 \times G_1 \end{aligned}$$
(22)
$$\begin{aligned} QF(t)= & {} t^{\frac{2 \times rand - 1}{(1-T)^2}} \end{aligned}$$
(23)
$$\begin{aligned} G_1= & {} 2 \times r_8 - 1 \end{aligned}$$
(24)
$$\begin{aligned} G_2= & {} 2 \times \left( 1 - \frac{1}{T}\right) \end{aligned}$$
(25)

4.5 System evaluation phase

Different performance metrics are used in the current study as mentioned in Sect. 3.4. The corresponding equations for them are accuracy (Eq. 26), (2) precision (Eq. 27), (3) recall (i.e., sensitivity) (Eq. 29), (4) specificity (Eq. 28), (5) F1-score (Eq. 30), (6) AUC, (7) IoU (Eq. 31), (8) dice coef. (Eq. 32), and (9) cosine similarity.

$$\begin{aligned}{} & {} \text {Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \end{aligned}$$
(26)
$$\begin{aligned}{} & {} \text {Precision} = \frac{TP}{TP+FP} \end{aligned}$$
(27)
$$\begin{aligned}{} & {} \text {Specificity} = \frac{TN}{TN+FP} \end{aligned}$$
(28)
$$\begin{aligned}{} & {} \text {Recall} = \text {Sensitivity} = \frac{TP}{TP+FN} \end{aligned}$$
(29)
$$\begin{aligned}{} & {} \text {F1-score} = \frac{2 \times \text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$
(30)
$$\begin{aligned}{} & {} \text {IoU} = \frac{TP}{TP+FP+FN} \end{aligned}$$
(31)
$$\begin{aligned}{} & {} \text {Dice} = \frac{2 \times TP}{2 \times TP+FP+FN} \end{aligned}$$
(32)

4.6 Pseudocode of the proposed framework

The learning and optimization steps are calculated repeatedly for a predefined number of iterations \(T_{max}\). After the learning iterations are executed, the finest combined configuration can be reported and used in any further systems. Algorithm 1 presents a summary of the introduced overall parameters learning and AO hyperparameters optimization approach.

figure a

5 Experiments and discussions

This section presents the results of experiments applied to the proposed framework. Python is the used scripting language. The major used packages are Tensorflow, Keras, keras-unet-collection, NumPy, OpenCV, and Matplotlib. The working environment is Google Colab with GPU (i.e., Intel(R) Xeon(R) CPU @ 2.00GHz, Tesla T4 16 GB GPU, CUDA v.11.2, and 12 GB RAM).

5.1 Segmentation phase experiments and discussion

Table 3 summarizes the common configurations of the segmentation experiments. Table 4 summarizes the segmentation phase experiments results. It is clear that the “Attention U-Net” is better than other models concerning the loss, accuracy, F1-score, precision, specificity, AUC, IoU coef., and dice coef. However, the “V-Net” is better than others concerning the recall (i.e., sensitivity). Figure 9 summarizes the segmentation phase experiments results graphically. Figure 10 shows the result of applying the Attention U-Net on a sample image. It shows that the region of interest from the predicted mask is relatively comparable with the original mask region of interest.

Table 3 The segmentation phase experiments configurations
Table 4 The segmentation phase experiments results
Fig. 9
figure 9

The segmentation phase experiments graphical summarization

Fig. 10
figure 10

The result of applying the Attention U-Net on a sample image

5.2 Classification phase experiments and discussion

Table 5 summarizes the classification phase experiments configurations. The finest associations for every model applied on “Breast Cancer Dataset” dataset are documented in Table 6. It is clear that the Categorical Crossentropy loss is the best choice using two models. The SGD and SGD Nesterov parameters optimizers are the best choice by two models each. Standardization is the best choice using two models. The finest associations for every model applied on “Dataset_BCD_ mammography_images_out” dataset are documented in Table 7. It is clear that the KLDivergence loss is the best choice using three models. The SGD and AdaMax parameters optimizers are the best choice by two models each. The min–max scaler is the best choice using three models. The finest associations for every model applied on “Breast Cancer Patients MRI’s” dataset are documented in Table 8. It is clear that the Categorical Crossentropy and KLDivergence losses are the best choice by two models each. The AdaGrad and SGD Nesterov parameters optimizers are the best choice by two models each. The standardization and min–max scaling are the best choice by two models each. This presents multiple performance indices concerning the “Breast Cancer Dataset” dataset in Table 9. From them, the VGG19 pre-trained model gives the topmost results over other models. This presents multiple performance indices concerning the “Dataset_BCD_ mammography_images_out” dataset in Table 10. From them, the MobileNet pre-trained model gives the topmost results over other models. This presents multiple performance indices concerning the “Breast Cancer Patients MRI’s” dataset in Table 11. From them, the MobileNet, VGG16, and VGG19 pre-trained models are the best model compared to others. Figures 11, 12, and 13 present graphical summarization of the performance metrics concerning the “Breast Cancer Dataset”, “Dataset_BCD_ mammography_images_out”, and “Breast Cancer Patients MRI’s” datasets respectively.

Table 5 The classification phase experiments configurations
Table 6 the finest combined configurations concerning the “Breast Cancer Dataset” dataset
Table 7 the finest combined configurations concerning the “Dataset_BCD_ mammography_images_out” dataset
Table 8 the finest combined configurations concerning the “Breast Cancer Patients MRI’s” dataset
Table 9 The “Breast Cancer Dataset” dataset experiments performance metrics
Table 10 The “Dataset_BCD_ mammography_images_out” dataset experiments performance metrics
Table 11 The “Breast Cancer Patients MRI’s” dataset experiments performance metrics
Fig. 11
figure 11

Graphical summarization of the performance metrics concerning the “Breast Cancer Dataset” dataset

Fig. 12
figure 12

Graphical summarization of the performance metrics concerning the “Dataset_BCD_ mammography_images_out” dataset

Fig. 13
figure 13

Graphical summarization of the performance metrics concerning the “Breast Cancer Patients MRI’s” dataset

5.3 Comparative study

The results of the proposed framework against the related studies are shown in Table 12. As seen from these results, BCSF could achieve a 100% classification accuracy on MRI data, which is higher than the recorded accuracy. For segmentation, the achieved accuracy is better than most of the recent studies.

Table 12 A Comparative study among the proposed framework and the related studies

6 Conclusion and future work

This study introduced a hybrid framework for both classification and segmentation of breast images to diagnose breast cancer using CNN. The framework included two phases. The first phase is the segmentation phase. During this phase, the area of the tumor is detected to facilitate the diagnosis for the physician. Five different segmentation models are used, namely U-Net, Swin U-Net, Attention U-Net, U-Net++, and V-Net, to identify the region of interest in the ultrasound breast images. The used performance metrics are Accuracy, Recall, Precision, Specificity, F1-score, AUC, Sensitivity, IoU coef., dice coef., Hinge, and Squared Hinge. The second phase is the classification phase, in which breast images are classified. Five pretrained CNN architectures, namely MobileNet, MobileNetV2, NasNetMobile, VGG16, and VGG19 are applied. The choice of the different parameters for the used CNN architectures is done using seven different optimizers, namely Adam, AdaGrad, NAdam, AdaMax, AdaDelta, RMSProp, and SGD optimizers. On the other hand, Aquila Optimizer is used for the choice of the different hyperparameters of the various CNN architectures. For training purposes, three different datasets with different modalities are used to allow the diagnosis of breast cancer. Due to the differences in the datasets, a different number of classes for each type is available. Therefore, the suggested framework can perform both binary- and multi-class classification. The used performance metrics are Accuracy, Recall, Precision, Specificity, F1-score, AUC, Sensitivity, IoU coef., dice coef., TP, TN, FP, FN, and cosine similarity. the proposed framework can achieve a classification accuracy of 100% on MRI images. From the different segmentation methods, the best-recorded segmentation accuracy is 95.58% using Attention U-Net. However, the main limitations of the current work are: (1) limitation of data for training and (2) segmentation was applied to ultrasonic data only. As future work, we will apply the segmentation techniques to other types of images, namely MRI images. We also hope to apply other different optimization techniques such as the Red Deer algorithm (RDA) and Sine Cosine Algorithm (SCA). We also aim to use the suggested hybrid framework in other different medical imaging problems.