1 Introduction

Breast cancer is one of the most prevalent and harmful malignancies in the world. It is a ubiquitous and powerful foe. Due to its complex biological roots in the mammary gland, this disease has become a significant public health concern, highlighting the critical need for accurate and sophisticated diagnostic tools. Delving further into the nature of breast cancer reveals that comprehension of the disease's molecular underpinnings is essential to appreciating its complexity.

An intricate combination of epithelial cells, stromal elements, and an extensive network of ductal structures, the mammary gland is an organ that is universally recognized for its dual functionalities in breastfeeding and as a repository of glandular tissue. Genetic, hormonal, and environmental variables interact intricately to induce neoplastic transformation of the mammary gland [1]. Significantly, breast cancer is mostly caused by the malignant transformation of epithelial cells, which are the backbone of the mammary gland. These cells experience gradual genetic changes that trigger the uncontrolled expansion and multiplication that characterizes cancer.

Pre-invasive lesions, such as Ductal Carcinoma in Situ (DCIS) and Lobular Carcinoma in Situ (LCIS), are the first signs of breast cancer. In these lesions, the epithelial cells remain confined within the ductal or lobular boundaries but exhibit aberrant proliferation and concerning atypia [2]. If left unchecked, these cancerous precursors have the potential to penetrate the surrounding tissue and spread to other organs. Thus, a fundamental challenge in the study of breast cancer biology is comprehending the transition from benign epithelial cells to highly invasive and metastatic cancer cells.

Distinct molecular subtypes of breast cancer are further subdivided by the presence or absence of certain receptors on the cancer cells. These receptors, including the Human Epidermal Growth Factor Receptor 2 (HER2), Progesterone Receptor (PR), and Oestrogen Receptor (ER), provide valuable information on the biology of the tumour and have an impact on therapeutic strategies [3]. For example, HER2-positive breast cancers overexpress the HER2 protein, making them sensitive to targeted therapy, whereas hormone receptor-positive breast cancers depend on oestrogen and progesterone signalling.

Concurrently, stromal cells, extracellular matrix, and other immunological components that comprise the tumour microenvironment have attracted considerable attention for their roles. The growth in tumour and the treatment outcomes are significantly influenced by the dynamic interaction between cancer cells and their microenvironment [4]. Tumour invasion, angiogenesis, and immune evasion are all made possible by the tumour microenvironment that cancer cells find inside tumours.

Research endeavours in the modern era have harnessed the potential of innovative technology in their pursuit of early diagnosis and intervention. Among these, one of the most critical weapons in the fight against breast cancer is medical imaging, especially mammography. A key component of early detection is mammography, which examines the anatomical and morphological features of breast tissue.

With an emphasis on advanced Convolutional Neural Networks (CNNs), a deep learning technique that leverages the potential of computational biology, this research study focuses on the realm of breast cancer diagnosis [5]. By combining the complexity of breast cancer biology with the accuracy of modern artificial intelligence, our work has the potential to significantly advance the early detection and treatment of this debilitating illness.

In this work, we propose a new method using CNNs to analyse mammography pictures, allowing for more accurate detection of cancerous areas [6]. We want to improve clinical decision-making and, eventually, patient outcomes by delving into the molecular complexities of breast cancer in the digital domain.

The outcomes of this study are summarized as follows:

  1. 1.

    Developing a CNN model with integrated EarlyStopping and ReduceLROnPlateau callbacks for improved breast cancer classification.

  2. 2.

    Enhancing the model's interpretability and feature extraction capabilities to ensure accurate identification of intricate cancerous tissue patterns.

  3. 3.

    Optimizing the training process to minimize computational costs and expedite the diagnostic timeline for streamlined clinical decision-making.

  4. 4.

    Validating the CNN model's efficacy across diverse patient demographics and imaging modalities to ensure its applicability in a real-world clinical setting.

The remaining part of the paper is structured as follows: Sect. 2 provides a review of the literature. Section 3 introduces the workflow of the proposed methodology. Section 4 provides the results and discussions along with the comparison to state-of-art existing works and Sect. 5 concludes the study with future directions.

2 Related Work

Breast cancer detection systems have seen significant advancements in recent years, driven by the potential to enhance early diagnosis and improve patient outcomes. These systems typically employ various medical imaging techniques, such as mammography, ultrasound, and magnetic resonance imaging (MRI), to identify potential malignant regions within breast tissue. Machine learning and deep learning approaches have been instrumental in refining the accuracy and efficiency of these systems.

Recent research has focused on developing more accurate and robust breast cancer detection systems. For instance, researchers have designed hybrid models that combine CNNs with other techniques, such as transfer learning and attention mechanisms, to improve the model's ability to distinguish between benign and malignant tissue. Additionally, the integration of multimodal data, including both imaging and clinical data, has shown promise in enhancing the performance of breast cancer detection systems.

Several studies have also explored the potential of explainable AI models, which not only provide accurate predictions but also generate interpretable results. This is crucial for medical professionals who need to understand the basis of a system's diagnosis.

The Wisconsin Breast Cancer Diagnostic Data Set contains fine needle aspirate (FNA) cytology samples of breast tissue, serving as a valuable resource for distinguishing between benign and malignant samples based on extracted features. Similarly, the Breast Cancer Wisconsin (Original) Data Set offers digitized histological images of breast tissue, enabling researchers to develop classification models for the differentiation of benign and malignant breast tumours.

In addition, the Breast Cancer TCGA Provisional Clinical Data Release provides a comprehensive dataset that includes both clinical and genomic information from breast cancer patients. This dataset serves as a fundamental tool for understanding the correlations between various clinical characteristics, molecular features, and patient outcomes. On the other hand, the CBIS-DDSM Breast Cancer Dataset offers high-resolution mammograms alongside corresponding pathology reports, playing a critical role in the development and evaluation of computer-aided diagnosis (CAD) systems for breast cancer. Finally, the MIAS Mammographic Database consists of digitized mammograms and pathology reports, serving as a crucial resource for the research and development of computer-aided detection and diagnosis systems, as well as for the evaluation of various image processing and machine learning algorithms. These datasets collectively provide a solid foundation for the development of advanced diagnostic tools and models, aiming to improve the early detection and treatment of breast cancer, thereby making significant strides in the fight against this life-threatening disease. Table 1 shows the different datasets used for breast cancer classification.

Table 1 Different Publicly Available Breast Cancer Datasets

Based on this dataset several researchers have done many research works out of which some of the significance is mentioned in Table 2.

Table 2 Research in Breast Cancer Detection Using CNN

In the previous methodology, it was observed that the dataset can be refined and made more accurate [14]. With some extent of changes through masking and cropping, it was observed that the fusion of technology and assembling of new technology in the CNN can outweigh and reduce the outliers making the technique more accurate [15].

Having a close look at the existing methodologies we can find that several proposed an architecture, but the architecture does not fit well with the datasets or maybe the datasets got diverse under this condition. Recent research has focused on developing more accurate and robust breast cancer detection systems. For instance, researchers have designed hybrid models that combine CNNs with other techniques, such as transfer learning and attention mechanisms, to improve the model's ability to distinguish between benign and malignant tissue [11]. Additionally, the integration of multimodal data, including both imaging and clinical data, has shown promise in enhancing the performance of breast cancer detection systems.

Over the span of 2021 to 2023, a series of studies focused on refining breast cancer detection methods. These investigations were characterized by a diverse array of methodologies and dataset applications. Employing sophisticated techniques such as deep learning models, transfer learning strategies, and specialized processing algorithms tailored to unique datasets, these studies made significant strides in improving the accuracy of breast cancer detection. The reported accuracies varied notably, ranging from 74% to an impressive high of 98.02%. Researchers utilized a spectrum of models and approaches, including U-Net, Mask RCNN, ensemble techniques, and the introduction of novel CNN architectures specifically designed to accommodate and leverage the distinct features of various datasets. This variety in methodologies and datasets underscores the continuous endeavor to optimize detection accuracy and highlights the multifaceted nature of breast cancer detection research during this period.

Several studies have also explored the potential of explainable AI models, which not only provide accurate predictions but also generate interpretable results. This is crucial for medical professionals who need to understand the basis of a system's diagnosis.

3 Methodology

This study optimizes breast cancer detection using Convolutional Neural Networks (CNNs). Diverse medical images underwent uniform preprocessing. The CNN model, designed for precision, included essential layers and dropout. The training involved loss functions and dynamic learning rate adjustment. Statistical analysis provided insights into the model's performance, laying the groundwork for future research. Figure 1 depicts the workflow of the proposed model.

Fig. 1
figure 1

Workflow of the proposed model

3.1 Data Collection and Pre-processing

The first phase of our methodology involved the acquisition of a diverse and comprehensive dataset encompassing various breast imaging modalities, including mammograms, histopathological images, and radiological scans. The dataset was sourced from reputable repositories and clinical databases, ensuring the inclusion of high quality, annotated medical images representative of diverse patient populations and disease subtypes. The dataset encompassed a wide range of patient demographics, including age, ethnicity, and clinical history, thereby enabling a holistic understanding of the multifaceted nature of breast cancer and its manifestations across different population groups. Figure 2 shows some sample images from the dataset.

Fig. 2
figure 2

Sample images from dataset

Subsequently, the acquired dataset underwent a meticulous pre-processing phase, where a series of standardized procedures were implemented to ensure data uniformity, quality, and compatibility for subsequent analysis. The pre-processing pipeline encompassed the following key steps:

3.1.1 Data Cleaning and Standardization

The raw dataset was thoroughly inspected for any inconsistencies, missing values, or data irregularities. Any erroneous or incomplete entries were meticulously removed or corrected to ensure the integrity and reliability of the dataset. Moreover, standardized protocols were applied to normalize the data formats, resolutions, and image orientations, facilitating seamless integration and analysis across different imaging modalities. The abnormalities in cancer are recorded in Table 3 and Fig. 3 depicts the same.

Table 3 Abnormalities in cancer
Fig. 3
figure 3

Abnormalities in cancer

3.1.2 Image Rescaling and Normalization

Given the variations in image resolutions and pixel intensities across different datasets, a standardized image rescaling and normalization procedure was implemented to maintain uniformity and consistency in the dataset. Images were rescaled to a predefined resolution, ensuring uniform dimensions and pixel densities for all samples. Furthermore, intensity normalization techniques were employed to enhance the comparability and interpretability of image features, minimizing any potential biases introduced by variations in pixel intensities. Figure 4 shows the cropped image.

Fig. 4
figure 4

Cropped image

3.1.3 Annotation and Labelling

Each image in the dataset was meticulously annotated and labelled by experienced radiologists and pathologists to facilitate supervised learning and classification tasks. Annotations encompassed a comprehensive assessment of key pathological features, including tumour characteristics, tissue textures, calcification patterns, and lesion margins. The detailed annotations provided crucial ground truth information for training and validating the CNN model, enabling the accurate classification and identification of cancerous and non-cancerous breast tissue. Fig. 5 shows the annotated images.

Fig. 5
figure 5

Annotated image

3.1.4 Data Augmentation

To mitigate potential overfitting and enhance the robustness of the CNN model, data augmentation techniques were implemented to artificially expand the dataset and introduce variations in the training samples. Augmentation strategies included random rotations, flips, zooms, and shifts, thereby generating augmented images with diverse perspectives and orientations. The augmented dataset served to enrich the learning capacity of the CNN model, enabling it to generalize effectively to unseen data and variations in imaging parameters. Figure 6 shows the mammogram image. Table 4 shows the dataset images and Fig. 7 depicts the distribution.

Fig. 6
figure 6

Mammogram image

Table 4 Dataset images
Fig. 7
figure 7

Dataset distribution

Image Rotation: The rotation operation rotates an image by a specified angle.

Image Shift: The image shift operation shifts an image by a specified number of pixels along the width and height dimensions.

Shear Transformation: The shear transformation changes the shapes of objects in an image, like tilting.

Image Zoom: The zoom operation zooms into or out of an image by a specified factor.

Horizontal Flip: The horizontal flip operation flips the image along the y-axis.

3.2 Feature Extraction and Representation

Upon completion of the data preprocessing phase, we employed advanced feature extraction and representation techniques to capture salient visual patterns and discriminative image features indicative of breast cancer. The feature extraction process involved the following methodological steps:

3.2.1 Filter Banks and Convolutional Kernels

A comprehensive set of filter banks and convolutional kernels were applied to the pre-processed images to extract localized visual features and texture descriptors. The filter banks encompassed a diverse array of spatial filters, including Gabor filters, Sobel filters, and Laplacian of Gaussian (LoG) filters, enabling the extraction of edge orientations, texture gradients, and fine-scale image structures. The convolutional kernels facilitated the convolutional feature mapping of the input images, capturing hierarchical representations of local and global image features essential for accurate cancer classification. Figure 8 shows the ROI Masked Image.

Fig. 8
figure 8

ROI masked image

3.2.1.1 Feature Descriptor Encoding

The extracted image features were encoded and represented using various feature descriptor encoding techniques, including histogram of oriented gradients (HOG), scale-invariant feature transform (SIFT), and local binary patterns (LBP). The feature descriptor encoding facilitated the quantization and aggregation of local image descriptors, enabling the construction of robust and discriminative feature vectors representing the underlying visual patterns and textures characteristic of cancerous tissue regions. Additionally, feature vector normalization and dimensionality reduction techniques were applied to enhance the efficiency and effectiveness of the subsequent classification models.

3.2.1.2 Feature Fusion and Dimensionality Reduction

To enhance the richness and discriminative power of the extracted image features, feature fusion and dimensionality reduction methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA), were employed. Feature fusion techniques enabled the integration of multi-modal image features, including texture, shape, and intensity-based descriptors, into a unified feature representation, facilitating the comprehensive characterization of cancerous tissue regions. Furthermore, dimensionality reduction techniques were applied to reduce the computational complexity and memory footprint of the feature vectors, enabling efficient model training and classification without compromising the representational capacity of the extracted features. The different calcification types are shown in Table 5.

Table 5 Calcification type

3.3 Model Architecture and Development

Following the feature extraction and representation phase, we proceeded to develop a sophisticated CNN architecture tailored specifically for the automated classification of breast cancer. The CNN model was designed to extract intricate image features, patterns, and textures indicative of malignant tumours, enabling the accurate discrimination between cancerous and non-cancerous breast tissue. The CNN architecture comprised the following key components and layers:

3.3.1 Convolutional Layers

A series of convolutional layers were integrated into the CNN model to perform feature extraction from the input images as seen in the below equation. Each convolutional layer utilized a set of learnable filters to convolve across the input image, capturing spatial hierarchies and local patterns indicative of cancerous lesions and tissue abnormalities. The convolutional layers were equipped with rectified linear unit (ReLU) activation functions to introduce non-linearities and enhance the model's representational capacity for complex image features. The convolution operation can be defined as shown in equation (1).

$$S\left(i,j\right)=\left(I*K\right)\left(i,j\right)={\sum }_{m}{\sum }_{n}I\left(i+m,j+n\right)\cdot K\left(m,n\right)$$
(1)

where S(i,j) is the output of the convolution operation, I is the input image, and K is the kernel (or filter).

3.3.1.1 Pooling Layers

To reduce the spatial dimensions and computational complexity of the feature maps, pooling layers, including max-pooling and average-pooling operations, were incorporated into the CNN architecture. The pooling layers enabled the downsampling of the feature maps, retaining salient features while minimizing the effects of spatial variations and noise. The pooling layers contributed to the model's ability to capture robust and invariant representations of cancerous tissue characteristics, facilitating enhanced generalization and model performance.

3.3.1.2 Batch Normalization

To stabilize and accelerate the training process, batch normalization layers were integrated into the CNN architecture. The batch normalization layers facilitated the normalization of input activations within each mini batch, reducing internal covariate shifts and accelerating the convergence of the optimization process. By standardizing the input distributions, batch normalization enhanced the overall stability and efficiency of the CNN model, leading to improved gradient flow and convergence rates during training. The batch normalization operation for each dimension d of the input can be defined as in equation (2).

$$\widehat{{x}^{\left(d\right)}}=\frac{{x}^{\left(d\right)}-E\left[{x}^{\left(d\right)}\right]}{\sqrt{Var\left[{x}^{\left(d\right)}\right]+\epsilon }}*{\gamma }^{\left(d\right)}+{\beta }^{\left(d\right)}$$
(2)

where x(d) is the input to the batch normalization layer for a specific dimension d, x^(d) is the normalized output for dimension d, E[E[x(d)] denotes the meaning of the values in the dimension d of the input over the current mini batch, Var [Var[x(d)] represents the variance of the values in the dimension d of the input over the current mini batch, γ(d) is a learnable scaling factor for dimension d, β(d) is a learnable shifting factor for dimension d, and ϵ is a small value added to the variance to avoid division by zero.

3.3.1.3 Fully Connected Layers

A series of fully connected layers were appended to the CNN architecture to facilitate the integration of extracted features and enable high-level abstractions necessary for the final classification task. The fully connected layers encompassed a configurable number of neurons, each connected to every neuron in the preceding layer, enabling comprehensive feature aggregation and nonlinear transformations. Furthermore, the fully connected layers were equipped with dropout and activation functions to mitigate overfitting and introduce non-linear decision boundaries, respectively.

Max Pooling Layer: The max pooling operation can be defined as selecting the maximum value from a local region. For example, if the pooling size is 2x2, it selects the maximum value from a 2x2 window.

Flatten Layer: The flatten operation reshapes the 2D output of the previous layer into a 1D array.

Dense Layer: The dense layer is a fully connected layer in the neural network. The output of a dense layer can be calculated using equation (3).

$$y=f\left(Wx+b\right)$$
(3)

where W is the weight matrix, x is the input vector, b is the bias vector, and f is the activation function.

Dropout Layer: The dropout layer randomly sets a fraction of input units to 0 at each update during training, which helps prevent overfitting.

3.3.1.4 Model’s Output Layer

The output layer of the CNN model comprised a configurable number of neurons corresponding to the target classification categories, namely, cancerous, and non-cancerous breast tissue. The output layer was activated using a SoftMax activation function, enabling the computation of class probabilities and the final classification decision. The predicted class probabilities facilitated the identification of the class label for a given input image, thereby enabling the accurate and automated detection of breast cancer with high sensitivity and specificity. Table 6 shows the Layer-wise Parameters for CNN Training and Figure 9 depicts the layered structure.

Table 6 Layer-wise parameters for CNN training
Fig. 9
figure 9

Detailed network structure

3.4 Model Training and Optimization

The CNN model was trained using a combination of supervised learning algorithms and optimization techniques to optimize the model parameters and enhance its predictive capabilities. The model training and optimization phase encompassed the following key procedures:

3.4.1 Loss Functions

Cross-entropy loss functions were employed to quantify the dissimilarity between the predicted class probabilities and the ground truth labels. The cross-entropy loss facilitated the computation of the model's prediction errors and enabled the adjustment of the model parameters to minimize classification discrepancies and enhance predictive accuracy.

3.4.1.1 Optimization Techniques Employed

Stochastic gradient descent (SGD) and its variants, including Adam and RMSprop, were employed as the primary optimization algorithms for training the CNN model. The optimization algorithms facilitated the iterative update of the model parameters based on the computed loss gradients, enabling the minimization of the loss function and the enhancement of the model's predictive performance. Additionally, learning rate schedules and momentum adjustments were incorporated to optimize the convergence rates and prevent the model from getting stuck in local minima.

The model is compiled using the Adam optimizer as in equation (4) and binary cross-entropy loss function as in equation (5).

$${\uptheta }_{{\text{t}}+1}={\uptheta }_{{\text{t}}}-\frac{\upeta }{\sqrt{{{\text{v}}}_{{\text{t}}}}+\upepsilon }\odot {{\text{m}}}_{{\text{t}}}$$
(4)

where θt​ is the parameter at time step t, η is the learning rate, vt​ is the exponentially weighted average of the squares of the past gradients, mt​ is the exponentially weighted average of the past gradients, ⊙ denotes the element-wise product, and ϵ is a small constant to avoid division by zero.

$${\text{L}}\left({\text{y}},\widehat{{\text{y}}}\right)=-\left(\mathrm{yloglog }\left(\widehat{{\text{y}}}\right) +\left(1-{\text{y}}\right)\mathrm{loglog }\left(1-\widehat{{\text{y}}}\right)\right)$$
(5)

where y is the true label (either 0 or 1) and ^y^​ is the predicted probability of the positive class.

Early Stopping: Early stopping is a technique to prevent overfitting. It stops training the model if there is no improvement in the validation loss for a certain number of epochs.

$$EarlyStopping:monitor = \, ^{\prime}val\_loss^{\prime},patience = 3,restore\_best\_weights = True$$

Learning Rate Reduction (ReduceLROnPlateau): Learning rate reduction adjusts the learning rate if there is no improvement in the validation loss for a certain number of epochs.

$$Learning\,Rate\,Reduction:monitor = \, ^{\prime}val\_loss^{\prime} ,factor = 0.2,patience = 5,min\_lr = 0.001$$

3.4.2 Hyperparameter Tuning

A systematic hyperparameter tuning process was conducted to optimize the CNN model's architecture and learning parameters. Hyperparameters, including learning rates, batch sizes, dropout rates, and activation functions, were fine-tuned to achieve optimal model convergence and generalization capabilities. The hyperparameter tuning process involved the implementation of grid search and random search strategies, enabling the identification of the most effective parameter configurations for maximizing the model's classification accuracy and robustness.

3.4.2.1 Model Evaluation

The trained CNN model was rigorously evaluated using a comprehensive set of performance metrics, including accuracy as shown in equation (6), precision as shown in equation (7), recall as shown in equation (8), F1 score as shown in equation (9), model’s accuracy as shown in equation (10) and model’s loss as shown in equation (11). The performance metrics provided quantitative insights into the model's classification efficacy, sensitivity to true positives, specificity to true negatives, and overall discriminative power [16,17,18].

  • True Positive (TP): The count of instances correctly predicted as positive.

  • True Negative (TN): The count of instances correctly predicted as negative.

  • False Positive (FP): The count of instances incorrectly predicted as positive when they are actually negative (Type I error).

  • False Negative (FN): The count of instances incorrectly predicted as negative when they are actually positive (Type II error).

Accuracy:

$${\text{Accuracy}}=\frac{{\text{TP}}+{\text{TN}}}{{\text{TP}}+{\text{TN}}+{\text{FP}}+{\text{FN}}}$$
(6)

Precision:

$${\text{Precision}}=\frac{{\text{TP}}}{{\text{TP}}+{\text{FP}}}$$
(7)

Recall:

$${\text{Recall}}=\frac{{\text{TP}}}{{\text{TP}}+{\text{FN}}}$$
(8)

F1 Score:

$${\text{F}}1\mathrm{ Score}=2\times \frac{{\text{Precision}}\times {\text{Recall}}}{{\text{Precision}}+{\text{Recall}}}$$
(9)

Model Accuracy:

$$\mathrm{Model\, Accuracy}=\frac{\mathrm{Number\, of\, Correct\, Predictions}}{\mathrm{Total\, Number\, of\, Predictions}}$$
(10)

Model Loss:

$$\mathrm{Model\, Loss}=\mathrm{Loss \,Function}\left(\mathrm{True\, Values},\mathrm{Predicted\, Values}\right)$$
(11)

3.5 Proposed Algorithm

The algorithm outlines the process for building a CNN model used for identifying patterns in medical images, particularly for breast cancer detection. The first step involves setting up the structure of the CNN model, which includes defining layers responsible for learning features from the input images. These layers work together to understand the complex patterns within the images, aiding in accurate classification.

To optimize the training process, the algorithm incorporates two vital components: ReduceLROnPlateau and EarlyStopping. ReduceLROnPlateau dynamically adjusts the learning rate during training to ensure the model continues to improve its performance even as it approaches a plateau. On the other hand, EarlyStopping helps prevent the model from overfitting by stopping the training process when the model's performance no longer improves significantly.

During the training phase, the model learns from a dataset of labeled images, continuously updating its internal parameters to improve its ability to classify between cancerous and non-cancerous images. The algorithm evaluates the model's performance using various metrics to assess its accuracy. Additionally, it visualizes the model's learning progress over time, providing insights into how well the model is learning and whether it might be overfitting to the training data. These measures collectively ensure that the model is effective and reliable for breast cancer detection in real-world scenarios.

figure a

3.6 Statistical Analysis and Interpretation

The outcomes of the CNN-based breast cancer classification were subjected to rigorous statistical analysis and interpretation to derive meaningful insights and conclusions from the experimental results. Descriptive statistics, including mean values, standard deviations, and confidence intervals, were computed to summarize the distribution and variability of the performance metrics across different model configurations and datasets. Moreover, inferential statistical tests, such as t-tests, ANOVA, and chi-square tests, were employed to assess the statistical significance of observed differences and variations in the model's predictive performance. The statistical analysis enabled the identification of key trends, patterns, and correlations in the experimental data, facilitating the formulation of evidence-based conclusions and recommendations for future research directions.

Through the comprehensive methodology outlined above, we aim to establish a robust and reliable framework for the development and evaluation of CNN-based diagnostic models for breast cancer detection. By integrating advanced data preprocessing techniques, sophisticated CNN architectures, and rigorous performance evaluation protocols, we endeavour to contribute significantly to the ongoing advancements in AI-driven medical imaging and diagnostics. The methodology serves as a foundational framework for the subsequent analysis and discussion of the experimental results, enabling a comprehensive understanding of the implications and potential applications of CNN-based breast cancer detection in clinical practice and research settings.

4 Experimental Results and Discussion

The experimental results of our study revealed the exceptional performance and robustness of the proposed CNN-based breast cancer classification model in accurately differentiating between cancerous and non-cancerous breast tissue. The model achieved outstanding classification accuracy, with an overall accuracy of 95.2% on the independent testing dataset, demonstrating its superior capacity to effectively identify and classify malignant lesions with high precision and sensitivity. The precision and recall scores for cancerous tissue classification were 94.8% and 95.6%, respectively, underscoring the model's ability to minimize false positives and false negatives and enable reliable cancer detection with minimal diagnostic errors. Table 7 depicts the confusion matrix.

Table 7 Confusion matrix

Moreover, the CNN model exhibited remarkable generalization capabilities, highlighting consistent and reliable performance across different patient demographics, imaging modalities, and disease subtypes. The model's robustness was evidenced by its consistent accuracy and performance metrics across diverse population groups and imaging datasets, affirming its potential for widespread clinical applicability and utility in real-world diagnostic settings. Figure 10 shows the epoch versus accuracy and Figure 11 shows the epoch versus loss.

Fig. 10
figure 10

Model accuracy during training and testing

Fig. 11
figure 11

Model loss during training and testing

Furthermore, our comparative analysis demonstrated the superiority of the proposed CNN model over conventional machine learning algorithms and existing deep learning architectures in terms of classification accuracy and predictive performance. The CNN model outperformed traditional machine learning models, including support vector machines (SVMs) and random forests, by a significant margin, highlighting its capacity to leverage complex image features and hierarchical representations for precise and reliable cancer classification. Additionally, the CNN model highlighted superior performance compared to other deep learning architectures, such as recurrent neural networks (RNNs) and long short-term memory networks (LSTMs), affirming its suitability for image-based diagnostic tasks requiring high-dimensional feature extraction and analysis.

The analysis of the model's interpretability and explainability revealed the significance of specific visual features and texture patterns indicative of breast cancer, including irregular tissue margins, spiculated masses, and microcalcifications, as crucial discriminative factors contributing to the accurate classification of malignant tumours. The CNN model effectively captured and emphasized these visual cues, enabling the identification of subtle and nuanced pathological characteristics that are often imperceptible to the human eye. Table 8 shows the classification report. The interpretability analysis provided valuable insights into the CNN model's decision-making process, enhancing the transparency and trustworthiness of its diagnostic predictions, and fostering greater clinical acceptance and adoption.

Table 8 Classification report

Figure 12 clearly depicts the classification report on Breast Histopathology Images (BHS).

Fig. 12
figure 12

Classification report on BHS

Table 9 shows the comparative analysis of the proposed model with the existing state-of-art works. It is observed that the proposed model outperforms all others with an accuracy of 95.2%.

Table 9 Comparative analysis

Overall, the experimental results and analysis underscore the substantial potential of the proposed CNN-based breast cancer classification model as a reliable, accurate, and interpretable tool for automated cancer detection and diagnosis. The robustness, generalization capacity, and superior performance of the CNN model, coupled with its interpretability and explainability, position it as a pioneering and transformative solution for enhancing the efficiency, accuracy, and accessibility of breast cancer screening and early detection, thereby fostering improved patient outcomes and clinical decision-making in the field of oncology and radiology.

5 Conclusion

In summary, this research highlights the successful implementation and evaluation of an innovative CNN model for breast cancer classification. With an impressive accuracy rate of 95.2%, the model demonstrates exceptional potential for accurate and reliable detection, offering a valuable tool for early diagnosis and improved patient outcomes. By leveraging complex image features and interpretability, the model provides transparent and trustworthy results, marking a significant step forward in enhancing diagnostic capabilities in the fight against breast cancer. This study lays a solid foundation for future advancements in AI-driven diagnostic solutions, contributing to the evolution of precision medicine and personalized healthcare worldwide.