1 Introduction

Breast cancer is a life-threatening disease among women [37]. Premature detection of breast cancer can assist in providing better treatment and can save human life. Breast tumours are identified using Mammograms. But, the determination of malignancy of the tumour needs microscopic analysis where characteristics and distribution of nuclei are studied. To this end, breast histopathological image (BHI) analysis is unavoidable and is considered the “gold standard” for the determination of malignancy of the tumour. Histopathological smear analysis involves biopsy where a small tissue from the tumour region is extracted. Subsequently, stained glass slides are prepared that are further analysed for malignancy. Hematoxylin and eosin (H &E) stains are popular choices for staining, where it stains nuclei in dark purple colour and other tissues in light pink colour. Subsequently, a pathologist analyses these stained images for the determination of malignancy. But, manual analysis of BHIs is tedious and laborious. Furthermore, manual analysis of BHIs is subjective and varies from laboratory to laboratory. In this regard, computer-aided tools can facilitate better analysis of BHIs and ease the life of a pathologist. Further, such a system would reduce the subjective interpretations of images and assist in better diagnosis of diseases [23].

The recent past has seen an upsurge in the development of computer-aided tools for the analysis of BHIs [23]. Generally, traditional and deep learning-based methods are explored for the analysis of BHIs. Traditional methods utilized pipelined process which constitutes pre-processing, feature extraction and classification [4, 6, 28, 32]. Lately, deep learning-based approaches are widely developed for analysing BHIs as they are proven to provide better accuracy than traditional methods [20, 50]. Deep learning-based approaches require large annotated datasets. The absence of a standard 400x magnification BHI dataset limits the development of these methods. In addition, the high-resolution BHIs pose a significant challenge to the design of deep learning models as they demand higher computation and memory power. This issue is seldom discussed in the literature that has to be addressed to improve the performance of the models [51]. Moreover, the literature lacks models that focus on potential regions of interest such as nuclei to decide on nuclei. Hence, there is still scope for the development of effective feature extractors to handle 400x BHIs. Also, the heterogeneous characteristic of malignant samples, illumination and colour variations, presence of artefacts, etc. makes it challenging to develop a robust computer-aided tool for the analysis of BHIs. Hence, the present study seeks at developing a computer-aided system for the detection of breast cancer from high resolution 400x BHIs.

The present study proposes a new approach based on medical image analysis techniques to classify 400x magnification BHIs. In particular, the present paper is focused on the detection of fibroadenoma and ductal carcinoma (DC). These two cases are considered as it is the most common cause of breast cancer. At 400x magnification, a pathologist analyses the structure and characteristics of nuclei to decide on the malignancy of the tumour. More specifically, the pathologist looks at the presence of prominent nucleoli, mitosis, hyperchromatism, etc. Hence, an algorithm designed to classify 400x magnification BHIs should be capable of extracting the features concerning these important regions. However, no published reports are available in literature concentrating on these regions at 400x magnification. To this end, this work presents Channel-Wise Attention Net (CWA-Net), a new CNN architecture that uses a colour channel attention module to enhance the extracted features concerning the nuclei regions, thereby facilitating accurate classification of 400x BHIs. Further, this study proposes a new approach to extract features at different levels of the image. This allows the model to compensate for the loss of information that occurs when the images are resized. The heterogeneous characteristics of malignant tumours make it difficult for a single classifier to identify accurate decision boundaries. In this context, the present work explores an ensemble learning algorithm to classify 400x BHIs. Also, as a part of this study, 400x BHIs are collected and annotated. The contributions of the proposed work are as follows:

  • This study proposes a novel feature extractor that effectively extracts features from high-resolution BHIs. This feature extractor is designed such that it allows the model to offset the loss of information that occurs when the images are resized.

  • The study introduces a novel colour channel attention module that enables the model to focus on potential regions of interest. This attention module generates an attention mask that highlights the prominent regions of interest such as nuclei. This is further utilized to enhance the extracted features.

  • The study proposes a new CNN architecture termed CWA-Net for the classification of 400X BHIs. The architecture is designed to extract larger contextual features that are essential for the accurate determination of malignancy.

The paper is organised as follows: Sect. 2 summarizes recent literature concerning BHI classification. Section 3 describes the adopted methodology for the classification of BHI. Section 4 demonstrates the various experiments carried out to evaluate the developed model. Finally, the conclusion of the study is presented in Sect. 5.

2 Related work

Several studies were carried out for the classification of BHI [42, 43, 46, 49], yet there exist some challenges. At 400x magnification, the pathologist analyses the nuclear enlargement, nuclear-cytoplasmic ratio, mitotic count, pleomorphism, the structure and pattern of the nuclei, etc. to decide on the malignancy. Hence, studying the characteristic of potential regions of interest such as nuclei can reveal the aggressiveness of cancer. However, there are limited reports available in the literature that focus on the potential regions of interest. This section presents a summary of the computer vision-based methods proposed in the literature for the classification of BHIs.

2.1 Traditional methods

Before the deep learning era, traditional methods were extensively used for the analysis of BHIs. These approaches relied on handcrafted features to describe the data. In these methods, the focus was on feature engineering as it has a significant impact on the classification results. To this end, texture features such as LBP, Gabor Filters and GLCM were used predominantly. In [11], the authors relied on Gabor filters for features extraction and spectral clustering for dimensionality reduction. In a separate study [32], the authors used texture features such as local binary patterns along with a random decision tree to detect stromal regions. The authors in [13] utilized GLCM and GLRLM for extracting texture features from identified nuclei regions. Subsequently, KNN algorithm was used for the classification of the images. A few studies such as [3, 25] and [10] relied upon clustering algorithms such as K-Means for the detection of potential regions of interest. It is also seen that support vector machines (SVM) and decision trees were widely investigated for the task of classification [4, 6, 28]. However, the traditional methods are dependent on the handcrafted features which may fail to capture all the characteristics of the dataset. Hence, these models may not generalize well on new test cases. Further, the design of these features requires domain knowledge.

Fig. 1
figure 1

Overview of the proposed model for classification of 400x BHIs. The input colour normalized image is divided into uniform grid and each image patch is passed to the patch-level feature extractor. Also, the colour normalized input image is resized and passed through image-level feature extractor. Finally, these features are concatenated and given to ensemble classifier for classification

2.2 Deep learning methods

The recent past has seen an upsurge in the usage of deep learning models for the analysis of histopathological images [17, 21, 38, 45]. This is due to their capability to model complex data/patterns. Consequently, several reports are available in the literature that focuses on the task of image classification and purely relies on deep learning framework such as VGG-16 [36], VGG-19 [36], ResNet [19] and Inception [40]. In a separate study, the authors designed a deep learning model based on fully convolutional neural network for BHI classification [15]. The authors in [7] proposed a DCNN architecture that efficiently computes the model weights and achieves faster backpropagation. Han et al. [18] introduced a novel DCNN architecture for multi-class classification of BHIs. Roa et al. [9] worked on the detection of breast cancer from WSI images using CNNs. However, these methods focus on the global image features and may ignore the features concerning the nuclei regions. This may result in the wrong analysis of the images.

A few studies focused on providing attention to nuclei regions before deciding on the classes as these regions contribute largely to the decision. For example, in [41], the authors introduced BreastNet architecture that combines attention modules and hypercolumn techniques to achieve better classification accuracy. In a separate study, the authors proposed a new framework that focuses on nuclei regions [52]. A few studies focused on feature extraction at different magnifications for the classification of BHI [26, 49]. In [26], the authors introduced a new CNN architecture to handle class variance and feature extraction at varying magnification. Yan et al. [49] proposed a new approach to extract features at multi-level by combining CNN and RNN. A few studies focused on designing attention modules to allow the model to focus on specific regions [48, 50]. Our work is similar to these approaches. However, the study proposes a simple attention module that uses various colour channel information to provide attention to nuclei regions.

Fig. 2
figure 2

A few sample illustration of colour normalization. First row shows input images. Second row shows colour normalized images. The input images are of different colour distribution. However, the colour normalized images have similar colour distribution which facilitates accurate classification of BHIs

Recently, transfer learning is found to improve the performance of deep learning models in the absence of a large dataset [1, 2, 8, 12, 22, 35, 44]. To this end, in [22], the authors utilized a transfer learning strategy to detect and classify BHIs. A similar study on transfer learning was carried out by Du et al. [12] for the classification of BHI. The pre-trained architectures namely ResNet50 and DenseNet-161 were used to extract the features and to detect invasive ductal carcinoma by Celik et al. [8]. Alzubaidi et al. [2] utilized the transfer learning technique to handle the inadequacy in training datasets. In a separate study [35], the authors demonstrated the significance of layer-wise fine-tuning of the per-trained network. A transfer learning approach with block-wise fine-tuning was utilized to learn the best features from the images to handle magnification dependent and magnification independent binary and eight class classification problems [5]. Recently, generate adversarial network gained lots of attention for addressing the problem of data imbalance [16]. In [33], the authors developed a model based on GAN to tackle the problem of data imbalance.

Despite these developments in the application of deep learning models to the analysis of BHIs, the deep learning models are dependent on the availability of large datasets. Moreover, providing annotations to BHIs is challenging and requires domain knowledge. In addition, deep learning models are limited by the input image size which often leads to resizing of the images. This can cause a loss of essential information that is required for the accurate classification of BHIs. To this end, this paper presents a new model to capture features at different levels of the image. Also, an attention module is proposed to enhance the features of the potential regions of interest.

3 Methodology

This study proposes a novel CNN model named CWA-Net for the classification of fibroadenoma and DC at 400x magnification. In the case of fibroadenoma, the nuclei are generally attached to the basement membrane in a form of single sheets. However, in the case of DC, the nuclei will penetrate the membrane and spread to the stromal regions. Hence, the nuclei will be distributed in the stromal regions. The present study introduces a new approach that is designed to focus on nuclei regions. In the present study, a hybrid approach is considered where the features are extracted using CNN and classified using ensemble methods. The details of the proposed method are described in this section.

3.1 Overview of the proposed system

The given input image I is colour normalized using the Reinhard method [31]. The colour normalized images are further processed for feature extraction. The present work proposes to extract patch-level and image-level features required for the accurate representation of BHI. The input image I is divided into uniform grids of image patches \(I_p=\{I_1, I_2,..I_n\}\). The patch-level feature extractor takes smaller images patches \(\{I_1, I_2,..I_n\}\) as input while the image-level feature extractor takes the resized input image \(I_r\). The patch-level feature extractor extracts local patch feature \(f_p\) and image-level feature extractor extracts global image feature \(f_i\). The patch-level feature extractor allows the model to compensate for the loss of information that occurs when the images are resized. Also, a colour channel attention model is used to refine the extracted features. The colour channel attention module produces an attention mask that highlights the potential regions of interest which is crucial for the classification of 400x BHIs. Finally, the features \((f_p)\) and \((f_i)\) are concatenated and given to a classifier for classification. An ensemble-based classifier is adopted for classifying the BHI. The overview of the proposed system is shown in Fig. 1.

3.2 Colour normalization

In general, BHI images undergo colour variations due to the difference in staining procedure, slide preparation, etc. These colour variations often alter the end results. In this regard, a pre-processing step known as the colour normalization procedure is popularly utilized for minimizing the colour variations. Colour normalization transfers the colour distribution of the reference/template image to the input image. The present study uses the Reinhard method [31] to normalize the colour distribution of input images. A brief overview of this method is described here.

The Reinhard method operates in LAB colour space thus allowing the model to handle colour and illumination variation. Initially, the mean and standard deviation of the L, A and B colour channels of input and template images are estimated. Subsequently, the pixel intensity values are subtracted from the estimated mean values. The standard deviations of the template image are divided by the standard deviation of the input image. This ratio is further utilized to scale the pixel intensity values of the input image. A few sample input images and their corresponding colour normalized image are shown in Fig. 2.

Fig. 3
figure 3

Architecture of the proposed model for classification of BHI at 400x magnification

3.3 Feature extractor

CNNs are widely used to extract features as they are capable of extracting more complex features that represent the dataset [24, 27, 34]. However, the computation and memory requirement of a CNN model is dependent on the size of the input image. That is, as the size of the input image increases, the memory requirement and computation involved increase. In general, the BHIs are captured at a higher resolution thus making it challenging to design CNN models. To this end, several methods are proposed in literature where the images are resized to reduce the image dimension. But, this results in loss of information and can lead to the wrong classification of images. Therefore, in the present study, a new feature extractor is proposed to compensate for the loss of information that occurs when the images are resized. The proposed feature extractor has two modules namely, patch-level and image-level feature extractor. The input image is divided into grids of uniform size. The original image is divided into sub-images and each of these image patches is given to the patch-level feature extractor that focuses on the local image patch features. However, the patch-level features dose not capture the long-range spatial correlation between the pixel. To this end, the image-level feature extractor is used that takes as input resized image. The image-level feature extractor is responsible for extracting the features from the entire image. Both features are finally concatenated to obtain the final feature vector that describes the input image.

In the case of 400x magnification BHI, a pathologist studies the characteristics of nuclei rather than its distribution. Hence, a feature extractor needs to be designed such that it extracts features from these regions more effectively. In general, colour information of the BHI images plays an important role since H &E stains nuclei in dark purple colour and other tissues in light pink colour. For example, the G plane in RGB colour space highlights nuclei more accurately as compared to the R and B planes. Similarly, L and S channels from LAB and HSI colour space highlights nuclei regions. An example of this is illustrated in Fig. 4. In general, each colour channels highlights various colour characteristics of the images. Including these colour channels for feature extraction aids in better representation of the image. Consequently, the present study proposes to utilize multiple colour channels for feature extraction. Specifically, 6 colour channels are considered and they are R, G, B from RGB colour space, L and A from LAB colour space and S from HSI colour space. Hence, the input to the feature extractor has a depth of 6 colour channels. The architecture of the proposed feature extractor is shown in Fig. 3.

Fig. 4
figure 4

Visualization of different colour channels

3.3.1 Image-level feature extractor

The input to this model is a resized image of dimension \(H \times W \times C\) where H is the height of the image, W is the width of the image and C is the number of colour channels. The proposed feature extractor has 4 conv-blocks. Each conv-block has 3 sets of convolutional layers followed by a batch normalization layer and a ReLU activation layer. Each convolution layer applies a filter of size \(3 \times 3\). Maxpool layers are popularly used for reducing the dimension of the feature maps and propagate the most prominent features to the deeper layers. However, this results in the loss of spatial resolution. Hence, in the present study, stride operation is utilized to reduce the dimension of the feature maps. The last convolution layer in each conv-block uses a stride set to 2. This is further passed through the Atrous spatial pyramid pooling with a dilation rate set to 2 and 4 at the bottleneck. This is utilized to capture larger contextual information which is essential for identifying malignant cases. Furthermore, the output of the channel-wise attention module is utilized to refine the extracted features. The output of the colour channel attention module is added with the extracted features. This refines the extracted features which is further passed to the global average pooling layer which captures the average representation of the feature maps. Finally, a sigmoid layer is applied to perform classification.

3.3.2 Patch-level feature extractor

The input image is divided into uniform image patches. Each of the image patches is further processed for feature extraction. This allows the model to extract features without reducing the image dimension and hence reduces the loss of information. In the present study, the patch-level feature extractor is similar to the image-level feature extractor. However, the patch-level feature extractor does not use Atrous spatial pyramid pooling layer since it is unnecessary at the patch-level and is redundant.

Fig. 5
figure 5

Proposed channel-wise attention module

3.4 Colour channel attention module

Each colour component highlights different aspects of the images such as nuclei and background. The convolution layers extract features from the image. However, these features may not capture prominent features from the region of interest. Hence, an attention module that refines these feature maps can aid in improving the representation of the image. In the present study, a colour channel attention module is designed such that it captures the correlation between the colour channels. This attention module is inspired from the approach proposed in [14].

The proposed attention module is shown in Fig. 5. The attention module takes stacked input image I with 6 colour channels of dimension \(H\times W\times C\) where H represents the height, W represents the width and C represents the 6 colour channels. We initially reshape the input image to \(C\times H*W\) represented as \(I'\). We then transpose it to produce \(I^T\). Then, we multiply the \(I'\) with \(I^T\). This allows the model to capture the inter-dependencies between the colour channel. The output of this is further passed through the softmax layer to obtain a channel-wise attention layer. This is transposed and multiplied with the input I. Finally, a 1x1 convolution layer is applied to produce the final attention mask. This attention mask is added to the features extracted by the convolution layer. This addition enhances the features of potential regions of interest which is crucial for the classification of 400x BHIs.

3.5 Classifier

The heterogeneous characteristics of the malignant samples make it difficult to properly define the decision boundaries. In this context, ensemble methods are found to be a popular choice for classification as it uses multiple models to decide on malignancy. The present work uses dynamic classifier selection (DCS) [47] to classify BHI. The DCS algorithm utilizes a pool of classifiers to predict the class labels of the test sample. The K-nearest neighbour (KNN) algorithm is utilized to identify the K nearest neighbours of the test sample in the training set. Subsequently, the best model is identified by evaluating the pool of classifiers on these identified K neighbours. In the present study, two pools of classifiers are considered for comparison. The first pool (pool-1) consists of a 10 decision tree (DT) classifier. The second pool (pool-2) consists of SVM, decision tree, random forest and Gaussian Naive Bayes classifiers. Both these pools of classifiers are evaluated for the classification of BHIs.

Fig. 6
figure 6

A few sample images of KMC breast histopathological image dataset. First row shows benign samples and second row shows malignant samples

Table 1 Details of KMC breast histopathological image dataset
Fig. 7
figure 7

Accuracy and loss plot of training and validation set. a and b shows the accuracy and loss plot of patch-level feature extractor. c and d shows the accuracy and loss plot of image-level feature extractor

Fig. 8
figure 8

Image-level heatmap visualization of colour channel attention module on benign BHIs. a, b, c and d shows benign samples. e, f, g and h shows the corresponding heatmaps

Fig. 9
figure 9

Image-level heatmap visualization of colour channel attention module on malignant BHIs. a, b, c and d shows malignant samples. e, f, g and h shows the corresponding heatmaps

Fig. 10
figure 10

Patch-level heatmap visualization of colour channel attention module on benign BHIs. a, b, c and d shows benign samples. e, f, g and h shows the corresponding heatmaps

Fig. 11
figure 11

Patch-level heatmap visualization of colour channel attention module on malignant BHIs. a, b, c and d shows malignant samples. e, f, g and h shows the corresponding heatmaps

4 Results

This section initially presents the performance metrics and the details of dataset utilized to evaluate the proposed model. Subsequently, a discussion on model training, colour channel attention module and selection of parameters is presented. Finally, the proposed method is compared with other methods and the obtained results are presented.

4.1 Performance metrics

Metrics such as precision, recall, F1-score and accuracy are used to quantitatively evaluate the developed model [30]. These metrics are calculated based on true positive, true negative, false positive and false negative values. In general, precision, recall and F1-score are used to evaluate the proposed model. However, these metrics may not convey the performance of the model in the presence of class imbalance problem. Hence, in the present study, Cohen’s Kappa statistics are also utilized to compare the observed accuracy with that of the expected accuracy.

4.2 Datasets

The proposed model is evaluated on two datasets for validation. KMC dataset In the present study, breast histopathological images at 400x magnification are collected manually from the department of pathology, kasturba Medical College (KMC), Manipal, India. A total of 1, 516 images are collected with a resolution of 1600x1200 pixels. These images are collected with OLYMPUS CX31 microscope. The images are collected concerning fibroadenoma and ductal carcinoma as it is the most common type of breast cancer. Further, these images are annotated based on the supervision of a domain expert. The collected images constitute 759 benign cases and 757 malignant cases. The dataset is further divided into train and test splits consisting of 916 and 600 images. The test set consists of 300 benign and 300 malignant cases chosen at random. The details of this dataset are given in Table 1. A few images of benign and malignant cases from this datasets are shown in Fig. 6.

BreakHis dataset This dataset consists of breast histopathological images collected at different magnifications such as 40x, 100x, 200x and 400x. In the present study, 400x images are considered. The images are provided at a resolution of 700x400. The train and test split consist of 635 and 550 images, respectively. The test split consists of 250 benign and 300 malignant samples. The details of this dataset can be found in [37].

4.3 Model training

In the present study, the image and patch-level feature extractors are trained independently. The weights of these models are initialized using the normal distribution. Batch normalization layers are used to normalize the output of each convolution layer. Both image-level and patch-level models are trained using binary cross-entropy loss function. The training and validation loss and accuracy curves are widely used to demonstrate model convergence and identify overfitting. Hence, in the present study, training and validation loss and accuracy curves are plotted for both image-level and patch-level feature extractors (Fig. 7). It is observed that, for patch-level feature extractors, the training and validation loss reduces and plateaus at around 20 epochs. Similarly, the training and validation accuracy curve increases and saturates at around 20 epochs. A similar pattern is observed for the loss and accuracy curves of the image-level feature extractor. Also, it is important to note that for both the feature extractors, the training and validation accuracy curves are closer to each other indicating that the model has not been overfitted. The DCS classifier is trained independently on the training data. Further, the selection of DCS hyperparameters is given in section 4.5.

4.4 Evaluation of colour channel attention module

This study proposes an attention module that captures the inter-dependencies between colour channels. The proposed colour channel module takes multi-colour channels as input and produces attention masks that are further utilized for refining the extracted features. This enables the proposed model to focus on potential regions of interest such as nuclei to decide on the malignancy of the tumour.

The proposed colour channel attention module is utilized in both patch-level and image-level feature extractors. This module is trained individually during the model training. The evaluation of the proposed colour channel attention module is carried out by visualizing the attention masks on both malignant and benign samples. The image-level heatmap visualizations of the attention masks produced by the colour channel attention module are shown in Fig. 8 and Fig. 9. Similarly, Fig. 10 and Fig. 11 represent the attention mask produced at patch-level feature extractor. It is observed that the activations concerning the nuclei regions are higher as compared to other regions. Specifically, it is seen that, in the benign case, the activations near the membrane region where the nuclei are attached is higher as compared to other regions. Similarly, in the case of malignant cases, the activations are higher near the nuclei regions. Hence, adding this attention mask with the feature maps of the feature extractor allows the model to enhance the feature concerning these potential regions of interest. This in turn enables the model to focus on potential regions of interest to accurately decide on the malignancy of the tumour. This result is significant as it demonstrates the effectiveness of the colour channel attention module in highlighting the potential regions of interest such as nuclei at 400x magnification.

Fig. 12
figure 12

Accuracy and Kappa value comparison of pool-1 classifiers with different values of K. a On KMC dataset. b On BreakHis dataset

Fig. 13
figure 13

Accuracy and Kappa value comparison of pool-2 classifiers with different values of K. a On KMC dataset. b On BreakHis dataset

Table 2 Performance comparison of pool-1 classifiers on KMC dataset. C’ represents Decision tree classifiers
Table 3 Performance comparison of pool-2 classifiers on KMC dataset
Table 4 Performance comparison of pool-1 classifiers on BreakHis dataset. C’ represents Decision tree classifiers
Table 5 Performance comparison of pool-2 classifiers on BreakHis dataset
Table 6 Performance comparison of proposed and other methods on KMC dataset
Table 7 Performance comparison of proposed and other methods on BreakHis dataset

4.5 Parameter selection

The heterogeneous characteristic of malignant samples makes it difficult to identify accurate decision boundaries. Hence, a single classifier may not be sufficient to classify these images. In this context, the present work utilizes an ensemble learning strategy to classify the images. In the present study, a DCS is utilized to classify the 400x BHIs. This is an ensemble-based approach that uses a pool of classifiers to decide on the class labels. This allows the model to select the best classifier in the local neighbourhood. However, the selection of classifiers for the pool of classifiers is crucial. In the present study, the performance of two pools of classifiers is investigated for the classification of 400x BHIs. The first pool consists of 10 decision tree classifiers as it has been proven to provide better performance as compared to other classifiers [47]. However, the selection of classifiers depends on the data. Hence in the present study, the second pool consisting of SVM, random forest, decision tree and Gaussian NB is also evaluated for the classification of BHIs.

The DCS algorithm considers a local neighbourhood from the training set to select the best classifier from the pool. To this end, a KNN classifier is utilized. Hence, the selection of K value plays an important role in the overall performance of the model. A higher value of K may reduce the performance of the classifier since a higher value of K may result in the inclusion of samples belonging to other class. Hence, K value should be chosen carefully. In the present study, K value is selected from the set \(\{5, 15,..., 95\}\). The value is selected at an interval of 10. Both the pool of classifiers are evaluated on both KMC and BreakHis dataset with different values of K value. Figure 12 and Fig. 13 show the comparison of accuracy and Kappa values of the proposed model with the pool of classifiers on KMC and BreakHis dataset, respectively. From the Figures, it is evident that the model accuracy and kappa value reduces as the K value is increased from 5 to 95. Hence in the present study, K value is set to 5.

Further, we have individually evaluated the performance of the classifiers compared to DCS on both KMC and BreakHis datasets. The results of this study are given in Tables 2, 34 and 5. The pool-1 classifiers consisting of decision tree classifiers outperforms pool-2 classifiers. It is observed that decision tree classifiers produced better accuracy and Kappa values. Also, pool-1 classifiers consist of 10 classifiers which are greater than the pool-2 classifier which consists of four classifiers. Furthermore, it is observed that DCS classifier which utilizes the overall local accuracy selects the best classifier in the local neighbourhood and hence performs better than the individual classifiers of pool-1 and pool-2. The DCS with pool-1 classifier produced an accuracy of 0.95 and 0.96 on KMC and BreakHis datasets, respectively. Hence, in the present study, DCS with pool-1 classifiers is selected for the classification of BHIs.

4.6 Comparison with other methods

In the present study, the proposed method is compared with popular classification models such as VGG-16 [36], ResNet-50 [19], Inception V3 [40], Inception-Resnet V2 [39] and [29]. VGG-16 is a 16 layer convolution neural network that has a classifier at the end. It extracts global image-level features using convolutional and maxpool layers. ResNet-50 has 50 residual layers. These layers use skip connections to propagate features to deeper layers. Inception uses convolution layers with different receptive fields to capture larger contextual information. However, the input to these models must be resized as it drastically increases the model parameters and memory requirements. Further, we have also compared the method proposed in [29] where the image-level and patch-level features are extracted for BHI classification.

Evaluation on KMC dataset Table 6 shows the performance metrics comparison of proposed and other methods. It is observed that VGG-16 [36] and the method proposed in [29] performed competitively on KMC dataset. Also, VGG-16 achieved a higher recall value of 0.95 for the malignant class, but performed poorly on the benign class since it has produced higher false negatives. Also, Inception-ResNet V2 has produced poor results due to smaller training data. Further, these models extract image-level features only. The input images are resized that results in loss of information which also adds to the poor results. In contrast, the proposed method which utilizes both image-level and patch-level features with pool-1 classifier achieved a recall value of 0.97 on the malignant class and 0.94 on the benign class. This result is significant as it shows that the proposed method has produced lower false negatives which is crucial for a critical decision support system such as this. Furthermore, the proposed method (pool-1 and pool-2) has produced higher accuracy as compared to other methods. Table 6 also shows the Kappa value comparison of proposed and other methods. This value signifies the improvement of the classifier in comparison to the classifier that guesses at random. We can observe that the proposed model achieved a Kappa score of 0.90 and 0.71 which is significantly greater than other compared methods. This highlights the superior performance of the proposed model. The ROC curve of the proposed and other models on malignant cases (KMC dataset) is shown in Fig. 14.

Evaluation on BreakHis dataset Table 7 presents the performance metrics comparison of proposed and compared methods on the BreakHis dataset. The VGG-16 and ResNet-50 fail to capture larger contextual information due to which they achieved lower accuracy on the BreakHis dataset. In contrast, the proposed method which captures larger contextual information achieved higher accuracy. This result is important as it highlights the importance of contextual information on the performance of CNN models for the analysis of 400x BHIs. This is additionally verified by Inception V3 which captures larger contextual information achieved an accuracy of 0.77. This is on the expected lines since larger contextual information aids in improving the performance of classifiers. Moreover, the images of the BreakHis dataset has lower resolution (700x400p) as compared to the KMC dataset and hence resizing results in lower loss of information. The proposed method achieved an accuracy of 0.91 and 0.96 with pool-2 and pool-1 classifiers. Moreover, the proposed method achieved Kappa values of 0.82 and 0.91 which is significantly greater than other methods. This result is significant as it shows that the performance of the proposed model is consistent on both datasets confirming the effectiveness of the proposed feature extractor. The method proposed by Nazeri et.al [29] also performed consistently on both datasets. This further substantiates the importance of patch-level and image-level features in the classification of 400x BHIs. The ROC curve of the proposed and other models on malignant cases (BreakHis dataset) is shown in Fig. 15. It is observed that the area under the curve obtained by the proposed model is higher as compared to other methods indicating higher accuracy. Also, it is important to note that the performance of the proposed model is higher as compared to other methods on both datasets which exhibits the reliability and robustness of the model.

Fig. 14
figure 14

ROC curve of the proposed and other models on KMC dataset

Fig. 15
figure 15

ROC curve of the proposed and other models on BreakHis dataset

5 Conclusion

The present work seeks the development of an automated system using medical image processing techniques for the detection of fibroadenoma and ductal carcinoma from 400x BHIs. In this regard, we have manually collected and annotated 400x BHIs required for evaluating the validating the developed model. The study proposes a novel approach for the effective extraction of features from high-resolution BHIs thereby reducing the dependency on the hardware. Specifically, the proposed model uses image-level and patch-level features to compensate for the loss of information that occurs when high-resolution images are resized. Furthermore, a colour channel attention module is introduced to enhance the features concerning the potential regions of interest. The classification of the images is achieved by using an ensembled learning strategy. The study reveals the impact of resizing high-resolution images on the performance of the models. The proposed colour channel attention module is quantitatively and qualitatively evaluated on private and public datasets which demonstrates its effectiveness. Quantitative evaluation of the classifiers confirmed that ensemble-based classifiers are better situated for the classification of BHIs at 400x magnification. Further, the experimental evaluation of the proposed model on the private (accuracy=0.95) and public (accuracy=0.96) dataset demonstrates its validity and reliability. In future, the proposed model can be extended to process WSI images. Also, multiple magnifications can be considered holistically to improve the performance of the model.