1 Introduction

Breast cancer is one of the most serious types of cancer that affects women around the world. It is also one of the leading causes of mortality in middle-aged and elderly women. The International Agency for Research on Cancer (IARC) estimates that more than 1 million cases of breast cancer occur world-wide each year, with some 580,000 cases occurring in developed countries and the remainder in developing countries. The risk of a woman developing breast cancer during her lifetime is approximately 11% [1]. Early detection of breast cancer is of vital importance to successful of treatment, with the main goal of increasing the probability of survival for patients. Currently, the most reliable and practical method for early detection and screening of breast cancer is mammography. Microcalcifications (MCs) can be an important early sign of breast cancer; they appear as bright spots of calcium deposits. Individual MCs are sometimes difficult to detect because of the surrounding breast tissue and variations in shape, orientation, brightness and diameter [2]. MCs are potential primary indicators of malignant types of breast cancer. Therefore, their detection can be important in preventing and treating the disease. However, it is still difficult to detect all MCs in mammograms because of the poor contrast against the tissue that surrounds them.

Many methodologies have been presented by different authors to detect the presence of MCs in mammograms. These methodologies involve image processing techniques, pattern recognition methods and artificial intelligence approaches. Vega-Corona et al. [3] proposed a method for detecting MCs in digitized mam-mograms. The method consists of image enhancement by adaptive histogram equalization to improve the visibility of MCs with respect to the background, processing by multiscale wavelets and gray-level statistical techniques for feature extraction, clustering by the k-means algorithm for MC detection and, finally, using feature selection and a classifier based on a general regression neural network (GRNN) and multilayer perceptron (MLP) to classify MCs. Papadopoulos et al. [4] compared five image enhancement algorithms for improving MC cluster detection in mammography. Halkiotis et al. [5] proposed mathematical morphology for MC extraction from a non-uniform background; in this scheme, a set of features is extracted from original mammograms to test two classifiers based on artificial neural networks, such as MLP, and a radial basis function (RBF) neural network. Fu et al. [6] proposed a method based on two stages. The purpose of the first stage is to locate the suspected MCs; this stage is based on mathematical morphology and border detection to segment the MCs. The second stage is based on feature extraction and selection from the MCs located in the first stage; in the final part of this latter stage, these features are used as an input vector to test two classifiers based on a GRNN and support vector machine (SVM).

In this paper, a method for detecting MCs in the regions of interest (ROIs) extracted from digitized mammograms is presented. The main purpose of this method is to provide an automatic MC detection system that can help radiologists to improve the diagnosis of breast cancer at an early stage. The method is based on image processing, pattern recognition and artificial intelligence techniques. The different stages of the method are as follows: image enhancement based on mathematical morphology operations, a novel image sub-segmentation approach based on possibilistic fuzzy c-means (PFCM) algorithm, which is compared with image segmentation by the k-means algorithm, feature extraction based on window-based features such as the mean and standard deviation and, finally, the use of a classifier based on an artificial neural network (ANN) to automatically detect MCs. Figure 1 shows a block diagram of the proposed method.

Figure 1
figure 1

Block diagram of the proposed method.

2 ROI image enhancement

Over the past several years, methodologies have been developed for the detection and/or classification of MCs, but the interpretation of MCs continues to be a difficult task mainly because of their fuzzy nature, low contrast and low dis-tinguishability from their surroundings. The difficulty of MC detection depends on factors such as size, shape and distribution with respect to MC morphology. Another important factor that also makes MC detection difficult is the fact that MCs are often located across non-homogeneous backgrounds, and owing to their low contrast against the background, their intensity may be similar to that of noise or other structures [7, 8]. Therefore, in this paper, it is considered important to apply image enhancement.

Mathematical morphology is a discipline within the field of image processing that involves the structural analysis of images. The geometrical structure of an image is determined by locally comparing it with a predefined elementary set called a structuring element (SE). Image processing using morphological trans-formations is a process of information removal based on size and shape. In this process, irrelevant image content is selectively eliminated; thus, essential image features can be enhanced. Morphological operations are based on the relationships between two sets: an input image, I, and a processing operator, the SE, which is usually much smaller than the input image. By selecting the shape and size of a structuring element, different results can be obtained in the output image. The fundamental morphological operations are erosion and dilation.

The contrast can be defined as the difference in intensity between an image structure and its background. By combining morphological operations, several image processing tasks can be performed; however, in this work, we focus on those morphological operations that achieve contrast enhancement. In [8], a contrast enhancement technique using mathematical morphology is presented, called morphological contrast enhancement. Morphological contrast enhancement is based on morphological operations known as top-hat and bottom-hat transforms, which were proposed in [9]. A top-hat is a residual filter that preserves those features in an image that can fit within the structuring element and removes those that cannot; in other words, the top-hat transform is used to segment objects that differ in brightness from the surrounding background in images with uneven background intensity. The top-hat transform is defined by the following equation:

I T ( x , y ) = I ( x , y ) - [ ( I ( x , y ) S E ) S E ]
(1)

where I(x,y) is the input image, I T (x,y) is the transformed image, SE is the structuring element, Ө represents the morphological erosion operation, ⊕ represents the morphological dilation operation, and - represents the image subtraction operation. [(I(x, y) Ө SE)SE] is also known as the morphological opening operation. In previous works such as [8, 10], this technique was used to obtain satisfactory results in MC detection.

3 Image segmentation by partitional clustering algorithms

Image segmentation is an important task in the field of image processing and computer vision and involves the identification of objects or regions with the same features in an image. The aim of image segmentation is to divide an image into non-overlapping subregions that are homogeneous with respect to some features such as gray-level intensity or texture. The level to which the subdivision is carried out depends on the problem being solved [11].

Depending on the specific application, several methods based on different principles have been used for image segmentation, such as histogram thresholding [12, 13], edge detection [14, 15], region growing [1618], fractal models [1922], ANNs [23], swarm-based algorithms [24] and clustering techniques [3, 2529].

In this paper, partitional clustering algorithms are considered for image segmentation, because of the great similarity between segmentation and clustering, although clustering was developed for feature space, whereas segmentation was developed for the spatial domain of an image.

The clustering techniques represent non-supervised pattern classification into groups or classes. The partitional clustering techniques are based on cluster analysis, which is the organization of a set of patterns (vector of measurements or a point in a d-dimensional space) into clusters based on similarity [30]. In the context of image segmentation, the set of patterns can be represented by an image in a d-dimensional space that depends on the number of features used to represent the pixels, where each point in this d-dimensional space will be named a pixel pattern. Within the same context, the clusters correspond to some semantic meaning in the image, which is referred to as an object. Therefore, the main goal of the clustering process is to obtain groups or classes from an unlabeled data set based on their similarities to facilitate further knowledge extraction. The similarity is evaluated according to a distance measure between the patterns and the prototypes or centers of the groups, and each pattern is assigned to the nearest or most similar prototype. However, this process must distribute all of the data to the different groups, even if some pixels are not very representative of the group as a whole [26]. In the field of medical imaging, segmentation plays an important role because it facilitates the delineation of anatomical structures and other regions that can be of interest. For the specific case of MC detection, several works based on image segmentation using partitional clustering algorithms have been proposed, such as [3, 2527]. Two clustering techniques based on partitional clustering algorithms are compared in this paper to improve the MC detection.

3.1 k- means

The k-means or hard c-means (HCM) algorithm [31] is one of the simplest unsu-pervised learning algorithms that can solve the well-known clustering problem. The objective of the clustering algorithms is to cluster a given data set into several groups such that the data within a group are more similar to one another than those outside the group. Achieving such a partition requires a similarity measure that considers two vectors and returns a value reflecting their similarity. The k-means algorithm partitions a given data set into c clusters and computes cluster centers V = [v1,v 2 , ..., v k ], so that the following objective function can be minimized.

J ( Z ; U , V ) = i = 1 c k = 1 N μ i k z k - v i 2
(2)

where ||z k - v i ||2 is the chosen distance measure between a data point z k and the cluster v i is an indicator of the distance of the data points from their cluster prototypes. V = [v1, v2, ..., v k ] is the vector of prototypes of the c clusters, which are calculated according to:

v i = 1 | A i | z k A i z k
(3)

where |A i | represents the number of data points belonging to cluster i.

To clarify, the procedures of the k-means algorithm are described as follows:

  1. 1.

    Initialize the cluster center v i , i = 1,..., c . This is typically achieved by randomly selecting c points from the data set.

  2. 2.

    Determine u ik , i = 1,2,.., c, k = 1,2 ,.., N, by equation (4)

    U = μ i k = 1 i f z k - v i 2 z k - v j 2 , j i . 0 o t h e r w i s e .
    (4)
  3. 3.

    Compute the objective function according to (2). Stop if either it has converged or the improvement is below a threshold.

  4. 4.

    Update the cluster center v i using (3), and then proceed to Step 2.

3.2 PFCM clustering algorithm

The PFCM is one of the most recently developed partitional clustering algorithms, which has the advantages of the fuzzy c-means (FCM) as well as the possibilistic c-means (PCM) algorithms. The FCM has a constraint that makes it very sensitive to outliers. To solve the problem of constraint of the FCM, Krisnapuram and Keller [32] developed the clustering algorithm PCM, which allows us to identify the degree of typicality that a data point has with respect to the group to which it belongs. The PCM has the problem, however, that sometimes the prototypes of clusters coincide, generating erroneous partitions of the feature space; for this reason, the PCM is not always successful. To solve the problems of the FCM (outlier sensitivity) and PCM (coincident clusters) clustering algorithms, Pal et al. [33] proposed a hybridized PFCM clustering model, where the function to be optimized is given by Equation 5:

J p f c m ( Z ; U , T , V ) = i = 1 c k = 1 N a μ i k m + b t i k η × z k - v i 2 + i = 1 c γ i k = 1 N ( 1 - t i k ) η ,
(5)

and is subject to the constraints i = 1 c μ i k = 1 k ; 0 μ i k , t i k 1 with the constants a > 0, b > 0, m > 1 and η > 1. The values of a and b represent the relative importance of membership and typicality values in the computation of the prototypes, respectively. The parameters m and η represent the absolute weight of the membership value and typicality value, respectively. To reduce the effect of outliers, one can set b > a and m > η.

Theorem PFCM[33]: If D ikA = ||z k -v i || > 0, for every i, k, m > 1, η > 1, and if

Z contains at least c distinct data points, then ( U , T , V ) M f c m × M p c m × c × N may minimize J pfcm only if:

μ i k = j = 1 c D i k A i D j k A i 2 ( m - 1 ) - 1 1 i c ; 1 k N
(6)
t i k = 1 1 + b γ i D i k A i 2 1 ( η - 1 ) 1 i c ; 1 k N
(7)
v i = k = 1 N ( a μ i k m + b t i k η ) z k / k = 1 N ( a μ i k m + b t i k η ) , 1 i c .
(8)
γ i = K k = 1 N μ i k m z k - v i 2 k = 1 N μ i k m
(9)

The iterative process of this algorithm is presented in [33].

To segment the MCs in ROI images, a novel technique based on the PFCM clustering algorithm is used. This technique is called image sub-segmentation and was proposed by Ojeda-Magaña et al. [26].

Proposed approach for the detection of MCs by sub-segmentation

  1. 1.

    Obtain the data vector.

  2. 2.

    Assign a value to the parameters (a, b, m, η).

  3. 3.

    Segment the image by taking into account the number of more representative regions, which in this case is two: suspicious region with the presence of the MCs (S 1) and normal tissue (S 2 ); the S 2 region is considered to be devoid of MCs.

  4. 4.

    Run the PFCM algorithm to obtain:

    • The membership matrix U.

    • The typicality matrix T.

  5. 5.

    Obtain the maximum typicality value for each pixel.

    T max = max i [ t i k ] , i = 1 , . . . , c .
    (10)
  6. 6.

    Select a value for the threshold α.

  7. 7.

    With α and the T max matrix, separate all of the pixels into two sub-matrices (T 1 ,T 2), with the first matrix:

    T 1 = T max α
    (11)

    containing the typical pixels of both regions (Stypical1) and (Stypical2), and the second matrix:

    T 2 = T max < α
    (12)

    containing the atypical pixels of both regions (Satypical1) and (Satypical2); in this case, the atypical pixels are of most interest, especially the atypical pixels of (S1).

  8. 8.

    From the labeled pixels z k of the T 1 sub-matrix, the following subregions can be generated:

    T 1 = S t y p i c a l 1 , . . . , S t y p i c a l i , i = 1 , . . . , c .
    (13)

    and from the T 2 sub-matrix:

    T 2 = S a t y p i c a l 1 + i , . . . , S a t y p i c a l 2 i , i = 1 , . . . , c .
    (14)

    such that each region S i , i = 1,..., c is defined by:

    S i = S t y p i c a l i S a t y p i c a l i + c .
    (15)
  9. 9.

    Select the sub-matrix T 1 or T 2 of interest for the corresponding analysis.

    In this work, T 2 is the sub-matrix of interest.

4 Microcalcification classification by ANN

Artificial neural networks (ANNs) are biologically inspired networks based on the neuron organization and decision-making process of the human brain [34]. In other words, they are mathematical models of the brain. ANNs are used in a wide variety of data processing applications where real-time data analysis and information extraction are required. One advantage of the ANNs approach is that most of the intense computation takes place during the training process. Once ANNs are trained for a particular task, operation is relatively fast and unknown samples can be rapidly identified in the field. An ANN can approximate the function of multiple inputs and outputs. As a consequence, ANNs can be used for a variety of applications, among which are classification in medical applications [3, 5, 23, 35], descriptive modeling, clustering, function approximation, time series prediction [36] and sonar or radar detection [37]. Classification is one of the most frequently encountered decision-making tasks in human activity. A classification problem occurs when an object needs to be assigned to a predefined group or class based on a number of observed patterns related to that object. In this paper, a classifier based on an ANN is used, with the aim of classifying patterns such as those that correspond to pixels belonging to healthy tissue or patterns that correspond to pixels belonging to microcalcifications, which we will call normal tissue class NT or MCs class, respectively. For this purpose, a multilayer perceptron (MLP) is used. The MLP is the most popular ANN for many practical applications, such as pattern recognition applications. The functionality of the MLP topology is determined by a learning algorithm, the back propagation (BP) [38], which is based on the method of steepest descent [39]. In upgrading connection weights, it is the algorithm most commonly used by the ANN scientific community.

5 Methodology and results

To test our method, a set of ten ROI images were selected from several mammograms of the mini-MIAS database provided by the Mammographic Image Analysis Society (MIAS) [40]. The size of each mammogram from this database is 1,024 × 1,024 pixels, with a spatial resolution of 200μ m/pixel. These mammograms were reviewed by an expert radiologist, and all abnormalities were identified and classified. The areas in which abnormalities such as MCs were located were taken as ROIs. In this work, ROI images measuring 256 × 256 pixels were used. Figure 2 a shows some ROI images used in this work.

Figure 2
figure 2

a Original ROI images. b ROI images processed by the top-hat transform.

5.1 Morphological enhancement

The morphological top-hat transform is used to enhance ROI images, with the aim of detecting objects that differ in brightness from the surrounding background; in our case, it was used to increase the contrast between the MCs and the background. During image enhancement, the same SE at different sizes, 3 × 3, 5 × 5, 7 × 7, was applied to perform the top-hat transform. The SE used in this work was a flat disk-shaped SE. Figure 2 shows the original ROI images processed by the top-hat transform with a SE of size 7 × 7.

5.2 Image segmentation by clustering

5.2.1 Data vector creation

A data vector Z for each ROI is generated for each of the images obtained from the previous stage. Thus, a unidimensional vector (xse) is built by mapping the images to the pixels as follows:

[ { I T ( x , y ) } 1 x R , 1 y C ] se x se = x se ( q ) q = 1 , , R × C
(16)

where se is the size of the SE, x se ( q ) is the gray-level of the qth pixel of I T when the image is decomposed row by row, and R and C correspond to the size of the image. Then, the data vector Z can be written as follows:

Z = [ x 3 × 3 , x 5 × 5 , x 7 × 7 ] T
(17)

For data vector Z, two proposed clustering techniques are then applied to obtain a label for each pattern belonging to each cluster of the partition of feature space, where only one cluster corresponds to MCs, which generally appear in a group of just a few patterns (pixels), and the remaining clusters correspond to normal (healthy) tissue.

The initial conditions and results for each proposed clustering technique are presented below.

5.2.2 Segmentation by k-means

The initial conditions for this approach are as follows:

  • Cluster number: 2 to 4.

  • Prototypes: initialized as random values.

  • Distance measure: Euclidean distance function.

Figure 3 shows segmented ROI images with different cluster values obtained after applying the proposed k-means algorithm to the data vector Z.

Figure 3
figure 3

Image segmentation by k -means. a Original ROI images. b The results obtained from the 2nd partition. c The results obtained from the 3rd partition.

5.2.3 Sub-segmentation by PFCM

In this case, the approach presented in Section 3.2 is applied, and the initial conditions are as follows:

  • Cluster number: 2.

  • Prototypes: initialized as random values.

  • Distance measure: Euclidean distance function.

  • a = 1, b = 2, m = 2, η = 2, α = 0.04, α = 0.03, α = 0.02.

Figure 4 shows segmented ROI images with different threshold values (α) obtained after applying the approach presented in Section 3.2 to the data vector Z.

Figure 4
figure 4

a Original ROI images. b Segmentation of the ROIs using the membership matrix U into 2 groups. Final image sub-segmentation by PFCM using the typicality matrix T and threshold values of α = (c) 0.04, (d) 0.03, (e) 0.02.

According to the results obtained from the clustering process by k-means and PFCM, Table 1 shows the number of patterns assigned to classes MCs and NT, respectively, for our set of ten ROI images.

Table 1 Number of patterns assigned to MCs and NT

5.2.4 Feature extraction

Two window-based features, such as the mean and standard deviation defined in Equations 18 and 19, respectively, are extracted.

I μ ( x , y ) = 1 R × C x = 1 R y = 1 C f ( x , y )
(18)
I σ ( x , y ) = 1 R × C x = 1 R y = 1 C f ( x , y ) - I μ ( x , y ) 2 1 2
(19)

where I μ , I σ and f(x, y) represent the mean, standard deviation and the gray-level value of a pixel located in (x,y), respectively. These features are extracted from original ROI images within rectangular windows; in this work, we used three different pixel block windows with sizes (ws), 3 × 3, 5 × 5 and 7 × 7. In our work, each image obtained by this process is considered a feature that can be used to generate a set of patterns. In this set of patterns, there are patterns that represent the MCs and NT classes. We refer to this set of patterns as the feature vector (FV). We know a priori that, for each image used in this work, there are pixels belonging to the MCs and NT class. This FV is considered an input vector for the classifier. The FV is formed as follows:

FV = [ i μ 3 × 3 , i σ 3 × 3 , i μ 5 × 5 , i σ 5 × 5 , i μ 7 × 7 , i σ 7 × 7 ]
(20)

where:

[ { I T ( x , y ) } 1 x R , 1 y C ] w s i μ w s = i μ w s ( q ) q = 1 , , R × C
(21)
[ { I σ ( x , y ) } 1 x R , 1 y C ] w s i σ w s = i σ w s ( q ) q = 1 , , R × C
(22)

The labels of the two classes of the FV were obtained by the previous process. Owing to the large number of patterns that do not belong to the MCs class, with respect to the number of patterns that do belong to the MCs class, balancing was performed. Table 2 shows the subsets of the patterns for the MCs and NT classes.

Table 2 Results of balancing

5.3 Microcalcification classification by ANN

A MLP was used to classify the patterns as NT or MCs, with the purpose of automatically identifying MCs in ROIs extracted from mammograms. To comparatively evaluate the performance of the classifiers, in this particular case, different network structures were trained and tested with the same training data set and the same testing data set. The best obtained results possessed the following structure and parameters:

  1. 1.

    Number of input neurons equal to the number of attributes in FV: 6.

  2. 2.

    Number of hidden layers: 1.

  3. 3.

    Hidden neurons: see Table 4).

  4. 4.

    Output neurons: 1 (all classifications present two classes).

  5. 5.

    Learning rate: 1.

  6. 6.

    Activation function is sigmoidal with values between [0,1].

  7. 7.

    All weights randomly initialized.

  8. 8.

    Training phase: back propagation (BP).

  9. 9.

    Test training conditions:

    1. (a)

      epochs: 2000.

    2. (b)

      mean squared error (MSE): 0.001.

In this paper, we used patterns extracted from the FV set to train and test our classifiers: 80% of the patterns were used for training, and 20% of the patterns were used for testing (see Table 3).

Table 3 Number of patterns used for training and testing for each classifier

Table 4 shows the optimal network structure and parameters for each FV.

Table 4 The best network structure and parameters for each database

A confusion matrix to determine the probability of MC detection versus the probability of false MC detection was built. Table 5 shows the performance of the classifiers presented in this work. The performance of the proposed method was evaluated by means of ROC (receiver operating characteristics) curve analysis. The ROC curve is a two-dimensional measure of classification performance and is widely used in biomedical applications to assess the performance of diagnostic tests. The ROC curve is a plot of the sensitivity versus specificity for the different possible cut-points of a diagnostic test. Figure 5 shows the ROC curve and the area under the curve (AUC) for the classifiers with different network structures used in this work.

Table 5 Confusion matrices and performance of the classifiers
Figure 5
figure 5

ROC curves of the classifiers when FV is labeled by: a k -means, b PFCM.

Finally, Figure 6 shows the results of MC detection in the ROIs using the methodology proposed in this paper.

Figure 6
figure 6

a Original ROI images. MCs detection using: b image segmentation by k-means and the classifier with structure1, c image sub-segmentation by PFCM and the classifier with structure2.

6 Discussion and conclusions

According to the performance of the classifiers as determined by means of the ROC curves (Figure 5) and the final images obtained (Figure 6), the proposed method is a promising alternative for automatically detecting MCs in ROIs extracted from digitized mammograms. This method involves several techniques that contribute to the MCs detection stage. The image segmentation stage is one of the most difficult stage when using partitional clustering algorithms, because these clustering algorithms are applied in the features space. Therefore, if the image contains noise or is not very homogeneous, image segmentation by clustering can be inaccurate. Thus, an image processing technique based on mathematical morphology was used to solve this problem. In the segmentation stage, two partitional clustering algorithms were used: k-means and PFCM. The k-means is the most popular technique, and its advantages and drawbacks are well known. With the PFCM, a new method for image segmentation called image sub-segmentation was used, in which the degrees of typicality of each data point were used to partition an image into two regions: one region with tissue suspected of harboring MCs and the other with normal (healthy) tissue. Then, the most atypical data points (pixels) of each region were identified; these data include possible abnormalities present in these regions, especially the region suspected of possessing MCs, because these atypical data, or abnormalities, represent the pixels belonging to potential MCs. For the ROI images used in this paper, both clustering algorithms used to perform image segmentation gave good results, although these results depend largely on good feature extraction and, in this paper, on the image enhancement stage. Once the MCs were detected from the original ROIs, window-based features such as the mean and standard deviation were extracted, which were then used as input vectors in a classifier. To perform this classification task, ANNs proved to be an excellent alternative. In this paper, a classifier based on the MLP was used. In the ROI images, the MCs class represented a lower percentage of pixels with respect to the number of pixels belonging to the healthy or normal tissue class. Therefore, balancing between patterns belonging to the MCs class and to the NT class was performed to obtain better results during the classification stage. Finally, according to the results obtained by applying our proposed method to these ROI images, the implemented method can detect pixels corresponding to microcalcifications or healthy tissue, thus fulfilling the aim of this paper.