Background

Breast cancer is the most common cancer affecting women worldwide. It is visible in two forms: microcalcifications (small calcium deposits appearing as small bright dots on the mammogram) and massive lesions. These forms exhibit different typical characteristics, such as density, size, shape and number. For this reason, the algorithms implemented to detect both types of lesions have to be different.

The main difficulties for radiologists to detect microcalcifications are due to their small size (0.1–1 mm, mean diameter ~ 0.3 mm) and low contrast compared to the background of the images. In this context, a CAD system may help physicians to improve their performance [13]: a good CAD system must be able to suppress the noise in the image in order to improve the contrast between the Region Of Interest (ROI hereafter) and the background, and extract/select the lesions for a correct classification process.

In a CAD system, an unidentified microcalcification at the initial phase of image processing is considered completely lost, with severe limitations for later stages; for this reason several filtering techniques are applied to reduce the signal to noise ratio (SNR) [4]. But an equally important role is carried out by the classification phase, where training based procedures such as artificial neural networks are used, and a feature reference pattern set is required in order to correctly assign an unknown pattern, related to a lesion or to a particular membership class [59].

The clusters of microcalcifications are an important warning sign for breast cancer and they are present in 30 - 50% of screening mammography cases [10], therefore, their identification plays an important role in the phase image processing. The clustering is a type of classification imposed on a finite set of objects, characterized as points in a d-dimensional metric space in which every dimension corresponds to a feature (e.g. color, size, shape, position) [11].

Many techniques may be used for microcalcifications clustering: some are based on the simple Euclidean distance evaluation, others are related to the most significant features.

The procedure described by Nishikawa et al. [12] was applied to signals defined by a single pixel: signals with several pixels in area are reduced to single pixel by means of a recursive transformation. The number of signals within a small region, typically 3.2 mm × 3.2 mm, are counted: only if three or more signals are present within such a region, they are preserved in the output image. 78 mammograms were examined containing 41 clusters and a reduction in the false positives rate detection from 4.2 to 2.5 per image was found, while maintaining a sensitivity of approximately 85%.

Estevez et al. [13] proposed an algorithm (interactive selective and adaptive clustering, ISAAC) for assisting the radiologist in looking for small clusters of microcalcifications. This algorithm can be divided into two successive steps: selective clustering and interactive adaptation. The first step reduces the false positives number by identifying the microcalcifications subspace in the feature space; the second step allows the radiologist to improve results by identifying interactively additional false positive or true negative samples. The algorithm was tested on a 15 mammograms database. Performance of the method have been evaluated not by numerical parameter but by asking to three radiologists to determine, on a scale of 1 to 10 how helpful the method was in locating suspicious microcalcifications cluster areas, and by valuing capability of two other radiologists of identifying clusters observing the mammograms before and after the ISAAC application.

Mao et al. [14] proposed a distance-based and dense-to-sparse grouping method: the basic idea is to group the microcalcifications close enough to each other by examining the distance among them. The most closely distributed regions can be grouped into clusters first and relatively more widely distributed regions can be gradually grouped from evident cluster centers if they are still near enough to the centers. Several experiments were performed on a set of 30 mammograms containing 40 micro-calcification clusters. The method yields a result independent of the distribution orientation of clusters.

Arodz et al. [15] performed a sequence of morphological operations on the filtered image with the goal of eliminating small or isolated objects: an area opening operation (removal of objects smaller than threshold), followed by dilation (for removing isolated objects). The technique was applied to 50 mammograms, each of them showing a region of the breast with an area of 25 cm2 containing a suspicious lesion, and displayed higher efficiency levels for those clusters with a probability of 40% to be malignant. Performance of the method was evaluated by valuing the average number of clusters detected on mammograms processed by the system but not detected on original mammograms and by valuing the average estimate of detection improvement resulting from using the system.

Cihan et al. [16] used the subtractive clustering; this subtractive clustering is a fast one-pass algorithm for estimating the number of clusters and the cluster centers in a dataset, if no prior knowledge of number of clusters is available. The point with the highest number of neighbours may be a center point and is selected as the first cluster. The method has been applied to 34 mammograms with a total of 72 micro-calcification clusters. The results show a success rate of 93% for the proposed algorithm.

Riyahi-Alam et al. [17] proposed an automated segmentation of suspicious clustered microcalcifications on digital mammograms. The algorithm consists of three main processing steps for this purpose. In the first step, the improvement of the microcalcifications appearance by using the “a trous wavelet” transform which could enhance the high frequency content of breast images were performed. In the second step, individual microcalcifications were segmented using wavelet histogram analysis on overlapping subplanes. Then, the extracted histogram features for each subplane used as an input to a fuzzy rule-based classifier to identify subimages containing microcalcifications. In the third step, subtractive clustering was applied to assign individual microcalcifications to the closest cluster. Finally, features of each cluster were used as input to another fuzzy rule-based classifier to identify suspicious clusters. The results of the applied algorithm for 47 images containing 16 benign and 31 malignant biopsy cases showed a sensitivity of 87% and the average of 0.5 false positive clusters per image.

Cordella et al. [18] proposed a method based on a graph-theoretical cluster analysis for automatically finding clusters on mammographic images. The proposed method starts by describing with a graph all the microcalcifications detected by an automatic algorithm: the graph nodes correspond to microcalcifications, while the edges of the graph encode the spatial relationships between microcalcifications. Each micro-calcification is linked, by an edge, to all the other ones. The weight of each edge is the Euclidean distance in the 2D space between the nodes connected by that edge. After such a graph is obtained, the GTC analysis is employed to remove all the tree edges with weights greater than a threshold value: in this way, the GTC method automatically groups vertices (microcalcifications) into clusters. Successively, clusters with less than three nodes are eliminated. The approach has been tested on a standard database of 40 mammographic images and turned out to be very effective even when the detection phase gives rise to several false positives. Performance of the method were measured in terms of Precision and Recall giving rise to a Precision value of 1 and a Recall value of 0.94.

Wang et al. [19] presented an approach based on fuzzy clustering to detect small lesions, such as microcalcifications and other masses, that are hard to recognize in breast cancer screening. A total of 180 mammograms were analyzed and classified by radiologists into three groups (n = 60 per group): those with microcalcifications; those with tumors; and those with no lesions. Analysis by fuzzy clustering achieved a mean accuracy of 99.7% compared with the radiologists’ findings.

Quintanilla-Dominguez et al. [20] presented a method for the automatic detection of microcalcifications implemented by feature extraction and sub-segmentation steps. The feature extraction step is improved using a top-hat transform such that microcalcifications can be highlighted. In a second step a sub-segmentation method based on the possibilistic fuzzy c-means clustering (PFCM) algorithm is applied in order to segment the images and as a way to identify the atypical pixels inside the regions of interest as the pixels representing microcalcifications. Once the pixels representing these objects have been identified, an ANN model is used to learn the relations between atypical pixels and microcalcifications, such that the model can be used for aid diagnosis, and a medical could determine if these regions of interest are benign or malignant. The classifier presented in this work has been tested on four different combination of features, obtained the following results: Sensitivity: 98.21%, 98.70%, 88.93%, 88.73%; Accuracy: 99.54%, 99.56%, 98.22%, 98.18%.

Malar et al. [21] proposed an approach for detection and classification of mammographic microcalcifications using wavelet analysis and Extreme Learning Machine (ELM). A total of 55 mammograms, including normal and microcalcifications images have been used, producing an Accuracy of 94%.

Cheng et al. [22] presented an approach to microcalcification detection based on fuzzy logic and scale space techniques (FLSS). First, they employ fuzzy entropy principal and fuzzy set theory to fuzzify the images. Then, they enhance the fuzzified image. Finally, scale-space and Laplacian-of-Gaussian filter techniques are used to detect the sizes and locations of microcalcifications. A dataset of 40 mammograms containing 105 clusters of microcalcifications is studied. Experimental results demonstrate that the proposed method can archive an accuracy greater than 97% with the FP rate of three clusters per image.

In the following paragraphs we will explain the algorithm and its application to a set of digital images. The findings will be subsequently presented.

Materials and methods

According to the report of the radiologist, a cluster of microcalcifications must have precise geometrical and morphological requirements: a cluster is a set of localized microcalcifications, therefore a potential micro-calcification cannot be associated with a cluster that is spatially “distant”.

As a further observation, the spatial association of microcalcifications is a necessary but not sufficient condition for proper clustering: a good microcalcifications clustering must be able to gather objects not only spatially “near”, but also “near” from a point of view of the form and the visual information contained in them.

Moreover, another frequently encountered difficulty in developing clustering algorithms is the lack of knowledge of the number of clusters present in an image with microcalcifications. Within this context, this paper describes a method that aims to achieve two objectives: the first is to use a more powerful clustering process (FCM-WF) based on the standard FCM algorithm appropriately modified with the addition of some features; the second objective is to determine the optimal number of microcalcifications clusters that may be present in an image.

In order to better evaluate the efficiency of the method, i.e. the correct identification of the cluster and the number of micro belonging to it, testing the algorithm on pathological images reported by the radiologists could not be sufficient because we need to know exactly both the correct position of the cluster and the number of microcalcifications belonging to the cluster. For this reason in this paper we have chosen to test the algorithm on images obtained by healthy images with an artificial injection of microcalcifications. The microcalcifications clusters used for simulation are extracted from real pathological images reported by several radiologists.The procedure followed in this preliminary phase is shown in Figure 1.

Figure 1
figure 1

Procedure scheme.

The first step of the procedure is the segmentation (Section Microcalcifications Segmentation), which is preliminary to other phases of the process as in all CAD system. Each cluster, with a number of micro variables depending on the lesions structure, is stored to build a database of real clusters (gold database). The mode of creating this "gold database" makes it possible to maintain in the cluster all relevant information present in the original image, with the consequent possibility to use them at a later stage (Section Databases). Afterwards, some clusters, or part of these, are injected on healthy images by creating a sample of pathological images "artificial" that will become our new "gold database artificial". Finally, with a new segmentation process performed on sample images, we obtain N objects used as a starting point to test the FCM-WF algorithm and to determine the number of clusters, as described in Section The Fuzzy C-MEANS Implemented With Features (FCM-WF) and Section K Clusters Best Value. The procedure was tested on images belonging to a private anonymous database collected in the Policlinic Hospital of Palermo. Policlinic Hospital is a hospital firm of University of Palermo in which formation, scientific research and health service are well integrated. Policlinic Hospital assess that every research involving human being is carried out in compliance with the Helsinki Declaration and correctly informing the patients previously. Policlinic Hospital of Palermo, in the person of Dr. Raffaele Ienzi who provided us the images, assessed that every precaution has been taken to protect the privacy of the patients and the confidentiality of their personal information. Only the images belonging to patients who given a free informed consent in writing have been used to create the database.

Anyway, since each paper described in the state-of-the-art is tested on different image databases and as such does not provide a perfectly fair comparison, in order to provide strong justification for the effectiveness of our work, we applied the FCM-WF algorithm even on the publicly available MIAS database (/http://peipa.essex.ac.uk/info/mias.html). The characteristics of this database will be described in Section The MIAS database.

Microcalcifications Segmentation

In a CAD system, an important role is performed by the ROI extraction phase [23], because at this level a missed microcalcifications is definitely lost. The algorithms designed to enhance the contrast of microcalcifications may improve the ROI extraction performance [24].

It is possible to recognize microcalcifications within an image using mammography features edge detection [25]. The enhancement stage must be sensitive enough to emphasize low contrast objects while, at the same time, it must have the required specificity to suppress the background [26]. Usually the background corresponds to some smoothed regions of the image which don’t give relevant information about pathologies in many cases. Background suppression can be implemented by using high pass filtering. In order to detect contours, the Laplacian of Gaussian (LoG) is often used in practice; it combines the Gaussian filter with the search properties of the edges of the Laplacian. The filter equation is represented by:

2 H σ x , y = 1 π σ 2 x 2 + y 2 2 σ 2 2 σ 4 e x 2 + y 2 2 σ 2
(1)

The parameter σ of Gaussian controls the effect of the LoG: the rise of its value increases the smoothing effect but with the lost of the ability to discriminate the details; therefore it is closely correlated to the size of the objects to identify. In equation 1), standard deviation σ is related to the target of size [27]. The relationship between the parameter σ and the dimensions of the target to identify has as effect, for microcalcifications recognition, a variability in the ability of individualization of the filter inside the range of dimensions of microcalcifications. This dependence implies that, if the size of the object to identify varies as in our case, we must use a method of analysis that uses several forms of convolution, with different values of σ. Conversely, in a multiscale approach, the final result can derive from mean or from maximum value among different obtained results (different values of σ) but in both cases the same problems occur: in the first case the scales different from that with the best coupling between the value of σ and the size of the object have as a result the reduction of detection power; in the second case we have a high sensitivity to noise.

To overcome the dependence from target dimensions and to optimize the recognition efficiency, it has been necessary to study and implement a spatial filter with a form that allows the detection of small and large microcalcifications. This form-filter, designed with the sum of the weights equal to zero in order to get the effect on the derivative, is characterized by three regions:

  1. 1.

    a first central region, circular (of radius equal to R1), whose pixels have positive intensity, and area equal to the smallest size of a single micro, with the shape of the positive part of the Laplacian of Gaussian (LoG);

  2. 2.

    a second region, also circular, adjacent to the first region and outer, whose pixels have zero intensity, which extends up an area equal to the maximum size of a micro detectable (distance R2);

  3. 3.

    a third region, the most external and quite narrow, with negative values obtained from the same LoG function and renormalized so that the sum of the intensities of its pixels, negative, is equal to the sum of the intensity of the first region, positive.

By assuming ΔR = R2 - R1, mathematically the three regions of developed filter F(x, y) may be defined as follows:

F x , y = Log x , y where : x 2 + y 2 < R 1 2 2 0 where : R 12 2 x 2 + y 2 R 22 2 α Log x Δ R , x Δ R where : x 2 + y 2 R 2 2

where α is a normalization factor chosen such that the volumetric integral IS1, within the circle of radius R1, is equal to the volumetric integral IS2 of the surface outer to the circle of radius R2.

From an analytic perspective, the filter function here realized is a LoG translated function: the negative LoG tail (that is: the third region of the developed filter) has been translated with ΔR, and the subsequent emptied region (the filter's second region, a circular crown of width ΔR) has been assigned a null value. Actually, the second region of the filter has the objective to make negligible the convolution contribution of the pixels characterized by distances between R1 and R2 and thus to obtain a constant detection performance on the pathology interval [R1, R2]; in fact, all the pixels of a ROI falling in the interval [R1, R2] and geometrically lying on a circular crown, will give no contribution in the computing of the convolution. So we can strongly reduce the dependence of the recognition process from the variability of the dimensions of microcalcifications. Since the aforementioned translation occurs in the plane and quadratically increases the number of interested pixels, a renormalization (performed by acting on the parameter α from the third region) becomes compulsory in order to fulfill the null sum condition (as required by the derivative filter definition). If one analyzes the LoG filter and the result of its application on a generic (x,y) position of an image, one notes that for obtaining a positive value in (x,y) the ROI must correspond to the positive part of the filter, while the background must overlap with the negative part. The filter developed here allows also partial overlaps of the ROI and the positive part of the filter. In fact, for ROI characterized by linear dimensions between R1 and R2, the positive part which will overlap with the filter still remains the part of linear dimensions R1.Figure 2 shows the 3-D representation of the filter, Figure 3 shows the 2-D projection and its renormalization due to the effect of placing the region with null values. In our case, application to a discrete image, the 3-D integrals are equivalent to the sum of the pixels intensities of the two affected areas, positive and negative, respectively.Figures 4 and 5 show how the filter implemented can emphasize the presence of microcalcifications: Figure 4 shows original mammograms in which microcalcifications clusters are hidden by breast tissues; Figure 5 shows the microcalcifications that remain after the segmentation process: in this figure it is possible to recognize the five clusters hidden in Figure 4. The image colours are inverted for better visualization.

Figure 2
figure 2

The Spatial filter implemented.

Figure 3
figure 3

2D projection for the spatial filter.

Figure 4
figure 4

Digital mammography.

Figure 5
figure 5

Filtered image.

Databases

Gold Database Implementation and Injection Process

A medical image data set is the starting point for important epidemiological and statistical studies. Usually, such a set is used to develop and test algorithms for computer aided detection (CAD) systems but it is used also for teaching and training medical students or as an archive of cases. The development of a CAD system is intimately linked to collection of a dataset of selected images [28]. The purpose of this step was the creation of a gold database of mammographic images [29] containing microcalcification clusters to be used for the evaluation of the clustering process. We used a private database because we need to know all the truth and the details of microcalcifications clusters, i.e. not only the healthy or pathological nature of microcalcifications, but even all the useful properties like the mean intensity over the background, the number and the position inside a cluster, the geometrical and spatial distribution of clusters on breast tissue and so on.

Two expert radiologists took place to the segmentation process: after the comparison, they segmented semi-automatically the ROI they though being microcalcifications. The health microcalcifications have been confirmed by a 2 years follow up and the pathological ones with histological exam.

Among all the ROI extracted by segmentation process, only those “cluster of ROI” identified by the physician as pathological cluster are kept to constitute the gold database of microcalcifications, while all other ROI are discarded. For all the microcalcification of each cluster we have recorded: mean intensity, number and coordinates of pixels, and membership cluster.

Since the properties that make possible to classify a group of microcalcifications as a cluster certainly take into account the parameters of size, intensity over the background, relative distances of the microcalcifications and cluster geometry itself, each simulated cluster must retain all informations of original cluster. From these data, we injected the cluster of microcalcifications in healthy images according to the following steps:

  1. 1.

    random extraction of one microcalcification from a cluster (mother cluster);

  2. 2.

    random selection of a position within the breast and injection of the extracted microcalcification;

  3. 3.

    random extraction of a new microcalcification from the same “mother cluster” and its injection on the breast retaining angles and distance with the previous microcalcification.

Step 3 is repeated until injection of a microcalcification number n in the range [P,M], where M is the micro number within the mother cluster and P is the nearest integer to M*0.8 (20% less), with condition M ≥ 4.

To not interfere with the healthy breast tissue, this injection method takes into account the average intensity of each cluster related to the intensity of the parenchyma where the cluster is injected.

At the end of the injection process of microcalcification, the healthy image is segmented with the same method used for the pathological images. The number N of objects segmented will be used for the following step, that is the clustering algorithm.

The MIAS database

The images collected from the Mammographic Image Analysis Society (MIAS), an organization of research groups in UK, are available via the Pilot European Image Processing Archive (PEIPA) at the University of Essex. The database contains 322 digitized films of 161 subjects with both right and left breast images and it also includes the so called ground truth on the locations of any abnormalities that may be present [21].To test the method we have to know position and number of microcalcifications belonging to the clusters, so among all the images contained in this database only the 20 images in which centre locations and radii of clusters are known have been used. Particularly, since the database provide only the centre locations and radii of clusters, and we need to know even position and number of every microcalcification belonging to it, we considered as “truth” about pathological microcalcifications the objects found inside the indicated circle after the segmentation process. An example is shown in Figure 6 and in Figure 7: Figure 6 shows the original mammogram (image mdb209), Figure 7 shows the same mammogram after the segmentation process, in which the centroids of all the segmented objects are indicated. In both figures the circle of cluster is present. We considered as microcalcifications belonging to the cluster only the objects found in the circle.

Figure 6
figure 6

Original mammogram (im. mdb209).

Figure 7
figure 7

Mammogram after segmentation process.

The images of MIAS database used contain one cluster, except images mdb223, mdb239, mdb249 (two clusters) and image mdb227 (three clusters), for a total of 25 clusters.

The Fuzzy C-MEANS Implemented With Features (FCM-WF)

As already explained in Paragraph 1, clusters of microcalcifications are an important warning sign for breast cancer so their identification plays an important role in developing a CAD system. Anyway, detecting all microcalcifications clusters is not an easy task, as there is often poor contrast on mammograms between microcalcifications and the surrounding tissue. Since microcalcifications belonging to the same cluster have similar properties (or features) such as grey level intensity, the main objective of a clustering process is to evaluate this similarity according to a distance measure between the microcalcifications and the prototypes of groups, and each microcalcification is assigned to the group with the nearest or most similar prototype [20].

Several algorithms should be proposed to solve the problem of microcalcifications clustering. Among them, partitional clustering methods are useful in this contest because they generate a single partition of the data in an attempt to recover the natural groups present in the data. Partitional methods are especially appropriate for the efficient representation and compression of large databases, as in the case of microcalcifications clustering [11].

If the clusters are compact and well separated, there is no uncertainty in assigning the objects to one cluster, but if the clusters are touching or overlapping, as is the case with microcalcifications clusters, the assignment of objects to clusters is difficult. So clustering methods based on fuzzy logic, according to which an object can belong to more than one cluster, are the better choice to handle the mammograms [22].

In the fuzzy clustering methods an object can belong to more than one cluster with a degree of membership continuously variable between 0 and 1. For ordinary clusters, the degree of membership for an object x is 1 if it belongs to the cluster and 0 if it does not; for the fuzzy clusters, larger is the membership degree to the clusters, greater will be the confidence level that the object belongs to that cluster [3033]. The output of a fuzzy algorithm not only includes a partition but also additional information in the form of membership value.

The most known fuzzy clustering algorithm is the Fuzzy C- Means (FCM), fuzzy version of the K-means algorithm, which usually takes into account as feature only the position of microcalcification for the clustering process.

The method proposed here, instead, in addition to the spatial information used in the standard Fuzzy C-Means makes use of seven other features (in the following we refer to our method as Fuzzy C-Mean – With Features, FCM-WF) that add information about the structure of each single micro [3436]. Three of these features take into account the geometric shape of an individual micro: area, perimeter, eccentricity. The other four take into account the pixels intensity variability of each micro: average, standard deviation, skewness and kurtosis of the intensity. In FCM-WF algorithm, clustering is performed not in Euclidean space but in multidimensional features space F.

Given a set of N object, fuzzy partitioning in K clusters is carried out through an iterative optimization process that minimizes the following objective function

J m = i = 1 N j = 1 K u ij m X i C j 2
(2)

where Xi is the i-th object of the set X = {X1, X2, …XN} used to perform the clustering, with XiR F , i = 1 ….. N; similarly Cj is the j-th centroid of the set C = {C1, C2, …CK} with CjR F, j = 1 ….. K. Finally uij, is equal to the degrees of membership of the Xi object to the cluster Cj. In Figure 8 we report the flow chart of the process. The initial values of the matrix U elements uij are randomly assigned; correspondently the objective function assumes the initial value J0. The iteration continues until reaching the absolute minimum of the objective function. The corresponding configuration of the clusters is considered as the best final result of the procedure used.

Figure 8
figure 8

Minimization process of the objective function.

It has been necessary to normalize all the features [37] because they are not dimensionless quantities, variable in a predetermined range; so, if x is a generic feature, its normalization in the range (0–1) can be carried out with the formula (3):

x norm = x min max min
(3)

where, min = (mean - 4 *σ) and max = (mean + 4 *σ) and mean and σ are the mean value and the standard deviation of the feature x respectively. In Figure 9 the distributions of values for pathological and non-pathological segmented objects are compared for each feature.

Figure 9
figure 9

Features comparison: pathological/healthy.

K Clusters Best Value

One of the problems in developing clustering algorithms is the a priori lack of knowledge of the number of clusters present in images. In Paragraph 5 we described the FCM-WF method, but we must remember that the number of clusters we are looking for is an input parameter to the clustering algorithm [34], Kini; it follows that it will be found always a number of clusters equal to the imposed value. For this reason we proposed a method to automatically determine the number of clusters present in an image based on the result of segmentation process: if the number of objects after the segmentation process is equal to N, since a disease cluster cannot have less than 3 micro, we can safely assume that the maximum number of clusters present in an image cannot exceed N/3. Moreover, we have a limit on cluster size because a pathological cluster has a maximum size of about 250 pixel (1 px = 0.07 mm).

If we start the clustering process on an image with the limit value Kini = N/3 on the number of clusters, the minimization of the objective function will lead to a final configuration with N/3 clusters, some of which may have less than 3 micro (and they will be removed since they are not pathological) and other clusters with sizes that exceed the maximum size.

For this reason, at the end of the clustering process, the input value Kini is reduced by Ndown (number of clusters with less than 3 micro) and increased by Nup (number of clusters with spatial dimensions greater than the maximum), finding, at last, according to the equation 4, a final number W of residual clusters that may be less than Kini.

W = K ini N down + N up
(4)

Furthermore, since the clustering process starts with a random position of the centroids and with random values of the U matrix, the final number W of residues clusters could be dependent from the initial conditions. To get a more reliable estimate of W, we repeat the FCM-WF process several times starting with the same value of Kini (for every image there is a different value of Kini, dependent from the number of segmented object present in the image), and calculate the average value W ¯ and the corresponding standard deviation σW. Finally, to improve the significance of the average value W ¯ , we introduce a rejection criterion: the elimination of the W i values at a distance greater than 3* σW from W ¯ . After this rejection, we recalculate the new W ¯ and σW values starting from a new value of Kini, according to the equation 5:

K ini = W ¯ + 3 σ W
(5)

If the average value of W differs from Kini for less than 3* σW, the iterative process stops and a final run start with K ini = W ¯ .

The iterative process just described in the flow chart in Figure 10 that starts with Kini = N/3 and repeat the process NLoop times (in the example NLoop = 50).

Figure 10
figure 10

Iterative process to get the best value of cluster number.

Figure 11 shows the histogram of the values W(Lp), with Lp = 1,…NLoop, obtained using an image with 261 segmented object and Kini equal to 87. Figure 12 shows the histogram of the value W(Lp) for the same image and positioned in the same range of variability as before but with a Kini value equal to 72. This makes the process independent from the initial value of Kini, provided that this is greater than the number of residual final clusters.

Figure 11
figure 11

Distribution of cluster values starting from K ini = 87.

Figure 12
figure 12

Distribution of cluster values starting from K ini = 72.

Figure 13 shows the histogram of the value W(Lp) obtained starting from an initial condition (value of Kini) closest to the final expected (stability condition).

Figure 13
figure 13

Distribution of cluster values starting from K ini = 52.

Results and discussions

The procedure described in the previous paragraphs was tested on 39 images of healthy patients belonging to a private database collected in the Policlinic Hospital of Palermo; every image has dimension of 4096 × 3328 and pixel spacing of 0.07 mm. The procedure was tested also on 20 images of the MIAS database; the size of all the images is 1024 × 1024 pixels.

Since in all the images the FCM-WF algorithm was able to recognize the clusters but the reconstruction turned out to be characterized by the lack of a few micros and the presence of some more objects (noise), we defined a Merit Figure as follows:

F M = N ini N miss N ini + N noise
(6)

In which is the number of micros belonging to the clusters, is the number of micros missing and is the number of noise objects added to the reconstructed cluster.

The performance of FCM-WF algorithm were compared with that of standard FCM. Indeed, to compare the results obtained by our proposed method with the existing ones, we used the following parameters:

TP = number of pathological micro correctly associated to the injected cluster;

FP = number of healthy segmented object erroneously associated to the injected cluster;

FN = number of pathological micro lost;

TN = number of healthy segmented object correctly associated to healthy clusters.

And evaluated also the following merit figures:

  •  Sensitivity = TP TP + FN

  •  Accuracy = TP + TN TP + TN + FP + FN

  •  FP/Image = FP/tot. of segmented objects

  •  Precision = TP TP + FP

We reported in Table 1 the comparison between the performance of our method and that of the method already described with more details in paragraph 1, while the obtained results are presented in Section Results on private database and Section Results on MIAS database.

Table 1 Performance comparison of clustering methods for breast cancer detection

Results on private database

Every image of the private database was injected with a pathological cluster with the mechanism described in Paragraph 4.1. The performance of the FCM-WF method has been evaluated by examining its capability of correctly recognize the injected clusters, i.e. by counting the number of microcalcifications correctly associated to the clusters. For this reason, as explained in Paragraph 4.1, it has been necessary to create a gold database in order to know all the truth and the details of microcalcifications clusters.

In Figure 14 each point's coordinates represent the values of the Merit Figure obtained with the two methods: FCM (abscissa) and FCM-WF (ordinate); it is visible the improvement of the cases corresponding to the points above the diagonal line; the solid line corresponds to the cases that do not benefit in any way from the implemented method. Figure 15 highlights the relative performance (ratio) improvement as a function of the number of segmented objects in an image. If we denote by FM1 and FM2 the Merit Figures for the two methods FCM and FCM-WF respectively, the mean value of the two Merit Figures are 0.61 and 0.67 respectively, with an increase of about 10% in favor of the FCM-WF.

Figure 14
figure 14

Comparison between the Merit Figures.

Figure 15
figure 15

Relative performance improvement vs. total injected object.

For an expert system, such as a CAD, the clustering process is a preliminary step done before the classification stage. Therefore, while the presence of injected disease clusters is definitely positive for both methods FCM and FCM-WF because it does not alter the efficiency of disease detection, the presence of a large number of clusters in an image may have a negative effect by increasing the risk to accept an excessive number of false positives after the classification stage. Practically, with the method FCM-WF there has been a steady reduction in the number of residual clusters.In Figure 16 a histogram is displayed which illustrates the capacity of the FCM-WF method to reduce the number of residual clusters compared to the standard method FCM; in only one case there is a consistent increase in the number of residual clusters. In Figure 17 the reduction of the number of clusters is displayed as a function of the number of segmented objects present in the image before the clustering processes. It is easy to see from this graph that the reduction of the residual clusters does not depend on the number of objects initially present in the image. If we calculate the false positives number in the two methods FCM and FCM-WF for each of the 39 images used, we obtain a false positives average number reduction equal to about 22%.The histogram in Figure 18 illustrates the ability of the two methods to maintain the pathological cluster: the results point out that FCM-WF achieves a better performance with respect to the conventional method. In particular, among the injected clusters, 70% remain unaffected if the reconstruction is performed with the FCM-WF.

Figure 16
figure 16

Residual clusters number reduction histogram.

Figure 17
figure 17

Residual clusters number reduction vs. total object on the breast image.

Figure 18
figure 18

Number of lost injected micros.

The method was also evaluated in terms of Sensitivity, Accuracy, FP/image and Precision obtaining the following results:

  •  Sensitivity = TP TP + FN = 93 %

  •  Accuracy = TP + TN TP + TN + FP + FN = 95 %

  •  FP/Image = FP/tot. of segmented objects = 4%

  •  Precision = TP TP + FP = 62 %

Results on MIAS database

The FCM-WF method has been tested also on 20 images belonging to the mini-MIAS database. As explained in Section The MIAS database, the database provide only the centre locations and radii of clusters, and since we need to know even position and number of every microcalcification belonging to it, we considered as “truth” about pathological microcalcifications the objects found inside the indicated circle after the segmentation process.

For this database we defined also a “segmentation efficiency” as the number of cluster found in the images after the segmentation process, which is equal to 80%.

In Figure 19 each point's coordinates represent the values of the Merit Figure obtained with the two methods: FCM (abscissa) and FCM-WF (ordinate); even for this database it is visible the improvement of the cases corresponding to the points above the diagonal line. Figure 20 highlights the relative performance (ratio) improvement as a function of the number of segmented objects in an image. If we denote by FM1 and FM2 the Merit Figures for the two methods FCM and FCM-WF respectively, the mean value of the two Merit Figures are 0.45 and 0.50 respectively, with an increase of 5% in favor of the FCM-WF.

Figure 19
figure 19

Comparison between the Merit Figures.

Figure 20
figure 20

Relative performance improvement vs. total injected object.

If we calculate the false positives number in the two methods FCM and FCM-WF for every image of the MIAS database used, we obtain a false positives average number reduction equal to about 10%.

The method was also evaluated in terms of Sensitivity, Accuracy, FP/image and Precision obtaining the following results:

  •  Sensitivity = TP TP + FN = 82 %

  •  Accuracy = TP + TN TP + TN + FP + FN = 94 %

  •  FP/tot. of segmented objects = 4 %

  •  Precision = TP TP + FP = 65 %

Conclusions

In this paper we presented a clustering method for microcalcifications based on fuzzy logic. This method, called Fuzzy C-Mean With Features (FCM-WF) allows microcalcifications clustering not only according to their distance but also to their relevant features.

The method was tested on a database of simulated images obtained by injecting a pathological cluster on healthy images: in this way we know the “truth” about the exact position of a cluster and the number of microcalcifications belonging to it.

Thanks to this database and to the informations contained in it regarding every single microcalcification, we tested the developed clustering method with great accuracy: actually, for every cluster, we verified the difference between the desired and obtained result with the Merit Figure of equation 6. In particular, we verified that 70% of the injected clusters remained unaffected if the reconstruction is performed with the FCM-WF.

Moreover, the automatic determination of the number of clusters in an image result in further improvements of the overall performance of the FCM-WF method over the standard FCM.

Finally, we want to put in evidence in this paper that the technique of placing the micro image in a healthy described here is not equivalent to a classic simulation process by injection of small spots of pixels of variable amplitude and positions randomly, but consists in the placing of real pathological cluster that retain and transfer the original information through the features used here, allowing us to consider the database simulated as a real database of reported image (Gold Standard Database).

In order to provide strong justification for the effectiveness of our work, we applied the FCM-WF algorithm even on the publicly available MIAS database. However, since to test the method we have to know position and number of microcalcifications belonging to the clusters, we didn’t use all images of the database but only the images in which centre locations and radii of clusters are known. We considered as “truth” about pathological microcalcifications the objects found inside the indicated circle after the segmentation process, so we defined a “segmentation efficiency” as the number of cluster found in the images after the segmentation process, which is equal to 80%. For this database we obtained an increase of 5% of the Merit Figure of the FCM-WF compared to that of FCM and a false positives average number reduction equal to about 10%.