Abstract
Microcalcifications (MCs) are the main signs of precancerous cells. The development of aided-system for their detection has become a challenge for researchers in this field. In this paper, we propose a system for MCs detection based on the multifractal approach that classifies mammographic ROIs into normal (healthy) or abnormal ROIs containing MCs. The proposed method is divided into four main steps: a mammogram pre-processing step based on breast selection, breast density reduction using haze removal algorithm and contrast enhancement using multifractal measures. The second step consists of extracting the normal and abnormal ROIs and calculating the multifractal spectrum of each ROI. The next step represents the extraction of the multifractal features from the multifractal spectrum and the GLCM characteristics of each ROI. The last step is the classification of ROIs where three classifiers are tested (KNN, DT, and SVM). The system is evaluated on images from the INbreast database (308 images) with a total of 2688 extracted ROIs (1344 normal, 1344 with MC) from different BI-RADS classes. In this study, the SVM classifier gave the best classification results with a sensitivity, specificity, and precision of 98.66%, 97.77%, and 98.20% respectively. These results are very satisfactory and remarkable compared to the literature.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
More than two million women in the world are affected by breast cancer and more than five hundred thousand women die each year of this disease [1]. Awareness, screening and early detection remain the only ones to fight against this scourge. Mammography is the most effective X-ray imaging exam and used technique for breast cancer detection. However, the complex architecture of the breast, the distribution of breast density, the image quality, and the level of experience of the radiologist are all factors that influence the reading of the mammogram and its analysis where the radiologist can miss the small abnormal details present in the mammogram such as microcalcifications (MCs) which are calcium deposits representing the main signs of precancerous cells. Hence, great efforts of scientific researchers to develop computer-aided detection or diagnosis (CAD) systems aimed at assisting radiologists in the detection of abnormalities and the diagnosis of breast cancer. These systems increase the cancer detection rate at an early stage and reduce false positive interpretations.
The classification of mammographic region of interest (ROI) has been the subject of several scientific studies and has become a challenge for researchers in this field given the importance and considerable assistance of radiologists in the early detection of breast cancer [2]. Most of the proposed CAD systems for the classification of mammographic ROI go through four major stages:
-
Pre-processing: This stage consists of removing noise and artifacts from the mammogram, selecting the breast, removing the pectoral muscle and the background, and, finally, enhancing the contrast. Many techniques for mammogram enhancement have been proposed in the literature. Table 1 summarizes all these techniques [3].
-
ROI extraction: This is done either by cropping (with the help of the ground truth or the annotation of the expert) or by segmentation approaches. Different ROI sizes were considered in different papers.
-
Feature extraction: Different sets of features are computed in the literature: statistical features (mean, standard deviation, skewness, uniformity, kurtosis, smoothness…), textural features using gray-level co-occurrence matrix (GLCM) or gray-level run length matrix (GLRLM) (contrast, homogeneity, correlation, entropy, sum average, sum entropy, difference variance, difference entropy, inverse difference moments…), shape features (area, perimeter, convexity, circularity, rectangularity, compactness, area ratio, perimeter ratio…), and multiscale features (wavelets coefficients, directional sub-bands coefficients…).
-
Classification: Extracted features are fed to a machine-learning algorithm to classify the ROI (normal/abnormal, benign/malignant). A large number of machine learning algorithms have been proposed in the literature, and the most frequently used models are support vector machines (SVM) and artificial neural networks (ANN) [2].
However, we also find in the literature some CAD systems that use the whole mammogram to detect the suspicious region without going through the stage of ROI extraction. Some other CAD systems do not go through the pre-processing stage. Other CAD systems initially extract a large number of features, then go through the stage of relevant feature selection.
Moreover, different issues have been studied in the classification of mammographic ROIs. We find the classification of normal and abnormal ROIs and the classification of benign and malignant ROIs where suspicious ROIs are either masses [4,5,6,7], or MCs [8,9,10,11], or without specifying the type of lesion [12,13,14,15,16,17,18].
In this work, we are interested in the detection of ROIs containing MCs, in other words, classifying ROIs into normal or abnormal ROI (containing MCs). Therefore, we present some recent works that have worked on the same problem. In [19], authors proposed a system based on wavelet transform for classification of normal and abnormal ROI of MCs, then the classification of benign and malignant MCs. Without pre-processing step, they directly worked on the cropped ROI where they extracted statistical and multi-scale features based on the Haar wavelet transform, interest points, and corners. In total, 50 features were fed to the random forest classifier. The system was tested on the BCDR data base with a total of 192 ROI (96 normal, 96 abnormal). Obtained accuracy, sensitivity, and specificity of classification were 95.83%, 96.84%, and 95.09% respectively. In [20], authors proposed a system based on pattern recognition and size prediction of MCs. The pattern of a MC was found based on its physical characteristics (the reflection coefficient and mass density of the lesion). Then, the detected MC pattern is projected as a 3D image to find the size of the MC. This was tested on 100 images from the DDSM database (100 abnormal, 10 normal). Obtained accuracy, sensitivity, and specificity of classification were 99%, 99%, and 100% respectively. In [21], authors compared three feature extraction techniques: the rotation invariant local frequency (RILF), the local binary pattern (LBP), and segmented fractal texture analysis (SFTA) where the RILF technique gave the best results. The IRMA database was used with a total of 1620 ROIs (932 normal, 688 abnormal). Extracted histogram features were fed to the SVM classifier. Obtained accuracy, sensitivity, and specificity of classification were 91.10%, 98.04%, and 81.17% respectively. In [22], authors proposed a system based on transfer learning for the classification of normal and abnormal ROI containing MCs or masses. They started with a pre-processing step where they used morphological operations, binarization, and component selection. ROIs were cropped and resized. Features were extracted and classified using ResNet. The system was tested on the Mini-Mias database, and it achieved an accuracy of 95.91%. In [23], authors proposed a system for classification of normal and abnormal ROI with MCs, then the classification of benign and malignant MCs. They started by ROIs extraction which are pre-processed using the pixel assignment-based spatial filter to enhance the visibility of MCs in mammograms. Then, six statistical features were extracted and fed to the SVM, multilayer perceptron neural network (MLPNN), and linear discriminant analysis (LDA) classifiers. This was tested on 219 mammograms extracted from the DDSM data base (124 normal, 95 abnormal). Obtained accuracy, sensitivity, and specificity were 90.9%, 98.4%, and 81.3% respectively. In [24], authors proposed an improvement of the previous work [19] by using only two features (interest point and interest corners) with a total of 260 ROIs (130 normal, 130 abnormal). Obtained accuracy, sensitivity, and specificity were 97.31%, 94.62%, and 100% respectively using random forest classifier.
In this paper, we propose a novel approach to the classification of normal and abnormal ROI containing MCs based on multifractal analysis. First, the original mammogram is pre-processed using the proposed approach in [25] for mammogram enhancement which is based on multifractal measures. Then, the ROI is cropped automatically according to the expert annotation. For each ROI (normal and abnormal), the multifractal spectrum is computed using two different methods and five features are extracted. Finally, multifractal features combined with GLCM features are fed to three classifiers (SVM, KNN and DT) for the classification of normal and abnormal ROIs.
The paper is organized as follows: a section of “Materials and methods” presents the multifractal theory and the proposed approach; the “Results and discussion” section discusses the obtained results; and, finally, we end with the “Conclusion” section.
Materials and Methods
Fractal and Multifractal Theories
The mathematician Benoît Mandelbrot introduced the concept of fractals to describe complex objects whose Euclidian geometry did not allow their description. These objects are characterized by the properties of scale invariance, also called self-similarity where similar structures are viewed on all scales, i.e., the object is composed of smaller parts, where each part is a smaller copy of the whole. The main parameter used to describe the geometry and heterogeneity of these irregular objects and quantify their internal structure repeated over a range of scales is the fractal dimension (\({D}_{F}\)). For a fractal object, the number of features of a certain size \(\varepsilon\), \(N(\varepsilon )\), varies as [26]:
Equation (1) is the scaling (or power) law that describes the size distribution of the object.
One of the simplest ways to calculate the fractal dimension \({D}_{F}\) is to calculate the logarithmic ratio of change in detail (\(N\)) to change in scale (\(S\)): \({D}_{F} = (log N)/(log S)\). The most used method to estimate the \({D}_{F}\) is the box-counting technique. This method uses sample elements, which are an array of pixel points. The scale \(S\) relates to the sampling element’s size. It consists of covering a measure with boxes of size \(L\) and counting the number of boxes containing at least one pixel representing the object under study \(N(L)\), \({D}_{F}\) is estimated as [26]:
Using Eq. (2), the box-counting dimension \({D}_{F}\) can be determined as the negative slope of \(log N(L)\) versus \(log(L)\) measured over a range of boxes sizes.
However, the limits of monofractal analysis remain in its inability to describe the local fractal behaviors of images, hence the birth of multifractal analysis.
Multifractals are a generalization of fractals. A multifractal object is more complex and it is always invariant by translation. Multifractal analysis studies the global regularity from the local regularity of the signal. It is based on the estimation of two sets of coefficients: the Hölder exponents that quantifies the local regularity of the signal and the multifractal spectrum that quantifies the multifractality of the signal. It consists of decomposing the signal into subsets having the same regularity, and then measuring the “size” of the subsets thus obtained. The local regularity of a signal \(X\) at any point \(t\) is determined by the Hölder exponent \({\alpha }_{X}\left(t\right)\) defined as [27]:
Since the exponent \({\alpha }_{X}\) is defined in all \(t\), the original signal \(X\) can be associated with its Hölder function, \(t \to {\alpha }_{X} (t)\), which describes how the regularity of \(X\) varies: the smaller \({\alpha }_{X}\left(t\right)\), the more irregular \(X\) is in \(t\), and vice versa [27].
The second step in multifractal analysis is to study the sets [27]:
where \({E}_{\alpha }\) is the sets of points (pixel locations for 2D signal) of the same regularity. The measurement of these sets gives a global description of the singularities distribution of the signal \(X\). This can be done using a geometrical or statistical approach. The result is, in all cases, a “multifractal spectrum,” i.e., a function \(\alpha \to f(\alpha )\), which describes “how many” points of the signal have a regularity equal to \(\alpha\). In the geometrical approach, the multifractal spectrum represents the dimension of all the points having the \(\alpha\) exponent, i.e., \({E}_{\alpha }\). In the statistical approach, the multifractal spectrum is estimated from the probability of encountering a pixel whose regularity is in the order of \(\alpha\) [27]. In this paper, we use the statistical approach.
Several methods for estimating the multifractal spectrum have been proposed in the literature. There are methods based on box counting, methods based on wavelets and methods based on detrended fluctuation analysis (DFA). A summary of these methods is presented in Table 2. In this paper, we use the generalized fractal dimensions method and the moving average DFA method for the classification of mammographic ROI.
Generalized Fractal Dimensions and Legendre Spectrum
In practice, in order to quantify local densities of a considered set (an image), the mass probability in the ith box is estimated as [35]:
and varies as:
where \({N}_{i}(L)\) is the number of pixels in the ith box, \({N}_{T}\) is the total mass of the set, and \({\alpha }_{i}\) is the Hölder exponent that reflect the local behavior of the mass probability \({P}_{i}(L)\) around the center of each box of size \(L\), it can be estimated as [35]:
The number of boxes \(N(\alpha )\) where the probability \({P}_{i}\) has similar exponent values between \(\alpha\) and \(\alpha +\Delta \alpha\) follows the power law with the box size \(L\) and the multifractal spectrum \(f(\alpha )\) [35]:
Since multifractals are affected by distortions, the process of multifractal analysis is equivalent to applying warp filters to an image to analyze imperceptible features. Warp filters are a set of arbitrary exponents traditionally denoted by the symbol “\(q\)” usually manipulated from a set bracketing 0 in a symmetrical way (e.g., from − 5 to 5). Therefore, a characterization of multifractal measures can be made through the scaling of the qth moments of \({P}_{i} (L)\) distributions in the form:
The exponent \(\tau (q)\) is called the mass exponent of the qth order moment, and \({D}_{q}\), q € R are the generalized fractal dimensions defined from Eq. [9] as:
where \(\sum_{i=1}^{N(L)}{P}_{i}^{q}(L)\) is the mean of the distribution of the distorted probability mass for a size \(L\), and the generalized dimension (\(Dq\)) is determined for each \(q\) where each mass is distorted by being raised to \(q\). So, \(q\) can be considered as a “microscope” allowed to explore different regions of the \({P}_{i}\) distribution. Low values of \(q\) favor boxes with low \({P}_{i}(L)\) (low irregularities) and high values of \(q\) favor boxes with high values of \({P}_{i}(L)\) (high irregularities). In other words, for \(q > 1\), \({D}_{q}\) represents the more singular regions, and for \(q < 1\), it accentuates the less singular regions [36].
The relation between the power exponents \(f(\alpha )\) and the exponent \(\tau (q)\) is established via the Legendre transformation:
and
However, the determination of \(f(\alpha )\) needs to smooth the \({D}_{q}\) curve and then use the Legendre transformation. The smoothing operation provides errors in the estimation of \(f(\alpha )\) and misses phase transitions when it exhibits discontinuities. In order to avoid these numerical errors, Chhabra and Jensen proposed a method for a direct calculation of the multifractal spectrum using the following formulas [28]:
First, a family of normalized measures \({\mu }_{i}(q,L)\) was constructed where the probabilities in the boxes of size \(L\) are:
Then, the direct computation of \(f(q)\) and \(\alpha (q)\) values is:
For each \(q\), values of \(\alpha (q)\) and \(f(q)\) are obtained from the slope of plots of the numerators of Eqs. (14) and (15) vs. \(log L\) over the entire range of \(L\) values considered. The \(f(q)\) and \(\alpha (q)\) functions obtained over a given \(\Delta q\) were used to construct the \(f(\alpha )\)-spectrum as an implicit function of \(q\) and \(L\).
Multifractal Detrended Moving Average (MFDMA)
The two-dimensional multifractal detrended moving average (2D-MFDMA) algorithm is described as follows [33]: Consider a surface of possible multifractal properties which can be denoted by a two-dimensional matrix \(X({i}_{1},{i}_{2})\), with \({i}_{1}=\mathrm{1,2},...,{N}_{1}\), and \({i}_{2}=\mathrm{1,2},...,{N}_{2}\).
Step 1: Calculate the sum \(Y({i}_{1},{i}_{2})\)
The first step consist to calculate the sum \(Y({i}_{1},{i}_{2})\) in a sliding window with size \({n}_{1} \times {n}_{2}\), where \({n}_{1}\le {i}_{1}\le {N}_{1}-\left({(n}_{1}-1\right){\theta }_{1}\) and \({n}_{2}\le {i}_{2}\le {N}_{2}-\left({(n}_{1}-1\right){\theta }_{2}\). θ1 and θ2 are two position parameters that vary in the range \([0 \space\ 1]\). Specifically, we extract a sub-matrix \(Z=X({u}_{1},{u}_{2})\) with size \({n}_{1}\times {n}_{2}\) from the matrix \(X\), where \({i}_{1}-{n}_{1}+1 \le {u}_{1}\le {i}_{1}\) and \({i}_{2}-{n}_{2}+1 \le {u}_{2}\le {i}_{2}\). We can calculate the sum \(Y({i}_{1},{i}_{2})\) of \(Z\) as follows:
Step 2: Determine the moving average function \(\tilde{Y }\left({i}_{1,}{i}_{2}\right)\).
First, we extract a sub-matrix \(W=X({u}_{1},{u}_{2})\) with size \({n}_{1} \times {n}_{2}\) from the matrix \(X\), where \({k}_{1}-\lceil\left({n}_{1}-1\right)(1-{\theta }_{1})\rceil\)\(\le {u}_{1}\le {k}_{1}-\left({n}_{1}-1\right){ \theta }_{1}\) and \({k}_{2}-\lceil\left({n}_{2}-1\right)(1-{\theta }_{2})\rceil\le {u}_{2}\)\(\le {k}_{2}-\left({n}_{2}-1\right){ \theta }_{2}\), where \(\lceil\left({n}_{1}-1\right)(1-{\theta }_{1})\rceil+1\le {k}_{1}\le\)\({N}_{1}-\left({n}_{1}-1\right){ \theta }_{1}\) and \(\lceil\left({n}_{2}-1\right)(1-{\theta }_{2})\rceil+1\le {k}_{2}\le {N}_{2}-\)\(\left({n}_{2}-1\right){ \theta }_{2}\). Then, we calculate the cumulative sum of the matrix \(\tilde{W }({m}_{1},{m}_{2})\) of W:
where \(1\le {m}_{1}\le {n}_{1}\) and \(1\le {m}_{2}\le {n}_{2}\).
The moving average function \(\tilde{Y }\left({i}_{1,}{i}_{2}\right)\) can be calculated as follows:
Step 3: The residual matrix \(\varepsilon \left({i}_{1},{i}_{2}\right)\)
The matrix is detrended by removing the moving average function \(\tilde{Y }\left({i}_{1,}{i}_{2}\right)\) from \(Y\left({i}_{1},{i}_{2}\right)\) to obtain the residual matrix \(\varepsilon \left({i}_{1},{i}_{2}\right)\):
Step 4: The detrended fluctuation function.
The residual matrix \(\varepsilon \left({i}_{1},{i}_{2}\right)\) is partitioned into \({N}_{{n}_{1}}\times {N}_{{n}_{1}}\) disjoint rectangle segment of the same size. Each segment can be denoted by \({\varepsilon }_{{v}_{1},{v}_{2}}\) such that \({\varepsilon }_{{v}_{1},{v}_{2}}\left({i}_{1},{i}_{2}\right)=\varepsilon \left({I}_{1}+{i}_{1},{I}_{2}+{i}_{2}\right)\) for \(1\le {i}_{1}\le {n}_{1}\) and \(1\le {i}_{2}\le {n}_{2}\), where \({I}_{1}=({v}_{1}-1){n}_{1}\) and \({I}_{2}=({v}_{2}-1){n}_{2}\). The detrended fluctuation \({F}_{{v}_{1},{v}_{2}}\left({n}_{1},{n}_{2}\right)\) of segment \({\varepsilon }_{{v}_{1},{v}_{2}}\left({i}_{1},{i}_{2}\right)\) can be calculated as follow:
Step 5: The qth order overall fluctuation function \({f}_{q}(n)\) is calculated as follows:
\({F}_{q}\left(n\right)\) is a vector obtained from the detrended fluctuation of each segment for each value of \(q\). Then, it is used to calculate the multifractal scaling exponent \(\tau (q)\), \({n}^{2}={(n}_{1}^{2}+{n}_{2}^{2})/2\) and \(q\), mathematically, can take any real values except \(q=0\). In practice, if \(q=0\), \({F}_{q}\left(n\right)\) is given by: \({F}_{q}\left(n\right)={exp}^{\frac{1}{2} \frac{1}{{N}_{{n}_{1}}{N}_{{n}_{2}}}\sum_{{v}_{1}=1}^{{N}_{{n}_{1}}}\sum_{{v}_{2}=1}^{{N}_{{n}_{2}}}\mathit{log}{F}_{{v}_{1},{v}_{2}}^{2}({n}_{1},{n}_{2})}\)
Step 6: Varying the segment sizes \({n}_{1}\) and \({n}_{2}\), we can determine the power-law relation between the q-order overall fluctuation function \({F}_{q}\left(n\right)\) and the scale \(n\):
For each q, we can get the corresponding traditional \(\tau \left(q\right)\) function through:
and obtain the singularity strength function \(\alpha (q)\) and the multifractal spectrum \(f(\alpha )\) via Legendre transform.
Comparison Between the Two Methods
The generalized fractal dimensions method is the most used in literature due to its simplicity of its implementation and robustness. Compared to MFDMA, the computation time is short and fast (Table 3) and only two parameters need to be adjusted (Table 4):
-
Box sizes \(L\): are powers of two, i.e., \(L = 1, 2, 4 ... {2}^{p}\), where \(P\) is the smallest integer such that \(max(size(Image)) \le {2}^{p}\).
-
q-orders: described previously. In this paper, we took \(q=[-4:0.1:4]\).
Database
There are several mammographic image databases which are used by researchers in the breast analysis field. In this paper, we used the INbreast database. This database was acquired at the Breast Center of Porto. The image matrices are 3328 × 4084 or 2560 × 3328 pixels saved in DICOM format. It has FFDM (full-field digital mammograms) images from screening, diagnostic, and follow-up cases with a total of 115 cases (a total of 410 images) of which 90 cases (MLO and CC) are from women with both breasts affected (four images per case) and 25 cases are from mastectomy patients (two images per case). It includes several types of lesions: masses, calcifications, asymmetries, architectural distortions, and multiple findings (Fig. 1) and provides information regarding patient’s age at the time of image acquisition, family history, BI-RADS classification of the breast density (Fig. 2), and abnormality [37].
The database contains 308 images of calcifications of different BI-RADS breast density classes (Fig. 2, Table 5). Thus, 6880 calcifications were individually identified in 299 images. Annotations were made by a specialist in the field, and validated by a second specialist (Fig. 3). These specialists are experts in reading mammograms. A detailed contour of the finding was made. An ellipse enclosing the entire cluster was adopted to annotate the clusters of MCs [37].
Proposed Method
In this paper, we propose an approach based on multifractal analysis for the characterization and classification of normal and abnormal ROIs to detect MCs. Figure 4 shows the global diagram of the proposed approach. Our method is divided into five steps: pre-processing, ROI extraction, multifractal analysis, feature extraction, and classification.
Pre-Processing
Image Filtering
Images of the INbreast database are of high quality and do not contain labels or artifacts, so they do not require a filtering step. But they are of large size and contain many empty rows and columns which represent the background of the mammogram (represented in black). So, in order to reduce the computation time of the next steps, the mammogram is filtered from this black background by removing the empty rows and columns and keeping only the breast [25].
Image Enhancement
Considering breast density as haze in mammography, we use a low-light image enhancement algorithm based on the haze removal technique [38] to reduce breast density and highlight MCs in relation to density [25].
α-image (Multifractal Measures)
α-image is constructed from the multifractal measures and the estimated Hölder exponents \({\alpha }_{p}\). For estimating Hölder exponents \({\alpha }_{p}\), natural logarithms of measure value \(\mathrm{ln}({\mu }_{p}\left(m,n\right))\) and of the window size \(\mathrm{ln}(L)\) (Eq. 2) are calculated and plotted corresponding points in bi-logarithmic diagram \(\mathrm{ln}({\mu }_{p}\left(m,n\right))\) vs. \(\mathrm{ln}(L)\) where 03 boxes of 1 × 1, 3 × 3, and 5 × 5 pixels in size were considered. Then, the limiting value of \(\alpha (m,n)\) is estimated as the slope of the linear regression line. All steps are detailed in the previous work [25]. The final result represents the enhanced mammogram.
ROI Extraction and Multifractal Analysis
Normal and abnormal ROIs containing MCs are extracted from the enhanced image. Based on the expert’s annotations, the coordinates of MCs are extracted, and then the ROI is cropped such that these coordinates represent the center of the ROI. For a normal ROI, the coordinates of the center are chosen randomly where there is no annotation. The ROIs are nonoverlapping of size 64 × 64 pixels. Then, for each extracted ROI, the multifractal spectrum is calculated using the two different methods described in the previous section.
Feature Extraction
Multifractal Features
From each multifractal spectrum of each ROI, five multifractal parameters are extracted: Hölder exponent \({\alpha }_{0}\) (correspending to the maximum of the spectrum), spectrum width (\(w={\alpha }_{max}- {\alpha }_{min}\)), \(R= {\alpha }_{0}- {\alpha }_{min}\), \(L= {\alpha }_{max}- {\alpha }_{0}\), and Asymmetry (\(A=({\alpha }_{0}- {\alpha }_{min})/( {\alpha }_{max}- {\alpha }_{0})\)) (Fig. 5).
The Hölder exponent \({\alpha }_{0}\) measures the intensity of the local irregularities present in the image [39]. The parameter Δα is a measure determining the degree of multifractality. The smaller Δα indicates that the function tends to be monofractal and the larger one indicates the enhancement of multifractality [40]. Asymmetry \(A\) measures the symmetry of the spectrum. It is null for symmetrical shapes, positive or negative for left and right asymmetric shapes, respectively [41]. An asymmetric spectrum on the right corresponds to a concentration of irregularities in large structures and conversely, an asymmetrical spectrum on the left corresponds to a concentration of irregularities in fine structures [42].
Gray Level Co-occurrence Matrix (GLCM)
GLCM is defined as the distribution of co-occurring values at a given offset. It consists of identifying patterns of pairs of pixels separated by a distance \(d\) in a direction \(\theta\) by calculating how often a pixel with a gray-level value \(i\) occurs either horizontally (\(0^\circ\)), vertically (\(90^\circ\)) or diagonally (\(-45^\circ\) or \(-135^\circ\)) to adjacent pixels with the value \(i\).
In this paper, since MCs are small structures of different sizes and shape, we extracted GLCM metrics for single pixel distances (an offset \(d=1\)) along the horizontal direction (\(\theta =0^\circ\)). If we apply large distances or different directions, the GLCM can miss detailed information. The chosen metrics also allow us to minimize execution time.
After the creation of the GLCM, several statistics can be derived providing information about the texture of the image [43]:
where
Classification
Classification based on machine learning techniques is a two-step process: the learning step where the model is developed from given training data and the prediction step where the trained model is used to predict the response of given data. Generally, it is a supervised machine-learning that use training data and associated labels during the model learning process. The objective is to predict output labels of input data related to what the model has learned during the training phase. Therefore, each output response belongs to a specific class. Many supervised machine-learning algorithms have been proposed in the literature [44]. The most popular are support vector machines (SVM), K-nearest neighbors (KNN), and decision trees (DT) which are used in this research to classify ROIs in which the output is either normal (healthy) or abnormal (MCs) case.
Support Vector Machine (SVM)
Support vector machines (SVM) are one of the main supervised machine-learning algorithms that are not only accurate but also highly robust. In the SVM algorithm, each data item is plotted as a point in n-dimensional space (where n is number of features) with the value of each feature being the value of a particular coordinate. The objective is to find the most appropriate classification function by making a comparison of the separating hyperplane that goes through the center of the two classes, separating the two. The role of SVM is to increase the margin (the shortest possible space between the hyperplane point and the closely located data points) to the maximum between the two classes [44, 45].
Decision Tree (DT)
A decision tree is a hierarchical supervised learning model. The goal of using a DT is to create a training model that can be used to predict the class of the target variable by learning simple decision rules (in the form of yes or no questions) inferred from prior data (training data). A DT is made up of internal decision nodes and terminal leaves. Each decision node uses a test function that labels the branches with discrete scores. A test is used at every node with an input, and one of the branches is chosen depending on the result. This process starts from the root with the complete training information, the best split must be checked in each phase. It divides the training data into two or more classes. Then, we continue to divide recursively with the relevant subset until there is no longer any need to split; at this stage, a leaf node is generated and labeled [44].
K-Nearest Neighbors (KNN)
In k-nearest neighbors classification, examples are classified based on the class of their nearest neighbors (more than one neighbor). The K-NN classification has two stages: the first is the determination of the nearest neighbors, and the second is the determination of the class using those neighbors. When classification needs to be determined for an unlabeled object, the distance metric between the labeled object and the unlabeled object is calculated. The k-nearest neighbors are selected based on this distance metric. Thereby, the identification of the k-nearest neighbors is attained. Therefore, the nearest neighbors’ class labels are employed in order to identify the object’s class label [44, 46].
To assess the performance of our approach, the extracted features are fed to the classifiers and classification accuracy, sensitivity, and specificity are calculated. The SVM classifier was trained using standardized predictors and the second-order polynomial kernel. The KNN classifier was trained using standardized Euclidean distance metric. The DT classifier was trained without using specific options.
In order to split our database into training and testing sets, the k-fold cross validation method is used. It consists to split the database into k groups, for each k, the group is taken as the test data and the remaining groups are taken as training data. This process (cross validation process) is repeated k times. In this paper, each classifier was tested using 5-folds cross validation.
Let TP, FP, TN, and FN be the number of true positives, false positives, true negative, and false negatives, respectively, and sensitivity, specificity, and accuracy are defined by:
Results and Discussion
The proposed system was tested and validated using MATLAB R2019b, on a personal computer with an Intel (R) Core (TM) i5-3230 M CPU processor and 8 Go RAM, running under Windows 7 operating system.
After the pre-processing step (Fig. 6), normal ROIs and abnormal ROIs (containing MCs) of size 64 × 64 pixels were extracted from the pre-processed mammograms based on the expert’s annotations. The size of the ROI influences the multifractal spectrum and its parameters. If the ROI’s size is larger, the ROI contains more singularities (structures); therefore, it can negatively impact the classification. If the ROI’s size is smaller, the classification results will be better. Also, the choice of this size allows us to reduce the computation time of the multifractal spectrum. Figures 7 and 8 compare some examples of ROIs obtained before (without pre-processing step) and after pre-processing (with pre-processing step). The influence of pre-processing was explained, evaluated, and discussed in the previous work [25].
The next step is to compute the multifractal spectrum of each ROI (normal and abnormal) and extract the parameters described in the previous section. To illustrate this, we took 10 examples of normal ROI and 10 examples of abnormal ROI (Fig. 9), we calculated the multifractal spectrum using the generalized fractal dimensions method of each ROI before and after pre-processing, then, the multifractal parameters are extracted. Figure 10 shows the result of obtained multifractal spectrums of ROIs before and after pre-processing. The extracted parameters are presented in Tables 6 and 7.
From the figures and tables above, we notice that before pre-processing, the normal ROIs were confused with the abnormal ROIs containing only one MC, and the multifractal spectrum could not differentiate between them. But after pre-processing, a very good discrimination of these ROIs was obtained from the multifractal spectrum and the extracted parameters.
According to Table 8, before pre-processing, we notice that the variation intervals of each parameter are very close to the normal and abnormal ROIs. After pre-processing, a large difference in these intervals is recorded, hence a better characterization and discrimination between the normal and the abnormal ROIs.
Results of ROI Classification
The proposed approach was applied to each BI-RADS class of the INbreast database (Table 9). After extraction of normal and abnormal ROIs where only ROIs with individual MC were considered, the multifractal spectrum was calculated using the two methods described in the previous section. Extracted parameters are used for classification of ROIs using the three classifiers. Obtained results are mentioned in Tables 10, 11 and 12. Table 13 presents the comparison of our results with the state of the art.
The classification results obtained after the pre-processing step are very striking and persuasive, which proves that the proposed method for mammogram pre-processing is effective in enhancing the contrast of MCs by reducing the effect of breast density.
By comparing the two methods, the MFDMA method is not only complicated in its implementation where several parameters need to be adjusted and takes more computation time, but also does not give good classification results compared to the generalized fractal dimensions method.
In most research, the analyzed ROIs of MCs are ROIs where the MCs (clusters in general) are very clear and discriminated from the surrounding tissue. Also, critical cases where MCs are invisible and masked by the high breast density (BI-RADS C and D) are not analyzed in their studies. In our work, it should be noted that the analyzed ROIs contain only one MC, so, if we take ROIs containing more than one MC (cluster for example) or a mass, the result would be better since the multifractal spectrum was sensitive to one MC.
The BI-RADS classification classifies the mammogram according to the density of the breast tissue from the weakest to the densest. The denser the tissue, the more difficult detection is because breast density masks lesions, especially small lesions such as MCs. In this paper, all BI-RADS cases have been taken into consideration (BI-RADS A, B, C, and D). The classification accuracy obtained for classes A, B, and C exceeds 97%; for class D, the obtained accuracy of 93.33% is low compared to other classes, but it remains a promising result in detecting MCs. Grouping all cases, the classification accuracy reached 98% which is a very satisfactory rate compared with the literature.
The proposed algorithm could be efficient for other databases, such as BCDR or DDSM. The pre-processing step may not work the same way, since they do not have the same image quality, further filtering steps are required, but multifractal analysis is still a powerful tool for singularity analysis.
Conclusion
In this paper, we have proposed an aided-system for the detection of MCs and classification of mammographic ROIs into normal or abnormal ROIs based on multifractal features. The proposed approach is based on multifractal analysis where the multifractal measures were used to enhance the contrast of the MCs, and then the multifractal spectrum was computed to extract the multifractal attributes used for the classification of the ROIs.
Generally, multifractal analysis does not require a pre-processing step, because it is a point analysis, which measures the local regularity and studies its variation from one point to another. In this paper, the pre-processing step was necessary for a better characterization and classification of ROI. A brief comparison of two methods of spectrum computation was made where the generalized fractal dimension method gave the best results. The combination between the multifractal features and the GLCM features gives a better result of classification where the SVM classifier was the most efficient. In comparison with the literature, the proposed system has given very satisfactory and promising results. The particularity of our work lies in the fact that all cases of breast density have been taken into consideration, which has not been done in the literature. On the other hand, the analyzed abnormal ROIs in our study contained only one MC, i.e., the spectrum was sensitive to one MC.
Finally, if this approach has been effective for the detection of individual MCs (ROIs containing one MC), certainly, it will be effective for the detection and classification of ROIs containing other abnormalities such as clusters of MCs or masses. Also, this approach could be used for the discrimination and classification of benign and malignant breast abnormalities.
Data Availability
The INbreast dataset used in this work was assigned by the manager with a signed transfer agreement.
Code Availability
Some matlab codes used in this work are available in Matlab toolboxes, and some others programmed by the authors are not available, they are personal.
References
“Cancer.” https://www.who.int/fr/news-room/fact-sheets/detail/cancer (accessed Jun. 06, 2021).
S. B. Yengec Tasdemir, K. Tasdemir, and Z. Aydin, “A review of mammographic region of interest classification,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 10, no. 5. Wiley-Blackwell, Sep. 01, 2020, https://doi.org/10.1002/widm.1357.
V. Bhateja, M. Misra, and S. Urooj, “Studies in Computational Intelligence 861 Non-Linear Filters for Mammogram Enhancement A Robust Computer-aided Analysis Framework for Early Detection of Breast Cancer.” [Online]. Available: http://www.springer.com/series/7092.
A. Elmoufidi, K. El Fahssi, S. Jai-andaloussi, A. Sekkaki, Q. Gwenole, and M. Lamard, “Anomaly classification in digital mammography based on multiple-instance learning,” IET Image Process., vol. 12, no. 3, pp. 320–328, Mar. 2018, https://doi.org/10.1049/iet-ipr.2017.0536.
A. Gautam, V. Bhateja, A. Tiwari, and S. C. Satapathy, “An improved mammogram classification approach using back propagation neural network,” in Advances in Intelligent Systems and Computing, 2018, vol. 542, pp. 369–376. https://doi.org/10.1007/978-981-10-3223-3_35.
N. Tavakoli, M. Karimi, A. Norouzi, N. Karimi, S. Samavi, and S. M. R. Soroushmehr, “Detection of abnormalities in mammograms using deep features,” J. Ambient Intell. Humaniz. Comput., 2019. https://doi.org/10.1007/s12652-019-01639-x.
D. Muduli, R. Dash, and B. Majhi, “Automated breast cancer detection in digital mammograms: A moth flame optimization based ELM approach,” Biomed. Signal Process. Control, vol. 59, May 2020, https://doi.org/10.1016/j.bspc.2020.101912.
K. Hu, W. Yang, and X. Gao, “Microcalcification diagnosis in digital mammography using extreme learning machine based on hidden Markov tree model of dual-tree complex wavelet transform,” Expert Syst. Appl., vol. 86, pp. 1339–1351, Nov. 2017, https://doi.org/10.1016/j.eswa.2017.05.062.
Z. Suhail, E. R. E. Denton, and R. Zwiggelaar, “Classification of micro-calcification in mammograms using scalable linear Fisher discriminant analysis,” Med. Biol. Eng. Comput., vol. 56, no. 8, pp. 1475–1485, Aug. 2018. https://doi.org/10.1007/s11517-017-1774-z.
B. Singh and M. Kaur, “An approach for classification of malignant and benign microcalcification clusters,” Sādhanā, Vol. 43, 2018, https://doi.org/10.1007/s12046-018-0805-2S.
J. G. Melekoodappattu and P. S. Subbian, “A Hybridized ELM for Automatic Micro Calcification Detection in Mammogram Images Based on Multi-Scale Features,” J. Med. Syst., vol. 43, no. 7, Jul. 2019, https://doi.org/10.1007/s10916-019-1316-3.
M. Dong, Z. Wang, C. Dong, X. Mu, and Y. Ma, “Classification of Region of Interest in Mammograms Using Dual Contourlet Transform and Improved KNN,” J. Sensors, vol. 2017, 2017, https://doi.org/10.1155/2017/3213680.
K. U. Sheba and S. Gladston Raj, “An approach for automatic lesion detection in mammograms,” Cogent Eng., vol. 5, no. 1, Jan. 2018, https://doi.org/10.1080/23311916.2018.1444320.
F. Mohanty, S. Rup, B. Dash, B. Majhi, and M. N. S. Swamy, “Mammogram classification using contourlet features with forest optimization-based feature selection approach,” Multimed. Tools Appl., vol. 78, no. 10, pp. 12805–12834, May 2019, https://doi.org/10.1007/s11042-018-5804-0.
H. Kaur, J. Virmani, Kriti, and S. Thakur, “A genetic algorithm-based metaheuristic approach to customize a computer-aided classification system for enhanced screen film mammograms,” in U-Healthcare Monitoring Systems, Elsevier, 2019, pp. 217–259.
S. A. Agnes, J. Anitha, S. I. A. Pandian, and J. D. Peter, “Classification of Mammogram Images Using Multiscale all Convolutional Neural Network (MA-CNN),” J. Med. Syst., vol. 44, no. 1, Jan. 2020, https://doi.org/10.1007/s10916-019-1494-z.
F. Mohanty, S. Rup, B. Dash, B. Majhi, and M. N. S. Swamy, “An improved scheme for digital mammogram classification using weighted chaotic salp swarm algorithm-based kernel extreme learning machine,” Appl. Soft Comput. J., vol. 91, Jun. 2020, https://doi.org/10.1016/j.asoc.2020.106266.
F. Mohanty, S. Rup, B. Dash, B. Majhi, and M. N. S. Swamy, “Digital mammogram classification using 2D-BDWT and GLCM features with FOA-based feature selection approach,” Neural Comput. Appl., vol. 32, no. 11, pp. 7029–7043, Jun. 2020, https://doi.org/10.1007/s00521-019-04186-w.
L. Losurdo et al., “A combined approach of multiscale texture analysis and interest point/corner detectors for microcalcifications diagnosis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 10813 LNBI, pp. 302–313, https://doi.org/10.1007/978-3-319-78723-7_26.
G. R. Jothilakshmi, A. Raaza, V. Rajendran, Y. Sreenivasa Varma, and R. Guru Nirmal Raj, “Pattern Recognition and Size Prediction of Microcalcification Based on Physical Characteristics by Using Digital Mammogram Images,” J. Digit. Imaging, vol. 31, no. 6, pp. 912–922, Dec. 2018, https://doi.org/10.1007/s10278-018-0075-x.
S. Paramkusham, K. M. M. Rao, and B. V. V. S. N. P. Rao, “Comparison of rotation invariant local frequency, LBP and SFTA methods for breast abnormality classification,” Int. J. Signal and Imaging Syst. Eng., vol. 11, no. 3, pp. 136-150, 2018.
X. Yu and S. H. Wang, “Abnormality Diagnosis in Mammograms by Transfer Learning Based on ResNet18,” Fundam. Informaticae, vol. 168, no. 2–4, pp. 219–230, 2019, https://doi.org/10.3233/FI-2019-1829.
M. Hekim, A. A. Yurdusev, and C. Oral, “The detection and classification of microcalcifications in the visibility-enhanced mammograms obtained by using the pixel assignment-based spatial filter,” Adv. Electr. Comput. Eng., vol. 19, no. 4, pp. 73–82, 2019, https://doi.org/10.4316/AECE.2019.04009.
A. Fanizzi et al., “A machine learning approach on multiscale texture analysis for breast microcalcification diagnosis,” BMC Bioinformatics, vol. 21, Mar. 2020, https://doi.org/10.1186/s12859-020-3358-4.
N. Kermouni Serradj, S. Lazzouni, and M. Messadi, “Mammograms enhancement based on multifractal measures for microcalcifications detection,” Int. J. Biomed. Eng. Technol (in press).
A. N. D. Posoda, D. Giménez, R. Quiroz, and R. Protz, “Multifractal characterization of soil pore systems,” SOIL SCL SOC.AM.J., vol. 67, pp. 1361–1369, 2003.
J. L. Véhel, “Fractal and multifractal processing of images,” Traitement du Signal, vol. 20, pp. 303–311, 2003. [Online]. Available: http://www-rocq.inria.fr/fractales.
A. Chhabra and R. V Jensen, “Direct Determination of the f (c) Singularity Spectrum,” Phys. Rev. Letters, vol. 62, no. 12, pp. 1327-1330, 1989.
S. G. De Bartolo, R. Gaudio, and S. Gabriele, “Multifractal analysis of river networks: Sandbox approach,” Water Resour. Res., vol. 40, no. 2, 2004, https://doi.org/10.1029/2003WR002760.
M. Broniatowski and P. Mignot, “A self-adaptive technique for the estimation of the multifractal spectrum,” Statistics and Probability Letters, vol. 54, pp. 125-135, 2001.
A. Arneodo, B. Audit, P. Kestener, and S. Roux, “Multifractal Formalism based on the Continuous Wavelet Transform.” Scholarpedia, vol. 3, 2007.
S. Jaffard, L. Bruno, and A. Patrice, “wavelet-leaders-in-multifractal-analysis,” in Wavelet Analysis and Applications, Q. Tao, I. V. Mang, and Y. Xu, Eds. Switzerland: Birkhauser Verlag Basel, 2006, pp. 219–264.
C. Xi, S. Zhang, G. Xiong, and H. Zhao, “A comparative study of two-dimensional multifractal detrended fluctuation analysis and two-dimensional multifractal detrended moving average algorithm to estimate the multifractal spectrum,” Phys. A Stat. Mech. its Appl., vol. 454, pp. 34–50, Jul. 2016, doi: https://doi.org/10.1016/j.physa.2016.02.027.
F. Wang, Q. Fan, and H. E. Stanley, “Multiscale multifractal detrended-fluctuation analysis of two-dimensional surfaces,” Phys. Rev. E, vol. 93, no. 4, Apr. 2016, https://doi.org/10.1103/PhysRevE.93.042213.
B. Yao, F. Imani, A. S. Sakpal, E. W. Reutzel, and H. Yang, “Multifractal Analysis of Image Profiles for the Characterization and Detection of Defects in Additive Manufacturing,” J. Manuf. Sci. Eng. Trans. ASME, vol. 140, no. 3, Mar. 2018, https://doi.org/10.1115/1.4037891.
R. Lopes and N. Betrouni, “Fractal and multifractal analysis: A review,” Med. Image Anal., vol. 13, no. 4, pp. 634–649, Aug. 2009, doi: https://doi.org/10.1016/j.media.2009.05.003.
I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M. J. Cardoso, and J. S. Cardoso, “INbreast: Toward a Full-field Digital Mammographic Database.,” Acad. Radiol., vol. 19, no. 2, pp. 236–248, Feb. 2012, doi: https://doi.org/10.1016/j.acra.2011.09.014.
“Low-Light Image Enhancement - MATLAB & Simulink Example - MathWorks France.” https://fr.mathworks.com/help/images/low-light-image-enhancement.html (accessed Jun. 06, 2021).
A. Ouahabi, Signal and image Multiresolution Analysis. Wiley-ISTE, 2012.
O. E. Dick and I. A. Svyatogor, “Potentialities of the wavelet and multifractal techniques to evaluate changes in the functional state of the human brain,” Neurocomputing, vol. 82, pp. 207–215, Apr. 2012, doi: https://doi.org/10.1016/j.neucom.2011.11.013.
L. Telesca, G. Colangelo, V. Lapenna, and M. Macchiato, “Monofractal and multifractal characterization of geoelectrical signals measured in southern Italy,” Chaos, Solitons and Fractals, vol. 18, no. 2, pp. 385–399, 2003, doi: https://doi.org/10.1016/S0960-0779(02)00655-0.
S. Oudjemia, “Analyse des signaux biomedicaux par des approches multifractales et entropiques: Application à la variabilité du rythme cardiaque foetal,” Mouloud Mammeri University, Tiizi-ouzou, 2015.
“Properties of gray-level co-occurrence matrix - MATLAB graycoprops - MathWorks France.” https://fr.mathworks.com/help/images/ref/graycoprops.html (accessed Jun. 06, 2021).
A. Subasi, “Machine learning techniques,” in Practical Machine Learning for Data Analysis Using Python, Elsevier, 2020, pp. 91–202.
S. Huang, C. A. I. Nianguang, P. Penzuti Pacheco, S. Narandes, Y. Wang, and X. U. Wayne, “Applications of support vector machine (SVM) learning in cancer genomics,” Cancer Genomics and Proteomics, vol. 15, no. 1. International Institute of Anticancer Research, pp. 41–51, Jan. 01, 2018, https://doi.org/10.21873/cgp.20063.
P. Cunningham and S. J. Delany, “k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples),” Apr. 2020, [Online]. Available: http://arxiv.org/abs/2004.04523.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kermouni Serradj, N., Messadi, M. & Lazzouni, S. Classification of Mammographic ROI for Microcalcification Detection Using Multifractal Approach. J Digit Imaging 35, 1544–1559 (2022). https://doi.org/10.1007/s10278-022-00677-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-022-00677-w