Background

Breast cancer among women is a well-known disease throughout the world. About 1.68 million cases and the 522,000 deaths causes of the breast cancer were registered in 2012 [1]. Computer aided diagnosis (CAD) was designed to locate the premature level of the breast cancer [2]. A number of imaging techniques have also been presented to manage this issue, such as mammography [3], ultrasound [4], magnetic resonance imaging (MRI) [5], PET/CT scan [6], SPECT, thermogram [7], and tomography [8]. Mammography is one of the most suggested imaging modality to detect the breast tumor at early stage. In screening mammography [9], two different angels of breast body are stored in mammogram which are cranial-caudal (CC) and mediolateral-oblique (MLO) assessment as shown in Fig. 1.

Fig. 1
figure 1

Two sided views of left and right mammograms: a left CC, b left MLO, c right CC and d right MLO

CC is used to observe “top to bottom” information and MLO is used to observe the “side view”. The difficulty with the MLO view of mammogram is the larger area of the pectoral muscle mass tissue, complex contour, and structural volume. However, pectoral muscle is a dense region and prominent in mammogram. It does not provide any valuable information. Moreover, it also affects the segmentation, feature extraction, and classification process, which leads to the high rate of false positive.

In recent years, a lot of automatic pectoral muscle removal methods have been developed [10]. However, due to the variations in size, shapes, intensity, and contrasts of the pectoral muscles, most of the existing techniques [8,9,10,11] fail to remove accurate muscle regions from the entire mammograms. The advantages of our proposed method are: 1) muscle detection possibility is improved, even in low contrast problems, 2) pectoral muscle shape tracking is attained without using of the heuristic thresholding, and 3) to identify the boundary of a breast. The existence of these problems may lead to wrong assumptions of a false-(negative and positive) rates with un-desired biopsies [11].

The proposed work is arranged in following Sections. “Related work” shows a literature analysis of the existing approaches regarding pectoral muscle extraction process. “Proposed methodology” demonstrates the proposed method for approximating the skin line boundary for given breast body. “Experiments and results” provides the simulation results and discussion, whereas, conclusions are presented in “Conclusion”.

Related work

Mammograms is known as a most recommended imaging modality to observe the breast cancer at initial stage [12]. The pectoral muscle in terms of mass tissue is used to support the breast body. Mostly pectoral muscle appears along with the breast tissues in Medio-Lateral Oblique (MLO) for observing the given mammograms. As a result segmentation data of the pectoral muscles with accurate contour by following the breast tissues has become challenging task in computer aided diagnosis (CAD) systems [13]. With existence of similarities in texture and pixel intensities of the pectoral muscles and breast tissues, it becomes very difficult to find out accurate region of interest or breast body which may lead towards awry CAD results. Usually, pectoral muscle is estimated in terms of a boundary measurement in form of straight line with range of an angle from 45° to 90°. Moreover, Hough transform (HM) was experienced to the accumulator cells for estimating a straight-line with the pectoral muscle of the given edges [14]. Another approach was used in order to find the pectoral muscle with the combination of the cliff detection technique and straight line estimation method [15]. An automatic procedure based on morphological operators and polynomial function is offered for finding pectoral muscles [16]. Various multi resolution techniques have been presented for extraction of the pectoral muscles [17]. A multi resolution approach is presented to classify the pectoral muscle of the MLO mammograms in wavelet domain [18]. A hybrid approach was presented to highlight the pectoral muscle and breast border using wavelet transform and bit depth reduction [19]. Pixel constancy constraint method is introduced at multi-resolution level for removing of the pectoral muscle [20]. Different techniques for locating the pectoral muscle edges based on contour detection and graphs have been discussed here. Combination of the minimum spanning trees and an active contour approach was presented for identifying the precise calculation of the pectoral muscle [21]. A method of the pectoral muscle identification at the rate of a 92% (DDSM database) and 90.06% (MIAS database) is presented in [22]. The method based on regression via RANSAC with edge detection have been proposed for contouring the muscle area [23]. Bezier curve method was established for leveling the region of the pectoral muscle using their control points [24]. An automated method based on normalized graph cuts segmentation technique is presented in [25]. Muscle contour detection method is adopted the shortest path with contour end point trained by support vector models [26].

A combination of an active contour technique is used with discrete time Markov chain (DTMC) for boundary detection of the pectoral muscle region. DTMC is determined to deal with two important properties of the pectoral muscle edges which are continuity and uncertainty. An active contour model is implemented on rough boundary to increase the detection rate of an affective part of the mammogram [27]. An intensity based approach with newly designed enhancement filter, and threshold method is presented to locate the contour of the pectoral muscle [28].

Various existing methods were demonstrated to extract the information of the pectoral muscle boundary [29,30,31,32,33,34,35]. Most of the techniques are constructed on the pixel divergences of the breast body and the tissues of the pectoral muscle. Intensity based segmentation methods may be noted using the intensity variations of a mass body tissues. However, it may cause an inconsistent segmentation outcome [29]. Recently, a number of the researchers tried to apply copious methods to achieve a sufficient segmentation rate using suitable intensity features [29,30,31,32,33,34]. With an exception of strong intensity based segmentation methods, histogram based founded techniques are conversed [14,15,16]. Furthermore, intensity based method designs by the hypothesis following the gray scale values with various structure of the known pectoral muscle could be achieved in higher order than its neighboring tissues [35,36,37,38,39,40,41,42,43,44,45,46].

Methods

The input data taken in the proposed method is used from the benchmark dataset of the MIAS. These images may contain label and machine artifacts with high intensity value at the top. Let Pϱ be the original mammogram on which segmentation is performed. In this regard, a flow chart is presented in Fig. 2.

Fig. 2
figure 2

Proposed methodology

Segmentation of the pectoral muscle

Our key drive of this research work was to elude the unnecessary areas from the breast region like pectoral muscle in a cost effective manner. Brightest pixels of the mammograms are present in the pectoral muscles regions. To avoid the false assumptions of positive results (mammogram shows cancer, but in fact there is no cancer), pectoral muscles regions should be removed, efficiently. Left or right pectoral muscles tissues are based on the front side view of the given mammogram. A labeled mammogram from the mini MIAS data is displayed in Fig. 3.

Fig. 3
figure 3

Labeled mammogram from mini MIAS

Label and artifacts removal

Usually background area in mammographic images may contain radiopaque artifacts, markers, and chocks. Let f() be a label removal function applied on image Pϱ which provides the binary image Iκ as shown in Eq. 1. f() is used to remove the undesired labels by amplifying the areas of the high intensity and segment them using a seed. The seed point is initialized on the convex hull and erodes the map until it has converged on the edge of the areas to preserve the edge geometry as a result we get a binary image Iκas described in [46,47,48,49,50,51].

$$ {I}_{\kappa }={P}_{\varrho}\leftarrow f\left(\nabla \right). $$
(1)
$$ {I}_{\psi}\leftarrow {I}_{\kappa }. P\varrho $$
(2)

Where Ik is used for preserving the original intensities to restore it back into gray scale (Iψ) image. The X-ray machine labels and certain other artifacts may be removed from the image and the object of interest is extracted as shown in Fig. 4.

Fig. 4
figure 4

Label along with artifacts removal: a and c given mammogram (original) and (b and d) after label and artifact removal

Boundary detection

Boundary detection to suppress the pectoral muscle from a breast parenchyma is an important step of the proposed method. It is possible to recognize pectoral muscle within an image using mammography features in terms of the edge detection. To detect contours, the differential operator is often used in practice which includes isotropic, Sobel, and Prewitt operators. These operators compute the horizontal and vertical differences of the local sums with reduced noise effects. The pixel location (α, β) is declared an edge location if φ(α, β) exceeds some threshold 0 > τ < 1. A threshold value τ with range between 0 and 1 is used as a power feature. This is used to manage the scrambled edges.

The locations of the edge points constitute an edge map Ρ(η, θ) which is defined as

$$ \mathrm{P}\left(\eta, \theta \right)=\left\{\ \begin{array}{c}1,\kern1em \left(\alpha, \beta \right)\in {I}_{\varphi}\\ {}0,\kern4em else\end{array}\right.,\kern1.5em where\kern0.5em {I}_{\varphi }=\kern0.5em \left\{\left(\alpha, \beta \right);\kern0.5em \varphi \left(\alpha, \beta \right)>\tau \right\},\kern0.5em $$
(3)

The edge map provides the significant information regarding the boundaries in an image. Usually, threshold value τ may be selected using the accumulative histogram of φ(α, β) so that the pixels with largest gradients are represented as sharp edges. A general edge detector is presented in Fig. 5. Results of the various edge detectors are given below in Fig. 6.

Fig. 5
figure 5

Edge detection map

Fig. 6
figure 6

Detection of edges: a Canny, b Prewitt, c Sobel, d Robert and e Laplacian

The performance is observed in various edge detectors for analyzing the peak signal to noise ratio metric (PSNR), mean square error metric (MSE), and structural similarity index measurement metric (SSIM). All these measures are determined for quality assessment of mammographic image. Highest value of the PSNR and the SSIM with lower mean square error gives the best choice of the edge detector [52,53,54,55,56,57,58]. Performance measures of the various edge detectors on mammograms taken from the mini MIAS are given below in Table 1.

Table 1 Performance measures of the various edge detectors on mammograms

For a noise-free monochrome image (I) of a size (ι × ω) and its noisy approximation κ(i, j), MSE, PSNR, and SSIM is defined as in Eqs. (4), (5), and (6) respectively.

$$ MSE=\frac{1}{\iota \omega}{\sum}_{j=0}^{\iota -1}{\sum}_{j=0}^{\omega -1}{\left[{I}_{\left(i,j\right)}-{\kappa}_{\left(i,j\right)}\right]}^2, $$
(4)
$$ PSNR=10{\log}_{10}\left(\frac{\gamma^2}{MSE}\right), $$
(5)

where γ is the maximum information value of the randomness in the given data.

$$ SSIM=\left[I\left(\iota, \omega \right)\right]\alpha .\left[\epsilon \left(\iota, \omega \right)\right]\beta .\left[s\left(\iota, \omega \right)\right]\gamma, $$
(6)

where, the entries are described as follows: \( \alpha =\beta =\gamma =1,\left[I\left(\iota, \omega \right)\right]=\frac{2{\delta}_{\iota }{\delta}_{\omega }+{\epsilon}_1}{{\delta_{\iota}}^2+{\delta_{\omega}}^2+{\epsilon}_1} \), \( \epsilon \left(\iota, \omega \right)=\frac{2{\sigma}_{\iota }{\sigma}_{\omega }+{\epsilon}_2}{{\sigma_{\iota}}^2+{\sigma_{\omega}}^2+{\epsilon}_2} \), and \( s\left(\iota, \omega \right)=\frac{2{\sigma}_{\iota \omega}+{\epsilon}_3}{\sigma_{\iota \omega}+{\epsilon}_3} \), respectively [53].

I(ι, ω) is a function of luminous comparison to measure the images closeness on the base of mean luminance δι δω of 2-D images ι and ω.Maximum value of I(ι, ω) is equal to 1 if and only if δι = δω. The second value ϵ(ι, ω) is used to measure the contrast on the base of standard deviation σι and σω.The third value s(ι, ω) measures the correlation between two images where σιω is the covariance value. The value of the SSIM lies in the range[0, 1], value 1 shows that two images are determined using the same quality measurement and 0 value indicates no correlation is determined between two mammograms images. According to quality analysis of images after implementing various edge detection techniques: the Sobel and Prewitt operators are considered a good choice. The Prewitt and the Sobel filter are same as filter mask of a 3 × 3 which is used for detection of gradient in the (x, y ) directions. The only difference exists is the spectral response. Prewitt filter is very suitable for enhancing high frequency and low frequency within the edges of the images edge detection. Sobel operator is a good choice for horizontal borders or edges and Prewitt operator detects better the vertical borders. As pectoral muscle usually appears with vertical border so, Prewitt operation is the best option in the proposed work. It makes use of a 3 × 3 total convolution mask for the detection of gradient (φ) in the 2-dimensional case as follows:

$$ \varphi =\sqrt{{\varphi_I}^2+{\varphi_{\mathrm{Y}}}^2}, $$
(7)
$$ \left|\varphi\ \right|=\left|{\varphi}_I\right|+\left|{\varphi}_{\mathrm{Y}}\right|, $$
(8)
$$ \theta = arrctan\frac{\varphi_I}{\varphi_{\mathrm{Y}}}. $$
(9)

Let Iψ is the image we obtained after label removal, f(φ) is a function of edge detection implemented on image Iψ with a threshold.

$$ {I}_{\vartheta }={I}_{\psi}\leftarrow f\left(\varphi \right). $$
(10)

The resultant images (Iϑ) have distorted boundaries as the area where highest intensity variation has been observed, which becomes a part of the background. In this regards, few images are shown in Fig. 7. The output image Iϑ with broken edges is processed with morphological ‘closing’ operation for obtaining a sealed and accurate boundary. The term ‘closing’ can be defined as a particular background region of a mammogram that is filled with particular color on selective basis. It may be dependent upon an appropriate shaping element of a mammogram for fitting or non-fitting purpose to keep the pectoral muscle structure to be preserved or to be removed. For joining the edges of a broken boundary, morphological closing is used with disk shaped structuring element Ωυ. Closing is a dual operation of the opening that is produced using the dilation (⨁) of the Iϑ by Ωv, followed by the erosion (⊝) as shown in Eq. (11).

$$ {I}_{\vartheta}\cdotp {\Omega}_v=\left({I}_{\vartheta}\bigoplus \kern0.5em {\Omega}_v\right)\circleddash {\Omega}_v, $$
(11)

where, \( {I}_{\vartheta}\bigoplus {\Omega}_v=\bigcup \limits_{b\in B\ }\ {I_{\vartheta}}_b \). Let f(Iϑ · Ωv) be the closing operation performed on image Iϑ and the resultant binary image is Iβ.

$$ \kern0.5em {I}_{\beta }={I}_{\vartheta}\leftarrow f\left({I}_{\vartheta}\cdotp {\Omega}_v\right). $$
(12)
Fig. 7
figure 7

Edges of various mammograms taken by edge detector (Prewitt)

Feature mapping

Convex hull is used in broad-range applications of the computer graphics, CAD, and pattern recognition [37]. In this proposed work, we have used the convex hull to extract the sillhoute of the breast using a topographic map to the binary image. For completing this task, we generate a topographic map (Iσ) computing the feature set of four corners for all the foreground pixels in the binary image based on the previous step. A convex hull image (IΔ) is generated using the map Iσ. The IΔ has a shape-shifting property. When this image is superimposed on the four corners of the binary image (Iβ), it shifts the shape according to the map of the binary image and extract the silhouette of the breast body. The resultant image (Iδ) pixels are mapped with original gray scale image for acquiring the segmented breast profile image ( I) with original intensities of the breast area without pectoral muscles.

$$ {I}_{\updelta}\leftarrow {I}_{\Delta}+{I}_{\beta }, $$
(13)
$$ {I}_{\mathrm{s}\uptau}\leftarrow {I}_{\updelta +}{I}_{\psi }. $$
(14)
figure a

Experiments and results

We tested a mini-MIAS and contrast enhanced digital mammographic images [58,59,60,61,62,63,64] to eradicate the pectoral muscle and undesired artifacts. The assessment of the proposed algorithm is done subjectively in two ways; through visual inspection and comparison with a ground truth by an experienced radiologist. According to the first method, the segmentation of a mammogram image can be categorized as follows: successful, acceptable, and unacceptable. Segmentation results are said to be accurate with visible edge information of the entire breast when there are no undesired parts like pectoral muscle is present with breast region as mentioned in Fig. 8. The results are said to be accepted when only some edges of the pectoral muscle remain with breast region. Unaccepted results contain subset of those images that contain half or more than half part of the pectoral muscle in breast mammogram. These results are presented with example in Figs. 8, 9, 10, 11, 12, 13, 14.

Fig. 8
figure 8

Successful implementation of the proposed algorithm on mdb001 images: a original image, b edge detection using Prewitt, c operation for removing the unnecessary edges, d edge smoothness, e superimposed the edge pixel for completing the boundary, f feature mapping and g output image

Fig. 9
figure 9

Successful implementation of the proposed method on mdb012 image: a given image (original), b label removal, c edge detection using Prewitt, d operation for removing the unnecessary edges, e edge smoothness, f superimposed the edge pixel for completing the boundary, g feature mapping and h output image

Fig. 10
figure 10

Successful implementation of the proposed algorithm on mdb052 image: a original image, b label removal, c edge detection using Prewitt, d operation for removing the unnecessary edges, e edge smoothness, f superimposed the edge pixel for completing the boundary, g feature mapping and h output image

Fig. 11
figure 11

Successful implementation of the proposed algorithm on mdb104 image: a original image, b label removal, c edge detection using Prewitt, d operation for removing the unnecessary edges, e edge smoothness, f superimposed the edge pixel for completing the boundary, g feature mapping and h output image

Fig. 12
figure 12

Successful implementation of the proposed algorithm on mdb320 image: a original image, b label removal, c edge detection using Prewitt, d operation for removing the unnecessary edges, e edge smoothness, f superimposed the edge pixel for completing the boundary, g feature mapping and h output image

Fig. 13
figure 13

Acceptable implementation of the proposed algorithm on mdb002 image: a original image, b label removal, c edge detection using Prewitt, d operation for removing the unnecessary edges, e edge smoothness, f superimposed the edge pixel for completing the boundary, g feature mapping and h output image

Fig. 14
figure 14

Successful implementation of the proposed algorithm on PT28_CEDMROSE_2013–08-27_LMLO_1.2.840.113681.1377641361.40.1377641361.1 image [38]: a original image, b edge detection using Prewitt, c operation for removing the unnecessary edges, d edge smoothness, e superimposed the edge pixel for completing the boundary, f feature mapping and g output image

Performance evaluation matrix

A mammogram (Pϱ) is represented using the pixel set ρ = {ρ1, …. ρn} with |Pϱ| = row × col; where row is the width and col. is the length of the matrix on which the image is defined. Let the ground truth segmentation provided with data set is represented by \( {I}_{g\upomega}^k. \) Moreover, the overlap metrics are defined using the ground truth based segmentation using the partition \( {I}_{g\upomega}^k=\left\{{I}_{g\omega}^1,{I}_{g\omega}^2\right\} \) of Pϱ with assignment function\( {F}_{\gamma}^{\kappa}\left(\rho \right) \). The segmentation method is performed using the designated algorithm by the partition \( {I}_{\mathrm{s}\uptau}=\left\{{I}_{s\tau}^1,{I}_{s\tau}^2\right\} \) of the Pϱwith the assignment function \( {F}_{\delta}^i\left(\rho \right) \) that provides the membership of the ρ in partition\( {I}_{s\tau}^{\nu } \). These four basic cardinalities named as TPTNFP and FN are provided for each pair of a subset \( \lambda \in {I}_{g\upvarpi}^k \) and \( \eta \in {I}_{s\tau}^{\nu } \). The sum of the weighted value (ωλη) between basic cardinalities is denoted in (15) and Table 2.

$$ {\omega}_{\lambda \eta}={\sum}_{h=1}^{P_{\varrho }}{F}_{\gamma}^{\lambda}\left(\rho \right){F}_{\delta}^{\eta}\left(\rho \right),\mathrm{where}={\omega}_{11},\kern0.5em TN={\omega}_{12},\kern0.5em FP={\omega}_{21},\mathrm{and}\ FN={\omega}_{22}. $$
(15)
Table 2 Confusion matrix

In addition to TP(ω11), TN(ω12), FP(ω21), and FN(ω22), the proposed algorithm is evaluated by measuring the Hausdorff distance. This is used to observe a gap between intensity values Pϱ based on ground truth data \( {I}_{g\upomega}^k \) and intensity values based on segmented pectoral muscle \( {I}_{s\tau}^{\nu } \) is formulated as:

$$ HD\left(\ {P}_{\varrho }\ \left({I}_{g\omega}^k\right),{P}_{\varrho}\left({I}_{s\tau}^{\nu}\right)\right)=\mathit{\max}\left(\ \mathit{\min}\ \left( dist\left(\lambda, \eta \right)\right)\right), $$
(16)

where, \( \lambda \in {I}_{g\omega}^k \) \( \kern0.5em and\kern0.5em \eta \in {I}_{s\tau}^{\nu } \), and dist(λ, η) is the Euclidean distance between two points (λ, η):

$$ dist\left(\lambda, \eta \right)=\sqrt{{\left({\lambda}_1-{\eta}_1\right)}^2+\kern0.5em {\left({\lambda}_2-{\eta}_2\right)}^2\ }. $$
(17)

Performance of the proposed method is evaluated using all the above discussed performance measures which is presented below in Table 3. Total 322 images are taken from a standard benchmark dataset of the mini-MIAS and 20 images are selected from the contrast enhanced digital mammogram (CEDM) images for evaluating the proposed algorithm.

Table 3 Performance measurement of the proposed algorithm

According to the Hausdorff distance measures, the result obtained using the proposed method shows the smallest mean value 3.51 mm on the CEDM as compared to the MIAS which is 3.52 mm and considered good measurement to remove the pectoral muscle.

Discussion

Mammograms from the mini MIAS dataset is taken for quantitative evaluation of the proposed method. The rates of FP, FN, standard deviation, and the mean values of the Hausdorff distance are 0.99, 5.67, 1.59, and 3.52%, respectively. A well-known analysis of the Hough, the Gabor, and the shape based pectoral muscle segmentation methods in comparison with the proposed algorithm are presented in Table 4.

Table 4 A brief comparisons of performance of various existing method and the proposed algorithm

In automated detection of breast tumor detection, the significance of false positive rate is considered more valuable than the false negative rate. However detection results of Gabor filter based and the Hough transform based method in case of both the FP > 10% and the FN > 10% rates are higher than the proposed method. In this method, the Hausdorff distance (HD) is using the designated approach to attain the smallest mean rate as 3.52 mm on the MIAS and 3.51 mm on the CEDM. It is considered to be good measurement to eliminate the unwanted part of the pectoral muscle.

Conclusion

A novel automatic method is presented for locating the boundary of pectoral muscle. In comparison to the methods shown in the existing literature, the proposed method is not exactly based on straight line detection concept for removing the pectoral muscle. First, differentiation operator is used to detect the edge boundaries and to approximate the gradient value of intensity function. Then, an accurate edge boundary of breast body is determined. Based on the end point of the breast body edges, a convex image is generated. Finally, a convex hull function is developed to produce a topographic map by means of convex image and breast body boundary which is applied on preprocessed mammograms to eradicate the unwanted pectoral muscle. The proposed technique is applied on the benchmark MIAS dataset for a 322 mammograms and a 20 contrasts enhanced digital mammographic images in order to achieve high accuracy in varying size of pectoral muscles.