Evaluation of modified adaptive k-means segmentation algorithm

Segmentation is the act of partitioning an image into different regions by creating boundaries between regions. k-means image segmentation is the simplest prevalent approach. However, the segmentation quality is contingent on the initial parameters (the cluster centers and their number). In this paper, a convolution-based modified adaptive k-means (MAKM) approach is proposed and evaluated using images collected from different sources (MATLAB, Berkeley image database, VOC2012, BGH, MIAS, and MRI). The evaluation shows that the proposed algorithm is superior to k-means++, fuzzy c-means, histogram-based k-means, and subtractive k-means algorithms in terms of image segmentation quality (Q-value), computational cost, and RMSE. The proposed algorithm was also compared to state-of-the-art learning-based methods in terms of IoU and MIoU; it achieved a higher MIoU value.


Overview
Segmentation is the act of partitioning an image into different regions by creating boundaries that keep regions apart. It is one of the most used steps in zoning pixels of an image [1]. After segmentation, pixels belonging to the same partition have higher similarity values, but higher dissimilarity with pixels in other partitions. Segmentation is a technique used in many fields including health care, image processing, traffic image, pattern recognition, etc. According to the review in Ref. [1], image segmentation techniques can be categorized into two types: layered-based segmentation and block-based segmentation. In layered-based segmentation, the image is divided into layers such as background, foreground, and mask layers. Reconstruction of the final image is decided using the mask layer [2]. This method is not widely applicable to medical image segmentation. Blockbased segmentation divides the image into unequal blocks using attributes such as color, histogram, pixels, wavelet coefficients, texture, and gradient [1,2]. Block-based segmentation can be further grouped into methods based on discontinuity or similarity in the image. It can also be further grouped into three categories: region-based, edge-or boundary-based, and hybrid techniques [1,2].

Edge-based segmentation
The discontinuous nature of pixels characterizes all algorithms in the edge-based segmentation family [2]. In this type of image segmentation, images are segmented into partitions based on unanticipated changes in gray intensity in the image. In most cases, edge-based segmentation techniques can identify corners, edges, points, and lines in the image. However, pixel miscategorization errors are the main limitation of the edge-based segmentation category. The edge detection technique is an example of this class of segmentation method [2].

Region-based segmentation
Edge-based segmentation techniques use the discontinuous nature of pixels. However, regionbased techniques use similarity of pixels in the image. Edges, lines, and points are attributes that decide the effectiveness of region-based techniques. Algorithms like clustering, splitting and merging, normalized cuts, region growing, and thresholding belong to the region-based segmentation family [1,2]. Our main interest is in clustering algorithms. Schwenker and Trentin [3] presented traditional machine learning as supervised and unsupervised learning: supervised learning associates every observation of the samples with a target label whereas this is not the case in unsupervised learning. Clustering algorithms are very important, especially for unlabeled larger dataset classification [3]; they belong to the unsupervised category. However, there is another machine learning approach, partially supervised machine learning, which lies between unsupervised and supervised machine learning. A detailed review is given in Ref. [3].

k-means segmentation
There has been much research on image segmentation for different application areas, using various techniques from conventional and learning-based methods. Among many segmentation algorithms, k-means is one of the simplest for generating a region of interest [10][11][12]. It has a time complexity of O(n) for n samples [13]. However, it is sensitive to outliers and initialization parameters [14]. As a result, it gives different clustering results with different cluster numbers and initial centroid values [12]. Much research has considered how to initialize the centers for k-means with the intention of maximizing the efficiency of the algorithm. In the k-means clustering algorithm, each pixel belongs to only one cluster and center, so it is a hard clustering algorithm [10,11]. Some recent works in clustering methods of segmentation and deep learning based segmentation are addressed in Sections 2.2 and 2.3 respectively.

Clustering methods for segmentation
In Ref. [15], adaptive k-means clustering is introduced to ameliorate the performance of k-means. Here, the initialization parameters remain consistent for several iterations. However, the initial seed point is computed simply by taking the usual mean of all data values in the input image, making it a simple post-processing operation for good quality image segmentation.
A first attempt to ameliorate the deficiencies of k-means clustering with respect to outliers occurred three decades ago. Bezdek [16] came up with a new algorithm named fuzzy c-means (FCM) in 1981. This algorithm is a membership-based soft clustering algorithm.
Faußer and Schwenker [17] proposed an algorithm that divides the samples into subsets to perform clustering in parallel, and merges the output repeatedly. In their proposed approach they used many kernel-based FCM clustering algorithms. Two datasets (the Breast Cancer database from the UCI repository and Enron Emails) were used to evaluate their algorithm. The experimental analysis proved that the algorithm has high accuracy and works well for large real-life datasets. Benaichouche et al. [18] brought in a region-based image segmentation algorithm using enhanced spatial fuzzy FCM. Lei et al. [19] explained that traditional FCM is susceptible to noise, and describes improvements based on the addition of local spatial information. This solves the robustness problem but greatly increases the computational complexity. First, they used morphological reconstruction to smooth images to enhance robustness and then applied FCM. They also modified FCM by using faster membership filtering instead of the slower distance computation between pixels within local spatial neighborhoods and their cluster centers. The gray-level histogram of the morphologically reconstructed image is used for clustering. The median filter is employed to avoid noise from the fuzzy membership matrix generated using the histogram. The paper demonstrated that the proposed algorithm is faster and more efficient when compared to FCM and other types of modification.
Arthur and Vassilvitskii [20] introduced a new algorithm called k-means++ that improves upon the initial selection of centroids. The selection for initial clusters is started by selecting one initial center randomly. The other cluster centers are then selected to satisfy specific probabilities determined by "D 2 weighting". The probabilities are defined based on the squared distance of each point to the already chosen centers. The paper claims that k-means++ outperforms the original k-means method in achieving lower intra-cluster separation and in speed. The number of clusters is still chosen by the user. But the algorithm is faster and more effective and even provided as a library in MATLAB.
Zhang et al. [21] used Mahalanobis distance instead of Euclidean distance to allocate every data point to the nearest cluster. Using their new clustering algorithm, PCM clustering, they got better segmentation results. However, their algorithm also has high computational cost and the challenge of initializing parameters.
Purohit and Joshi [22] presented a new approach to improve k-means with aim of reducing the mean square error of the final cluster and attaining minimum computation time. Yedla et al. [23] also introduced an enhanced k-means clustering algorithm with better initial centers. They achieved an effective way to associate data points with appropriate clusters with reduced computation time compared to standard k-means.
Dhanachandra et al. [12] initialized k-means clustering using a subtractive clustering approach which attempted to find optimal centers based on data point density. The first center is chosen to have the highest density value in the data points. After selecting the first center, the potential of the data points near this center decreases. The algorithm then tries to find other centers based on the potential value until the potential of all grid points falls below some threshold. The algorithm is effective at finding centers but the computational complexity increases exponentially as the number of data points increases. The standard k-means algorithm is then initialized with these centers. Since the aim of the paper was the segmentation of medical images, which suffer from poor contrast, a partial spatial starching contrast enhancement technique was applied. After segmentation, filtering is applied to avoid unwanted regions and noise. The paper attempted to illustrate the out-performance of subtractive clustering based k-means over normal k-means. However, it failed to compare it to other methods. Subtractive clustering has a higher computational time than other clustering methods, which is the main drawback of this technique.
Küçükkülahlı et al. [24] tried to initialize k-means by finding both k and centroid locations. First, they used histogram values to determine peaks and pits. Then, by calculating the distances between adjacent peaks and pits, a vertical sweep is done to find the highest peak within some threshold distance. Horizontal sweeping is followed to group peaks that are close to each other, replacing them with a representative by calculating the mean distance and choosing peaks which are above the mean. Once k and centroids have been obtained, the standard k-means method is used for clustering. Even though the approach is dependent on human involvement for assigning the threshold for the vertical sweep, the algorithm automated the k-means method.

Deep learning in image segmentation
Recently a number of deep learning models have shown astounding results in semantic segmentation [4,25,26].
According to Ref. [26], deep learning has shown its success in handwritten digit recognition, speech recognition, image categorization, and object detection in images. It has also been applied to screen content image segmentation, and proven its application to semantic pixel-wise labeling [25]. Badrinarayanan et al. [26] proposed SegNet, a method for semantic pixel-wise segmentation of road scenes, and tested their algorithm using the CamVid road scenes dataset. They used three popular performance evaluation parameters: global accuracy, class average accuracy, and mean intersection over union (MIoU) over all classes.
Minaee and Wang [25] introduced an algorithm for segmentation of screen content images into two layers (foreground and background). The foreground layer mainly consists of text and lines and the background layer consists of smoothly varying regions. They compared their results with two algorithms (hierarchical k-means clustering in DjVu, and a shape primitive extraction and coding (SPEC) method) in terms of precision and recall values using five test images. The proposed approach scored 91.47% for precision and 87.73% for recall.
Chen et al. [4] proposed a technique that embeds multiscale features in a fully connected convolutional neural network to perform pixel-based semantic segmentation through pixel-level classification. They introduced an attention model to softly determine the weight of multi-scale features at each pixel location. They trained the FCN with multiscale features obtained from multiple resized images using a shared deep network. The attention model played the role of average pooling and max-pooling. Besides feature reduction, the attention model overcame one of the challenges of deep learning: it enabled the authors to visualize the features at different positions along with their level of importance. They proved the effectiveness of their approach using three datasets: PASCAL-Person-Part, VOC 2012, and MS-COCO 2014.
As reviewed by Minaee and Wang [27], various algorithms are in use to separate text from its background. Approaches include clustering-based algorithms, sparse decomposition based methods, and morphological operations.
In their paper, they proposed an alternating direction method of Lagrange multipliers (ADMM) for this problem. They adopted the proposed algorithm to separate moving objects from the background. In a comparison made with the hierarchical k-means approach and sparse decomposition, their proposed method scored higher precision (95%), recall (92.5%), and F 1 (93.7%) values. The sparse decomposition approach was proposed by themselves in Ref. [28].

Dataset
Images used in this paper are from different sources. Some are from the MATLAB image database and some from the Berkeley segmentation datasets (BSD) [29]. The same images were used in

Proposed approach
Our proposed segmentation approach is convolution based, as indicated in Fig. 1. First, the histogram First, histograms of the grayscale image of the original image are generated. Second, amplitude thresholds, T p, are computed using the histogram levels. Third, the dynamic window size is computed using an amplitude threshold for each image. This is followed by a 2D convolution operation. Finally, the mean of the convolution is set as the initial seed to generate other new seed values that can be used as the centers of clusters, which are then used to perform clustering. distribution (a li ) of the grayscale image is generated for each image. Second, out of the generated histogram values we select only those with a li 1. Third, we computed the ratio of the sum of the selected histograms to the number of occurrences (l 0 ) of such histogram values to obtain the amplitude threshold (T p). Fourth, we added the histogram values T p and divided by α to get a window size. See Algorithm 1; α is computed as indicated at the end of Algorithm 1. Finally, the convolution operation is performed and the result is converted to an array to compute the mean value used as the initial seed point of the convolution-based modified adaptive k-means (MAKM) segmentation algorithm. The parameters used in the proposed algorithm are constant for each image and the segmentation result is consistent for a number of iterations which is not true for the other clustering algorithms (FCM, HBK, SC, and k-means++) used for comparative analysis. The pseudocode of the proposed algorithm is given in Algorithm 2.

Algorithm 1 Pseudocode for window size generation
Get input image and convert to gray image Image = readimg() if (channels(Image) 3) then grayimage = rgb2gray (Image) else grayimage = Image end if Calculate the Amplitude Histogram Threshold Hist = histogram (grayimage)

Algorithm 2 Pseudocode for modified adaptive k-means
Input: window size w and gray image Perform Convolution operation gray=conv2(gray,ones(w)/(w 2 ),'same') Place the convolution result in an array array = gray(:) Initialize iteration Counters: i = 0, j = 0 while true do Initialize seed point, seed = mean(array) Increment counter for each iteration, i = i + 1 while true do Initialize counter for each iteration, j = j + 1 Compute distance between seed and gray value dist = sqrt((array−seed) 2 ) Compute bandwidth for cluster center distth= sqrt(sum((array−seed) 2 )/numel(array)) Check values are in selected bandwidth or not qualified = dist<distth Update mean newseed = mean(array(qualified)) condition for termination if (seed = newseed or j>10) then j = 0 Remove values assigned to a cluster Store center of cluster center(i) = newseed break end if Update seed: seed = newseed check maximum number of clusters if (isempty(array) or i > 10) then Reset counter: i = 0 break end if end while Sort centers Compute distances between adjacent centers Find minimum distance between centers Discard cluster centers less than distance Make a clustered image using these centers end while
The Q-value measures image segmentation quality taking into consideration both small and large regions in the final segmented images. The Q value evaluation function used in this paper is given by where N and M are the numbers of rows and columns in the image respectively, R is the total number of regions in the segmented image, e i is the color difference between the original and segmented image, where r(x, y) and t(x, y) are grayscale values at position x, y in the raw and segmented images. RMSE is the square root of the MSE [33,34]. RMSE = √ MSE (4) A smaller value means higher image segmentation quality.
For the VOC2012 challenge datasets, popular performance evaluation parameters include global accuracy, class average accuracy, and mean intersection over union (MIoU) for all classes [26]. Global accuracy is the percentage of pixels correctly classified in the dataset whereas the mean of the predictive accuracy over all classes is class average accuracy. In this paper, we use MIoU to compare the performance of our algorithm with learning-based segmentation methods. MIoU is defined as MIoU = IoU/N (5) where intersection over union (IoU) is defined in Eq. (6) and N is the number of objects considered from the dataset for a particular experiment. where area of overlap is the area between the predicted bounding box and the ground-truth bounding box, and area of union is the area covered by both the predicted bounding box and the groundtruth bounding box.

Results and discussion
The outcomes of the experiments we conducted using the proposed technique show that gray image segmentation task can be carried out efficiently while initialization of parameters is done automatically. All experiments were performed in MATLAB version 2016b and run on a 3.00 GHz Intel Core i7-4601M CPU, under the Microsoft Windows 10 operating system. The performance of the proposed segmentation algorithm was evaluated using RMSE,       MRI  128×128  3  3  4  2  3  3  Bag  250×189  3  3  4  4  5  4  Cameraman  256×256  3  3  4  3  4  4  Experiment-1  Coins  246×300  3  3  4  2  4  3  Moon  537×358  3  3  4  2  4  4  Pout  291×240  3  3  4  2  3  4  Glass  181×282  3  3  4  5  3    the overall achievement of the proposed modified adaptive k-means is superior to other clustering algorithms in terms of image segmentation quality (Q-value), computational cost, and RMSE. For further analysis, we considered additional images from the VOC2012 challenge dataset and mammography images from Bethezatha General Hospital (BGH) and MIAS.
Four randomly selected images (dog, airplane, plant, and person) from VOC2012 were used to compare the proposed algorithm to three clustering algorithms (AKM, FCM, K++) in terms of Q, time, MAE, E, and PSNR. For all images our proposed algorithm scored better for Q, computation time, and PSNR compared to other clustering algorithms, but not for MAE and entropy: see Table 8. In the case of the "person" image, our proposed algorithm scored minimum MAE compared to other algorithms, indicating good performance for this particular image.
Comparative performance of the proposed algorithm for two randomly selected MRI images is  given in Table 9. The proposed algorithm performs better in terms of MSE, Q, and computation time for both MRI images. However, the second MRI recorded a higher IoU value than the first image, as  Table 10. Segmentation results for the second MRI image are given in Fig. 11. A comparison of the proposed algorithm with learning-based and clustering algorithms is presented in Table 11. The comparison terms of IoU and MIoU indicate that the proposed algorithm scored higher IoU and MIoU for plant and person images, but for the dog and plane images, k-means++ scored Table 10 Comparison of proposed algorithm with clustering algorithm in terms of IoU and MIoU for two MRI images  . 8 Examples of annotated and extracted region with cancer for breast mammographic images from BGH and MIAS datasets using proposed method.

Conclusions
In this study, we presented a convolution-based modified adaptive k-means algorithm, to get the best out of the normal k-means method during image segmentation. Firstly, an automatic window size generation approach was designed to perform the convolution process to get the central value for every convolution step, and the mean of these values is assigned as the initial seed point. Then, using this seed point, the cluster centers and number of clusters are determined as initial parameters and given to the adaptive k-means algorithm. A comparative analysis of the proposed modified adaptive k-means with K++, HBK, and SC methods was made in terms of image segmentation quality (Q), RMSE, and time.
The results obtained confirmed the advantages of our proposed modified adaptive k-means algorithm.  Furthermore, an objective comparison of the proposed modified adaptive k-means algorithm with another soft clustering algorithm, FCM, also proved the advantages of our proposed technique.
To evaluate the robustness of our algorithm we ran additional experiments using the VOC2012 challenge dataset and MRI images, comparing the proposed segmentation algorithm with learning-based methods in terms of IoU and MIoU. They found that our algorithm outperforms learning-based methods for the VOC2012 challenge dataset.
In work, we hope to apply our method to breast cancer image analysis. After segmentation, texture features (quantized compound change histogram, Haralick descriptors, edge histogram MPEG-7, Gabor features, gray-level c-occurrence matrix, and local binary patterns) and shape features (centroid distance function signature, chain code histogram, Fourier descriptors, and pyramid histogram of oriented gradients) can be extracted and used as input to various classifiers to distinguish between normal and abnormal mammograms. Friedhelm Schwenker is a senior lecturer and researcher at the Institute of Neural Information Processing, Ulm University, Germany. His research interests are in artificial neural networks, pattern recognition, data mining, and affective computing.
Samuel Rahimeto is an M.Sc. student at Addis Ababa Science and Technology University, Ethiopia. His research interests are in digital image processing and machine learning.
Dereje Yohannes is a director of the Artificial Intelligence Excellence Center. He holds M.Sc. and Ph.D. degrees in computer engineering. His main research interests are in network security and wireless. He is also working in the area of data mining and machine learning.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.