1 Introduction

Automatic inspection of fruits is the subject of many grading and sorting systems to decrease production costs and increase the quality of the production in the agro-industry. In the packing lines, where most external quality attributes are currently inspected visually, machine vision systems are powerful tools for performing this task automatically. These systems not only substitute human inspection but also improve its capabilities that go beyond the limited human capacity to evaluate long-term processes objectively or to appreciate the events that take place outside the visible electromagnetic spectrum that the human eye unable to see [1]. Such a system was developed in [2] for automatic grading of date fruits with the use of digital reflective near-infrared imaging.

The goal of many fruits’ inspection systems based on computer vision is to extract features of the fruits of interest and relate them with the quality which is normally associated with the absence of defects on fruit peel. This makes the task of detection of defects present on fruit peel the target of many researches such as automatic citrus skin defect detection in [3] using a multivariate image analysis; detection of blemish in potatoes in [4], or in-line detection of apple defects in [5]. The core technique in this task is always related to image analysis and processing which is largely dependent on the segmentation procedure.

Image segmentation is usually the first step in detecting the flaws in fruits and its result mainly affects the successive stages in the process. It is a process of partitioning an image into some regions such that each region is homogeneous and none of the union of two adjacent regions is homogeneous [6]. In general, the automated segmentation is one of the most difficult tasks in the image analysis, because a false segmentation will cause degradation of the measurement process and therefore the interpretation may fail [7]. Variety approaches of segmentation for fruit defects detection have been developed in the literature. The existing techniques can be categorized into four classes ([6]): edge-based approaches, clustering-based approaches, region-based approaches, and split and merge approaches. However, as stated in [8], histogram-based thresholding is still the most referenced among segmentation methods.

Thresholding method is based on a threshold value to turn a gray-scale image into a binary image. The key of this method is to select a single threshold value or multiple-levels of thresholds. Riquelme et al. [8] and Liming and Yanchao [9] used simple Otsu’s method to separate fruit area from background. Mery and Pedreschi [7] proposed an estimation for a global threshold using statistical approach and morphological operation to segment food images. Blasco et al. [10] used different thresholds for color channels in RGB color space to separate different categories of objects of interest and the background.

Most edge-based approaches use a differentiation filter to approximate the first-order image gradient or the image Laplacian to detect image edges and then, candidate edges are extracted by thresholding the gradient or Laplacian magnitude. Lopez [11] used a boundary detection method to segment defect areas in citrus based on Sobel gradient mask, and then boundaries of objects of interest were identified using neighborhood and gradient thresholds. A threshold for the Sobel operator was used in [2] to adjust the sensitivity of skin delamination detection. The main advantage of this approach is short computation time. However, these approaches suffer from serious difficulties in setting appropriate thresholds and producing continuous, one-pixel-wide contours [6].

In clustering-based approaches, image pixels are clustered according to their intensities or colors based on a pre-defined number of clusters. The number of obtained regions is usually greater than the cluster number because the location of pixels in the same cluster may not be adjacent. Several clustering-based approaches have been proposed, such as k-means or fuzzy-c-means (FCM). The main advantage of these approaches is that the difficult threshold setting problem could be avoided using iterative processes though it depends much on the number of clusters and the initial clusters. Moreover, the segmented contours are always continuous and one-pixel wide [6]. Because of the lack of using local properties of pixels, it may occur an over-segmentation problem. Usually, a merge process is further applied for solving this problem.

Region-based approaches are available because the segmented contours are always continuous and one-pixel wide. The goal in region-based approaches is detection of regions that satisfy certain pre-defined homogeneity criteria [6]. A region-oriented segmentation in [12] was proposed for detecting most common peel defects of citrus fruits and was able to correctly detect 95 % of the defects. The difficulty of the region-growing approach is to set a threshold which is sensitive in measuring the similarity. A similar watersheds’ method such as flood algorithm was employed in [5] and [13] for apple defect detection. However, different similarity threshold settings may lead to different segmentation results and may cause the over-segmentation problem.

In split and merge approaches, the input image is first sub-divided into a set of homogeneous primitive regions. Then, similar neighboring regions are merged according to certain decision rules. Several split approaches are available, such as pyramidal segmentation, watershed ([14]), FCM, and k-means, which usually produce over-segmentation results. Region adjacent graph (RAG) and nearest neighbor graph (NNG) are both available structures for merge process ([15]). RAG and NNG are usually applied with a greedy process for merging adjacent regions, until a pre-defined stop condition is satisfied.

The increase in computer power at affordable prices and the introduction of multiple core processors allow to process complex images in a short time and to use more complex algorithm. In this paper, based on split and merge approach, we proposed a new hybrid algorithm that uses k-means clustering and Graph-based technique for splitting and merging image regions to segment fruit images. This combination is an efficient approach to employ the local and global characteristic of the image intensities as shown in the experimental results.

The objective of this work is to develop a general algorithm to effectively segment objects in images to facilitate fruit defect detection. For splitting an image, a segmentation scheme using k-means clustering is used to over-segment the original image because it is known to give a good segmentation result and time efficiency. Region Adjacency Graph (RAG) is then used to represent region structure to facilitate the merge procedure where similar regions are iteratively merged into new homogeneous ones based on minimum spanning tree algorithm. The approach does not aim at all kinds of fruits. We just want to contribute one more choice among a variety of algorithms for the segmentation task. The selection of suitable algorithm is application dependent. The difficulties for fruit defect segmentation are not only kinds of fruits but also defect types of each kind of fruits. Some kinds of defects cannot be detected by non-destructive or appearance-based methods. To have agricultural expert knowledge is essential for algorithm design for a specific application. For academic research in the field of computer vision, our approach just limits at proposing a new hybrid algorithm for the segmentation task that hopefully can be applied successfully for some kinds of specific fruits and the fruits used in our experiments are just for illustration.

The rest of the paper is organized as follows: The proposed hybrid method is detailed in Sect. 2. In Sect. 3, system setup to acquire fruit images and experimental results are described. The conclusion is in Sect. 4.

2 Segmentation method

There exist several image analysis methods for defect detection, including global gray-level or gradient thresholding, simple background subtraction, statistical classification and color classification [16]. Blemish segmentation is a difficult problem, because various types of blemishes with different size and extent of damage may occur on fruit surfaces [13]. In general, thresholding method may be failed in case of a slightly discolored blemish on a light colored surface.

For image segmentation, split and merge approach is an efficient approach to employ local and global characteristics of color intensities of an image. The method subdivides an image initially into a set of arbitrary and disjoint regions by a fast over-segmentation algorithm which produces regions as parts of objects of interest. Then, those regions are iteratively merged until satisfying the homogeneous condition or when no further merging is possible. An important characteristic of graph-based method as stated in [17] is its ability to preserve detail in low-variability image regions while ignoring detail in high-variability regions.

Because of noise and high variation in the original image, the obtained image after applying the k-means algorithm may include many small regions that should be merged to nearby regions to speed-up the merge procedure. So we can filter out these small regions using a threshold of region size. In our algorithm, small regions or regions with size smaller than a pre-defined constant will be merged to the biggest nearby region.

The overall process is shown in the flowchart in Fig. 1.

Fig. 1
figure 1

Flowchart of the proposed algorithm

2.1 Split procedure

k-means is used in split procedure to produce initial regions. k-means is one of the simplest algorithms that solve the clustering problem. The k-means algorithm involves grouping pixels together whose feature vectors are close together. Here, it is used to group pixels in an image into a specified number of clusters which are then separated into various small regions based on adjacency relation.

The dimension of feature vectors depends on the number of color channels used. Lee [6] and Riquelme et al. [18] used gray-level based k-means for segmenting images. However, different color spaces such as RGB or Lab can be used to get more accurate result. In this paper, \(L^*a^*b^*\) or CIE-Lab color space is used for k-means clustering. \(L^*a^*b^*\) color space has an advantage of more perceptual uniformity than other spaces like RGB. Uniform changes of components in the \(L^*a^*b^*\) color space aim to correspond to uniform changes in perceived color, so the relative perceptual differences between any two colors in \(L^*a^*b^*\) can be approximated by treating each color as a point in a three-dimensional space (with three components: \(L^*, a^*, b^*\))\(.\) The distance between two colors can be measured by Euclidean distance.

The split algorithm is composed of the following steps:

Step 1: Place \(K\) points into the space represented by the objects (pixels) that should be clustered (according to \(K\) cluster). These points represent initial group centroids \(Z_{k}\).

Step 2: Assign each object (pixel) to the group that has the closest centroid

$$\begin{aligned} x\; \in \;C_{i} \;\;\;\;\;\;\;\;{\text {if }}\mathrm{dist}(x,Z_{i} )\, \le \,\mathrm{dist}\left( {x,\,Z_{j} } \right) ,\\ {\text { for}}\;j\, = \,1, 2, \ldots , K(i\, \ne \,j) \end{aligned}$$

where \(C_{i }\) is the \(i\)th cluster with the centroid \(Z_{i}.\)

Step 3: When all objects have been assigned, recalculate the cluster center \(Z_{k}\)- for all cluster \(C_{k}\) using the equation:

$$\begin{aligned} Z_k \;=\;\frac{1}{\left| {C_k } \right| }\sum \limits _{s\in C_k } {X_s ,\;\;\;\;k=\,1,\ldots ,\,K} \end{aligned}$$

where \(s\) is a member of \(C_{k}\), \(X_{s}\) is the feature vector of \(s\), and \(\vert \)C\(_{k}\vert \) is the number of members in C\(_{k}.\)

Step 4: Repeat Steps 2 and 3 until the centroids no longer move (\(Z_{k}\) remains unchanged).

The selection of initial cluster centers is very important, and different sets of initial centers cause different results. In this paper, we divide the channel color range into \(K+1\) sections, and the end point of each section is chosen to be a component of the center. The initial cluster centers \( {\varvec{Z}}_{{\varvec{i}}} = ({\varvec{Z}}_{{\varvec{i}}}^{{\varvec{L}}} ,{\varvec{Z}}_{{\varvec{i}}}^{{\varvec{a}}} ,{\varvec{Z}}_{{\varvec{i}}}^{{\varvec{b}}} ) \) are determined as follow:

$$\begin{aligned} {\varvec{Z}}_{{\varvec{i}}}^{{\varvec{C}}} = {\varvec{C}}_{\mathbf{min }} + {\varvec{i}}\frac{{{\varvec{C}}_{\mathbf{max }} - {\varvec{C}}_{\mathbf{min }} }}{{{\varvec{K}} + 1}}\;\;~{\text {for}}\;i\, = \,1,\,2,\,..\,K \end{aligned}$$

where \(C\) is a color channel (\(L^*, a^*,\) or \(b^*),\) \(K\) is the preferred number of clusters. \(C_\mathrm{max}\) and \(C_\mathrm{min}\) are, respectively, the maximum and minimum value of the color channel \(C.\)

The behavior of k-means algorithm is influenced by the specified cluster number K. Experiments in [19] showed that the proper number of clusters for most images is \(K = 4.\) After splitting, the number of segmented regions may be larger than the cluster number \(K\) because pixels in a cluster may not be adjacent.

The result after applying the k-means algorithm can include many small regions that are out of interest. So we can filter out these small regions using a threshold of region size. Small regions or regions with size smaller than a pre-defined constant will be merged to the biggest nearby region.

One limitation of k-means algorithm is that during the space partitioning process the algorithm does not take into consideration the local connections between the data points (color components of each pixel) and its neighbor. This fact will restrict the application of clustering algorithms to complex color-textured images since the segmented output will be over-segmented. Then, a merge procedure is taken into account to obtain the final segmentation.

2.2 Merge procedure

2.2.1 Build region adjacent graph (RAG) of regions

To merge over-segmented regions into better ones, a graph called Region Adjacent Graph is built. RAG is a usual data structure used to represent region neighborhood relations in a segmented image. It is a weighted unidirectional graph G = {V, E, W}, where each node in \(V\) represents a region of the over-segmented image and each edge in \(E\) is a symmetric dissimilarity function between adjacent regions. The weight set W of edges is defined by taking into account the mean value of brightness.

Figure 2a, b shows an example of a synthetic image and its corresponding RAG.

Fig. 2
figure 2

Region adjacent graph. a Image regions, b corresponding RAG

RAG-based methods only consider local information for the region-merging task. As stated in [17], the image partitioning task is inherently hierarchical and it would be appropriate to develop a top-down segmentation strategy which returns a hierarchical partition of the image instead of a flat partition. The proposed method shares this perspective with a tree-based image bipartition using minimum spanning tree (MST).

2.2.2 Merge regions

Regions with similar intensity are desired to be merged. One of the issues in merging regions is the order of regions to be merged. In graph-based segmentation, this order is determined through the technique used in MST.

The use of Kruskal’s algorithm to build minimum spanning trees for segmentation reflects the local properties of the image. A predicate is defined for measuring the merging criteria and the algorithm makes greedy decisions to produce the final segmentation.

After RAG is built, regions are merged when the following simple condition is satisfied:

$$\begin{aligned} \left| {\frac{1}{\left| {R_i } \right| }\mathop \sum \limits _{p\in R_i } I_p -\frac{1}{\left| {R_j } \right| }\mathop \sum \limits _{p\in R_j } I_p } \right| <T \end{aligned}$$

where threshold \(T\) is a global threshold parameter for dissimilarity between regions, \(p\) is an pixel in region \( {\varvec{R}}_{{\varvec{i}}} \) or \({\varvec{R}}_{\varvec{j}} \) and \({\varvec{I}}_{\varvec{p}} \) is the intensity value of \(p.\)

For fruit defect detection, the defects usually appear in darker color compared with the normal skin of fruits. The selection of threshold \(T\) depends on the contrast of the defects and the skin of fruits. Then image intensities are taken into account in merge procedure to deal with the image contrast and overcome the lighting effect on fruit peel.

Two neighboring regions should be merged if the new combined region is homogeneous. Consequently, each region is anticipated to be as large as possible under the merge condition. Then, the total number of regions is reduced as shown in experimental results.

3 Experimental results

The system setup to acquire fruit images for the experiments is shown in Fig. 3. The fruit to be taken was placed in the center of a box whose inner surface is flat white to provide a diffuse and uniform light. To provide uniform lighting, four fluorescent tubes were used as the light source. A Canon CCD camera (from Japan), with a lens of 2.8–4 focal length, was mounted 25 cm above from the fruit. Tubes were positioned to provide uniform and good illumination, from which the shadow of fruit can be removed. To obtain images showing entire fruit surface the fruit was manually rotated, by which we can check whether the defects exist or not. The pixel resolution was set at 2,353 \(\times \) 1,568, and then the acquired images were resized at the width of 256 pixels to be the input for the experimental program.

Fig. 3
figure 3

Experimental setup for mage acquisition.

For k-means algorithm, \(k = 4\) was used as it is known to yield good results in segmentation. Other parameters including minimum size of the regions and the threshold value used in merge process were also set in each test.

The final results were evaluated by human inspection on objects in the images. The number of regions and better segmentation in the terms of human observation were considered for evaluation.

For experiment, we employed some images that are commonly used in segmentation test and also some fruit images. The segmented regions in result images are re-colored randomly to aid the observation.

Table 1 compares the result of three related segmentation algorithms: Graph-based segmentation, k-means segmentation and the proposed method. Because k-means segmentation creates an over-segmentation, the dramatically decrease in the number of regions in the proposed method together with good results in observation shows that it can improve the k-means segmentation, even though it is a little more time consuming. Figures 4 and 5 show some very good results in the terms of vision assessment on common images and fruit images.

Fig. 4
figure 4

Segmentation results for common images: the tested images (left), segmented results by k-means algorithm (middle) and segmented results by the proposed method (right).

Fig. 5
figure 5figure 5

Segmentation results for fruit images. fruit images (left), results by k-means (middle), and results by the proposed algorithm (right)

Table 1 Comparison of three segmentation methods

In comparison with graph-based segmentation, the proposed method is more efficient in processing time as shown in Table 1. This is because the merging process in the proposed method starts with the regions obtained from split procedure instead of individual pixels used in graph-based technique. In addition, the processing time of the graph-based method seriously depends on the number of pixels in the image. Hence, the graph-based segmentation showed the most time-consuming result in the experiment, especially on large-sized images.

To illustrate the efficiency of using \(L^*a^*b^*\) color space on the method, two illustrations for the k-means algorithm were carried out: k-means using gray image and k-means using \(L^*a^*b^*\) color image. The merge process is the same for the two illustrations. Figure 6 shows some results of the illustrations for different color spaces. The results in using \(L^*a^*b^*\) color space are more accurate than those in using gray levels. However, the case of using gray levels has an advantage in computational time.

Fig. 6
figure 6

Comparison of the method for gray-scale images and for \(L^*a^*b^*\) color images. The fruit images are on the first column, results of gray-level image are in the second column, and results in using \(L^*a^*b^*\) color space are in the last column. (Fruit images were from [9])

The proposed method has some advantages in comparison with Otsu’s method. The result from Otsu’s method is a binary image with scattered pixels which belongs to defects, while the proposed method gives continuous contour regions. One illustration for the better quality of the proposed method is shown in Fig. 7. Moreover, to find the defects by Otsu’s method, the background in fruit image should be previously removed. For not too complex background, the proposed method can be used to detect background and defects once the result image was obtained.

Fig. 7
figure 7

A comparison with Otsu’s method. a Fruit image, b segmented result by Otsu’s method, c segmented result by the proposed method

4 Conclusion

This paper has introduced a split and merge approach for image segmentation that aims to detect defects on fruit peel. The method firstly over-segmented the original image with k-means clustering technique in \(L^*a^*b^*\) color space. Small regions then were filtered out by merging to the regions nearby. Next, RAG was built based on regions obtained from previous stage to serve the merging process. Regions are iteratively merged based on minimum spanning tree technique. The experiments showed that the proposed method is faster than graph-based algorithm because of using regions resulted from split procedure as the initial universe instead of pixels and higher quality than k-means only method. For a general image segmentation that aims to the quality of the segmentation, the method used \(L^*a^*b^*\) color space in k-means algorithm implementation to obtain more accurate results as shown in the experiments. Moreover, the proposed method is superior to Otsu’s method in the terms of continuous contour and better quality. The method can be improved to apply to different kinds of fruits by selecting more suitable starting centroids for k-means or the stop condition of merging. Fully automated segmentation has an important meaning in real applications. We propose as a future work the improvement of merging condition without parameters.