Keywords

1 Introduction

It is well known that China is a big agricultural country with a long history and fruit and vegetable production plays an important part. While in fruit and vegetable production, harvesting accounts for about 40 % of the whole work, which means using robots to harvest fruit and vegetable automatically can not only reduce labor cost and damage rate of fruit picking but also improve harvesting efficiency. American scholars Schertz and Brown proposed to have robots harvest fruit and vegetable in 1960s, since then, a variety of fruit and vegetable picking robot have been studied widely. However, harvesting robots still can not be put into practice because of low recognition and picking rate and recognition problems like occlusion and overlapping fruits [16].

As apple growing attitude is varied, the images captured by apple harvesting robot are also various, such as single apple, overlapping apples, foliage occlusion and so on. Research is relatively mature on the identification and location of single fruit [79]. However, the identification and location of overlapping apples is still a problem which restricts the development of harvesting robot. To solve this problem, numerous scholars have carried out a lot of researches on the identification and location of overlapping images and some progress has been made [1017]. Among them, Petros [10] used watershed transform and gradient paths to segment touching and overlapping chromosomes. After testing on 183 Multiplex Fluorescence In Situ Hybridization images, the success rate for touching chromosomes is 90.6 % and for overlapping chromosomes is 80.4 %. Julio [11] proposed to use ellipse to approximate the leaf shape so that the complexity of it is simplified. After that, active shape models were used to recognize the clustering of the leaves. Shape model of experimental plants with 2, 3 and 4 leaves were tested and the results indicated that this method was able to identify overlapping leaves if less than 32 % of the area is overlapped. Xu [12] used histogram of oriented gradients (HOG) descriptor associated with a support vector machine (SVM) classifier to detect slightly overlapping strawberries. However, these studies are mostly under static condition and use single frame processing method, the process of harvesting robot is dynamic, so these studies can not be fully applicable to the dynamic picking of robot during moving. For dynamic recognition, many scholars begin to pay attention and gratifying results have been achieved [18, 19]. For example, Lv [18] made a preliminary research on dynamic recognition of fruits. The results indicated that the correlation of images could effectively reduce processing time. But overall, it seems that there is still much room for the improvement of dynamic recognition of overlapping fruit.

This study mainly focuses on dynamic identification and location of overlapping apples. Firstly, improved Otsu threshold algorithm is used to segment the image, after that, the image is processed by morphology so that relevant features can be extracted. Then, the center and radius of overlapping fruit are determined by local maxima method and the template can be extracted according to them. Finally, robot motion path is predicted by combining software and hardware. NCC method is used to make sure that tracking target is the same one and to accurately locate the target fruit so that the overlapping fruit can be tracked dynamically. A total of 11 images which are continuously taken by harvesting robot at a constant speed in natural scene are tested and the results show that the new method can locate the center and radius of overlapping apples correctly. Additionally, processing time of the improved method is reduced by 48.1 % compared with the original one, which means real time of harvesting robot is improved and it can better meet practical requirements.

2 Thought of Fast Tracking and Recognition

Fruit tracking recognition is based on a series of image sequences. Dynamic characteristics of fruits are obtained by searching for the correlation and difference of a range of image sequences, so that the subsequent movement can be predicted and the next range of image processing can be estimated, which reduces the time of robot recognition and improves the picking efficiency. The difficulty of tracking and recognizing overlapping fruits is to determine the center and radius of each fruit. In this paper, a method based on maxima is used to overcome this problem.

The process is shown as follows:

  1. Step 1

    Collect eleven overlapping apple images continuously;

  2. Step 2

    Segment the images by improved Otsu color difference segmentation algorithm;

  3. Step 3

    Perfect the image by mathematical morphology operation;

  4. Step 4

    Determine the centers by local maximum method;

  5. Step 5

    Determine the radii by minimun distance from the center to the edge of contour;

  6. Step 6

    Extract the template according to the centers and radii;

  7. Step 7

    Predict the motion path by least square method;

  8. Step 8

    Extract subsequent processing region according to the centers and radii;

  9. Step 9

    Conduct normalized cross correlation (NCC) fast matching algorithm for image matching.

3 Image Segmentation

3.1 Otsu Segmentation Algorithm

Overlapping fruit segmentation is a major difficulty of tracking and recognizing because segmentation result has a critical influence on subsequent steps. Although fixed threshold segmentation has the advantage of real-time. Considering that the color of fruits and background varies a lot, Otsu method based on the color characteristics is used here.

Otsu method [20] was proposed by Japanese scholar Otsu, the basic idea is dividing the image into two categories by a threshold, the optimal threshold is determined according to the variance between class of these two categories and the optimal threshold is maximal variance between class. For a specific image, set the gray-scale range of the image f(x,y) to be [0, L-1], the probability of each gray-scale is Pi. The image is divided into C0 and C1 (namely background part and foreground part). After setting a threshold t, the probability of the foreground is \( \omega_{0} = \sum\limits_{i = 0}^{t} {P_{i} } \), and background is \( \omega_{1} = 1 - \omega_{0} \), the average gray value is \( \mu_{0} \) and \( \mu_{1} \) respectively, while total average gray value is \( \mu = \omega_{0} \mu_{0} + \omega_{1} \mu_{1} \), the variance between class of these two parts is

$$ \sigma^{2} ({\text{t}}) = \omega_{0} (\mu_{0} - \mu )^{2} + \omega_{1} (\mu_{1} - \mu )^{2} = \omega_{0} \omega_{1} (\mu_{0} - \mu_{1} )^{2} $$
(1)

The t which makes \( \sigma^{2} (\text{t}) \) the maximal value is the optimum threshold T which separates the target and background.

Although Otsu method can separate the most of the target from the background, it will easily cause over-segmentation phenomenon.

3.2 Improved Otsu Segmentation Algorithm

Although Otsu color difference segmentation algorithm can segment the apples from background to a certain extent, but it will easily cause over-segmentation or under-segmentation phenomenon. For example, Fig. 1(b) appears over-segmentation phenomenon. To overcome this problem, improved Otsu color difference segmentation algorithm is used in this paper. The main content of this algorithm is to stretch or shrink R component in the image by applying gamma conversion to it, so that the difference between R and G component in the image can be increased.

Fig. 1.
figure 1

Otsu segmentation

Gamma conversion is a nonlinear gray-scale transformation, its formula is shown as follows:

$$ y = (x + K_{esp} )^{\gamma } $$
(2)

Where \( x \in [0,1] \); \( y \in [0,1] \); \( K_{esp} - \) compensation coefficient; \( \gamma - \) gamma coefficient.

  • When \( \gamma > 1 \), the contrast in high gray-scale region is enhanced and the contrast in low gray-scale region is reduced;

  • When \( \gamma = 1 \), the image is the same;

  • When \( \gamma < 1 \), the contrast in low gray-scale region is enhanced and the contrast in high gray-scale region is reduced;

Figure 2 shows the results after Otsu color difference segmentation. In Fig. 2(a), \( \gamma \) = 0.5; in Fig. 2(b), \( \gamma \) = 1; in Fig. 2(c), \( \gamma \) = 1.5. After several experiments, segmentation works best when \( \gamma \) = 0.68.

Fig. 2.
figure 2

Improved Otsu segmentation

3.3 Image Improvement

There may be some holes in the calyx of apple after segmentation because the color of calyx and body of apple varies greatly. Additionally, there may also be some other holes, burrs, noise, etc. Therefore, mathematical morphology operation [21, 22] should be conducted after segmentation to perfect the image and de-noising.

The basic idea of mathematical morphology is using structural elements with a certain form to measure and extract the corresponding shape in the image, only image features which are similar to the structural elements are kept. Specific steps are:

  • Step 1 Firstly, the image should be dilated with a 1 radius disk shaped structural elements, so boundary points are expanded and some holes are filled;

  • Step 2 Then floodfill operation is used to fill the holes remained;

  • Step 3 After that, the maximum connected region should be obtained so that isolated burrs are removed;

  • Step 4 Finally, image erosion is operated to eliminate the noise around the boundary.

The results of perfection are shown in Fig. 3.

Fig. 3.
figure 3

Apple image perfection

Figure 3 shows that apples are almost split from the background after dilation-hole filling-largest connection region getting-erosion and the result is satisfying.

4 Template Extraction

Apples are almost separated from the background after automatic threshold segmentation, each center and radius of overlapping apples should be obtained after that so that matching template of apples can be extracted from the image.

4.1 Determine the Center

The center is determined according to the maximum of minimum distance between the point within a circle and the edge of contour. However, calculating all the points within the circle will certainly take up a lot of memory which causes poor real-time performance, so an improved method is used to scan the points within the circle.

Define four scanning direction: A(x+,y+), B(x−,y+), C(x−,y−), D(x+,y−):

For point E(m,n) which is inside the contour of a image, comparing the distance of its left (m − 1, n) and lower (m, n − 1) points in direction A; comparing the distance of its right (m + 1, n) and lower (m, n − 1) points in direction B; comparing the distance of its right (m + 1, n) and upper (m, n + 1) points in direction C; comparing the distance of its left (m − 1, n) and upper (m, n + 1) points in direction D.

Minimum distances are obtained by the examination of the points within the outline in comparison to its four-neighborhood and the minimum distance function is composed by these distances. The three-dimensional surface chart is shown as Fig. 4.

Fig. 4.
figure 4

Three-dimensional map of minimum distance function

In Fig. 4, the two maxima of the minimum distance are marked with red circles, which are where the two centers of the overlapping apples are.

4.2 Determine the Radius

The radius can be determined by the center of circle, however, it can not be determined only according to the maximum distance from the center to the edge of the outline Firstly, work out the distance from the center to the edge of the contour; then the minimum distance is used as the radius. The steps involved are shown in Fig. 5.

Fig. 5.
figure 5

Flow chart of radius determination

4.3 Template Extraction

The templates which are intercepted on the basis of the center, radius and a certain reserve value are shown in Fig. 6.

Fig. 6.
figure 6

Template extraction

It can be seen from Fig. 6(b) that the centers and radii can be found accurately by above method. After that, the templates which are used in the subsequent experiments are extracted according to the center, radius and a certain reserve value.

5 Matching Recognition

5.1 Robot Motion Path Anticipation

In order to reduce the time of image processing and accelerate the speed of robot picking, motion path is predicted according to the center of apples in collected images and the location of centers in a series of images is used to narrow the processing scope of subsequent image.

Due to the complication of overlapping fruits, the case of two overlapping apples is studied in this paper. The processing steps of two overlapping apples are as follows:

  1. Step 1

    Determine the centers of two overlapping fruits in the images respectively by the method mentioned in Sect. 4.1. Then robot motion path is fitted by polynomial fitting based on the two centers respectively. The two centers in the next frame are estimated after prediction which combines robot speed and sampling time.

  2. Step 2

    The radius is determined by the method mentioned in Sect. 4.2. Find the maximum radius of the two fruits and name it rmax. A(ax, ay), B(bx, by) are the two estimated centers of the next frame. The subsequent processing area is intercepted with starting point C(cx,cy). The size of it is a square with 4*rmax side length. Figure 7 shows the schematic diagram of cutting the subsequent processing area. Where, cx = min{ax,bx} − rmax − m, cy = max{ay,by} + rmax + m, m is a certain reserve value.

Fig. 7.
figure 7

Subsequent processing region extraction

Least square method is used in Step 1 to fit the motion path of the robot. Specific processing steps are: set the square sum of error of the fitting curve is less than 5, then find out the best order of the fitting curve. After that, motion curve is fitted by least square method according to the best order and list the polynomial of the curve. Finally, the coordinates of the centers in the next frame are determined by combing sampling interval of robots.

5.2 Normalized Cross Correlation Matching

Subsequent processing region of the image has been extracted after the above steps. The position of apples can be located after the images conduct normalized cross correlation matching with the templates.

Normalized cross correlation (NCC) matching [23, 24] is the most classical algorithm among matching methods based on gray feature. Its basic principle is comparing the gray matrix of template with image to be searched to get the location of the most relevant match. The algorithm is simple and it eliminates the light sensitive issues which means it has good applicability for apple images under different light intensities. Besides, NCC match applies to the situation when the image is slightly displaced and rotated compared with the original one, which means it is suitable for dynamic image tracking and matching.

I is used to represent the image to be matched whose pixels are M × N and T is used to represent the template whose pixels are m × n. Normalized correlation coefficient is defined as follows:

$$ R(x,y) = \frac{{\sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {[I(x + u,y + \gamma ) - \overline{I}_{x,y} ]} [T(u,\gamma ) - \overline{T} ]} }}{{\sqrt {\sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {[I(x + u,y + \gamma ) - \overline{I}_{x,y} ]^{2} \sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {[T(u,\gamma ) - \overline{T} ]^{2} } } } } } }} $$
(3)

Where, \( (x,y) \) is the coordinate of the top left corner of the sub-graph in the image; \( (u,\gamma ) \) is the coordinate of the pixel in the template.

$$ \overline{I}_{x,y} = \frac{1}{mn}\sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {[I(x + u,y + \gamma )]} } $$
(4)

is the average value of pixel of the sub-graph \( I_{x,y} \).

$$ \overline{T} = \frac{1}{mn}\sum\limits_{i = 0}^{m - 1} {\sum\limits_{j = 0}^{n - 1} {T(u,\gamma )} } $$
(5)

is the average value of pixel of template T.

\( R(x,y) \) is between (0,1), the greater the coefficient is, the higher similarity between two matching templates will be.

However, NCC fast matching algorithm is used in this paper because NCC algorithm needs too much calculation and bad real time. Specific steps are as follows:

Step 1 Set \( T^{'} (u,\gamma ) = T(u,\gamma ) - \overline{T} \), then the numerator of formula (2) can be simplified as

$$ \sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {I(x + u,y + \gamma )T^{'} (u,\gamma ) - \overline{I}_{x,y} } } \sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {T^{'} (u,\gamma )} } $$
(6)

Where, \( \sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {T^{'} (u,\gamma )} } = \sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {[T(u,\gamma )} } - \overline{T} ] = 0 \), so the numerator can be simplified as

$$ R(x,y)_{numerator} = \sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {I(x + u,y + \gamma )T^{'} (u,\gamma )} } $$
(7)

According to the Fourier transform, the numerator can be rewritten as

$$ R(x,y)_{numerator} = F^{ - 1} \left\{ {F\{ I\} \bullet F*\{ T^{'} \} } \right\} $$
(8)

Step 2 For the denominator, since the template is known, \( \sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {[T(u,\gamma )} } - \overline{T} ] \) is a given fixed value, which means it does not affect the normalized match to find the optimal solution.

Therefore it does not have to be calculated, denominator of formula (2) can be simplified as

$$ R(x,y)_{denominator} = \sqrt {\sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {[I(x + u,y + \gamma ) - \overline{I}_{x,y} ]^{2} } } } $$
(9)

In summary, normalized correlation coefficient can be simplified as

$$ R_{1} (x,y) = \frac{{F^{ - 1} \left\{ {F\left\{ I \right\} \bullet F*\left\{ {T^{'} } \right\}} \right\}}}{{\sqrt {\sum\limits_{\mu = 0}^{m - 1} {\sum\limits_{\gamma = 0}^{n - 1} {[I(x + u,y + \gamma ) - \overline{I}_{x,y} ]^{2} } } } }} $$
(10)

The amount of calculation of NCC algorithm is reduced significantly after simplification and real-time has been improved effectively.

6 Results and Analysis

6.1 Robot Motion Path Prediction

The operating environment of the experiment is as follows, hardware environment: windows 7 operating system, Intel(R) Core(TM)2 Duo CPU E7500 2.93 GHz processor, 2 GB memory. Software environment: Matlab R2013a.

The latest 11 images which are continuously collected by robot under uniform condition in the natural scene should be processed. Fitting the trajectory of the robot according to the previous ten images and predicting subsequent path. Specific steps are as follows.

Firstly, find each center of two overlapping apples in the 10 images, conduct polynomial fitting for the motion path of the centers. Experiment results are shown in Fig. 8.

Fig. 8.
figure 8

Curve fitting diagram of robot motion path

Take the left apple for an example, set the square sum of error of the fitting curve is less than 5, it turns out that three order curve can fit the robot trajectory, so, it is used in this paper to fit the curve. Find the polynomial of the curve and the coordinates of the two centers of overlapping fruits in the eleventh image can be predicted according to former ten pictures combined with sampling time of robot. In this experiment, the coordinates of the predicted centers are (214,60.6) and (304,56.3) respectively.

6.2 Image Matching

According to the position of the centers predicted, a square with side length 4*rmax is intercepted as the area to be processed.

Figure 9(a) is a 320 × 240 pixel image, apples account for approximately 18.5 % of the whole image. The results show that the new method can find the apples in the eleventh image accurately.

Fig. 9.
figure 9

Match identification process

Comparison experiments show that average matching time of two apples is 0.181 s before pre-judgment, while after pre-judgment, average matching time is 0.094 s, which means the speed of identification is accelerated by 48.1 %. These results indicate that it significantly accelerates image matching after pre-judgment and matching accuracy is high.

6.3 Different Proportion of Apple Area Discussion

In this paper, tracking and recognition problem caused by mutual occlusion of fruits is studied. It is a simple approach, but it is effective. The results indicate that this approach can successfully recognize the overlapping apples and track them in dynamic condition. Also it has good real-time. However, the proportion of apple area has a big influence on the matching time.

Applying this method to Fig. 10(a), (b), (c) which have the same size and the relationship between the proportion of apple area and matching time is shown in Table 1.

Fig. 10.
figure 10

Images of experiments on influence of proportion of apple area

Table 1. Relationship between the proportion of apple area and matching time

It can be seen from Table 1 that the smaller proportion of apple fruit is, the better accelerating optimization will be.

6.4 Different Proportion of Apple Overlapping Area Discussion

Besides the proportion of apple area, the proportion of apple overlapping area also has great impact on the matching time.

The comparison experiments are done with the same two apples in different overlapping conditions (Fig. 11).

Fig. 11.
figure 11

Images of experiments on influence of proportion of apple overlapping area

Table 2 shows that the bigger proportion of overlapping area is, the less matching time will be needed.

Table 2. Relationship between the proportion of apple overlapping area and matching time

7 Conclusion

Tracking and recognition of overlapping fruits is studied in this paper. The process of dynamic tracking and recognition is introduced, firstly, improved Otsu method based on color difference is used to segment the image, then morphology method is processed on segmented image. Secondly, the centers of overlapping fruits are determined by local maxima distance, the radii are determined by the centers and the templates are extracted according to the centers and radii. Finally, rapid normalization match is conducted to track and recognize overlapping fruits. It is proved by several tests that matching time is reduced significantly after prediction and real-time has been improved effectively. Additional hardware devices are not required in this method, lower cost is needed and it has generality for spherical fruits and vegetables picking robot. Further study is required for the following problems.

Recognize overlapping fruits in relatively more occlusion situation. The method which is based on local maxima of distance only applies to the situation when the profile of apples is a relatively complete circle. The maxima method can not be used if the profile of apple is not complete.

Improve real-time performance of template matching performance. NCC algorithm has good effect on different light and slight rotation and translation, but it needs much calculation. Although image processing time is reduced by pre-judgment, the computation is still heavy. Researches on real-time improvement can be studied in subsequent studies.