Introduction

Potatoes are one of the most important crops in the Netherlands. They are grown on a total area of 180,000 ha. Unfortunately, they are vulnerable to disease, especially to the outbreak of late blight caused by Phytophthora infestans. Late blight is one of the most important potato diseases that is spread, for instance, by volunteer potatoes. Volunteer potatoes are potato plants that have survived the winter due to lack of frost. They can be responsible for infesting up to 80,000 plants/ha during the following year after crop rotation has taken place. In this way, volunteer potatoes spread pests and disease to regular potato crops in neighbouring fields (Turkensteen et al. 2000; Boydston 2001). In the Netherlands, farmers are under a statutory obligation to remove volunteer potatoes from the field by the 1st of July. There is a definite need for methods to selectively detect and remove volunteer potatoes. At present, no selective chemicals are available to eliminate the potato tubers or volunteer potatoes in sugar beet fields (Boydston 2001). The existing method of manually removing volunteer potatoes with up to 30 h/ha of manual labour is too time consuming and therefore too costly (Paauw and Molendijk 2000). Besides manual removal of volunteer potatoes, band spraying machinery is used to apply glyphosate between rows of sugar beets. However, the effectiveness of band sprayers is limited, as only between 20% and 80% of volunteer potatoes are removed, while up to 25% of sugar beets may be unintentionally killed (Reijnierse 2004).

In 2004, a project was initiated with the goal to develop an economically attractive automatic volunteer potato detection and control system. This paper discusses one part of such a system, a colour-only based technique to detect volunteer potato plants in sugar beet fields using machine vision. The objective was to develop a method based on a one time short learning process for a field under certain circumstances and subsequently classify the image pixels and plants from that field. Colour vision as a detection means was chosen because of the reasonable price of the hardware and its proven applicability (Lee et al. 1999) in other agricultural applications. By using colour vision, several features can be chosen to create a plant specific sensor. Shape, colour and texture are commonly used features for detection of plants in images (Woebbecke et al. 1995). Compared to shape and texture-based detection, colour based detection algorithms are faster and less complex (Perez et al. 2000). However, the colour based detection system needs to overcome the challenge of operating under natural lighting conditions during various crop growth stages between April and July.

Earlier research (Nieuwenhuizen et al. 2005) has shown that with a 3-CCD camera, volunteer potato plants could be distinguished based on colour only. One method used a combination of K-means clustering, a Bayes classifier, and a resulting colour lookup table. Another method investigated was a neural network based classification routine. Using the method with the lookup table 96% of the volunteer potato plants could be detected in a sugar beet crop. In that approach, the plant objects in the images were classified by human inspection of the pixel classification result.

The research reported in this paper was built on the results of the earlier research by testing the performance of those two colour-only based detection algorithms in two fields. Also, a low-cost Bayer filter CCD camera was used and, finally, the human operator based visual object classification method was automated.

Materials and methods

Image acquisition

Image acquisition was achieved using a Basler A301f colour camera with a 4.2 mm lens mounted perpendicular to the soil surface on a in-house made three-wheel platform as shown in Fig. 1. Image acquisition was triggered by a distance sensor on one of the wheels, such that images were taken every 0.5 m in the driving direction. The camera was mounted such that an image covered one beet row and two thirds of the soil area between two adjacent rows. Images (640 × 480 pixels) were stored on a Pentium III PC. During image acquisition, the colour gains and the shutter time of the camera were adjusted continuously based on a grey reference plate which was placed at the bottom side of the field of view of the camera. This adaptive grey balance was applied to maintain a constant quality of the acquired images under variable outdoor light conditions.

Fig. 1
figure 1

Measurement setup during the field experiment. A: Grey reference plate; B: camera; C: desktop PC; D: wheel trigger

Experiments

In spring 2005, the platform was pushed forward by hand at approximately 1 m/s and images were acquired on two fields with a sandy soil. On May 26, 100 images were acquired under sunny conditions on field 1, where the sugar beet plants were in the two- to four-leaf stage. On June 2, another 220 images were acquired under cloudy conditions in field 2, where the sugar beet plants were in the four-leaf stage. Figure 2 shows two illustrative examples of images taken on field 1 and 2 respectively. The images clearly demonstrate the effects of different lighting conditions. It was also observed that about 25% of the images did not contain any volunteer potato plants.

Fig. 2
figure 2

Sugar beet plants (SB) and volunteer potato plants (VP) in field 1 (left) acquired under sunny conditions and field 2 (right) acquired under cloudy conditions. The grey reference plate is shown at the bottom of the images. The growth stage of the sugar beets in field 2 was larger than in field 1

Image processing and volunteer potato classification

Image processing consisted of three main steps, i.e. an image pre-processing, pixel classification and plant object classification.

Image pre-processing

The first step of image-processing was to correct the images for lens distortion using a nonlinear calibration routine. This resulted in a correct representation of the area of the plants in the images used for learning and classification. Secondly, the green plant material was segmented from the soil background. This second step was done to reduce the calculation time in classifying plant parts into volunteer potato plant and non-volunteer potato plant regions. For this segmentation task, the excessive green parameter (Woebbecke et al. 1995) (Eq. 1) and a threshold were used. The threshold for the excessive green value was set at 20, which was based on the interclass variance in the histograms of the images. One static threshold could be used as intensity and colour of the images were kept constant using the reference plate as shown in Fig. 1.

$$ ExcessiveGreen = 2*G - R - B $$
(1)

where G = Green pixel value, R = Red pixel value, B = Blue pixel value.

After background elimination, the remaining plant pixels were transformed using the EGRBI transformation matrix (Steward and Tian 1998) as defined in Eq. 2. This transformation separates the intensity information from colour information and allows further analyses based on colour only.

$$ {\left[ {\begin{array}{*{20}c} {{EG}} \\ {{RB}} \\ {I} \\ \end{array} } \right]} = {\left[ {\begin{array}{*{20}c} {{\frac{{ - 1}} {{{\sqrt 6 }}}}} & {{\frac{2} {{{\sqrt 6 }}}}} & {{\frac{{ - 1}} {{{\sqrt 6 }}}}} \\ {{\frac{1} {{{\sqrt 2 }}}}} & {0} & {{\frac{{ - 1}} {{{\sqrt 2 }}}}} \\ {{\frac{1} {{{\sqrt 3 }}}}} & {{\frac{1} {{{\sqrt 3 }}}}} & {{\frac{1} {{{\sqrt 3 }}}}} \\ \end{array} } \right]} \cdot {\left[ {\begin{array}{*{20}c} {R} \\ {G} \\ {B} \\ \end{array} } \right]} $$
(2)

where EG = Excessive Green, RB = Red minus Blue, I = Intensity.

The distribution of the EG and RB values from the plant pixels of sample images from field 1 and field 2 are shown in Fig. 3. It shows two colour groups and the possibility of using EG and RB values to segment potato pixels from sugar beet pixels. The visually separable distribution of sugar beet and volunteer potato colour groups in the EG-RB plane was the reason for choosing EG and RB as suitable features for volunteer potato detection.

Fig. 3
figure 3

EG and RB pixel values for potato and sugar beet plants from field 1 (left) and field 2 (right), larger circles represent more pixels with identical values

Pixel classification

For each field, classification was based on five learning images. Both classification methods used the same learning images. The learning images were randomly chosen. Therefore, the results could indicate whether static or adaptive methods would better classify volunteer potato plants. In the results section two fields, five learning images, and two methods yielded 20 classification runs.

For pixel classification two methods were used. The first method was a combination of K-means clustering and a Bayes classifier (Tang 2002). For clustering of image pixels, the EG and RB features were used together with the Euclidean distance measure. The plant pixels were clustered using the K-means algorithm with eight randomly chosen cluster centres as starting point. Volunteer potato plant clusters were identified in the EGRB clustered image and labelled manually in the learning image. The corresponding RGB values of the labelled clusters were input as a priori data, representing the volunteer potato class for that specific field, to a Bayes classification routine as described by Gonzalez and Woods (1992). After that, all possible (256= 16777216) RGB colour values were input to the Bayes decision function and a Lookup Table (LUT) was generated, consisting of all RGB values and a boolean value for membership of volunteer potato pixels. Finally, all pixels in the images from field 1 and field 2 were classified using the subsequent five different lookup tables from the five learning images for field one and the subsequent five LUTs from the five learning images for field two.

The second method was to train an Adaptive Resonance Theory 2 (ART2) Neural Network for Euclidean distance-based clustering (Pao 1989) and then use its weights to form a classifier. An ART2 Neural Network is an unsupervised learning method that is able to adaptively cluster continuous input patterns according to the distribution of the dataset. The iterative learning process decides to which cluster an input pattern of EGRB pixel colour values belongs. In contrast with the fixed number of clusters using K-means clustering, an ART2 neural network produces a variable number of clusters in accordance with the distribution of the data in the learning image. ART2 can handle continuously valued input patterns and a vigilance parameter is set to guard the cluster splitting process. The weights of the neural network contain the cluster representation in EGRB colour space and were saved together with the manually identified volunteer potato clusters in the learning images. Finally, these ten weight files were used for classification of all the pixels from the images from field 1 and field 2.

So, after the 20 pixel classification runs using both methods, the classification results were evaluated. For this purpose, reference data are necessary to evaluate the performance of the classification procedures. After passing the excessive green threshold as described earlier, all 320 images of field 1 and 2 were visually evaluated and judged. With objects labelled as volunteer potato and sugar beet, Fig. 4 shows a representative example of these 320 evaluated images. These images were used as a reference to evaluate the performance of the classification and to define true positive and false positive classified pixels. True positive percentage was defined in Eq. 3 and false positive percentage was defined in Eq. 4.

$$ {\text{True positive pixels \% }} = {\text{ }}\frac{{{\text{Potato pixels classified as potato pixels}}}} {{{\text{Total reference potato pixels}}}} \cdot 100\% $$
(3)
$$ {\text{False positive pixels \% }} = {\text{ }}\frac{{{\text{Sugar beet pixels classified as potato pixels}}}} {{{\text{Total reference sugar beet pixels}}}} \cdot 100\% $$
(4)
Fig. 4
figure 4

Sugar beet plants (SB) and volunteer potato plants (VP) in an image after correction for lens distortion (left) and binary reference image (right)

The number of classified potato and sugar beet pixels in Eqs. 3 and 4 was derived from the classification results and the total number of potato and sugar beet pixels was calculated from the binary reference images.

Plant object classification

More importantly, the results were evaluated at plant object level as we are not interested in detected pixels, but rather volunteer potato plants. A plant object was either classified as potato plant or as sugar beet plant. This decision was based on the percentage pixels classified in the object and a threshold, as defined in Eq. 5.

$$ \begin{aligned}{} & \% {\text{ Classified pixels in object }} \ge {\text{ threshold }} \Rightarrow {\text{ object }} \in {\text{ potato plants}} \\ & \% {\text{ Classified pixels in object }} < {\text{ threshold }} \Rightarrow {\text{ object }} \in {\text{ sugar beet plants}} \\ \end{aligned} $$
(5)

As in every classification problem, a trade-off between correct classification and misclassification was present in the threshold level in Eq. 5. We decided to accept a misclassification rate of the sugar beet plants of 5%, based on the fact that current—non plant specific—band spraying machinery may even remove up to 25% of the sugar beet plants. The threshold level was defined at a level where the misclassification of sugar beet plants was as close as possible to 5%, but 5% misclassification could not always be attained due to the integer number of sugar beet plants available in the images.

For each of the 20 runs the percentage true positive classification and false positive classification of plants was calculated according to Eqs. 6 and 7. The total number of potato and sugar beet plants was calculated from the binary reference images.

$$ {\text{True positive objects \% }} = {\text{ }}\frac{{{\text{Potato plants classified as potato plants}}}} {{{\text{Total potato plants}}}} \cdot 100\% $$
(6)
$$ {\text{False positive objects \% }} = {\text{ }}\frac{{{\text{Sugar beet plants classified as potato plants}}}} {{{\text{Total sugar beet plants}}}} \cdot 100\% $$
(7)

Results

Pixel classification

The results of pixel classification of the two fields are given in Table 1 and an example of pixel classification is shown in Fig. 5. Firstly, the true positive classification in field 1 shows that between 3% and 41% of the potato plant pixels were classified true positive. Within field 1, the neural network (NN) approach had a higher percentage volunteer potato pixels classified compared to the K-means/Bayes approach (LUT). Similarly, in field 2, between 11% and 52% of the pixels were correctly classified and again, the NN showed higher percentages volunteer potato pixels classified.

Table 1 Pixel classification results for field 1, 100 images, field 2, 220 images when using two classification methods and five learning images
Fig. 5
figure 5

Sugar beet plants (SB) and volunteer potato plants (VP) in an image with classified pixels in black (left image) and its corresponding image with classified plant objects based on the threshold from Eq. 5 (right image)

Secondly, the false positive classification shows that in field 1 between 5% and 22% of the pixels were misclassified. In contrast, in field 2 the misclassification of sugar beet pixels was much smaller between 1% and 7% if we do not take into account learning image 6. Learning image 6 showed almost no visual colour differences between volunteer potato and sugar beet plants. Therefore, it was hard to choose clusters representing the green colours of the volunteer potato plant but not the green colours of the sugar beet plants. As a result, the false positive classification rate was higher than the true positive classification rate. Finally, the pixel classification results show that choosing a different learning image influenced the true and false positive percentages.

Plant object classification

Pixel classification results showed in general a higher true positive rate for volunteer potato pixel classification than for sugar beet. Therefore, one can distinguish between volunteer potato and sugar beet based on the classification percentage. So, this information was used to set up the plant object classification routine. Table 2 shows the true and false positive plant classification percentages as well as the threshold used to classify objects as volunteer potato of sugar beet using Eq. 5. Due to the integer characteristics of the number of crop plants, a misclassification rate of 5% on the sugar beets could not always be achieved. Nevertheless, the closest approximation is given in Table 2. The true positive rate in field 2 for learning image 6 is much higher than the true positive rate in field 1. The zero percent classification rate of image 6 in field 2 was caused by the poor pixel classification result where the false positive percentage classified was larger than the true positive percentage classified. This negatively affected the plant classification results and the threshold level of 38% still resulted in 0.0% classified volunteer potato plants.

Table 2 Plant classification results for both fields

Discussion

Pixel classification

The main reason for the differences in classification results between field 1 and field 2 was the overlapping distributions in EG-RB space of field 1 images (Fig. 3). In field 1 the two classes were not well separated. Therefore, the false and true positive classification results were closer to each other in field 1. The differences within the fields were caused by the quality and contents of the learning image. Although the learning images were chosen randomly, they may not have represented the actual colour distribution of the two classes for the complete field, from which learning image 6 was an example case. When looking into the difference of two classification methods, larger differences in performance between the Bayes classifier and the neural network were expected because the latter could adapt itself better to the variation of image conditions during the clustering process. However, similar to Marchant and Onyango (2003) found out there were not large differences in pixel classification performance between a Bayesian classifier and a neural network classification routine. A reason for the similarities in classification performance was that both algorithms use the Euclidean distance between the pattern and the cluster centres as a decision measure for cluster membership.

Plant object classification

Field 2 showed higher numbers of volunteer plants were true positive classified. These higher true positive rates were reached with lower threshold levels in plant pixels classified. This indicates that a relatively larger amount of volunteer potato pixels was already classified when 5% of misclassification in sugar beets was reached. On the other hand, field 1 gives lower true positive rates, this might be due to smaller colour differences between sugar beet and volunteer potato plants as shown in Fig. 3, which was due to the direct sunlight illumination that often results in specular effects and colour vanishing on plant pixels. The neural network gave a slightly better approach when using learning image 3, 6, 9 and 10. This indicates that the adaptive clustering was successful in these learning images. Possibly using multiple learning images would increase the classification results, but this was not within the objectives of this research. Learning image 6 from field 2 showed no volunteer potato plants classified when the LUT was used. This was due to the high amount of misclassification in the sugar beet plants. When the threshold level of 5% of sugar beet plants was used, still no volunteer plants had more pixels classified than the threshold level of 38%. This resulted in true positive classification rates between 11% and 49% in field 1 and in true positive classification between 56% and 97% in field 2 when learning image 6 was omitted. With the automatic classification procedure as described in this report, it was possible to reach over 95% true positive classification, similar as previously predicted (Nieuwenhuizen et al. 2005).

General

The results show a discrepancy in classification performance between the two different sampling days. Several factors are responsible for the discrepancy. Firstly, the outdoor lighting conditions between the days were different. In field 1 the images were acquired under sunny conditions. This caused shadows in the images and shadowed leafs have different colours than leafs in the sun or in overcast conditions. These shadow effects within plants will not be corrected for by changing and updating the white balance. Also, direct sunlight causes colour fading in images. The images taken under overcast conditions did not have shadow effects, which largely explains the better classification results. Secondly, the growth stage of the plants changed between the days of image acquisition. Figure 2 shows that the sugar beet plants are larger in field 2. Therefore the number of pixels available as training data is larger. This resulted in a better representation of the two classes used.

The algorithms as applied in this research were colour based only and were not adaptive to colour changes of the plants in the field. The classification algorithms were trained on five learning images resulting in static classifiers. The changing thresholds in Table 2, needed tot maintain constant misclassification rates of approximately 5%, indicate that adaptive methods are needed to classify volunteer potatoes and sugar beets in a field situation correctly. Therefore, possible improvements on our current classification scheme can be made in several ways. Firstly, the detection algorithms could be made adaptive to colour changes, for example by iteratively learning the lookup table or the neural net. Also taking the average colour of plants might be more efficient for learning and classification of plant objects, as it is less computational intensive. Secondly, more plant features like texture, shape, and near infra red reflection properties could be used. Hemming and Rath (2001) also included crop row distances and morphological features of the plant objects to improve the classification results. Especially an adaptive method that takes care of changing plant parameters in the field should be able to outperform static classification methods based on single static learning images.

The software showed that applying a lookup table was four times faster than the neural network implementation, although the applications were not optimised for processing speed. The reason for this difference was that applying a lookup table was computationally less expensive than the computation of a neural network-based classifier.

Some mixed binary objects, due to occluded leaves, were present in our data. In field 1, two volunteer plants occluded sugar beet plants, this was 1.1% of the total plants appeared in the images. In field 2, eleven plants occluded, this was 2.2% of the total number of plants. This amount was higher in field 2 due to the larger growth stage of the crop and volunteer plants. This number of occlusions in our data could not be of major influence on the results. Anyway, for calculation of the results, the occluded objects were not taken into account, as they were labelled in a separate group when the reference images were made.

Conclusions

In this research, two colour-based classification schemes, namely an Adaptive Neural Network and K-Means clustering/Bayes classification scheme, were developed and field tested for volunteer potato plant detection in sugar beet fields. Up to 97% of the volunteer potato plants could be detected in a test field under cloudy conditions by using the neural network classification. In another test field under sunny conditions, up to 49% of the potato plants could be detected by both the neural network and the Bayes classification scheme. The colour-based algorithms were not yet suitable to detect more than 97% of the volunteer potato plants in different field situations. The performance of the volunteer potato detection algorithm under outdoor field conditions depended on the both plant growth stages and light conditions. The results showed that an improved adaptive method is needed to achieve a consistent classification performance over fields. Adaptive methods for plant object classification are currently included and evaluated in a practice situation.