Comparison of Machine Learning and Deep Learning Methods for Grape Cluster Segmentation

Automatic grape detection is one of the first steps towards automatic yield estimation. This step is often performed with a computer vision algorithm using the classic feature extraction and classification approach. Many grape bunch detection algorithms have been proposed in the last decade and most of them follow this standard approach. An alternative is semantic segmentation with deep learning models. The main objective of this work is to compare existing algorithms based on machine learning to encoder-decoder segmentation models (UNet and PSPNet). The comparison was performed on two challenging datasets of white grape varieties in natural lighting conditions. The UNet model reached better performances on both datasets with up-to 0.76 IoU score (compared to 0.59 IoU for the second best model). UNet was combined to a linear model to estimate the total number of grape bunches in 200 plants and reached 86% counting accuracy. The results show that deep learning models are more robust to white grape detection compared to classic segmentation techniques. This is an important property for early yield estimation before veraison.


  • Grape detection
  • Precision viticulture
  • Deep learning
  • Semantic segmentation

This work has been performed in the project AI4DI: Artificial Intelligence for Digitizing Industry, under grant agreement No 826060. The project is co-funded by grants from Germany, Austria, Finland, France, Norway, Latvia, Belgium, Italy, Switzerland, and the Czech Republic and - Electronic Component Systems for European Leadership Joint Undertaking (ECSEL JU).

We want to thank Vranken-Pommery Monopole, our partner in the AI4DI project, for allowing image collection in their vineyards. We also thank the ROMEO Computing Center ( of Université de Reims Champagne-Ardenne, whose Nvidia DGX-1 server allowed us to accelerate the training steps and compare several model approaches.

