Skip to main content


Log in

Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO

  • Published:
Precision Agriculture Aims and scope Submit manuscript


The performance of six existing deep learning architectures were compared for the task of detection of mango fruit in images of tree canopies. Images of trees (n = 1 515) from across five orchards were acquired at night using a 5 Mega-pixel RGB digital camera and 720 W of LED flood lighting in a rig mounted on a farm utility vehicle operating at 6 km/h. The two stage deep learning architectures of Faster R-CNN(VGG) and Faster R-CNN(ZF), and the single stage techniques YOLOv3, YOLOv2, YOLOv2(tiny) and SSD were trained both with original resolution and 512 × 512 pixel versions of 1 300 training tiles, while YOLOv3 was run only with 512 × 512 pixel images, giving a total of eleven models. A new architecture was also developed, based on features of YOLOv3 and YOLOv2(tiny), on the design criteria of accuracy and speed for the current application. This architecture, termed ‘MangoYOLO’, was trained using: (i) the 1 300 tile training set, (ii) the COCO dataset before training on the mango training set, and (iii) a daytime image training set of a previous publication, to create the MangoYOLO models ‘s’, ‘pt’ and ‘bu’, respectively. Average Precision plateaued with use of around 400 training tiles. MangoYOLO(pt) achieved a F1 score of 0.968 and Average Precision of 0.983 on a test set independent of the training set, outperforming other algorithms, with a detection speed of 8 ms per 512 × 512 pixel image tile while using just 833 Mb GPU memory per image (on a NVIDIA GeForce GTX 1070 Ti GPU) used for in-field application. The MangoYOLO model also outperformed other models in processing of full images, requiring just 70 ms per image (2 048 × 2 048 pixels) (i.e., capable of processing ~ 14 fps) with use of 4 417 Mb of GPU memory. The model was robust in use with images of other orchards, cultivars and lighting conditions. MangoYOLO(bu) achieved a F1 score of 0.89 on a day-time mango image dataset. With use of a correction factor estimated from the ratio of human count of fruit in images of the two sides of sample trees per orchard and a hand harvest count of all fruit on those trees, MangoYOLO(pt) achieved orchard fruit load estimates of between 4.6 and 15.2% of packhouse fruit counts for the five orchards considered. The labelled images (1 300 training, 130 validation and 300 test) of this study are available for comparative studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others


  • Anderson, N., Underwood, J., Rahman, M., Robson, A., & Walsh, K. (2018). Estimation of fruit load in mango orchards: tree sampling considerations and use of machine vision and satellite imagery. Precision Agric.

    Article  Google Scholar 

  • Bargoti S, Underwood J (2017a) Deep fruit detection in orchards. In: Proceedings—IEEE international conference on robotics and automation, pp 3626–3633.

  • Bargoti, S., & Underwood, J. P. (2017b). Image segmentation for fruit detection and yield estimation in apple orchards. J Field Robot, 34, 1039–1060.

    Article  Google Scholar 

  • Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings—IEE conference on computer vision and pattern recognition, pp 248–255.

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. Int J Comput Vis, 88, 303–338.

    Article  Google Scholar 

  • Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448.

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 580–587.

  • Gongal, A., Amatya, S., Karkee, M., Zhang, Q., & Lewis, K. (2015). Sensors and systems for fruit detection and localization: a review. Comput Electron Agric, 116, 8–19.

    Article  Google Scholar 

  • He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988.

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 770–778

  • Herold, B., Kawano, S., Sumpf, B., Tillmann, P., & Walsh, K. B. (2009). Chapter 3. VIS/NIR spectroscopy. In M. Zude (Ed.), Optical monitoring of fresh and processed agricultural crops (pp. 141–249). Boca Raton, USA: CRC Press.

    Google Scholar 

  • Hung, C., Underwood, J., Nieto, J., & Sukkarieh, S. (2015). A feature learning based approach for automated fruit yield estimation. In A. Zelinsky (Ed.), Field and service robotics (pp. 485–498). Cham: Springer.

    Chapter  Google Scholar 

  • Jimenez, A., Ceres, R., & Pons, J. (2000). A survey of computer vision methods for locating fruit on trees. Trans ASAE, 43, 1911–1920.

    Article  Google Scholar 

  • Kadir, M. F. A., Yusri, N. A. N., Rizon, M., Bin Mamat, A. R., Jamal, A. A., & Makhtar, M. (2015). Automatic mango detection using texture analysis and randomised hough transform. Appl Math Sci, 9, 6427–6436.

    Article  Google Scholar 

  • Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: a survey. Comput Electron Agric, 147, 70–90.

    Article  Google Scholar 

  • Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017a) Feature pyramid networks for object detection. In: IEE conference on computer vision and pattern recognition, pp 936–944.

  • Lin T-Y et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

  • Lin TY, Goyal P, Girshick R, He K, Dollar P (2017b) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2999–3007.

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multi box detector. In: European conference on computer vision. Springer, pp 21–37.

    Chapter  Google Scholar 

  • Nanaa K, Rizon M, Rahman MNA, Ibrahim Y, Aziz AZA (2014) Detecting mango fruits by using randomized hough transform and backpropagation neural network. In: Proceedings of the international conference on information visualisation, pp 388–391.

  • Payne, A., & Walsh, K. (2014). Chapter 16. Machine vision in estimation of fruit crop yield. In Y. Ibaraki & S. D. Gupta (Eds.), Plant image analysis: fundamentals and applications (pp. 329–374). Boca Raton, FL, USA: CRC Press.

    Chapter  Google Scholar 

  • Payne, A., Walsh, K., Subedi, P., & Jarvis, D. (2014). Estimating mango crop yield using image analysis using fruit at ‘stone hardening’ stage and night time imaging. Comput Electron Agric, 100, 160–167.

    Article  Google Scholar 

  • Payne, A. B., Walsh, K. B., Subedi, P., & Jarvis, D. (2013). Estimation of mango crop yield using image analysis–segmentation method. Comput Electron Agric, 91, 57–64.

    Article  Google Scholar 

  • Qureshi, W. S., Payne, A., Walsh, K. B., Linker, R., Cohen, O., & Dailey, M. N. (2017). Machine vision for counting fruit on mango tree canopies. Precis Agric, 18, 224–244.

    Article  Google Scholar 

  • Redmon J (2018) Darknet: open source neural networks in C. Accessed 23/03/2018

  • Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 779–788.

  • Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271.

  • Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:180402767

  • Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 1–99

  • Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., & McCool, C. (2016). Deep fruits: a fruit detection system using deep neural networks. Sensors.

    Article  PubMed  Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

  • Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:14126806

  • Stein, M., Bargoti, S., & Underwood, J. (2016). Image based mango fruit detection, localisation and yield estimation using multiple view geometry. Sensors.

    Article  PubMed  Google Scholar 

  • Syal, A., Garg, D., & Sharma, S. (2013). A survey of computer vision methods for counting fruits and yield prediction. Int J Comput Sci Eng, 2, 346–350.

    Google Scholar 

  • Underwood JP, Rahman MM, Robson A, Walsh KB, Koirala A, Wang Z (2018) Fruit load estimation in mango orchards—a method comparison. Paper presented at the ICRA 2018 workshop on robotic vision and action in agriculture, Brisbane, Australia

  • Walsh, K., & Wang, Z. (2018). Monitoring fruit quality and quantity in mangoes. In V. Galán Saúco & P. Lu (Eds.), Achieving sustainable cultivation of mangoes (pp. 313–338). Cambridge, UK: Burleigh Dodds Science Publishing.

    Chapter  Google Scholar 

  • Wang, Z., Underwood, J., & Walsh, K. B. (2018). Machine vision assessment of mango orchard flowering. Comput Electron Agric, 151, 501–511.

    Article  Google Scholar 

  • Wang, Z., Walsh, K. B., & Verma, B. (2017). On-tree mango fruit size estimation using RGB-D images. Sensors.

    Article  PubMed  Google Scholar 

  • Zeiler, M. D. (2014). Visualizing and understanding convolutional networks. LNCS.

    Article  Google Scholar 

Download references


This work received funding support from the Australian Federal Department of Agriculture and Water, and from Horticulture Innovation Australia (Project ST15005, Multiscale monitoring of tropical fruit production). AK acknowledges receipt of an Australian Regional Universities Network scholarship, and ZW was funded by a CQU Early Career Fellowship. Farm support from Chad Simpson and Ivan Philpott is appreciated. The assistance of Jason Bell of the CQUniversity High Performance Computing cluster is acknowledged. The work also benefitted from discussion with AlexeyAB (

Author information

Authors and Affiliations


Corresponding author

Correspondence to A. Koirala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koirala, A., Walsh, K.B., Wang, Z. et al. Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’. Precision Agric 20, 1107–1135 (2019).

Download citation

  • Published:

  • Issue Date:

  • DOI: