Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’

Koirala, A.; Walsh, K. B.; Wang, Z.; McCarthy, C.

doi:10.1007/s11119-019-09642-0

Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’

Published: 28 February 2019

Volume 20, pages 1107–1135, (2019)
Cite this article

Precision Agriculture Aims and scope Submit manuscript

A. Koirala ORCID: orcid.org/0000-0001-7376-2143¹,
K. B. Walsh¹,
Z. Wang¹ &
…
C. McCarthy²

7513 Accesses
263 Citations
6 Altmetric
Explore all metrics

Abstract

The performance of six existing deep learning architectures were compared for the task of detection of mango fruit in images of tree canopies. Images of trees (n = 1 515) from across five orchards were acquired at night using a 5 Mega-pixel RGB digital camera and 720 W of LED flood lighting in a rig mounted on a farm utility vehicle operating at 6 km/h. The two stage deep learning architectures of Faster R-CNN(VGG) and Faster R-CNN(ZF), and the single stage techniques YOLOv3, YOLOv2, YOLOv2(tiny) and SSD were trained both with original resolution and 512 × 512 pixel versions of 1 300 training tiles, while YOLOv3 was run only with 512 × 512 pixel images, giving a total of eleven models. A new architecture was also developed, based on features of YOLOv3 and YOLOv2(tiny), on the design criteria of accuracy and speed for the current application. This architecture, termed ‘MangoYOLO’, was trained using: (i) the 1 300 tile training set, (ii) the COCO dataset before training on the mango training set, and (iii) a daytime image training set of a previous publication, to create the MangoYOLO models ‘s’, ‘pt’ and ‘bu’, respectively. Average Precision plateaued with use of around 400 training tiles. MangoYOLO(pt) achieved a F1 score of 0.968 and Average Precision of 0.983 on a test set independent of the training set, outperforming other algorithms, with a detection speed of 8 ms per 512 × 512 pixel image tile while using just 833 Mb GPU memory per image (on a NVIDIA GeForce GTX 1070 Ti GPU) used for in-field application. The MangoYOLO model also outperformed other models in processing of full images, requiring just 70 ms per image (2 048 × 2 048 pixels) (i.e., capable of processing ~ 14 fps) with use of 4 417 Mb of GPU memory. The model was robust in use with images of other orchards, cultivars and lighting conditions. MangoYOLO(bu) achieved a F1 score of 0.89 on a day-time mango image dataset. With use of a correction factor estimated from the ratio of human count of fruit in images of the two sides of sample trees per orchard and a hand harvest count of all fruit on those trees, MangoYOLO(pt) achieved orchard fruit load estimates of between 4.6 and 15.2% of packhouse fruit counts for the five orchards considered. The labelled images (1 300 training, 130 validation and 300 test) of this study are available for comparative studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning with Data Augmentation for Fruit Counting

A deep learning-based framework for object recognition in ecological environments with dense focal loss and occlusion

Article 07 March 2024

Muhammad Munir Afsar, Asim Dilawar Bakhshi, … Javed Iqbal

Trees Detection from Aerial Images Using the YOLOv5 Family

References

Anderson, N., Underwood, J., Rahman, M., Robson, A., & Walsh, K. (2018). Estimation of fruit load in mango orchards: tree sampling considerations and use of machine vision and satellite imagery. Precision Agric. https://doi.org/10.1007/s11119-018-9614-1.
Article Google Scholar
Bargoti S, Underwood J (2017a) Deep fruit detection in orchards. In: Proceedings—IEEE international conference on robotics and automation, pp 3626–3633. https://doi.org/10.1109/icra.2017.7989417
Bargoti, S., & Underwood, J. P. (2017b). Image segmentation for fruit detection and yield estimation in apple orchards. J Field Robot, 34, 1039–1060. https://doi.org/10.1002/rob.21699.
Article Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings—IEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/cvpr.2009.5206848
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. Int J Comput Vis, 88, 303–338. https://doi.org/10.1007/s11263-009-0275-4.
Article Google Scholar
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448. https://doi.org/10.1109/iccv.2015.169
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 580–587. https://doi.org/10.1109/cvpr.2014.81
Gongal, A., Amatya, S., Karkee, M., Zhang, Q., & Lewis, K. (2015). Sensors and systems for fruit detection and localization: a review. Comput Electron Agric, 116, 8–19. https://doi.org/10.1016/j.compag.2015.05.021.
Article Google Scholar
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988. https://doi.org/10.1109/iccv.2017.322
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 770–778
Herold, B., Kawano, S., Sumpf, B., Tillmann, P., & Walsh, K. B. (2009). Chapter 3. VIS/NIR spectroscopy. In M. Zude (Ed.), Optical monitoring of fresh and processed agricultural crops (pp. 141–249). Boca Raton, USA: CRC Press.
Google Scholar
Hung, C., Underwood, J., Nieto, J., & Sukkarieh, S. (2015). A feature learning based approach for automated fruit yield estimation. In A. Zelinsky (Ed.), Field and service robotics (pp. 485–498). Cham: Springer. https://doi.org/10.1007/978-3-319-07488-7_33.
Chapter Google Scholar
Jimenez, A., Ceres, R., & Pons, J. (2000). A survey of computer vision methods for locating fruit on trees. Trans ASAE, 43, 1911–1920. https://doi.org/10.13031/2013.3096.
Article Google Scholar
Kadir, M. F. A., Yusri, N. A. N., Rizon, M., Bin Mamat, A. R., Jamal, A. A., & Makhtar, M. (2015). Automatic mango detection using texture analysis and randomised hough transform. Appl Math Sci, 9, 6427–6436. https://doi.org/10.12988/ams.2015.53290.
Article Google Scholar
Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: a survey. Comput Electron Agric, 147, 70–90. https://doi.org/10.1016/j.compag.2018.02.016.
Article Google Scholar
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017a) Feature pyramid networks for object detection. In: IEE conference on computer vision and pattern recognition, pp 936–944. https://doi.org/10.1109/cvpr.2017.106
Lin T-Y et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Lin TY, Goyal P, Girshick R, He K, Dollar P (2017b) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2999–3007. https://doi.org/10.1109/iccv.2017.324
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multi box detector. In: European conference on computer vision. Springer, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Nanaa K, Rizon M, Rahman MNA, Ibrahim Y, Aziz AZA (2014) Detecting mango fruits by using randomized hough transform and backpropagation neural network. In: Proceedings of the international conference on information visualisation, pp 388–391. https://doi.org/10.1109/iv.2014.54
Payne, A., & Walsh, K. (2014). Chapter 16. Machine vision in estimation of fruit crop yield. In Y. Ibaraki & S. D. Gupta (Eds.), Plant image analysis: fundamentals and applications (pp. 329–374). Boca Raton, FL, USA: CRC Press.
Chapter Google Scholar
Payne, A., Walsh, K., Subedi, P., & Jarvis, D. (2014). Estimating mango crop yield using image analysis using fruit at ‘stone hardening’ stage and night time imaging. Comput Electron Agric, 100, 160–167. https://doi.org/10.1016/j.compag.2013.11.011.
Article Google Scholar
Payne, A. B., Walsh, K. B., Subedi, P., & Jarvis, D. (2013). Estimation of mango crop yield using image analysis–segmentation method. Comput Electron Agric, 91, 57–64. https://doi.org/10.1016/j.compag.2012.11.009.
Article Google Scholar
Qureshi, W. S., Payne, A., Walsh, K. B., Linker, R., Cohen, O., & Dailey, M. N. (2017). Machine vision for counting fruit on mango tree canopies. Precis Agric, 18, 224–244. https://doi.org/10.1007/s11119-016-9458-5.
Article Google Scholar
Redmon J (2018) Darknet: open source neural networks in C. https://pjreddie.com/darknet/. Accessed 23/03/2018
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 779–788. https://doi.org/10.1109/cvpr.2016.91
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271. https://doi.org/10.1109/cvpr.2017.690
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:180402767
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 1–99
Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., & McCool, C. (2016). Deep fruits: a fruit detection system using deep neural networks. Sensors. https://doi.org/10.3390/s16081222.
Article PubMed Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:14126806
Stein, M., Bargoti, S., & Underwood, J. (2016). Image based mango fruit detection, localisation and yield estimation using multiple view geometry. Sensors. https://doi.org/10.3390/s16111915.
Article PubMed Google Scholar
Syal, A., Garg, D., & Sharma, S. (2013). A survey of computer vision methods for counting fruits and yield prediction. Int J Comput Sci Eng, 2, 346–350.
Google Scholar
Underwood JP, Rahman MM, Robson A, Walsh KB, Koirala A, Wang Z (2018) Fruit load estimation in mango orchards—a method comparison. Paper presented at the ICRA 2018 workshop on robotic vision and action in agriculture, Brisbane, Australia
Walsh, K., & Wang, Z. (2018). Monitoring fruit quality and quantity in mangoes. In V. Galán Saúco & P. Lu (Eds.), Achieving sustainable cultivation of mangoes (pp. 313–338). Cambridge, UK: Burleigh Dodds Science Publishing.
Chapter Google Scholar
Wang, Z., Underwood, J., & Walsh, K. B. (2018). Machine vision assessment of mango orchard flowering. Comput Electron Agric, 151, 501–511. https://doi.org/10.1016/j.compag.2018.06.040.
Article Google Scholar
Wang, Z., Walsh, K. B., & Verma, B. (2017). On-tree mango fruit size estimation using RGB-D images. Sensors. https://doi.org/10.3390/s17122738.
Article PubMed Google Scholar
Zeiler, M. D. (2014). Visualizing and understanding convolutional networks. LNCS. https://doi.org/10.1007/978-3-319-10590-1_53.
Article Google Scholar

Download references

Acknowledgement

This work received funding support from the Australian Federal Department of Agriculture and Water, and from Horticulture Innovation Australia (Project ST15005, Multiscale monitoring of tropical fruit production). AK acknowledges receipt of an Australian Regional Universities Network scholarship, and ZW was funded by a CQU Early Career Fellowship. Farm support from Chad Simpson and Ivan Philpott is appreciated. The assistance of Jason Bell of the CQUniversity High Performance Computing cluster is acknowledged. The work also benefitted from discussion with AlexeyAB (https://github.com/AlexeyAB).

Author information

Authors and Affiliations

Institute for Future Farming Systems, Central Queensland University, Building 361, Bruce Highway, Rockhampton, QLD, 4701, Australia
A. Koirala, K. B. Walsh & Z. Wang
Centre for Agricultural Engineering (Operations), University of Southern Queensland, Building P9-132, West Street, Toowoomba, QLD, 4350, Australia
C. McCarthy

Authors

A. Koirala
View author publications
You can also search for this author in PubMed Google Scholar
K. B. Walsh
View author publications
You can also search for this author in PubMed Google Scholar
Z. Wang
View author publications
You can also search for this author in PubMed Google Scholar
C. McCarthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Koirala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koirala, A., Walsh, K.B., Wang, Z. et al. Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’. Precision Agric 20, 1107–1135 (2019). https://doi.org/10.1007/s11119-019-09642-0

Download citation

Published: 28 February 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11119-019-09642-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’

Abstract

Access this article

Similar content being viewed by others

Deep Learning with Data Augmentation for Fruit Counting

A deep learning-based framework for object recognition in ecological environments with dense focal loss and occlusion

Trees Detection from Aerial Images Using the YOLOv5 Family

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’

Abstract

Access this article

Similar content being viewed by others

Deep Learning with Data Augmentation for Fruit Counting

A deep learning-based framework for object recognition in ecological environments with dense focal loss and occlusion

Trees Detection from Aerial Images Using the YOLOv5 Family

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation