Abstract
The performance of six existing deep learning architectures were compared for the task of detection of mango fruit in images of tree canopies. Images of trees (n = 1 515) from across five orchards were acquired at night using a 5 Mega-pixel RGB digital camera and 720 W of LED flood lighting in a rig mounted on a farm utility vehicle operating at 6 km/h. The two stage deep learning architectures of Faster R-CNN(VGG) and Faster R-CNN(ZF), and the single stage techniques YOLOv3, YOLOv2, YOLOv2(tiny) and SSD were trained both with original resolution and 512 × 512 pixel versions of 1 300 training tiles, while YOLOv3 was run only with 512 × 512 pixel images, giving a total of eleven models. A new architecture was also developed, based on features of YOLOv3 and YOLOv2(tiny), on the design criteria of accuracy and speed for the current application. This architecture, termed ‘MangoYOLO’, was trained using: (i) the 1 300 tile training set, (ii) the COCO dataset before training on the mango training set, and (iii) a daytime image training set of a previous publication, to create the MangoYOLO models ‘s’, ‘pt’ and ‘bu’, respectively. Average Precision plateaued with use of around 400 training tiles. MangoYOLO(pt) achieved a F1 score of 0.968 and Average Precision of 0.983 on a test set independent of the training set, outperforming other algorithms, with a detection speed of 8 ms per 512 × 512 pixel image tile while using just 833 Mb GPU memory per image (on a NVIDIA GeForce GTX 1070 Ti GPU) used for in-field application. The MangoYOLO model also outperformed other models in processing of full images, requiring just 70 ms per image (2 048 × 2 048 pixels) (i.e., capable of processing ~ 14 fps) with use of 4 417 Mb of GPU memory. The model was robust in use with images of other orchards, cultivars and lighting conditions. MangoYOLO(bu) achieved a F1 score of 0.89 on a day-time mango image dataset. With use of a correction factor estimated from the ratio of human count of fruit in images of the two sides of sample trees per orchard and a hand harvest count of all fruit on those trees, MangoYOLO(pt) achieved orchard fruit load estimates of between 4.6 and 15.2% of packhouse fruit counts for the five orchards considered. The labelled images (1 300 training, 130 validation and 300 test) of this study are available for comparative studies.
Similar content being viewed by others
References
Anderson, N., Underwood, J., Rahman, M., Robson, A., & Walsh, K. (2018). Estimation of fruit load in mango orchards: tree sampling considerations and use of machine vision and satellite imagery. Precision Agric. https://doi.org/10.1007/s11119-018-9614-1.
Bargoti S, Underwood J (2017a) Deep fruit detection in orchards. In: Proceedings—IEEE international conference on robotics and automation, pp 3626–3633. https://doi.org/10.1109/icra.2017.7989417
Bargoti, S., & Underwood, J. P. (2017b). Image segmentation for fruit detection and yield estimation in apple orchards. J Field Robot, 34, 1039–1060. https://doi.org/10.1002/rob.21699.
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings—IEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/cvpr.2009.5206848
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. Int J Comput Vis, 88, 303–338. https://doi.org/10.1007/s11263-009-0275-4.
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448. https://doi.org/10.1109/iccv.2015.169
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 580–587. https://doi.org/10.1109/cvpr.2014.81
Gongal, A., Amatya, S., Karkee, M., Zhang, Q., & Lewis, K. (2015). Sensors and systems for fruit detection and localization: a review. Comput Electron Agric, 116, 8–19. https://doi.org/10.1016/j.compag.2015.05.021.
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988. https://doi.org/10.1109/iccv.2017.322
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 770–778
Herold, B., Kawano, S., Sumpf, B., Tillmann, P., & Walsh, K. B. (2009). Chapter 3. VIS/NIR spectroscopy. In M. Zude (Ed.), Optical monitoring of fresh and processed agricultural crops (pp. 141–249). Boca Raton, USA: CRC Press.
Hung, C., Underwood, J., Nieto, J., & Sukkarieh, S. (2015). A feature learning based approach for automated fruit yield estimation. In A. Zelinsky (Ed.), Field and service robotics (pp. 485–498). Cham: Springer. https://doi.org/10.1007/978-3-319-07488-7_33.
Jimenez, A., Ceres, R., & Pons, J. (2000). A survey of computer vision methods for locating fruit on trees. Trans ASAE, 43, 1911–1920. https://doi.org/10.13031/2013.3096.
Kadir, M. F. A., Yusri, N. A. N., Rizon, M., Bin Mamat, A. R., Jamal, A. A., & Makhtar, M. (2015). Automatic mango detection using texture analysis and randomised hough transform. Appl Math Sci, 9, 6427–6436. https://doi.org/10.12988/ams.2015.53290.
Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: a survey. Comput Electron Agric, 147, 70–90. https://doi.org/10.1016/j.compag.2018.02.016.
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017a) Feature pyramid networks for object detection. In: IEE conference on computer vision and pattern recognition, pp 936–944. https://doi.org/10.1109/cvpr.2017.106
Lin T-Y et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Lin TY, Goyal P, Girshick R, He K, Dollar P (2017b) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2999–3007. https://doi.org/10.1109/iccv.2017.324
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multi box detector. In: European conference on computer vision. Springer, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Nanaa K, Rizon M, Rahman MNA, Ibrahim Y, Aziz AZA (2014) Detecting mango fruits by using randomized hough transform and backpropagation neural network. In: Proceedings of the international conference on information visualisation, pp 388–391. https://doi.org/10.1109/iv.2014.54
Payne, A., & Walsh, K. (2014). Chapter 16. Machine vision in estimation of fruit crop yield. In Y. Ibaraki & S. D. Gupta (Eds.), Plant image analysis: fundamentals and applications (pp. 329–374). Boca Raton, FL, USA: CRC Press.
Payne, A., Walsh, K., Subedi, P., & Jarvis, D. (2014). Estimating mango crop yield using image analysis using fruit at ‘stone hardening’ stage and night time imaging. Comput Electron Agric, 100, 160–167. https://doi.org/10.1016/j.compag.2013.11.011.
Payne, A. B., Walsh, K. B., Subedi, P., & Jarvis, D. (2013). Estimation of mango crop yield using image analysis–segmentation method. Comput Electron Agric, 91, 57–64. https://doi.org/10.1016/j.compag.2012.11.009.
Qureshi, W. S., Payne, A., Walsh, K. B., Linker, R., Cohen, O., & Dailey, M. N. (2017). Machine vision for counting fruit on mango tree canopies. Precis Agric, 18, 224–244. https://doi.org/10.1007/s11119-016-9458-5.
Redmon J (2018) Darknet: open source neural networks in C. https://pjreddie.com/darknet/. Accessed 23/03/2018
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 779–788. https://doi.org/10.1109/cvpr.2016.91
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271. https://doi.org/10.1109/cvpr.2017.690
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:180402767
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 1–99
Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., & McCool, C. (2016). Deep fruits: a fruit detection system using deep neural networks. Sensors. https://doi.org/10.3390/s16081222.
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:14126806
Stein, M., Bargoti, S., & Underwood, J. (2016). Image based mango fruit detection, localisation and yield estimation using multiple view geometry. Sensors. https://doi.org/10.3390/s16111915.
Syal, A., Garg, D., & Sharma, S. (2013). A survey of computer vision methods for counting fruits and yield prediction. Int J Comput Sci Eng, 2, 346–350.
Underwood JP, Rahman MM, Robson A, Walsh KB, Koirala A, Wang Z (2018) Fruit load estimation in mango orchards—a method comparison. Paper presented at the ICRA 2018 workshop on robotic vision and action in agriculture, Brisbane, Australia
Walsh, K., & Wang, Z. (2018). Monitoring fruit quality and quantity in mangoes. In V. Galán Saúco & P. Lu (Eds.), Achieving sustainable cultivation of mangoes (pp. 313–338). Cambridge, UK: Burleigh Dodds Science Publishing.
Wang, Z., Underwood, J., & Walsh, K. B. (2018). Machine vision assessment of mango orchard flowering. Comput Electron Agric, 151, 501–511. https://doi.org/10.1016/j.compag.2018.06.040.
Wang, Z., Walsh, K. B., & Verma, B. (2017). On-tree mango fruit size estimation using RGB-D images. Sensors. https://doi.org/10.3390/s17122738.
Zeiler, M. D. (2014). Visualizing and understanding convolutional networks. LNCS. https://doi.org/10.1007/978-3-319-10590-1_53.
Acknowledgement
This work received funding support from the Australian Federal Department of Agriculture and Water, and from Horticulture Innovation Australia (Project ST15005, Multiscale monitoring of tropical fruit production). AK acknowledges receipt of an Australian Regional Universities Network scholarship, and ZW was funded by a CQU Early Career Fellowship. Farm support from Chad Simpson and Ivan Philpott is appreciated. The assistance of Jason Bell of the CQUniversity High Performance Computing cluster is acknowledged. The work also benefitted from discussion with AlexeyAB (https://github.com/AlexeyAB).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Koirala, A., Walsh, K.B., Wang, Z. et al. Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’. Precision Agric 20, 1107–1135 (2019). https://doi.org/10.1007/s11119-019-09642-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11119-019-09642-0