Skip to main content
Log in

Bunet: An effective and efficient segmentation method based on bilateral encoder-decoder structure for rapid detection of apple tree branches

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Automatic apple harvesting robots have received much research attention in recent years to lower harvesting costs. A fundamental problem for harvesting robots is how to quickly and accurately detect branches to avoid collisions with limited hardware resources. In this paper, we propose a lightweight, high-accurate and real-time semantic segmentation network, Bilateral U-shape Network (BUNet), to segment apple tree branches. The BUNet consists mainly of a U-shaped detail branch and a U-shaped semantic branch, the former for capturing spatial details and the latter for supplementing semantic information. These two U-shape branches complement each other, keeping the high accuracy of the Encoder-decoder Backbone while maintaining the efficiency and effectiveness of the Two-pathway Backbone. In addition, a Simplified Attention Fusion Module (SAFM) is proposed to effectively fuse different levels of information from two branches for pixel-by-pixel prediction. Experimental results show (on our own constructed dataset) that BUNet achieves the highest Intersection over Union (IoU) and F1-score of 75.96% and 86.34%, respectively, with minimum parameters of 0.93M and 11.94G Floating-point of Operations (FLOPs) in branch segmentation. Meanwhile, BUNet achieves a speed of 110.32 Frames Per Second (FPS) with input image size of 1280\(\times \)720 pixels. These results confirm that the proposed method can effectively detect the branches and can, therefore, be used to plan an obstacle avoidance path for harvesting robots.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request

References

  1. Bac CW, Van Henten EJ, Hemming J, Edan Y (2014) Harvesting robots for high-value crops: State-of-the-art review and challenges ahead. J Field Robotics 31(6):888–911. https://doi.org/10.1002/rob.21525

    Article  Google Scholar 

  2. Kapach K, Barnea E, Mairon R, Edan Y, Ben-Shahar O (2012) Computer vision for fruit harvesting robots-state of the art and challenges ahead. Int J Comput Vision Robotics 3(1/2):4–34. https://doi.org/10.1504/IJCVR.2012.046419

    Article  Google Scholar 

  3. Zhang Z, Igathinathane C, Li J, Cen H, Lu Y, Flores P (2020) Technology progress in mechanical harvest of fresh market apples. Comput Electr Agriculture 175:105606. https://doi.org/10.1016/j.compag.2020.105606

    Article  Google Scholar 

  4. Tang Y, Chen M, Wang C, Luo L, Li J, Lian G, Zou X (2020) Recognition and localization methods for vision-based fruit picking robots: A review. Frontiers Plant Sci 11:510. https://doi.org/10.3389/fpls.2020.00510

    Article  Google Scholar 

  5. Fu L, Gao F, Wu J, Li R, Karkee M, Zhang Q (2020) Application of consumer rgb-d cameras for fruit detection and localization in field: A critical review. Comput Electr Agriculture 177:105687. https://doi.org/10.1016/j.compag.2020.105687

    Article  Google Scholar 

  6. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pp 801–818. https://doi.org/10.1007/978-3-030-01234-2_49

  7. Kamilaris A, X F (2018) Deep learning in agriculture: A survey. Comput Electr Agriculture 147:70–90. https://doi.org/10.1016/j.compag.2018.02.016

  8. Mo Y, Wu Y, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646. https://doi.org/10.1016/j.neucom.2022.01.005

    Article  Google Scholar 

  9. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440. https://doi.org/10.1109/TPAMI.2016.2572683

  10. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Analysis Machine Int 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  11. Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. https://doi.org/10.48550/arXiv.1706.05587 arXiv preprint arXiv:1706.05587

  12. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE International conference on computer vision, pp 2961–2969. https://doi.org/10.1109/ICCV.2017.322

  13. Guo Y, Liu Y, Georgiou T, Lew MS (2018) A review of semantic segmentation using deep neural networks. Int J Multimedia Inf Retrieval 7(2):87–93. https://doi.org/10.1007/s13735-017-0141-z

    Article  Google Scholar 

  14. Lin G, Tang Y, Zou X, Xiong J, Li J (2019) Guava detection and pose estimation using a low-cost rgb-d sensor in the field. Sensors 19(2):428. https://doi.org/10.3390/s19020428

    Article  Google Scholar 

  15. Lin G, Tang Y, Zou X, Wang C (2021) Three-dimensional reconstruction of guava fruits and branches using instance segmentation and geometry analysis. Comput Electr Agriculture 184:106107. https://doi.org/10.1016/j.compag.2021.106107

    Article  Google Scholar 

  16. Li J, Tang Y, Zou X, Lin G, Wang H (2020) Detection of fruit-bearing branches and localization of litchi clusters for vision-based harvesting robots. IEEE Access 8:117746–117758. https://doi.org/10.1109/ACCESS.2020.3005386

    Article  Google Scholar 

  17. Kang H, Chen C (2019) Fruit detection and segmentation for apple harvesting using visual sensor in orchards. Sensors 19(20):4599. https://doi.org/10.3390/s19204599

    Article  Google Scholar 

  18. Bechar A, Vigneault C (2016) Agricultural robots for field operations: Concepts and components. Biosyst Eng 149:94–111. https://doi.org/10.1016/j.biosystemseng.2016.06.014

    Article  Google Scholar 

  19. Bechar A, Vigneault C (2017) Agricultural robots for field operations. part 2: Operations and systems. Biosyst Eng 153:110–128. https://doi.org/10.1016/j.biosystemseng.2016.11.004

    Article  Google Scholar 

  20. Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the European conference on computer vision (ECCV). https://doi.org/10.1007/978-3-030-01249-6_34

    Article  Google Scholar 

  21. Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  22. Peng J, Liu Y, Tang S, Hao Y, Chu L, Chen G, Wu Z, Chen Z, Yu Z, Du Y, et al. (2022) Pp-liteseg: A superior real-time semantic segmentation model. https://doi.org/10.48550/arXiv.2204.02681 arXiv preprint arXiv:2204.02681

  23. Sun K, Zhao Y, Jiang B, Cheng T, Xiao B, Liu D, Mu Y, Wang X, Liu W, Wang J (2019) High-resolution representations for labeling pixels and regions. https://doi.org/10.48550/arXiv.1904.04514 arXiv preprint arXiv:1904.04514

  24. Zhang W, Huang Z, Luo G, Chen T, Wang X, Liu W, Yu G, Shen C (2022) Topformer: Token pyramid transformer for mobile semantic segmentation. In Proceedings of the IEEE/CVF ccnference on computer vision and pattern recognition (CVPR), pp 12083–12093. https://doi.org/10.1109/CVPR52688.2022.01177

  25. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV), pp 325–341. https://doi.org/10.1007/978-3-030-01261_20

  26. Fan M, Lai S, Huang J, Wei X, Chai Z, Luo J, Wei X (2021) Rethinking bisenet for real-time semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9716–9725. https://doi.org/10.1109/CVPR46437.2021.00959

  27. Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2021) Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int J Comput Vision 129(11):3051–3068. https://doi.org/10.1007/s11263-021-01515-2

    Article  Google Scholar 

  28. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Analysis Machine Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

  29. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861 arXiv preprint arXiv:1704.04861

  30. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1

  31. Contributors P (2019) PaddleSeg, End-to-end image segmentation kit based on PaddlePaddle. https://github.com/PaddlePaddle/PaddleSeg

  32. Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, Le QV, Adam H (2019) Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2019.00140

    Article  Google Scholar 

  33. Yuan Y, Chen X, Chen X, Wang J (2019) Segmentation transformer: Object-contextual representations for semantic segmentation. https://doi.org/10.48550/arXiv.1909.11065 arXiv preprint arXiv:1909.11065

Download references

Funding

No funding was received to assist with the preparation of this manuscript

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zeming Fan.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Shanshan Zhang and Hao Wan were both co-first authors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Wan, H., Fan, Z. et al. Bunet: An effective and efficient segmentation method based on bilateral encoder-decoder structure for rapid detection of apple tree branches. Appl Intell 53, 23336–23348 (2023). https://doi.org/10.1007/s10489-023-04742-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04742-x

Keywords

Navigation