MASPP and MWASP: multi-head self-attention based modules for UNet network in melon spot segmentation

Tran, Khoa-Dang; Ho, Trang-Thi; Huang, Yennun; Le, Nguyen Quoc Khanh; Tuan, Le Quoc; Ho, Van Lam

doi:10.1007/s11694-024-02466-1

MASPP and MWASP: multi-head self-attention based modules for UNet network in melon spot segmentation

Original Paper
Published: 29 March 2024

(2024)
Cite this article

Journal of Food Measurement and Characterization Aims and scope Submit manuscript

Khoa-Dang Tran¹,
Trang-Thi Ho ORCID: orcid.org/0000-0001-7541-3932²,
Yennun Huang¹,
Nguyen Quoc Khanh Le³,
Le Quoc Tuan⁴ &
…
Van Lam Ho⁵

77 Accesses
Explore all metrics

Abstract

Sweet melon, and in particular, spotted melon, is one of the most profitable fruit crops for farmers in the international market. As the spot ratio impacts the melon’s visual appeal, it plays a significant role in shaping consumers’ initial impressions and influencing their decision to purchase a spotted melon. However, accurately determining the spot area on a melon’s skin is challenging due to the diverse sizes and colors of these spots among different types of melons. In this study, the novel networks based on UNet model have been proposed to accurately determine the spot area on melon skins after harvesting. First, Mask R-CNN model was employed to isolate the melons from unwanted objects and backgrounds. Then, the novel variants of the Atrous Spatial Pyramid Pooling (ASPP) and Waterfall Atrous Spatial Pooling (WASP) were developed based on the multi-head self-attention (MHSA) approach to efficiently enhance the original structures. Finally, the proposed modules were integrated into VGG16-UNet network to segment melons’ spots on its skin. The experimental results demonstrate that the proposed methods yielded promising outcomes, achieving a mean IoU of 89.86% and an accuracy of 99.45% across all classes. Moreover, it outperformed other existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Fruit ripeness identification using YOLOv8 model

Article Open access 31 August 2023

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Data availability

The datasets generated during the current study are available from the corresponding author upon reasonable request.

References

A.M. Herrero, Raman spectroscopy a promising technique for quality assessment of meat and fish: A review. Food Chem. 107(4), 1642–1651 (2008)
Article CAS Google Scholar
J. Dong, Q. Chen, S. Yan, A. Yuille, Towards unified object detection and semantic segmentation. In: European Conference on Computer Vision, pp. 299–314. Springer, Cham (2014)
S. Gidaris, N. Komodakis, Object detection via a multi-region and semantic segmentation-aware CNN model, in Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
J. Liang, N. Homayounfar, W.-C. Ma, Y. Xiong, R. Hu, R. Urtasun, Polytransform: deep polygon transformer for instance segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9131–9140 (2020)
N. Saovana, N. Yabuki, T. Fukuda, Automated point cloud classification using an image-based instance segmentation for structure from motion. Autom. Constr. 129, 103804 (2021)
Article Google Scholar
A. Francis Alexander Raghu, J.P. Ananth, Robust object detection and localization using semantic segmentation network. Comput. J. 64(10), 1531–1548 (2021)
Article Google Scholar
R. Singh, R. Rani, Semantic segmentation using deep convolutional neural network: a review, in Proceedings of the International Conference on Innovative Computing & Communications (ICICC) (2020)
R. Yang, Y. Yu, Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis. Front. Oncol. 11, 573 (2021)
Google Scholar
A. Anagnostis, A.C. Tagarakis, D. Kateris, V. Moysiadis, C.G. Sørensen, S. Pearson, D. Bochtis, Orchard mapping with deep learning semantic segmentation. Sensors 21(11), 3813 (2021)
Article PubMed PubMed Central Google Scholar
D. Tian, Y. Han, B. Wang, T. Guan, H. Gu, W. Wei, Review of object instance segmentation based on deep learning. J. Electron. Imaging 31(4), 041205 (2021)
Article Google Scholar
A.M. Hafiz, G.M. Bhat, A survey on instance segmentation: state of the art. Int. J. Multimedia Inf. Retrieval 9(3), 171–189 (2020)
Article Google Scholar
W. Gu, S. Bai, L. Kong, A review on 2D instance segmentation based on deep neural networks. Image Vis. Comput. 120, 104401 (2022)
Article Google Scholar
Q. Zhang, X. Chang, S.B. Bian, Vehicle-damage-detection segmentation algorithm based on improved mask RCNN. IEEE Access 8, 6997–7004 (2020)
Article Google Scholar
R. Mohan, A. Valada, EfficientPS: efficient panoptic segmentation. Int. J. Comput. Vis. 129(5), 1551–1579 (2021)
Article Google Scholar
T. Masuda, Leaf area estimation by semantic segmentation of point cloud of tomato plants, in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1381–1389 (2021)
S. Bargoti, J.P. Underwood, Image segmentation for fruit detection and yield estimation in apple orchards. Journal of Field Robotics 34(6), 1039–1060 (2017)
Article Google Scholar
H. Kang, C. Chen, Fruit detection and segmentation for apple harvesting using visual sensor in orchards. Sensors 19(20), 4599 (2019)
Article PubMed PubMed Central Google Scholar
H. Kang, C. Chen, Fruit detection, segmentation and 3d visualisation of environments in apple orchards. Comput. Electron. Agric. 171, 105302 (2020)
Article Google Scholar
A.M. Mostafa, S.A. Kumar, T. Meraj, H.T. Rauf, A.A. Alnuaim, M.A. Alkhayyal, Guava disease detection using deep convolutional neural networks: A case study of guava plants. Appl. Sci. 12(1), 239 (2021)
Article Google Scholar
H. Mureşan, M. Oltean, Fruit recognition from images using deep learning (2017). arXiv preprint at arXiv:1712.00580
T.B. Shahi, C. Sitaula, A. Neupane, W. Guo, Fruit classification using attention-based mobilenetv2 for industrial applications. PLoS ONE 17(2), 0264586 (2022)
Article Google Scholar
K. Sun, X. Wang, S. Liu, C. Liu, Apple, peach, and pear flower detection using semantic segmentation network and shape constraint level set. Comput. Electron. Agric. 185, 106150 (2021)
Article Google Scholar
P. Ganesh, K. Volle, T. Burks, S. Mehta, Deep orange: mask R-CNN based orange detection and segmentation. IFAC-PapersOnLine 52(30), 70–75 (2019)
Article Google Scholar
X. Longye, W. Zhuo, L. Haishen, K. Xilong, Y. Changhui, Overlapping citrus segmentation and reconstruction based on mask R-CNN model and concave region simplification and distance analysis. J. Phys. Conf. Ser. 1345, 032064 (2019)
Article Google Scholar
Y. Yu, K. Zhang, L. Yang, D. Zhang, Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN. Comput. Electron. Agric. 163, 104846 (2019)
Article Google Scholar
X. Liu, D. Zhao, W. Jia, W. Ji, C. Ruan, Y. Sun, Cucumber fruits detection in greenhouses based on instance segmentation. IEEE Access 7, 139635–139642 (2019)
Article Google Scholar
M. Afonso, H. Fonteijn, F.S. Fiorentin, D. Lensink, M. Mooij, N. Faber, G. Polder, R. Wehrens, Tomato fruit detection and counting in greenhouses using deep learning. Front. Plant Sci. (2020). https://doi.org/10.3389/fpls.2020.571299
Article PubMed PubMed Central Google Scholar
W. Jia, Y. Tian, R. Luo, Z. Zhang, J. Lian, Y. Zheng, Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot. Comput. Electron. Agric. 172, 105380 (2020)
Article Google Scholar
X. Ni, C. Li, H. Jiang, F. Takeda, Deep learning image segmentation and extraction of blueberry fruit traits associated with harvestability and yield. Hortic. Res. 7, 110 (2020)
Article CAS PubMed PubMed Central Google Scholar
M. Fukuda, T. Okuno, S. Yuki, Central object segmentation by deep learning for fruits and other roundish objects (2020). arXiv preprint at arXiv:2008.01251
A. Khan, T. Ilyas, M. Umraiz, Z.I. Mannan, H. Kim, CED-NET: crops and weeds segmentation for smart farming using a small cascaded encoder-decoder architecture. Electronics 9(10), 1602 (2020)
Article Google Scholar
L. Hashemi-Beni, A. Gebrehiwot, Deep learning for remote sensing image classification for agriculture applications. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 44, 51–54 (2020)
Article Google Scholar
M. Fukuda, T. Okuno, S. Yuki, Central object segmentation by deep learning to continuously monitor fruit growth through RGB images. Sensors 21(21), 6999 (2021)
Article PubMed PubMed Central Google Scholar
A. Taravat, M.P. Wagner, R. Bonifacio, D. Petit, Advanced fully convolutional networks for agricultural field boundary detection. Remote Sens. 13(4), 722 (2021)
Article Google Scholar
Q. Li, W. Jia, M. Sun, S. Hou, Y. Zheng, A novel green apple segmentation algorithm based on ensemble U-Net under complex orchard environment. Comput. Electron. Agric. 180, 105900 (2021)
Article Google Scholar
K. Roy, S.S. Chaudhuri, S. Pramanik, Deep learning based real-time industrial framework for rotten and fresh fruit detection using semantic segmentation. Microsyst. Technol. 27(9), 3365–3375 (2021)
Article Google Scholar
T. Van De Looverbosch, E. Raeymaekers, P. Verboven, J. Sijbers, B. Nicolai, Non-destructive internal disorder detection of conference pears by semantic segmentation of X-ray CT scans using deep learning. Expert Syst. Appl. 176, 114925 (2021)
Article Google Scholar
G. Lin, Y. Tang, X. Zou, C. Wang, Three-dimensional reconstruction of guava fruits and branches using instance segmentation and geometry analysis. Comput. Electron. Agric. 184, 106107 (2021)
Article Google Scholar
P. Chu, Z. Li, K. Lammers, R. Lu, X. Liu, Deep learning-based apple detection using a suppression mask R-CNN. Pattern Recogn. Lett. 147, 206–211 (2021)
Article Google Scholar
D. Wang, D. He, Fusion of mask R-CNN and attention mechanism for instance segmentation of apples under complex background. Comput. Electron. Agric. 196, 106864 (2022)
Article Google Scholar
J. Lv, H. Xu, L. Xu, Y. Gu, H. Rong, L. Zou, An image rendering-based identification method for apples with different growth forms. Comput. Electron. Agric. 211, 108040 (2023)
Article Google Scholar
T.-T. Ho, T. Hoang, K.-D. Tran, Y. Huang, N.Q.K. Le, Non-destructive classification of melon sweetness levels using segmented rind properties based on semantic segmentation models. J. Food Meas. Charact. 17, 5913–5928 (2023)
Article Google Scholar
Z. Li, X. Deng, Y. Lan, C. Liu, J. Qing, Fruit tree canopy segmentation from UAV orthophoto maps based on a lightweight improved U-Net. Comput. Electron. Agric. 217, 108538 (2024)
Article Google Scholar
C. Qian, H. Liu, T. Du, S. Sun, W. Liu, R. Zhang, An improved u-net network-based quantitative analysis of melon fruit phenotypic characteristics. J. Food Meas. Charact. 16(5), 4198–4207 (2022)
Article Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems, vol. 30 (2017)
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article PubMed Google Scholar
B. Artacho, A. Savakis, Waterfall atrous spatial pooling architecture for efficient semantic segmentation. Sensors 19(24), 5361 (2019)
Article PubMed PubMed Central Google Scholar
S.-H.M. Ashtiani, S. Javanmardi, M. Jahanbanifard, A. Martynenko, F.J. Verbeek, Detection of mulberry ripeness stages using deep learning models. IEEE Access 9, 100380–100394 (2021)
Article Google Scholar
W. Zhao, H. Zhang, Y. Yan, Y. Fu, H. Wang, A semantic segmentation algorithm using FCN with combination of BSLIC. Appl. Sci. 8(4), 500 (2018)
Article Google Scholar
M.A. Al-Masni, M.A. Al-Antari, M.-T. Choi, S.-M. Han, T.-S. Kim, Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks. Comput. Methods Programs Biomed. 162, 221–231 (2018)
Article PubMed Google Scholar
F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions (2015). arXiv preprint at arXiv:1511.07122
L.-C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint at arXiv:1706.05587
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
R. Augustauskas, A. Lipnickas, Improved pixel-level pavement-defect segmentation using a deep autoencoder. Sensors 20(9), 2557 (2020)
Article PubMed PubMed Central Google Scholar
Y. Wang, B. Liang, M. Ding, J. Li, Dense semantic labeling with atrous spatial pyramid pooling and decoder for high-resolution remote sensing imagery. Remote Sens. 11(1), 20 (2018)
Article Google Scholar
G. Chen, C. Li, W. Wei, W. Jing, M. Woźniak, T. Blažauskas, R. Damaševičius, Fully convolutional neural network with augmented atrous spatial pyramid pool and fully connected fusion path for high resolution remote sensing image segmentation. Appl. Sci. 9(9), 1816 (2019)
Article Google Scholar
P. Zhang, Y. Ke, Z. Zhang, M. Wang, P. Li, S. Zhang, Urban land use and land cover classification using novel deep learning models based on high spatial resolution satellite imagery. Sensors 18(11), 3717 (2018)
Article PubMed PubMed Central Google Scholar
Y.B. Guo, B. Matuszewski, Giana polyp segmentation with fully convolutional dilation neural networks, in Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (SCITEPRESS-Science and Technology Publications, 2019), pp. 632–641
V. Badrinarayanan, A. Kendall, R. Cipolla, SEGNET: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article PubMed Google Scholar
K. He, G. Gkioxari,, P. Dollár, R. Girshick, Mask R-CNN, in Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2015), pp. 234–241
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint at arXiv:1409.1556
M. Iman, H.R. Arabnia, K. Rasheed, A review of deep transfer learning and recent advancements. Technologies 11(2), 40 (2023)
Article Google Scholar
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, IMAGENET: a large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (IEEE, 2009)
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: common objects in context, in European Conference on Computer Vision (Springer, 2014), pp. 740–755
M. Yang, K. Yu,, C. Zhang, Z. Li, K. Yang, DENSEASPP for semantic segmentation in street scenes, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692 (2018)
C. Balakrishna, S. Dadashzadeh, S. Soltaninejad, Automatic detection of lumen and media in the IVUS images using U-Net with VGG16 encoder (2018). arXiv preprint at arXiv:1806.07554
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFS (2014). arXiv preprint at arXiv:1412.7062

Download references

Funding

The work was supported by National Science and Technology Council of the Republic of China under grant NSTC 112-2222-E-032-001. The work was also supported by Academia Sinica under Grant AS-TP-110-M07.

Author information

Authors and Affiliations

Research Center for Information Technology Innovation, Academia Sinica, Taipei, 10607, Taiwan
Khoa-Dang Tran & Yennun Huang
Department of Computer Science and Information Engineering, TamKang University, New Taipei, 251301, Taiwan
Trang-Thi Ho
Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 106, Taiwan
Nguyen Quoc Khanh Le
College of Management, Yuan Ze University, Taoyuan, 32003, Taiwan
Le Quoc Tuan
Faculty of Information Technology, Quy Nhon University, Quy Nhon, Vietnam
Van Lam Ho

Authors

Khoa-Dang Tran
View author publications
You can also search for this author in PubMed Google Scholar
Trang-Thi Ho
View author publications
You can also search for this author in PubMed Google Scholar
Yennun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Quoc Khanh Le
View author publications
You can also search for this author in PubMed Google Scholar
Le Quoc Tuan
View author publications
You can also search for this author in PubMed Google Scholar
Van Lam Ho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Khoa-Dang Tran and Trang-Thi Ho; Methodology: Khoa-Dang Tran and Trang-Thi Ho; Formal analysis and investigation: Trang-Thi Ho, Khoa-Dang Tran and Yennun Huang; writing—original draft preparation: Khoa-Dang Tran and Trang-Thi Ho; writing—review and editing: Yennun Huang, Nguyen Quoc Khanh Le and Le Quoc Tuan; Supervision: Van Lam Ho. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Trang-Thi Ho.

Ethics declarations

Confict of interest

The authors do not declare any conflict of interest.

Research involving human and animal rights

This research did not contain any studies involving animal or human participants, nor did it take place on any private or protected areas. No specific permissions were required for corresponding locations.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tran, KD., Ho, TT., Huang, Y. et al. MASPP and MWASP: multi-head self-attention based modules for UNet network in melon spot segmentation. Food Measure (2024). https://doi.org/10.1007/s11694-024-02466-1

Download citation

Received: 24 August 2023
Accepted: 22 February 2024
Published: 29 March 2024
DOI: https://doi.org/10.1007/s11694-024-02466-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MASPP and MWASP: multi-head self-attention based modules for UNet network in melon spot segmentation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Fruit ripeness identification using YOLOv8 model

Attention mechanisms in computer vision: A survey

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Confict of interest

Research involving human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MASPP and MWASP: multi-head self-attention based modules for UNet network in melon spot segmentation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Fruit ripeness identification using YOLOv8 model

Attention mechanisms in computer vision: A survey

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Confict of interest

Research involving human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation