Real-time and accurate model of instance segmentation of foods

Fan, Yuhe; Zhang, Lixun; Zheng, Canxing; Zu, Yunqin; Wang, Keyi; Wang, Xingyuan

doi:10.1007/s11554-024-01459-z

Real-time and accurate model of instance segmentation of foods

Research
Published: 30 April 2024

Volume 21, article number 80, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Yuhe Fan¹,
Lixun Zhang¹,
Canxing Zheng²,
Yunqin Zu¹,
Keyi Wang¹ &
…
Xingyuan Wang¹

133 Accesses
Explore all metrics

Abstract

Instance segmentation of foods is an important technology to ensure the food success rate of meal-assisting robotics. However, due to foods have strong intraclass variability, interclass similarity, and complex physical properties, which leads to more challenges in recognition, localization, and contour acquisition of foods. To address the above issues, this paper proposed a novel method for instance segmentation of foods. Specifically, in backbone network, deformable convolution was introduced to enhance the ability of YOLOv8 architecture to capture finer-grained spatial information, and efficient multiscale attention based on cross-spatial learning was introduced to improve sensitivity and expressiveness of multiscale inputs. In neck network, classical convolution and C2f modules were replaced by lightweight convolution GSConv and improved VoV-GSCSP aggregation module, respectively, to improve inference speed of models. We abbreviated it as the DEG-YOLOv8n-seg model. The proposed method was compared with baseline model and several state-of-the-art (SOTA) segmentation models on datasets, respectively. The results show that the DEG-YOLOv8n-seg model has higher accuracy, faster speed, and stronger robustness. Specifically, the DEG-YOLOv8n-seg model can achieve 84.6% Box_mAP@0.5 and 84.1% Mask_mAP@0.5 accuracy at 55.2 FPS and 11.1 GFLOPs. The importance of adopting data augmentation and the effectiveness of introducing deformable convolution, EMA, and VoV-GSCSP were verified by ablation experiments. Finally, the DEG-YOLOv8n-seg model was applied to experiments of food instance segmentation for meal-assisting robots. The results show that the DEG-YOLOv8n-seg can achieve better instance segmentation of foods. This work can promote the development of intelligent meal-assisting robotics technology and can provide theoretical foundations for other tasks of the computer vision field with some reference value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiclass Semantic Segmentation of Mediterranean Food Images

UEC-FoodPix Complete: A Large-Scale Food Image Segmentation Dataset

Benchmarking algorithms for food localization and semantic segmentation

Article 24 June 2020

Data availability

Data will be made available on request.

References

Daehyung, P., Yuuna, H., Charles, C.: A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder. IEEE Rob. Autom. Lett. 3(3), 1544–1551 (2018)
Article Google Scholar
Jihyeon, H., Sangin, P., Chang-Hwan, I., Laehyun, K.: A hybrid brain–computer interface for real-life food assist robot control. Sensors 21, 4578 (2021)
Article Google Scholar
Nabil, E., Aman, B.: A learning from demonstration framework for implementation of a feeding task. Ency. Semant. Comput. Robot Intell. 2(1), 1850001 (2018)
Article Google Scholar
Tejas, K., Maria, K., Graser, A.: Application of reinforcement learning to a robotic drinking assistant. Robotics 9(1), 1–15 (2019)
Article Google Scholar
Fei, L., Hongliu, Y., Wentao, W., Changcheng, Q.: I-feed: a robotic platform of an assistive feeding robot for the disabled elderly population. Technol. Health Care 2, 1–5 (2020)
Google Scholar
Fei, L., Peng, X., Hongliu, Y.: Robot-assisted feeding: a technical application that combines learning from demonstration and visual interaction. Technol. Health Care 1, 1–6 (2020)
Google Scholar
Yuhe, F., Lixun, Z., Xingyuan, W., Keyi, W., Lan, W., Zhenhan, W., Feng, X., Jinghui, Z., Chao, W.: Rheological thixotropy and pasting properties of food thickening gums orienting at improving food holding rate. Appl. Rheol. 32, 100–121 (2022)
Article Google Scholar
Yuhe, F., Lixun, Z., Jinghui, Z., Yunqin, Z., Xingyuan, W.: Viscoelasticity and friction of solid foods measurement by simulating meal-assisting robot. Int. J. Food Prop. 25(1), 2301–2319 (2022)
Article Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Xingyuan, W., Keyi, W., Jinghui, Z.: Motion behavior of non-Newtonian fluid-solid interaction foods. J. Food Eng. 347, 111448 (2023)
Article Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Feng, X., Zhenhan, W., Xingyuan, W., Lan, W.: Contact forces and motion behavior of non-Newtonian fluid–solid food by coupled SPH–FEM method. J. Food Sci. 88(6), 2536–2556 (2023)
Article Google Scholar
Weng, Z., Meng, F., Liu, S., Zhang, Y., Zheng, Z., Gong, C.: Cattle face recognition based on a two-branch convolutional neural network. Comput. Electron. Agric. 196, 106871 (2022)
Article Google Scholar
Jinhai, W., Zongyin, Z., Lufeng, L., Huiling, W., Wei, W., Mingyou, C., Shaoming, L.: DualSeg: Fusing transformer and CNN structure for image segmentation in complex vineyard environment. Comput. Electron. Agr. 206, 107682 (2023)
Article Google Scholar
Chan, Z., Pengfei, C., Jing, P., Xiaofan, Y., Changxin, C., Shuqin, T., Yueju, X.: A mango picking vision algorithm on instance segmentation and key point detection from RGB images in an open orchard. Biosyst. Eng. 206, 32–54 (2021)
Article Google Scholar
Jordi, G., Mar, F., Eduard, G., Jochen, H., JosepRamon, M.: Looking behind occlusions: a study on a modal segmentation for robust on-tree apple fruit size estimation. Comput. Electron. Agr. 209, 107854 (2023)
Article Google Scholar
Dandan, W., Dongjian, H.: Fusion of Mask R-CNN and attention mechanism for instance segmentation of apples under complex background. Comput. Electron. Agr. 196, 106864 (2022)
Article Google Scholar
Ang, W., Juanhua, Z., Taiyong, R.: Detection of apple defect using laser-induced light backscattering imaging and convolutional neural network. Comput. Electr. Eng. 81, 106454 (2020)
Article Google Scholar
Tian, Y., Yang, G., Wang, Z., Li, E., Liang, Z.: Instance segmentation of apple flowers using the improved mask R-CNN model. Biosyst. Eng. 193, 264–278 (2020)
Article Google Scholar
Olarewaju, M.: YOLOv5-LiNet: a lightweight network for fruits instance segmentation. PLoS ONE 18(3), e0282297 (2023)
Article Google Scholar
Rajdeep, K., Rakesh, K., Meenu, G.: Food Image-based diet recommendation framework to overcome PCOS problem in women using deep convolutional neural network. Comput. Electr. Eng. 103, 108298 (2022)
Article Google Scholar
Zhu, L., Li, Z., Li, C., Wu, J., Yue, J.: High performance vegetable classification from images based on Alexnet deep learning model. Int. J. Agr. Biol. Eng. 11(4), 217–223 (2018)
Google Scholar
Haozan, L., Guihua, W., Yang, H., Mingnan, L., Pei, Y., Yingxue, X.: MVANet: multi-task guided multi-view attention network for Chinese food recognition. IEEE T. Multimedia 23, 3551–3561 (2021)
Article Google Scholar
Eduardo, A., Bhalaji, N., Beatriz, R., Petia, R.: Bayesian deep learning for semantic segmentation of food images. Comput. Electr. Eng. 103, 108380 (2022)
Article Google Scholar
Liu, Y., Han, Z., Liu, X., Wang, J., Wang, C., Liu, R.: Estimation method and research of fruit glycemic load index based on the fusion SE module faster R-CNN. Comput. Electr. Eng. 109, 108696 (2023)
Article Google Scholar
Tang, Z., Zhou, L., Qi, F., Chen, H.: An improved lightweight and real-time YOLOv5 network for detection of surface defects on indocalamus leaves. J. Real-Time Image Pr. 20(14), 1–14 (2023)
Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Yunqin, Z., Xingyuan, W., Jinghui, Z.: Real-time and accurate meal detection for meal-assisting robots. J. Food Eng. 371, 111996 (2024)
Article Google Scholar
Lingling, F., Hanyu, Z., Jiaxin, Z., Xianghai, W.: Image classification with an RGB-channel nonsubsampled contourlet transform and a convolutional neural network. Neurocomputing 396, 266–277 (2020)
Article Google Scholar
Yu, F., Xinxing, L., Yinggang, Z., Tianhua, X.: Detection of Atlantic salmon residues based on computer vision. J. Food Eng. 358, 111658 (2023)
Article Google Scholar
Kunshan, Y., Jun, S., Chen, C., Min, X., Xin, Z., Yan, C., Yan, T.: Non-destructive detection of egg qualities based on hyperspectral imaging. J. Food Eng. 325, 111024 (2022)
Article Google Scholar
Li, W., Mao, S., Mahoney, A., Petkovic, S., Coyle, J., Sejdic, E.: Deep learning models for bolus segmentation in videofuoroscopic swallow studies. J. Real-Time Image Pr. 21(18), 1–10 (2024)
Google Scholar
Yousong, Z., Xu, Z., Chaoyang, Z., Jinqiao, W., Hanqing, L.: Food det: Detecting foods in refrigerator with supervised transformer network. Neurocomputing 379, 162–171 (2020)
Article Google Scholar
Glenn, J.: Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics (2023)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conf. Comput. Vis. Pattern. Recognit., pp. 779–788 (2016)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: IEEE Int. Conf. Comput. Vis., pp. 764–773 (2017)
Daliang, O., Su, H., Guozhong, Z., Mingzhu, L., Huaiyong, G., Jian, Z., Zhijie, H.: Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In: ICASSP, pp. 1–5 (2023)
Hulin, L., Jun, L., Hanbing, W., Zheng, L., Zhenfei, Z., Qiliang, R.: Slim-neck by GsConv: A better design paradigm of detector architectures for autonomous vehicles. In: IEEE Conf. Comput. Vis. Pattern. Recognit., pp. 1–17 (2022)
Tianhua, L., Meng, S., Qinghai, H., Guanshan, Z., Guoying, S., Xiaoming, D., Sen, L.: Tomato recognition and location algorithm based on improved YOLOv5. Comput. Electron. Agr. 208, 107759 (2023)
Article Google Scholar
Wang, C., Bochkovskiy, A., Liao, H.: YOLOv7: Trainable bag-of-freebiessets new state-of-the-art for real-time object detectors. In: IEEE Conf. Comput. Vis. Pattern. Recognit (2022)
Glenn, J.: YOLOv5 release v6.1, https://github.com/ultralytics/yolov5/releases/tag/v6.1 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conf. Comput. Vis. Pattern. Recognit., pp. 770–778 (2016)
Wenjie, Y., Jiachun, W., Jinlai, Z., Kai, G., Ronghua, D., Zhuo, W., Eksan, F., Dingwen, L.: Deformable convolution and coordinate attention for fast cattle detection. Comput. Electron. Agric. 211, 108006 (2023)
Article Google Scholar
Chilukuri, D., Yi, S., Seong, Y.: A robust object detection system with occlusion handling for mobile devices. Comput. Intell. 38(4), 1338–1364 (2022)
Article Google Scholar
Fang, H., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., Li, Y., Lu, C.: Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 7157–7173 (2022)
Article Google Scholar
Qibin, H., Daquan, Z., Jiashi, F.: Coordinate attention for efficient mobile network design. In: IEEE Conf. Comput. Vis. Pattern. Recognit., pp. 13708–13717 (2021)
Ryan, F., Youngsun, K., Gilwoo, L., Ethan, K.: Robot-assisted feeding: Generalizing skewering strategies across food items on a realistic plate. In: IEEE Conf. Comput. Vis. Pattern. Recognit. arXiv preprint: arXiv:1906.02350 (2021)
Girshick, R.: Fast r-cnn. In: IEEE Conf. Comput. Vis. Pattern. Recognit., pp. 1440–1448 (2015).
Tsungyi, L., Priya, G., Ross, G., Kaiming, H., Piotr, D.: Focal loss for dense object detection. In: IEEE Conf. Comput. Vis. Pattern. Recognit. arXiv:1708.02002 (2017)
Haoyang, Z., Ying, W., Feras, D., Niko, S.: VarifocalNet: An IoU-aware Dense Object Detector. In: IEEE Conf. Comput. Vis. Pattern. Recognit. arXiv:2008.13367v2 (2021)
Wada, K.: https://github.com/wkentaro/labelme (2020)
Jinlai, Z., Lyujie, C., Bo, O., Binbin, L., Jihong, Z., Yujing, C., Yanmei, M., Danfeng, W.: Pointcutmix: Regularization strategy for point cloud classification. In: IEEE Conf. Comput. Vis. Pattern. Recognit. arXiv:2101.01461 (2022)
Su, D., Kong, H., Qiao, Y., Sukkarieh, S.: Data augmentation for deep learning based semantic segmentation and crop-weed classification in agricultural robotics. Comput. Electron. Agric. 190, 106418 (2021)
Article Google Scholar
Shu, L., Lu, Q., Haifang, Q., Jianping, S., Jiaya, J.: Path aggregation network for instance segmentation. In: IEEE Conf. Comput. Vis. Pattern. Recognit. arXiv:1803.01534v4 (2018)
Chengyang, F., Mykhailo, S., Alexander, C.: RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free. In: IEEE Conf. Comput. Vis. Pattern. Recognit. arXiv:1901.03353v1 (2019)
Kaiming, H., Georgia, G., Piotr, D., Ross, G.: Mask R-CNN. In: IEEE Conf. Comput. Vis. Pattern. Recognit. (2018)
Daniel, B., Chong, Z., Fanyi, X., Yongjae, L.: YOLACT real-time instance segmentation. In: IEEE Conf. Comput. Vis. Pattern. Recognit. arXiv:1904.02689v2 (2019)

Download references

Acknowledgements

The research work is supported by National Key R&D Program of China under grant 2020YFC2007700 and Fundamental Research Funds for the Central Universities of China under grant 3072022CF0703.

Funding

National Key R&D Program of China, 2020YFC2007700, Fundamental Research Funds for the Central Universities of China, 3072022CF0703.

Author information

Authors and Affiliations

College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No. 61, Nantong Street No. 145, Harbin, 150001, China
Yuhe Fan, Lixun Zhang, Yunqin Zu, Keyi Wang & Xingyuan Wang
Weifang People’s Hospital, Weifang City, Shandong Province, China
Canxing Zheng

Authors

Yuhe Fan
View author publications
You can also search for this author in PubMed Google Scholar
Lixun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Canxing Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yunqin Zu
View author publications
You can also search for this author in PubMed Google Scholar
Keyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xingyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yuhe Fan: analysis, experiments, drafting, and revising. Lixun Zhang: funding, methods, reviewed, and revised. Canxing Zheng: results, collecting images, and datasets. Yunqin Zu: datasets and experiments. Keyi Wang: reviewed and revised. Xingyuan Wang: analysis and theory. All authors agree to be accountable for all aspects of the work.

Corresponding author

Correspondence to Lixun Zhang.

Ethics declarations

Conflict of interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fan, Y., Zhang, L., Zheng, C. et al. Real-time and accurate model of instance segmentation of foods. J Real-Time Image Proc 21, 80 (2024). https://doi.org/10.1007/s11554-024-01459-z

Download citation

Received: 29 February 2024
Accepted: 07 April 2024
Published: 30 April 2024
DOI: https://doi.org/10.1007/s11554-024-01459-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time and accurate model of instance segmentation of foods

Abstract

Access this article

Similar content being viewed by others

Multiclass Semantic Segmentation of Mediterranean Food Images

UEC-FoodPix Complete: A Large-Scale Food Image Segmentation Dataset

Benchmarking algorithms for food localization and semantic segmentation

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time and accurate model of instance segmentation of foods

Abstract

Access this article

Similar content being viewed by others

Multiclass Semantic Segmentation of Mediterranean Food Images

UEC-FoodPix Complete: A Large-Scale Food Image Segmentation Dataset

Benchmarking algorithms for food localization and semantic segmentation

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation