MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition

Feng, Zhengquan; Li, Xiaochao; Li, Yun

doi:10.1007/s00530-024-01354-4

MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition

Regular Paper
Published: 18 May 2024

Volume 30, article number 149, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Zhengquan Feng¹,
Xiaochao Li² &
Yun Li¹

64 Accesses
Explore all metrics

Abstract

Food and ingredient recognition emerges as a pivotal challenge in the domain of computer vision, particularly pertinent to multimedia systems applications. To exploit the intricate relationships between foods and their constituent ingredients, this paper introduces a novel approach termed Multi-Task Knowledge Graph Reasoning for Food and Ingredient Recognition (MTKGR). By integrating a multi-task convolutional neural network model with knowledge graph reasoning, MTKGR achieves significant breakthroughs in ingredient recognition on the ETH Food-101 and Ingredient-101 datasets, propelling the Micro-F1 and Macro-F1 scores to new state-of-the-art heights with improvements of 2.23% and 0.83%, respectively. Specifically, the multi-task model performs joint food and ingredient recognition, while the knowledge graph captures associations between food and ingredient entities. Knowledge graph reasoning is then applied to reconcile errors and inconsistencies in the multi-task model’s predictions. Our proposed Precision and Logits Ranking technique identifies the optimal food-ingredient label combination with maximum concordance with the knowledge graph. The substantial results not only demonstrate MTKGR’s potential in food and ingredient recognition but also showcase the value of fusing deep learning and symbolic reasoning for enhanced visual understanding and intelligent analysis within multimedia systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition

Article 24 December 2023

Cross-modal recipe retrieval with stacked attention model

Article 17 April 2018

A Model for Automated Food Logging Through Food Recognition and Attribute Estimation Using Deep Learning

Data availability

ETH Food-101 dataset is available at https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/, Vireo Food-172 can be applied at http://vireo.cs.cityu.edu.hk/VireoFood172/.

Notes

References

Tian, Y., Zhang, C., Guo, Z., Ma, Y., Metoyer, R., Chawla, N.: Recipe2vec: Multi-modal recipe representation learning with graph neural networks. In: International Joint Conference on Artificial Intelligence (2022). https://api.semanticscholar.org/CorpusID:249062592
Haussmann, S., Seneviratne, O., Chen, Y., Ne’eman, Y., Codella, J., Chen, C.-H., McGuinness, D.L., Zaki, M.J.: FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation, pp. 146–162 (2019). https://doi.org/10.1007/978-3-030-30796-7_10
Chen, J., Zhu, B., Ngo, C.-W., Chua, T.-S., Jiang, Y.-G.: A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing, 1514–1526 (2021) https://doi.org/10.1109/tip.2020.3045639
Chen, J., Ngo, C.-w.: Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 24th ACM International Conference on Multimedia (2016). https://doi.org/10.1145/2964284.2964315
Peng, Y., He, X., Zhao, J.: Object-part attention model for fine-grained image classification. IEEE Trans. Image Process., 1487–1500 (2018) https://doi.org/10.1109/tip.2017.2774041
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014)
Bolaños, M., Ferrá, A., Radeva, P.: Food ingredients recognition through multi-label learning. arXiv: Computer Vision and Pattern Recognition (2017)
Yang, S., Chen, M., Pomerleau, D., Sukthankar, R.: Food recognition using statistics of pairwise local features. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). https://doi.org/10.1109/cvpr.2010.5539907
Martinel, N., Foresti, G.L., Micheloni, C.: Wide-slice residual networks for food recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018). https://doi.org/10.1109/wacv.2018.00068
Ming, Z.-Y., Chen, J., Cao, Y., Forde, C., Ngo, C.-W., Chua, T.S.: Food Photo Recognition for Dietary Tracking: System and Experiment, pp. 129–141 (2018). https://doi.org/10.1007/978-3-319-73600-6_12
Yanai, K., Kawano, Y.: Food image recognition using deep convolutional network with pre-training and fine-tuning. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (2015). https://doi.org/10.1109/icmew.2015.7169816
Kagaya, H., Aizawa, K., Ogawa, M.: Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014). https://doi.org/10.1145/2647868.2654970
Zhang, X., Zhou, F., Lin, Y., Zhang, S.: Embedding label structures for fine-grained feature representation. Cornell University - arXiv, Cornell University - arXiv (2015)
Google Scholar
Zhou, F., Lin, Y.: Fine-grained image classification by exploring bipartite-graph labels. arXiv: Computer Vision and Pattern Recognition, arXiv: Computer Vision and Pattern Recognition (2015)
Chen, J., Zhu, B., Ngo, C.-W., Chua, T.-S., Jiang, Y.-G.: A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing, 1514–1526 (2021) https://doi.org/10.1109/tip.2020.3045639
Bolaños, M., Ferrá, A., Radeva, P.: Food ingredients recognition through multi-label learning. arXiv: Computer Vision and Pattern Recognition,arXiv: Computer Vision and Pattern Recognition (2017)
Pan, L., Li, C., Pouyanfar, S., Chen, R., Zhou, Y.: A novel combinational convolutional neural network for automatic food-ingredient classification. Comput Mater Continua 62(2), 731–746 (2020) https://doi.org/10.32604/cmc.2020.06508
Zhang, X.-J., Lu, Y.-F., Zhang, S.-H.: Multi-task learning for food identification and analysis with deep convolutional neural networks. J. Comput. Sci. Technol. 31, 489–500 (2016). https://doi.org/10.1007/s11390-016-1642-6
Article Google Scholar
Wang, Z., Min, W., Li, Z., Kang, L., Wei, X., Wei, X., Jiang, S.: Ingredient-guided region discovery and relationship modeling for food category-ingredient prediction. IEEE Trans. Image Process 31, 5214–5226 (2022). https://doi.org/10.1109/tip.2022.3193763
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., Polosukhin, I.: Attention is All You Need. Neural Information Processing Systems, Neural Information Processing Systems (2017)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv: Computer Vision and Pattern Recognition,arXiv: Computer Vision and Pattern Recognition (2020)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.00986
Zhu, Y., Liu, L., Tian, J.: Learn more for food recognition via progressive self-distillation. In: AAAI Conference on Artificial Intelligence (2023). https://api.semanticscholar.org/CorpusID:257427514
Zhu, Y., Zhao, X., Zhao, C., Wang, J., Lu, H.: Food det: Detecting foods in refrigerator with supervised transformer network. Neurocomputing 379, 162–171 (2020). https://doi.org/10.1016/j.neucom.2019.10.106
Article Google Scholar
Vadivambal, R., Jayas, D.S.: Applications of thermal imaging in agriculture and food industry-a review. Food Bioprocess Technol 186–199 (2011) https://doi.org/10.1007/s11947-010-0333-5
Steinbrener, J., Posch, K., Leitner, R.: Hyperspectral fruit and vegetable classification using convolutional neural networks. Comput Electron Agricult 364–372 (2019) https://doi.org/10.1016/j.compag.2019.04.019
Liang, K.Y., Meng, L., Liu, M., Liu, Y., Tu, W., Wang, S., Zhou, S., Liu, X., Sun, F.: A survey of knowledge graph reasoning on graph types: Static, dynamic, and multimodal. (2022). https://api.semanticscholar.org/CorpusID:257220329
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. Le Centre pour la Communication Scientifique Directe - HAL - Inria, Le Centre pour la Communication Scientifique Directe - HAL - Inria (2013)
Google Scholar
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. Proc. AAAI Conf. Artificial Intell. (2022). https://doi.org/10.1609/aaai.v29i1.9491
Article Google Scholar
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. Proc. AAAI Conf. Artificial Intell. (2022). https://doi.org/10.1609/aaai.v28i1.8870
Article Google Scholar
Li, L., Yatskar, M., Yin, D., Hsieh, C.-J., Chang, K.-W.: Visualbert: a simple and performant baseline for vision and language. Cornell University - arXiv, Cornell University - arXiv (2019)
Google Scholar
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., Wei, F., Guo, B.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/cvpr52688.2022.01170
Huang, Y.-X., Dai, W.-Z., Cai, L.-W., Muggleton, S., Jiang, Y.: Fast abductive learning by similarity-based consistency optimization. Neural Information Processing Systems (2021)
Lin, B., Zhang, Y.: Libmtl: A python library for multi-task learning. ArXiv abs/2203.14338 (2022)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009). https://doi.org/10.1109/cvpr.2009.5206848
Caruana, R.A.: Multitask learning: a knowledge-based source of inductive bias, pp. 41–48 (1993). https://doi.org/10.1016/b978-1-55860-307-3.50012-5
Pandey, P., Deepthi, A., Mandal, B., Puhan, N.B.: FoodNet: recognizing foods using ensemble of deep networks. IEEE Signal Process Lett. 24(12), 1758–1762 (2017). https://doi.org/10.1109/lsp.2017.2758862
Article Google Scholar
Liu, C., Cao, Y., Luo, Y., Chen, G., Vokkarane, V., Ma, Y.: Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In: International Conference on Smart Homes and Health Telematics (2016). https://api.semanticscholar.org/CorpusID:14068125
Bolanos, M., Radeva, P.: Simultaneous food localization and recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR) (2016). https://doi.org/10.1109/icpr.2016.7900117
Hassannejad, H., Matrella, G., Ciampolini, P., De Munari, I., Mordonini, M., Cagnoni, S.: Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management (2016). https://doi.org/10.1145/2986035.2986042
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011–2023 (2020) https://doi.org/10.1109/tpami.2019.2913372
Martinel, N., Foresti, G.L., Micheloni, C.: Wide-slice residual networks for food recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018). https://doi.org/10.1109/wacv.2018.00068
Weiqing, M., Linhu, L., Zhengdong, L., Shuqiang, J.: Ingredient-guided cascaded multi-attention network for food recognition. ACM Proceedings,ACM Proceedings (2019)
Jiang, S., Min, W., Liu, L., Luo, Z.: Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans. Image Process. 29, 265–276 (2020). https://doi.org/10.1109/tip.2019.2929447
Article MathSciNet Google Scholar
Min, W., Liu, L., Wang, Z., Luo, Z.-D., Wei, X., Wei, X., Jiang, S.: Isia food-500: a dataset for large-scale food recognition via stacked global-local attention network. Cornell University - arXiv, Cornell University - arXiv (2020)
Google Scholar
Chen, Z.-M., Wei, X.-S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00532
Vu, X.-S., Le, D.-T., Edlund, C., Jiang, L., Nguyen, H.D.: Privacy-preserving visual content tagging using graph transformer networks. In: Proceedings of the 28th ACM International Conference on Multimedia (2020). https://doi.org/10.1145/3394171.3414047
Zhou, F., Huang, S., Xing, Y.: Deep semantic dictionary learning for multi-label image classification. Proceedings of the AAAI Conference on Artificial Intelligence, 3572–3580 (2022) https://doi.org/10.1609/aaai.v35i4.16472
Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-Manor, L.: Asymmetric loss for multi-label classification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.00015
Liang, H., Wen, G., Hu, Y., Luo, M., Yang, P., Xu, Y.: Mvanet: Multi-task guided multi-view attention network for chinese food recognition. IEEE Trans. Multimedia 23, 3551–3561 (2021). https://doi.org/10.1109/tmm.2020.3028478
Article Google Scholar
Liu, C., Liang, Y., Xue, Y., Qian, X., Fu, J.: Food and ingredient joint learning for fine-grained recognition. IEEE Trans. Circ. Syst. Video Technol. 31(6), 2480–2493 (2021). https://doi.org/10.1109/tcsvt.2020.3020079
Article Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Nanjing, 210023, Jiangsu, China
Zhengquan Feng & Yun Li
School of Cyber Science and Engineering, Nanjing University of Science and Technology, No.200 Xiaolingwei Street, Nanjing, 210094, Jiangsu, China
Xiaochao Li

Authors

Zhengquan Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochao Li
View author publications
You can also search for this author in PubMed Google Scholar
Yun Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A wrote the main manuscript, AB prepared the experiments data. BC reviewed the manuscript.

Corresponding author

Correspondence to Yun Li.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

Not applicable.

Additional information

Communicated by X. Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Feng, Z., Li, X. & Li, Y. MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition. Multimedia Systems 30, 149 (2024). https://doi.org/10.1007/s00530-024-01354-4

Download citation

Received: 14 September 2023
Accepted: 08 May 2024
Published: 18 May 2024
DOI: https://doi.org/10.1007/s00530-024-01354-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition

Abstract

Access this article

Similar content being viewed by others

Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition

Cross-modal recipe retrieval with stacked attention model

A Model for Automated Food Logging Through Food Recognition and Attribute Estimation Using Deep Learning

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition

Abstract

Access this article

Similar content being viewed by others

Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition

Cross-modal recipe retrieval with stacked attention model

A Model for Automated Food Logging Through Food Recognition and Attribute Estimation Using Deep Learning

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation