Skip to main content
Log in

MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Food and ingredient recognition emerges as a pivotal challenge in the domain of computer vision, particularly pertinent to multimedia systems applications. To exploit the intricate relationships between foods and their constituent ingredients, this paper introduces a novel approach termed Multi-Task Knowledge Graph Reasoning for Food and Ingredient Recognition (MTKGR). By integrating a multi-task convolutional neural network model with knowledge graph reasoning, MTKGR achieves significant breakthroughs in ingredient recognition on the ETH Food-101 and Ingredient-101 datasets, propelling the Micro-F1 and Macro-F1 scores to new state-of-the-art heights with improvements of 2.23% and 0.83%, respectively. Specifically, the multi-task model performs joint food and ingredient recognition, while the knowledge graph captures associations between food and ingredient entities. Knowledge graph reasoning is then applied to reconcile errors and inconsistencies in the multi-task model’s predictions. Our proposed Precision and Logits Ranking technique identifies the optimal food-ingredient label combination with maximum concordance with the knowledge graph. The substantial results not only demonstrate MTKGR’s potential in food and ingredient recognition but also showcase the value of fusing deep learning and symbolic reasoning for enhanced visual understanding and intelligent analysis within multimedia systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

ETH Food-101 dataset is available at https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/, Vireo Food-172 can be applied at http://vireo.cs.cityu.edu.hk/VireoFood172/.

Notes

  1. https://pytorch.org/docs/stable/generated/torch.nn.MultiLabelSoftMarginLoss.html.

  2. https://downshiftology.com/recipes/chicken-salad/.

References

  1. Tian, Y., Zhang, C., Guo, Z., Ma, Y., Metoyer, R., Chawla, N.: Recipe2vec: Multi-modal recipe representation learning with graph neural networks. In: International Joint Conference on Artificial Intelligence (2022). https://api.semanticscholar.org/CorpusID:249062592

  2. Haussmann, S., Seneviratne, O., Chen, Y., Ne’eman, Y., Codella, J., Chen, C.-H., McGuinness, D.L., Zaki, M.J.: FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation, pp. 146–162 (2019). https://doi.org/10.1007/978-3-030-30796-7_10

  3. Chen, J., Zhu, B., Ngo, C.-W., Chua, T.-S., Jiang, Y.-G.: A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing, 1514–1526 (2021) https://doi.org/10.1109/tip.2020.3045639

  4. Chen, J., Ngo, C.-w.: Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 24th ACM International Conference on Multimedia (2016). https://doi.org/10.1145/2964284.2964315

  5. Peng, Y., He, X., Zhao, J.: Object-part attention model for fine-grained image classification. IEEE Trans. Image Process., 1487–1500 (2018) https://doi.org/10.1109/tip.2017.2774041

  6. Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014)

  7. Bolaños, M., Ferrá, A., Radeva, P.: Food ingredients recognition through multi-label learning. arXiv: Computer Vision and Pattern Recognition (2017)

  8. Yang, S., Chen, M., Pomerleau, D., Sukthankar, R.: Food recognition using statistics of pairwise local features. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). https://doi.org/10.1109/cvpr.2010.5539907

  9. Martinel, N., Foresti, G.L., Micheloni, C.: Wide-slice residual networks for food recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018). https://doi.org/10.1109/wacv.2018.00068

  10. Ming, Z.-Y., Chen, J., Cao, Y., Forde, C., Ngo, C.-W., Chua, T.S.: Food Photo Recognition for Dietary Tracking: System and Experiment, pp. 129–141 (2018). https://doi.org/10.1007/978-3-319-73600-6_12

  11. Yanai, K., Kawano, Y.: Food image recognition using deep convolutional network with pre-training and fine-tuning. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (2015). https://doi.org/10.1109/icmew.2015.7169816

  12. Kagaya, H., Aizawa, K., Ogawa, M.: Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014). https://doi.org/10.1145/2647868.2654970

  13. Zhang, X., Zhou, F., Lin, Y., Zhang, S.: Embedding label structures for fine-grained feature representation. Cornell University - arXiv, Cornell University - arXiv (2015)

    Google Scholar 

  14. Zhou, F., Lin, Y.: Fine-grained image classification by exploring bipartite-graph labels. arXiv: Computer Vision and Pattern Recognition, arXiv: Computer Vision and Pattern Recognition (2015)

  15. Chen, J., Zhu, B., Ngo, C.-W., Chua, T.-S., Jiang, Y.-G.: A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing, 1514–1526 (2021) https://doi.org/10.1109/tip.2020.3045639

  16. Bolaños, M., Ferrá, A., Radeva, P.: Food ingredients recognition through multi-label learning. arXiv: Computer Vision and Pattern Recognition,arXiv: Computer Vision and Pattern Recognition (2017)

  17. Pan, L., Li, C., Pouyanfar, S., Chen, R., Zhou, Y.: A novel combinational convolutional neural network for automatic food-ingredient classification. Comput Mater Continua 62(2), 731–746 (2020) https://doi.org/10.32604/cmc.2020.06508

  18. Zhang, X.-J., Lu, Y.-F., Zhang, S.-H.: Multi-task learning for food identification and analysis with deep convolutional neural networks. J. Comput. Sci. Technol. 31, 489–500 (2016). https://doi.org/10.1007/s11390-016-1642-6

    Article  Google Scholar 

  19. Wang, Z., Min, W., Li, Z., Kang, L., Wei, X., Wei, X., Jiang, S.: Ingredient-guided region discovery and relationship modeling for food category-ingredient prediction. IEEE Trans. Image Process 31, 5214–5226 (2022). https://doi.org/10.1109/tip.2022.3193763

    Article  Google Scholar 

  20. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., Polosukhin, I.: Attention is All You Need. Neural Information Processing Systems, Neural Information Processing Systems (2017)

    Google Scholar 

  21. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv: Computer Vision and Pattern Recognition,arXiv: Computer Vision and Pattern Recognition (2020)

  22. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.00986

  23. Zhu, Y., Liu, L., Tian, J.: Learn more for food recognition via progressive self-distillation. In: AAAI Conference on Artificial Intelligence (2023). https://api.semanticscholar.org/CorpusID:257427514

  24. Zhu, Y., Zhao, X., Zhao, C., Wang, J., Lu, H.: Food det: Detecting foods in refrigerator with supervised transformer network. Neurocomputing 379, 162–171 (2020). https://doi.org/10.1016/j.neucom.2019.10.106

    Article  Google Scholar 

  25. Vadivambal, R., Jayas, D.S.: Applications of thermal imaging in agriculture and food industry-a review. Food Bioprocess Technol 186–199 (2011) https://doi.org/10.1007/s11947-010-0333-5

  26. Steinbrener, J., Posch, K., Leitner, R.: Hyperspectral fruit and vegetable classification using convolutional neural networks. Comput Electron Agricult 364–372 (2019) https://doi.org/10.1016/j.compag.2019.04.019

  27. Liang, K.Y., Meng, L., Liu, M., Liu, Y., Tu, W., Wang, S., Zhou, S., Liu, X., Sun, F.: A survey of knowledge graph reasoning on graph types: Static, dynamic, and multimodal. (2022). https://api.semanticscholar.org/CorpusID:257220329

  28. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. Le Centre pour la Communication Scientifique Directe - HAL - Inria, Le Centre pour la Communication Scientifique Directe - HAL - Inria (2013)

    Google Scholar 

  29. Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. Proc. AAAI Conf. Artificial Intell. (2022). https://doi.org/10.1609/aaai.v29i1.9491

    Article  Google Scholar 

  30. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. Proc. AAAI Conf. Artificial Intell. (2022). https://doi.org/10.1609/aaai.v28i1.8870

    Article  Google Scholar 

  31. Li, L., Yatskar, M., Yin, D., Hsieh, C.-J., Chang, K.-W.: Visualbert: a simple and performant baseline for vision and language. Cornell University - arXiv, Cornell University - arXiv (2019)

    Google Scholar 

  32. Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., Wei, F., Guo, B.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/cvpr52688.2022.01170

  33. Huang, Y.-X., Dai, W.-Z., Cai, L.-W., Muggleton, S., Jiang, Y.: Fast abductive learning by similarity-based consistency optimization. Neural Information Processing Systems (2021)

  34. Lin, B., Zhang, Y.: Libmtl: A python library for multi-task learning. ArXiv abs/2203.14338 (2022)

  35. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009). https://doi.org/10.1109/cvpr.2009.5206848

  36. Caruana, R.A.: Multitask learning: a knowledge-based source of inductive bias, pp. 41–48 (1993). https://doi.org/10.1016/b978-1-55860-307-3.50012-5

  37. Pandey, P., Deepthi, A., Mandal, B., Puhan, N.B.: FoodNet: recognizing foods using ensemble of deep networks. IEEE Signal Process Lett. 24(12), 1758–1762 (2017). https://doi.org/10.1109/lsp.2017.2758862

    Article  Google Scholar 

  38. Liu, C., Cao, Y., Luo, Y., Chen, G., Vokkarane, V., Ma, Y.: Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In: International Conference on Smart Homes and Health Telematics (2016). https://api.semanticscholar.org/CorpusID:14068125

  39. Bolanos, M., Radeva, P.: Simultaneous food localization and recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR) (2016). https://doi.org/10.1109/icpr.2016.7900117

  40. Hassannejad, H., Matrella, G., Ciampolini, P., De Munari, I., Mordonini, M., Cagnoni, S.: Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management (2016). https://doi.org/10.1145/2986035.2986042

  41. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011–2023 (2020) https://doi.org/10.1109/tpami.2019.2913372

  42. Martinel, N., Foresti, G.L., Micheloni, C.: Wide-slice residual networks for food recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018). https://doi.org/10.1109/wacv.2018.00068

  43. Weiqing, M., Linhu, L., Zhengdong, L., Shuqiang, J.: Ingredient-guided cascaded multi-attention network for food recognition. ACM Proceedings,ACM Proceedings (2019)

  44. Jiang, S., Min, W., Liu, L., Luo, Z.: Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans. Image Process. 29, 265–276 (2020). https://doi.org/10.1109/tip.2019.2929447

    Article  MathSciNet  Google Scholar 

  45. Min, W., Liu, L., Wang, Z., Luo, Z.-D., Wei, X., Wei, X., Jiang, S.: Isia food-500: a dataset for large-scale food recognition via stacked global-local attention network. Cornell University - arXiv, Cornell University - arXiv (2020)

    Google Scholar 

  46. Chen, Z.-M., Wei, X.-S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00532

  47. Vu, X.-S., Le, D.-T., Edlund, C., Jiang, L., Nguyen, H.D.: Privacy-preserving visual content tagging using graph transformer networks. In: Proceedings of the 28th ACM International Conference on Multimedia (2020). https://doi.org/10.1145/3394171.3414047

  48. Zhou, F., Huang, S., Xing, Y.: Deep semantic dictionary learning for multi-label image classification. Proceedings of the AAAI Conference on Artificial Intelligence, 3572–3580 (2022) https://doi.org/10.1609/aaai.v35i4.16472

  49. Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-Manor, L.: Asymmetric loss for multi-label classification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.00015

  50. Liang, H., Wen, G., Hu, Y., Luo, M., Yang, P., Xu, Y.: Mvanet: Multi-task guided multi-view attention network for chinese food recognition. IEEE Trans. Multimedia 23, 3551–3561 (2021). https://doi.org/10.1109/tmm.2020.3028478

    Article  Google Scholar 

  51. Liu, C., Liang, Y., Xue, Y., Qian, X., Fu, J.: Food and ingredient joint learning for fine-grained recognition. IEEE Trans. Circ. Syst. Video Technol. 31(6), 2480–2493 (2021). https://doi.org/10.1109/tcsvt.2020.3020079

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

A wrote the main manuscript, AB prepared the experiments data. BC reviewed the manuscript.

Corresponding author

Correspondence to Yun Li.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

Not applicable.

Additional information

Communicated by X. Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, Z., Li, X. & Li, Y. MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition. Multimedia Systems 30, 149 (2024). https://doi.org/10.1007/s00530-024-01354-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01354-4

Keywords

Navigation