Abstract
Food and ingredient recognition emerges as a pivotal challenge in the domain of computer vision, particularly pertinent to multimedia systems applications. To exploit the intricate relationships between foods and their constituent ingredients, this paper introduces a novel approach termed Multi-Task Knowledge Graph Reasoning for Food and Ingredient Recognition (MTKGR). By integrating a multi-task convolutional neural network model with knowledge graph reasoning, MTKGR achieves significant breakthroughs in ingredient recognition on the ETH Food-101 and Ingredient-101 datasets, propelling the Micro-F1 and Macro-F1 scores to new state-of-the-art heights with improvements of 2.23% and 0.83%, respectively. Specifically, the multi-task model performs joint food and ingredient recognition, while the knowledge graph captures associations between food and ingredient entities. Knowledge graph reasoning is then applied to reconcile errors and inconsistencies in the multi-task model’s predictions. Our proposed Precision and Logits Ranking technique identifies the optimal food-ingredient label combination with maximum concordance with the knowledge graph. The substantial results not only demonstrate MTKGR’s potential in food and ingredient recognition but also showcase the value of fusing deep learning and symbolic reasoning for enhanced visual understanding and intelligent analysis within multimedia systems.
Similar content being viewed by others
Data availability
ETH Food-101 dataset is available at https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/, Vireo Food-172 can be applied at http://vireo.cs.cityu.edu.hk/VireoFood172/.
References
Tian, Y., Zhang, C., Guo, Z., Ma, Y., Metoyer, R., Chawla, N.: Recipe2vec: Multi-modal recipe representation learning with graph neural networks. In: International Joint Conference on Artificial Intelligence (2022). https://api.semanticscholar.org/CorpusID:249062592
Haussmann, S., Seneviratne, O., Chen, Y., Ne’eman, Y., Codella, J., Chen, C.-H., McGuinness, D.L., Zaki, M.J.: FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation, pp. 146–162 (2019). https://doi.org/10.1007/978-3-030-30796-7_10
Chen, J., Zhu, B., Ngo, C.-W., Chua, T.-S., Jiang, Y.-G.: A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing, 1514–1526 (2021) https://doi.org/10.1109/tip.2020.3045639
Chen, J., Ngo, C.-w.: Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 24th ACM International Conference on Multimedia (2016). https://doi.org/10.1145/2964284.2964315
Peng, Y., He, X., Zhao, J.: Object-part attention model for fine-grained image classification. IEEE Trans. Image Process., 1487–1500 (2018) https://doi.org/10.1109/tip.2017.2774041
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014)
Bolaños, M., Ferrá, A., Radeva, P.: Food ingredients recognition through multi-label learning. arXiv: Computer Vision and Pattern Recognition (2017)
Yang, S., Chen, M., Pomerleau, D., Sukthankar, R.: Food recognition using statistics of pairwise local features. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). https://doi.org/10.1109/cvpr.2010.5539907
Martinel, N., Foresti, G.L., Micheloni, C.: Wide-slice residual networks for food recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018). https://doi.org/10.1109/wacv.2018.00068
Ming, Z.-Y., Chen, J., Cao, Y., Forde, C., Ngo, C.-W., Chua, T.S.: Food Photo Recognition for Dietary Tracking: System and Experiment, pp. 129–141 (2018). https://doi.org/10.1007/978-3-319-73600-6_12
Yanai, K., Kawano, Y.: Food image recognition using deep convolutional network with pre-training and fine-tuning. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (2015). https://doi.org/10.1109/icmew.2015.7169816
Kagaya, H., Aizawa, K., Ogawa, M.: Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014). https://doi.org/10.1145/2647868.2654970
Zhang, X., Zhou, F., Lin, Y., Zhang, S.: Embedding label structures for fine-grained feature representation. Cornell University - arXiv, Cornell University - arXiv (2015)
Zhou, F., Lin, Y.: Fine-grained image classification by exploring bipartite-graph labels. arXiv: Computer Vision and Pattern Recognition, arXiv: Computer Vision and Pattern Recognition (2015)
Chen, J., Zhu, B., Ngo, C.-W., Chua, T.-S., Jiang, Y.-G.: A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing, 1514–1526 (2021) https://doi.org/10.1109/tip.2020.3045639
Bolaños, M., Ferrá, A., Radeva, P.: Food ingredients recognition through multi-label learning. arXiv: Computer Vision and Pattern Recognition,arXiv: Computer Vision and Pattern Recognition (2017)
Pan, L., Li, C., Pouyanfar, S., Chen, R., Zhou, Y.: A novel combinational convolutional neural network for automatic food-ingredient classification. Comput Mater Continua 62(2), 731–746 (2020) https://doi.org/10.32604/cmc.2020.06508
Zhang, X.-J., Lu, Y.-F., Zhang, S.-H.: Multi-task learning for food identification and analysis with deep convolutional neural networks. J. Comput. Sci. Technol. 31, 489–500 (2016). https://doi.org/10.1007/s11390-016-1642-6
Wang, Z., Min, W., Li, Z., Kang, L., Wei, X., Wei, X., Jiang, S.: Ingredient-guided region discovery and relationship modeling for food category-ingredient prediction. IEEE Trans. Image Process 31, 5214–5226 (2022). https://doi.org/10.1109/tip.2022.3193763
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., Polosukhin, I.: Attention is All You Need. Neural Information Processing Systems, Neural Information Processing Systems (2017)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv: Computer Vision and Pattern Recognition,arXiv: Computer Vision and Pattern Recognition (2020)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.00986
Zhu, Y., Liu, L., Tian, J.: Learn more for food recognition via progressive self-distillation. In: AAAI Conference on Artificial Intelligence (2023). https://api.semanticscholar.org/CorpusID:257427514
Zhu, Y., Zhao, X., Zhao, C., Wang, J., Lu, H.: Food det: Detecting foods in refrigerator with supervised transformer network. Neurocomputing 379, 162–171 (2020). https://doi.org/10.1016/j.neucom.2019.10.106
Vadivambal, R., Jayas, D.S.: Applications of thermal imaging in agriculture and food industry-a review. Food Bioprocess Technol 186–199 (2011) https://doi.org/10.1007/s11947-010-0333-5
Steinbrener, J., Posch, K., Leitner, R.: Hyperspectral fruit and vegetable classification using convolutional neural networks. Comput Electron Agricult 364–372 (2019) https://doi.org/10.1016/j.compag.2019.04.019
Liang, K.Y., Meng, L., Liu, M., Liu, Y., Tu, W., Wang, S., Zhou, S., Liu, X., Sun, F.: A survey of knowledge graph reasoning on graph types: Static, dynamic, and multimodal. (2022). https://api.semanticscholar.org/CorpusID:257220329
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. Le Centre pour la Communication Scientifique Directe - HAL - Inria, Le Centre pour la Communication Scientifique Directe - HAL - Inria (2013)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. Proc. AAAI Conf. Artificial Intell. (2022). https://doi.org/10.1609/aaai.v29i1.9491
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. Proc. AAAI Conf. Artificial Intell. (2022). https://doi.org/10.1609/aaai.v28i1.8870
Li, L., Yatskar, M., Yin, D., Hsieh, C.-J., Chang, K.-W.: Visualbert: a simple and performant baseline for vision and language. Cornell University - arXiv, Cornell University - arXiv (2019)
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., Wei, F., Guo, B.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/cvpr52688.2022.01170
Huang, Y.-X., Dai, W.-Z., Cai, L.-W., Muggleton, S., Jiang, Y.: Fast abductive learning by similarity-based consistency optimization. Neural Information Processing Systems (2021)
Lin, B., Zhang, Y.: Libmtl: A python library for multi-task learning. ArXiv abs/2203.14338 (2022)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009). https://doi.org/10.1109/cvpr.2009.5206848
Caruana, R.A.: Multitask learning: a knowledge-based source of inductive bias, pp. 41–48 (1993). https://doi.org/10.1016/b978-1-55860-307-3.50012-5
Pandey, P., Deepthi, A., Mandal, B., Puhan, N.B.: FoodNet: recognizing foods using ensemble of deep networks. IEEE Signal Process Lett. 24(12), 1758–1762 (2017). https://doi.org/10.1109/lsp.2017.2758862
Liu, C., Cao, Y., Luo, Y., Chen, G., Vokkarane, V., Ma, Y.: Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In: International Conference on Smart Homes and Health Telematics (2016). https://api.semanticscholar.org/CorpusID:14068125
Bolanos, M., Radeva, P.: Simultaneous food localization and recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR) (2016). https://doi.org/10.1109/icpr.2016.7900117
Hassannejad, H., Matrella, G., Ciampolini, P., De Munari, I., Mordonini, M., Cagnoni, S.: Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management (2016). https://doi.org/10.1145/2986035.2986042
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011–2023 (2020) https://doi.org/10.1109/tpami.2019.2913372
Martinel, N., Foresti, G.L., Micheloni, C.: Wide-slice residual networks for food recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018). https://doi.org/10.1109/wacv.2018.00068
Weiqing, M., Linhu, L., Zhengdong, L., Shuqiang, J.: Ingredient-guided cascaded multi-attention network for food recognition. ACM Proceedings,ACM Proceedings (2019)
Jiang, S., Min, W., Liu, L., Luo, Z.: Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans. Image Process. 29, 265–276 (2020). https://doi.org/10.1109/tip.2019.2929447
Min, W., Liu, L., Wang, Z., Luo, Z.-D., Wei, X., Wei, X., Jiang, S.: Isia food-500: a dataset for large-scale food recognition via stacked global-local attention network. Cornell University - arXiv, Cornell University - arXiv (2020)
Chen, Z.-M., Wei, X.-S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00532
Vu, X.-S., Le, D.-T., Edlund, C., Jiang, L., Nguyen, H.D.: Privacy-preserving visual content tagging using graph transformer networks. In: Proceedings of the 28th ACM International Conference on Multimedia (2020). https://doi.org/10.1145/3394171.3414047
Zhou, F., Huang, S., Xing, Y.: Deep semantic dictionary learning for multi-label image classification. Proceedings of the AAAI Conference on Artificial Intelligence, 3572–3580 (2022) https://doi.org/10.1609/aaai.v35i4.16472
Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-Manor, L.: Asymmetric loss for multi-label classification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.00015
Liang, H., Wen, G., Hu, Y., Luo, M., Yang, P., Xu, Y.: Mvanet: Multi-task guided multi-view attention network for chinese food recognition. IEEE Trans. Multimedia 23, 3551–3561 (2021). https://doi.org/10.1109/tmm.2020.3028478
Liu, C., Liang, Y., Xue, Y., Qian, X., Fu, J.: Food and ingredient joint learning for fine-grained recognition. IEEE Trans. Circ. Syst. Video Technol. 31(6), 2480–2493 (2021). https://doi.org/10.1109/tcsvt.2020.3020079
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
A wrote the main manuscript, AB prepared the experiments data. BC reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethical approval
Not applicable.
Additional information
Communicated by X. Li.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Feng, Z., Li, X. & Li, Y. MTKGR: multi-task knowledge graph reasoning for food and ingredient recognition. Multimedia Systems 30, 149 (2024). https://doi.org/10.1007/s00530-024-01354-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01354-4