Multi-input trademark element recognition with transformer

Liu, Linqi; Wang, Xiuhui

doi:10.1007/s11042-024-18678-y

Multi-input trademark element recognition with transformer

Published: 29 February 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

56 Accesses
Explore all metrics

Abstract

The trademark element recogniton is a crucial task in applications such as trademark brand evaluation and trademark infringement identification. In recent years, although modeling technology has made significant progress, small objects, similar objects, and objects with high conditional probability continue to be unable to be solved, due to the limitations of convolutional kernels. Based on semantic-aware region search and label dependency modeling, we propose a multi-input recognition framework for trademark elements (Mi-Tr) based on Transformer, which learns the complex dependencies between visual features and labels them through feature extraction using different convolutional networks and Transformer encoding. The proposed approach includes two visual feature-embedding modules that use modified VGG16 and ResNet101 as feature extractors to obtain feature information of trademark images in different dimensions. Simultaneously, the category labels are input into the transformer by embedding, using the order invariance of the transformer, thus, it is better to learn all types of dependencies between all features and labels. Additionally, the number of layers of the transformer and number of heads of the multiheaded attention were modified to find parameters that better match image features and label information. The experimental results on two datasets, METU and Logotypes of Different Companies, demonstrate that the classifier developed by our model performs significantly better in the multi-input classification of trademark image elements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fruit ripeness identification using YOLOv8 model

Article Open access 31 August 2023

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Deep Learning for Generic Object Detection: A Survey

Article Open access 31 October 2019

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code Availability

Code will be made available on reasonable request.

References

Liu W, Wang H, Shen X, Tsang IW (2022) The emerging trends of multi-label learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955–7974
Article PubMed Google Scholar
Law A, Ghosh A (2022) Multi-label classification using binary tree of classifiers. IEEE Trans Emerg Top Comput Intell 6(3):677–689
Article Google Scholar
Lanchantin J, Wang T, Ordonez V, Qi Y (2021) General multi-label image classification with transformers. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16473–16483
Ha M, Sim J, Moon D, Rhee M, Choi J, Koh B, Lim E, Park K (2022) Cms: a computational memory solution for high-performance and power-efficient recommendation system. In: 2022 IEEE 4th international conference on artificial intelligence circuits and systems (AICAS), pp 491–494
Alfiani FS, Imamah, Yuhana UL (2021) Categorization of learning materials using multilabel classification. In: 2021 international conference on electrical and information technology (IEIT), pp 167–171
Singh NK, Satish C, (2022) Machine learning-based multilabel toxic comment classification. In: 2022 international conference on computing, communication, and intelligent systems (ICCCIS), pp 435–439
Lin D, Lin J, Liang ZZ, Wang J, Chen Z (2022) Multilabel aerial image classification with a concept attention graph neural network. IEEE Trans Geosci Remote Sens 60:1–12
Google Scholar
Chen T, Xu M, Hui X, Wu H, Lin L (2019) Learning semantic-specific graph representation for multi-label image recognition. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 522–531
Tursun O, Denman S, Sridharan S, Fookes C (2021) Learning regional attention over multi-resolution deep convolutional features for trademark retrieval. In: 2021 IEEE international conference on image processing (ICIP), pp 2393–2397
Tursun O, Denman S, Sivapalan S, Sridharan S, Fookes C, Mau S (2022) Component-based attention for large-scale trademark retrieval. IEEE Trans Inf Forensics Secur 17:2350–2363
Article Google Scholar
Yang H, Zhou JT, Zhang Y, Gao BB, Wu J, Cai J (2016) Exploit bounding box annotations for multi-label object recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 280–288
Zhang X, Wang F, Li H (2022) An efficient method for cooperative multi-target localization in automotive radar. IEEE Signal Process Lett 29:16–20
Article ADS Google Scholar
Chen K, Qi G, Li Y, Sheng A (2022) Target localization and standoff tracking with discrete-time bearing-only measurements. IEEE Trans Circuits Syst II Express Briefs 69(11):4448–4452
Google Scholar
Tursun O, Kalkan S (2015) Metu dataset: a big dataset for benchmarking trademark retrieval. In: 2015 14th IAPR international conference on machine vision applications (MVA), pp 514–517
Stock P, Cisse M (2018) Convnets and imagenet beyond accuracy: understanding mistakes and uncovering biases. lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 11210. LNCS. Munich, Germany, pp 504–519
Google Scholar
Yeh CK, Wu WC, Ko WJ, Wang YCF, (2017) Learning deep latent spaces for multi-label classification. 31st AAAI conference on artificial intelligence. AAAI 2017, CA. United states, San Francisco, pp 2838–2844
Liu H, Chen G, Li P, Zhao P, Xindong W (2021) Multi-label text classification via joint learning from label embedding and label correlation. Neurocomputing 460:385–398
Article Google Scholar
Li Q, Qiao M, Bian W, Tao D (2016) Conditional graphical lasso for multi-label image classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2977–2986
Yatskar M, Ordonez V, Zettlemoyer L, Farhadi A (2017) Commonly uncommon: semantic sparsity in situation recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6335–6344
Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: a unified framework for multi-label image classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2285–2294
Chen T, Lin L, Chen R, Hui X, Hefeng W (2022) Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans Pattern Anal Mach Intell 44(3):1371–1384
Article PubMed Google Scholar
Zhuang N, Yan Y, Chen S, Wang H, Shen C (2018) Multi-label learning based deep transfer neural network for facial attribute classification. Pattern recognition, pp S0031320318301080
Guan Q, Huang Y, (2018) Multi-label chest x-ray image classification via category-wise residual attention learning. Pattern recognition letters, 130
Ata B, Jl A, Wzwb C, Jia ZD (2022) Semi-supervised partial multi-label classification via consistency learning. Pattern recognition, 131
Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2022) Deep convolution neural network sharing for the multi-label images classification. Machine learning with applications, 10
Goa B, Gyab C, Cd D, Xl E, Xz C (2020) Multi-label zero-shot learning with graph convolutional networks. Neural Netw 132:333–341
Article Google Scholar
Cheng G, Li Q, Wang G, Xie X, Min L, Han J (2023) Sfrnet: fine-grained oriented object recognition via separate feature refinement. IEEE Trans Geosci Remote Sens 61:1–10
Google Scholar
Xie X, Lang C, Miao S, Cheng G, Li K, Han J (2023) Mutual-assistance learning for object detection. IEEE Trans Pattern Anal Mach Intell 45(12):15171–15184
Article PubMed Google Scholar
Goyal A, Walia E (2014) Variants of dense descriptors and zernike moments as features for accurate shape-based image retrieval. SIViP 8(7):1273–1289
Article Google Scholar
Phan R, Androutsos D (2020) Content-based retrieval of logo and trademarks in unconstrained color image databases using color edge gradient co-occurrence histograms. Comput Vis Image Underst 114(1):66–84
Article Google Scholar
Hui J, Ngo CW, Tan HK (2006) Gestalt-based feature similarity measure in trademark database. Pattern Recogn 39(5):988–1001
Article ADS Google Scholar
Wei CH, Li Y, Chau WY, Li CT (2010) Trademark image retrieval using synthetic features for describing global shape and interior structure. Pattern Recogn 42(3):386–394
Article ADS Google Scholar
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11531–11539
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. San Diego, CA, United states
Google Scholar
He K, Zhang X, Ren S, Sun J (2018) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Chen ZM, Wei XS , Wang P, Guo Y (2019) Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5172–5181

Download references

Acknowledgements

We would like to thank Editage (www.editage.cn) for English language editing.

Funding

This work was supported by National Key Research and Development Program of China (No. 2021YFC3340402).

Author information

Authors and Affiliations

Computer Department, China Jiliang University, Xueyuan Street, Hangzhou, 310018, Zhejiang, China
Linqi Liu & Xiuhui Wang

Authors

Linqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiuhui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Linqi Liu wrote the original draft and prepared all figures; Xiuhui Wang reviewed and edited. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiuhui Wang.

Ethics declarations

Conflicts of interest

The authors declared that they have no conflicts of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, L., Wang, X. Multi-input trademark element recognition with transformer. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18678-y

Download citation

Received: 16 September 2023
Revised: 18 January 2024
Accepted: 21 February 2024
Published: 29 February 2024
DOI: https://doi.org/10.1007/s11042-024-18678-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-input trademark element recognition with transformer

Abstract

Access this article

Similar content being viewed by others

Fruit ripeness identification using YOLOv8 model

ImageNet Large Scale Visual Recognition Challenge

Deep Learning for Generic Object Detection: A Survey

Availability of data and materials

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-input trademark element recognition with transformer

Abstract

Access this article

Similar content being viewed by others

Fruit ripeness identification using YOLOv8 model

ImageNet Large Scale Visual Recognition Challenge

Deep Learning for Generic Object Detection: A Survey

Availability of data and materials

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation