Abstract
The trademark element recogniton is a crucial task in applications such as trademark brand evaluation and trademark infringement identification. In recent years, although modeling technology has made significant progress, small objects, similar objects, and objects with high conditional probability continue to be unable to be solved, due to the limitations of convolutional kernels. Based on semantic-aware region search and label dependency modeling, we propose a multi-input recognition framework for trademark elements (Mi-Tr) based on Transformer, which learns the complex dependencies between visual features and labels them through feature extraction using different convolutional networks and Transformer encoding. The proposed approach includes two visual feature-embedding modules that use modified VGG16 and ResNet101 as feature extractors to obtain feature information of trademark images in different dimensions. Simultaneously, the category labels are input into the transformer by embedding, using the order invariance of the transformer, thus, it is better to learn all types of dependencies between all features and labels. Additionally, the number of layers of the transformer and number of heads of the multiheaded attention were modified to find parameters that better match image features and label information. The experimental results on two datasets, METU and Logotypes of Different Companies, demonstrate that the classifier developed by our model performs significantly better in the multi-input classification of trademark image elements.
Similar content being viewed by others
Availability of data and materials
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Code Availability
Code will be made available on reasonable request.
References
Liu W, Wang H, Shen X, Tsang IW (2022) The emerging trends of multi-label learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955–7974
Law A, Ghosh A (2022) Multi-label classification using binary tree of classifiers. IEEE Trans Emerg Top Comput Intell 6(3):677–689
Lanchantin J, Wang T, Ordonez V, Qi Y (2021) General multi-label image classification with transformers. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16473–16483
Ha M, Sim J, Moon D, Rhee M, Choi J, Koh B, Lim E, Park K (2022) Cms: a computational memory solution for high-performance and power-efficient recommendation system. In: 2022 IEEE 4th international conference on artificial intelligence circuits and systems (AICAS), pp 491–494
Alfiani FS, Imamah, Yuhana UL (2021) Categorization of learning materials using multilabel classification. In: 2021 international conference on electrical and information technology (IEIT), pp 167–171
Singh NK, Satish C, (2022) Machine learning-based multilabel toxic comment classification. In: 2022 international conference on computing, communication, and intelligent systems (ICCCIS), pp 435–439
Lin D, Lin J, Liang ZZ, Wang J, Chen Z (2022) Multilabel aerial image classification with a concept attention graph neural network. IEEE Trans Geosci Remote Sens 60:1–12
Chen T, Xu M, Hui X, Wu H, Lin L (2019) Learning semantic-specific graph representation for multi-label image recognition. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 522–531
Tursun O, Denman S, Sridharan S, Fookes C (2021) Learning regional attention over multi-resolution deep convolutional features for trademark retrieval. In: 2021 IEEE international conference on image processing (ICIP), pp 2393–2397
Tursun O, Denman S, Sivapalan S, Sridharan S, Fookes C, Mau S (2022) Component-based attention for large-scale trademark retrieval. IEEE Trans Inf Forensics Secur 17:2350–2363
Yang H, Zhou JT, Zhang Y, Gao BB, Wu J, Cai J (2016) Exploit bounding box annotations for multi-label object recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 280–288
Zhang X, Wang F, Li H (2022) An efficient method for cooperative multi-target localization in automotive radar. IEEE Signal Process Lett 29:16–20
Chen K, Qi G, Li Y, Sheng A (2022) Target localization and standoff tracking with discrete-time bearing-only measurements. IEEE Trans Circuits Syst II Express Briefs 69(11):4448–4452
Tursun O, Kalkan S (2015) Metu dataset: a big dataset for benchmarking trademark retrieval. In: 2015 14th IAPR international conference on machine vision applications (MVA), pp 514–517
Stock P, Cisse M (2018) Convnets and imagenet beyond accuracy: understanding mistakes and uncovering biases. lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 11210. LNCS. Munich, Germany, pp 504–519
Yeh CK, Wu WC, Ko WJ, Wang YCF, (2017) Learning deep latent spaces for multi-label classification. 31st AAAI conference on artificial intelligence. AAAI 2017, CA. United states, San Francisco, pp 2838–2844
Liu H, Chen G, Li P, Zhao P, Xindong W (2021) Multi-label text classification via joint learning from label embedding and label correlation. Neurocomputing 460:385–398
Li Q, Qiao M, Bian W, Tao D (2016) Conditional graphical lasso for multi-label image classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2977–2986
Yatskar M, Ordonez V, Zettlemoyer L, Farhadi A (2017) Commonly uncommon: semantic sparsity in situation recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6335–6344
Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: a unified framework for multi-label image classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2285–2294
Chen T, Lin L, Chen R, Hui X, Hefeng W (2022) Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans Pattern Anal Mach Intell 44(3):1371–1384
Zhuang N, Yan Y, Chen S, Wang H, Shen C (2018) Multi-label learning based deep transfer neural network for facial attribute classification. Pattern recognition, pp S0031320318301080
Guan Q, Huang Y, (2018) Multi-label chest x-ray image classification via category-wise residual attention learning. Pattern recognition letters, 130
Ata B, Jl A, Wzwb C, Jia ZD (2022) Semi-supervised partial multi-label classification via consistency learning. Pattern recognition, 131
Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2022) Deep convolution neural network sharing for the multi-label images classification. Machine learning with applications, 10
Goa B, Gyab C, Cd D, Xl E, Xz C (2020) Multi-label zero-shot learning with graph convolutional networks. Neural Netw 132:333–341
Cheng G, Li Q, Wang G, Xie X, Min L, Han J (2023) Sfrnet: fine-grained oriented object recognition via separate feature refinement. IEEE Trans Geosci Remote Sens 61:1–10
Xie X, Lang C, Miao S, Cheng G, Li K, Han J (2023) Mutual-assistance learning for object detection. IEEE Trans Pattern Anal Mach Intell 45(12):15171–15184
Goyal A, Walia E (2014) Variants of dense descriptors and zernike moments as features for accurate shape-based image retrieval. SIViP 8(7):1273–1289
Phan R, Androutsos D (2020) Content-based retrieval of logo and trademarks in unconstrained color image databases using color edge gradient co-occurrence histograms. Comput Vis Image Underst 114(1):66–84
Hui J, Ngo CW, Tan HK (2006) Gestalt-based feature similarity measure in trademark database. Pattern Recogn 39(5):988–1001
Wei CH, Li Y, Chau WY, Li CT (2010) Trademark image retrieval using synthetic features for describing global shape and interior structure. Pattern Recogn 42(3):386–394
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11531–11539
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. San Diego, CA, United states
He K, Zhang X, Ren S, Sun J (2018) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Chen ZM, Wei XS , Wang P, Guo Y (2019) Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5172–5181
Acknowledgements
We would like to thank Editage (www.editage.cn) for English language editing.
Funding
This work was supported by National Key Research and Development Program of China (No. 2021YFC3340402).
Author information
Authors and Affiliations
Contributions
Linqi Liu wrote the original draft and prepared all figures; Xiuhui Wang reviewed and edited. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declared that they have no conflicts of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable
Consent for publication
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, L., Wang, X. Multi-input trademark element recognition with transformer. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18678-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18678-y