Skip to main content
Log in

Multi-input trademark element recognition with transformer

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The trademark element recogniton is a crucial task in applications such as trademark brand evaluation and trademark infringement identification. In recent years, although modeling technology has made significant progress, small objects, similar objects, and objects with high conditional probability continue to be unable to be solved, due to the limitations of convolutional kernels. Based on semantic-aware region search and label dependency modeling, we propose a multi-input recognition framework for trademark elements (Mi-Tr) based on Transformer, which learns the complex dependencies between visual features and labels them through feature extraction using different convolutional networks and Transformer encoding. The proposed approach includes two visual feature-embedding modules that use modified VGG16 and ResNet101 as feature extractors to obtain feature information of trademark images in different dimensions. Simultaneously, the category labels are input into the transformer by embedding, using the order invariance of the transformer, thus, it is better to learn all types of dependencies between all features and labels. Additionally, the number of layers of the transformer and number of heads of the multiheaded attention were modified to find parameters that better match image features and label information. The experimental results on two datasets, METU and Logotypes of Different Companies, demonstrate that the classifier developed by our model performs significantly better in the multi-input classification of trademark image elements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code Availability

Code will be made available on reasonable request.

References

  1. Liu W, Wang H, Shen X, Tsang IW (2022) The emerging trends of multi-label learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955–7974

    Article  PubMed  Google Scholar 

  2. Law A, Ghosh A (2022) Multi-label classification using binary tree of classifiers. IEEE Trans Emerg Top Comput Intell 6(3):677–689

    Article  Google Scholar 

  3. Lanchantin J, Wang T, Ordonez V, Qi Y (2021) General multi-label image classification with transformers. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16473–16483

  4. Ha M, Sim J, Moon D, Rhee M, Choi J, Koh B, Lim E, Park K (2022) Cms: a computational memory solution for high-performance and power-efficient recommendation system. In: 2022 IEEE 4th international conference on artificial intelligence circuits and systems (AICAS), pp 491–494

  5. Alfiani FS, Imamah, Yuhana UL (2021) Categorization of learning materials using multilabel classification. In: 2021 international conference on electrical and information technology (IEIT), pp 167–171

  6. Singh NK, Satish C, (2022) Machine learning-based multilabel toxic comment classification. In: 2022 international conference on computing, communication, and intelligent systems (ICCCIS), pp 435–439

  7. Lin D, Lin J, Liang ZZ, Wang J, Chen Z (2022) Multilabel aerial image classification with a concept attention graph neural network. IEEE Trans Geosci Remote Sens 60:1–12

    Google Scholar 

  8. Chen T, Xu M, Hui X, Wu H, Lin L (2019) Learning semantic-specific graph representation for multi-label image recognition. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 522–531

  9. Tursun O, Denman S, Sridharan S, Fookes C (2021) Learning regional attention over multi-resolution deep convolutional features for trademark retrieval. In: 2021 IEEE international conference on image processing (ICIP), pp 2393–2397

  10. Tursun O, Denman S, Sivapalan S, Sridharan S, Fookes C, Mau S (2022) Component-based attention for large-scale trademark retrieval. IEEE Trans Inf Forensics Secur 17:2350–2363

    Article  Google Scholar 

  11. Yang H, Zhou JT, Zhang Y, Gao BB, Wu J, Cai J (2016) Exploit bounding box annotations for multi-label object recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 280–288

  12. Zhang X, Wang F, Li H (2022) An efficient method for cooperative multi-target localization in automotive radar. IEEE Signal Process Lett 29:16–20

    Article  ADS  Google Scholar 

  13. Chen K, Qi G, Li Y, Sheng A (2022) Target localization and standoff tracking with discrete-time bearing-only measurements. IEEE Trans Circuits Syst II Express Briefs 69(11):4448–4452

    Google Scholar 

  14. Tursun O, Kalkan S (2015) Metu dataset: a big dataset for benchmarking trademark retrieval. In: 2015 14th IAPR international conference on machine vision applications (MVA), pp 514–517

  15. Stock P, Cisse M (2018) Convnets and imagenet beyond accuracy: understanding mistakes and uncovering biases. lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 11210. LNCS. Munich, Germany, pp 504–519

    Google Scholar 

  16. Yeh CK, Wu WC, Ko WJ, Wang YCF, (2017) Learning deep latent spaces for multi-label classification. 31st AAAI conference on artificial intelligence. AAAI 2017, CA. United states, San Francisco, pp 2838–2844

  17. Liu H, Chen G, Li P, Zhao P, Xindong W (2021) Multi-label text classification via joint learning from label embedding and label correlation. Neurocomputing 460:385–398

    Article  Google Scholar 

  18. Li Q, Qiao M, Bian W, Tao D (2016) Conditional graphical lasso for multi-label image classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2977–2986

  19. Yatskar M, Ordonez V, Zettlemoyer L, Farhadi A (2017) Commonly uncommon: semantic sparsity in situation recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6335–6344

  20. Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: a unified framework for multi-label image classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2285–2294

  21. Chen T, Lin L, Chen R, Hui X, Hefeng W (2022) Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans Pattern Anal Mach Intell 44(3):1371–1384

    Article  PubMed  Google Scholar 

  22. Zhuang N, Yan Y, Chen S, Wang H, Shen C (2018) Multi-label learning based deep transfer neural network for facial attribute classification. Pattern recognition, pp S0031320318301080

  23. Guan Q, Huang Y, (2018) Multi-label chest x-ray image classification via category-wise residual attention learning. Pattern recognition letters, 130

  24. Ata B, Jl A, Wzwb C, Jia ZD (2022) Semi-supervised partial multi-label classification via consistency learning. Pattern recognition, 131

  25. Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2022) Deep convolution neural network sharing for the multi-label images classification. Machine learning with applications, 10

  26. Goa B, Gyab C, Cd D, Xl E, Xz C (2020) Multi-label zero-shot learning with graph convolutional networks. Neural Netw 132:333–341

    Article  Google Scholar 

  27. Cheng G, Li Q, Wang G, Xie X, Min L, Han J (2023) Sfrnet: fine-grained oriented object recognition via separate feature refinement. IEEE Trans Geosci Remote Sens 61:1–10

    Google Scholar 

  28. Xie X, Lang C, Miao S, Cheng G, Li K, Han J (2023) Mutual-assistance learning for object detection. IEEE Trans Pattern Anal Mach Intell 45(12):15171–15184

    Article  PubMed  Google Scholar 

  29. Goyal A, Walia E (2014) Variants of dense descriptors and zernike moments as features for accurate shape-based image retrieval. SIViP 8(7):1273–1289

    Article  Google Scholar 

  30. Phan R, Androutsos D (2020) Content-based retrieval of logo and trademarks in unconstrained color image databases using color edge gradient co-occurrence histograms. Comput Vis Image Underst 114(1):66–84

    Article  Google Scholar 

  31. Hui J, Ngo CW, Tan HK (2006) Gestalt-based feature similarity measure in trademark database. Pattern Recogn 39(5):988–1001

    Article  ADS  Google Scholar 

  32. Wei CH, Li Y, Chau WY, Li CT (2010) Trademark image retrieval using synthetic features for describing global shape and interior structure. Pattern Recogn 42(3):386–394

    Article  ADS  Google Scholar 

  33. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11531–11539

  34. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. San Diego, CA, United states

    Google Scholar 

  35. He K, Zhang X, Ren S, Sun J (2018) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  36. Chen ZM, Wei XS , Wang P, Guo Y (2019) Multi-label image recognition with graph convolutional networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5172–5181

Download references

Acknowledgements

We would like to thank Editage (www.editage.cn) for English language editing.

Funding

This work was supported by National Key Research and Development Program of China (No. 2021YFC3340402).

Author information

Authors and Affiliations

Authors

Contributions

Linqi Liu wrote the original draft and prepared all figures; Xiuhui Wang reviewed and edited. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiuhui Wang.

Ethics declarations

Conflicts of interest

The authors declared that they have no conflicts of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, L., Wang, X. Multi-input trademark element recognition with transformer. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18678-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18678-y

Keywords

Navigation