Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors

Samarin, A.; Savelev, A.; Toropov, A.; Dzestelova, A.; Malykh, V.; Mikhailova, E.; Motyko, A.

doi:10.1134/S105466182302013X

Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors

SELECTED CONFERENCE PAPERS
Published: 03 July 2023

Volume 33, pages 139–146, (2023)
Cite this article

Pattern Recognition and Image Analysis Aims and scope Submit manuscript

A. Samarin¹,
A. Savelev¹,
A. Toropov¹,
A. Dzestelova¹,
V. Malykh¹,
E. Mikhailova¹ &
…
A. Motyko²

25 Accesses
1 Citation
Explore all metrics

Abstract

We present a description of various custom image descriptor modifications that are used as part of an image classification pipeline with text elements. The problem under consideration is related to the classification of images of commercial facades by the type of services provided. Some of the proposed descriptor types are presented for the first time and demonstrate state-of-the-art performance on open datasets. In our study, we used a special type of descriptor for image areas with text based on traces of the movement of agents. The traces in question are generated using parameterized movement strategies, which are presented and compared in this article.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3.

REFERENCES

L. Ballan, M. Bertini, A. D. Bimbo, and A. Jain, “Automatic trademark detection and recognition in sport videos,” in 2008 IEEE Int. Conf. on Multimedia and Expo, Hannover, 2008 (IEEE, 2008), pp. 901–904. https://doi.org/10.1109/ICME.2008.4607581
D. A. Chacra and J. Zelek, “Road segmentation in street view images using texture information,” in 13th Conf. on Computer and Robot Vision (CRV), Victoria, Canada, 2016 (IEEE, 2016), pp. 424–431. https://doi.org/10.1109/CRV.2016.47
T. Chattopadhyay and A. Sinha, “Recognition of trademarks from sports videos for channel hyperlinking in consumer end,” in IEEE 13th Int. Symp. on Consumer Electronics, Kyoto, Japan, 2009 (IEEE, 2009), pp. 943–947. https://doi.org/10.1109/ISCE.2009.5156881
A. Clavelli and D. Karatzas, “Text segmentation in colour posters from the Spanish Civil War era,” in 10th Int. Conf. on Document Analysis and Recognition, Barcelona, 2009 (IEEE, 2009), pp. 181–185. https://doi.org/10.1109/ICDAR.2009.32
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conf. on Computer Vision and Pattern Recognition, Miami, Fla., 2009 (IEEE, 2009), pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
G. Howard, A. Zhu, M. Chen, B. Kalenichenko, D. Wang, W. Weyand, T. Andreetto, and M. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” (2017). arXiv:1704.04861 [cs.CV]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016 (IEEE, 2016), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243
T. Intasuwan, J. Kaewthong, and S. Vittayakorn, “Text and object detection on billboards,” in 10th Int. Conf. on Information Technology and Electrical Engineering (ICITEE), Bali, Indonesia, 2018 (IEEE, 2018), pp. 6–11. https://doi.org/10.1109/ICITEED.2018.8534879
T.-Yi Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context,” in Computer Vision—ECCV 2014, Ed. by D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Lecture Notes in Computer Science, Vol. 8693 (Springer, Cham, 2014), pp. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Book Google Scholar
T. Liu, S. Fang, Y. Zhao, P. Wang, and J. Zhang, “Implementation of training convolutional neural networks,” (2015). arXiv:1506.01195 [cs.CV]
V. Malykh and A. Samarin, “Combined advertising sign classifier,” in Analysis of Images, Social Networks and Texts. AIST 2019, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, V. Kuskova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevictch, A. Napoli, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 11832 (Springer, Cham, 2019), pp. 179–185. https://doi.org/10.1007/978-3-030-37334-4_16
Book Google Scholar
X. Wei, S. L. Phung, and A. Bouzerdoum, “Visual descriptors for scene categorization: experimental evaluation,” Artif. Intell. Rev. 45, 333–368 (2016). https://doi.org/10.1007/s10462-015-9448-4
Article Google Scholar
S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol, “Scalable logo recognition in real-world images,” in Proc. 1st ACM Int. Conf. on Multimedia Retrieval, Trento, Italy, 2011 (Association for Computing Machinery, New York, 2011), p. 25. https://doi.org/10.1145/1991996.1992021
A. Samarin and V. Malykh, “Worm-like image descriptor for signboard classification,” CEUR Workshop Proc. 2691, 17 (2020). https://ceur-ws.org/Vol-2691/paper17.pdf
Google Scholar
A. Samarin and V. Malykh, “Ensemble-based commercial buildings facades photographs classifier,” in Analysis of Images, Social Networks and Texts, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 12602 (Springer, Cham, 2021), pp. 257–265. https://doi.org/10.1007/978-3-030-72610-2_19
Book Google Scholar
A. Samarin, V. Malykh V., Muravyov, S., “Specialized image descriptors for signboard photographs classification,” in Databases and Information Systems. DB&IS 2020, Ed. by T. Robal, H. M. Haav, J. Penjam, and R. Matulevičius, Communications in Computer and Information Science, Vol. 1243 (Springer, Cham, 2020), pp. 122–129. https://doi.org/10.1007/978-3-030-57672-1_10
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-Ch. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, Utah, 2018 (IEEE, 2018), pp. 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” (2014). arXiv:1409.1556 [cs.CV]
R. Smith, An overview of the Tesseract OCR engine,” in Ninth Int. Conf. on Document Analysis and Recognition (ICDAR 2007), Cutitiba, Brazil, 2007 (IEEE, 2007), vol. 2, pp. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, 2015 (IEEE, 2015), pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Ch. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the im-pact of residual connections on learning,” Proc. AAAI Conf. Artif. Intell. 31 (1) (2017). https://doi.org/10.1609/aaai.v31i1.11231
M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. Mach. Learn. Res. 97, 6105–6114 (2019). http://proceedings.mlr.press/v97/tan19a.html
M. Tan and Q. Le, “EfficientNetV2: Smaller models and faster training,” Proc. Mach. Learn. Res. 139, 10096–10106 (2021). https://proceedings.mlr.press/v139/tan21a.html
Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connetionist text proposal network,” in Computer Vision—ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science, Vol. 9912 (Springer, Cham, 2016), pp. 56–72. https://doi.org/10.1007/978-3-319-46484-8_4
Book Google Scholar
T. Tsai, W. Cheng, C. You, M. Hu, A. W. Tsui, and H. Chi, “Learning and recognition of on-premise signs from weakly labeled street view images,” IEEE Trans. Image Process. 23, 1047–1059 (2014). https://doi.org/10.1109/TIP.2014.2298982
Article MathSciNet MATH Google Scholar
A. Watve and S. Sural, “Soccer video processing for the detection of advertisement billboards,” Pattern Recogn. Lett. 29, 994–1006 (2008). https://doi.org/10.1016/j.patrec.2008.01.022
Article Google Scholar
J. Zhou, K. McGuinness, and N. E. O’Connor, “A text recognition and retrieval system for e-business image management,” in MultiMedia Modeling, Ed. by K. Schoeffmann, T. H. Chalidabhongse, Ch. W. Ngo, S. Aramvith, N. E. O’Connor, Yo-S. Ho, M. Gabbouj, and A. Elgammal, Lecture Notes in Computer Science, Vol. 10705 (Springer, Cham, 2018), pp. 23–35. https://doi.org/10.1007/978-3-319-73600-6_3
X. Zhou, C. Yao, H. Wen, Y. ang, S. Zhou, W. He, and J. Liang, “East: An efficient and accurate scene text detector,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii (IEEE, 2017), pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283

Download references

Author information

Authors and Affiliations

ITMO University, 197101, St. Petersburg, Russian Federation
A. Samarin, A. Savelev, A. Toropov, A. Dzestelova, V. Malykh & E. Mikhailova
St. Petersburg Electrotechnical University LETI, 197022, St. Petersburg, Russian Federation
A. Motyko

Authors

A. Samarin
View author publications
You can also search for this author in PubMed Google Scholar
A. Savelev
View author publications
You can also search for this author in PubMed Google Scholar
A. Toropov
View author publications
You can also search for this author in PubMed Google Scholar
A. Dzestelova
View author publications
You can also search for this author in PubMed Google Scholar
V. Malykh
View author publications
You can also search for this author in PubMed Google Scholar
E. Mikhailova
View author publications
You can also search for this author in PubMed Google Scholar
A. Motyko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. Samarin, A. Savelev, A. Toropov, A. Dzestelova, V. Malykh, E. Mikhailova or A. Motyko.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Aleksei Samarin is an AI researcher with over 10 years of scientific and industrial experience. The main areas of interest are related to computer vision, pattern recognition, image processing and analysis. Author of more than ten scientific publications.

Alexander Savelev is a master’s student at ITMO University with experience in deep learning, computer vision, data analysis, and natural language processing. Author of four scientific publications.

Aleksei Toropov is a data scientist in the Computer Vision Lab with scientific and industrial experience in computer vision, image processing, and deep learning. Author of two scientific publications.

Alina Dzestelova is a master’s student at ITMO University. Author of three scientific publications.

Valentin Malykh is an AI researcher with over 30 scientific publications, Candidate of Engineering Sciences. The main areas of research are related to natural language understanding and deep learning.

Elena Mikhailova is the director of the Higher School of Digital Culture at ITMO University. Associate Professor, Candidate of Physical and Mathematical Sciences, has more than 30 scientific publications.

Alexandr Motyko is Associate Professor of St. Petersburg Electrotechnical University. Candidate of Engineering Sciences, author of more than 20 scientific publications. The main areas of research are related to image and video processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Samarin, A., Savelev, A., Toropov, A. et al. Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors. Pattern Recognit. Image Anal. 33, 139–146 (2023). https://doi.org/10.1134/S105466182302013X

Download citation

Received: 08 December 2022
Revised: 08 December 2022
Accepted: 08 December 2022
Published: 03 July 2023
Issue Date: June 2023
DOI: https://doi.org/10.1134/S105466182302013X

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors

Abstract

Access this article

REFERENCES

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation