Skip to main content
Log in

Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors

  • SELECTED CONFERENCE PAPERS
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

We present a description of various custom image descriptor modifications that are used as part of an image classification pipeline with text elements. The problem under consideration is related to the classification of images of commercial facades by the type of services provided. Some of the proposed descriptor types are presented for the first time and demonstrate state-of-the-art performance on open datasets. In our study, we used a special type of descriptor for image areas with text based on traces of the movement of agents. The traces in question are generated using parameterized movement strategies, which are presented and compared in this article.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

REFERENCES

  1. L. Ballan, M. Bertini, A. D. Bimbo, and A. Jain, “Automatic trademark detection and recognition in sport videos,” in 2008 IEEE Int. Conf. on Multimedia and Expo, Hannover, 2008 (IEEE, 2008), pp. 901–904. https://doi.org/10.1109/ICME.2008.4607581

  2. D. A. Chacra and J. Zelek, “Road segmentation in street view images using texture information,” in 13th Conf. on Computer and Robot Vision (CRV), Victoria, Canada, 2016 (IEEE, 2016), pp. 424–431. https://doi.org/10.1109/CRV.2016.47

  3. T. Chattopadhyay and A. Sinha, “Recognition of trademarks from sports videos for channel hyperlinking in consumer end,” in IEEE 13th Int. Symp. on Consumer Electronics, Kyoto, Japan, 2009 (IEEE, 2009), pp. 943–947. https://doi.org/10.1109/ISCE.2009.5156881

  4. A. Clavelli and D. Karatzas, “Text segmentation in colour posters from the Spanish Civil War era,” in 10th Int. Conf. on Document Analysis and Recognition, Barcelona, 2009 (IEEE, 2009), pp. 181–185. https://doi.org/10.1109/ICDAR.2009.32

  5. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conf. on Computer Vision and Pattern Recognition, Miami, Fla., 2009 (IEEE, 2009), pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  6. G. Howard, A. Zhu, M. Chen, B. Kalenichenko, D. Wang, W. Weyand, T. Andreetto, and M. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” (2017). arXiv:1704.04861 [cs.CV]

  7. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016 (IEEE, 2016), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

  8. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243

  9. T. Intasuwan, J. Kaewthong, and S. Vittayakorn, “Text and object detection on billboards,” in 10th Int. Conf. on Information Technology and Electrical Engineering (ICITEE), Bali, Indonesia, 2018 (IEEE, 2018), pp. 6–11. https://doi.org/10.1109/ICITEED.2018.8534879

  10. T.-Yi Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context,” in Computer Vision—ECCV 2014, Ed. by D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Lecture Notes in Computer Science, Vol. 8693 (Springer, Cham, 2014), pp. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

    Book  Google Scholar 

  11. T. Liu, S. Fang, Y. Zhao, P. Wang, and J. Zhang, “Implementation of training convolutional neural networks,” (2015). arXiv:1506.01195 [cs.CV]

  12. V. Malykh and A. Samarin, “Combined advertising sign classifier,” in Analysis of Images, Social Networks and Texts. AIST 2019, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, V. Kuskova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevictch, A. Napoli, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 11832 (Springer, Cham, 2019), pp. 179–185. https://doi.org/10.1007/978-3-030-37334-4_16

    Book  Google Scholar 

  13. X. Wei, S. L. Phung, and A. Bouzerdoum, “Visual descriptors for scene categorization: experimental evaluation,” Artif. Intell. Rev. 45, 333–368 (2016). https://doi.org/10.1007/s10462-015-9448-4

    Article  Google Scholar 

  14. S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol, “Scalable logo recognition in real-world images,” in Proc. 1st ACM Int. Conf. on Multimedia Retrieval, Trento, Italy, 2011 (Association for Computing Machinery, New York, 2011), p. 25. https://doi.org/10.1145/1991996.1992021

  15. A. Samarin and V. Malykh, “Worm-like image descriptor for signboard classification,” CEUR Workshop Proc. 2691, 17 (2020). https://ceur-ws.org/Vol-2691/paper17.pdf

    Google Scholar 

  16. A. Samarin and V. Malykh, “Ensemble-based commercial buildings facades photographs classifier,” in Analysis of Images, Social Networks and Texts, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 12602 (Springer, Cham, 2021), pp. 257–265. https://doi.org/10.1007/978-3-030-72610-2_19

    Book  Google Scholar 

  17. A. Samarin, V. Malykh V., Muravyov, S., “Specialized image descriptors for signboard photographs classification,” in Databases and Information Systems. DB&IS 2020, Ed. by T. Robal, H. M. Haav, J. Penjam, and R. Matulevičius, Communications in Computer and Information Science, Vol. 1243 (Springer, Cham, 2020), pp. 122–129. https://doi.org/10.1007/978-3-030-57672-1_10

  18. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-Ch. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, Utah, 2018 (IEEE, 2018), pp. 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

  19. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” (2014). arXiv:1409.1556 [cs.CV]

  20. R. Smith, An overview of the Tesseract OCR engine,” in Ninth Int. Conf. on Document Analysis and Recognition (ICDAR 2007), Cutitiba, Brazil, 2007 (IEEE, 2007), vol. 2, pp. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991

  21. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, 2015 (IEEE, 2015), pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  22. Ch. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the im-pact of residual connections on learning,” Proc. AAAI Conf. Artif. Intell. 31 (1) (2017). https://doi.org/10.1609/aaai.v31i1.11231

  23. M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. Mach. Learn. Res. 97, 6105–6114 (2019). http://proceedings.mlr.press/v97/tan19a.html

  24. M. Tan and Q. Le, “EfficientNetV2: Smaller models and faster training,” Proc. Mach. Learn. Res. 139, 10096–10106 (2021). https://proceedings.mlr.press/v139/tan21a.html

  25. Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connetionist text proposal network,” in Computer Vision—ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science, Vol. 9912 (Springer, Cham, 2016), pp. 56–72. https://doi.org/10.1007/978-3-319-46484-8_4

    Book  Google Scholar 

  26. T. Tsai, W. Cheng, C. You, M. Hu, A. W. Tsui, and H. Chi, “Learning and recognition of on-premise signs from weakly labeled street view images,” IEEE Trans. Image Process. 23, 1047–1059 (2014). https://doi.org/10.1109/TIP.2014.2298982

    Article  MathSciNet  MATH  Google Scholar 

  27. A. Watve and S. Sural, “Soccer video processing for the detection of advertisement billboards,” Pattern Recogn. Lett. 29, 994–1006 (2008). https://doi.org/10.1016/j.patrec.2008.01.022

    Article  Google Scholar 

  28. J. Zhou, K. McGuinness, and N. E. O’Connor, “A text recognition and retrieval system for e-business image management,” in MultiMedia Modeling, Ed. by K. Schoeffmann, T. H. Chalidabhongse, Ch. W. Ngo, S. Aramvith, N. E. O’Connor, Yo-S. Ho, M. Gabbouj, and A. Elgammal, Lecture Notes in Computer Science, Vol. 10705 (Springer, Cham, 2018), pp. 23–35. https://doi.org/10.1007/978-3-319-73600-6_3

  29. X. Zhou, C. Yao, H. Wen, Y. ang, S. Zhou, W. He, and J. Liang, “East: An efficient and accurate scene text detector,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii (IEEE, 2017), pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. Samarin, A. Savelev, A. Toropov, A. Dzestelova, V. Malykh, E. Mikhailova or A. Motyko.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Aleksei Samarin is an AI researcher with over 10 years of scientific and industrial experience. The main areas of interest are related to computer vision, pattern recognition, image processing and analysis. Author of more than ten scientific publications.

Alexander Savelev is a master’s student at ITMO University with experience in deep learning, computer vision, data analysis, and natural language processing. Author of four scientific publications.

Aleksei Toropov is a data scientist in the Computer Vision Lab with scientific and industrial experience in computer vision, image processing, and deep learning. Author of two scientific publications.

Alina Dzestelova is a master’s student at ITMO University. Author of three scientific publications.

Valentin Malykh is an AI researcher with over 30 scientific publications, Candidate of Engineering Sciences. The main areas of research are related to natural language understanding and deep learning.

Elena Mikhailova is the director of the Higher School of Digital Culture at ITMO University. Associate Professor, Candidate of Physical and Mathematical Sciences, has more than 30 scientific publications.

Alexandr Motyko is Associate Professor of St. Petersburg Electrotechnical University. Candidate of Engineering Sciences, author of more than 20 scientific publications. The main areas of research are related to image and video processing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Samarin, A., Savelev, A., Toropov, A. et al. Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors. Pattern Recognit. Image Anal. 33, 139–146 (2023). https://doi.org/10.1134/S105466182302013X

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S105466182302013X

Keywords:

Navigation