Abstract
We present a description of various custom image descriptor modifications that are used as part of an image classification pipeline with text elements. The problem under consideration is related to the classification of images of commercial facades by the type of services provided. Some of the proposed descriptor types are presented for the first time and demonstrate state-of-the-art performance on open datasets. In our study, we used a special type of descriptor for image areas with text based on traces of the movement of agents. The traces in question are generated using parameterized movement strategies, which are presented and compared in this article.
REFERENCES
L. Ballan, M. Bertini, A. D. Bimbo, and A. Jain, “Automatic trademark detection and recognition in sport videos,” in 2008 IEEE Int. Conf. on Multimedia and Expo, Hannover, 2008 (IEEE, 2008), pp. 901–904. https://doi.org/10.1109/ICME.2008.4607581
D. A. Chacra and J. Zelek, “Road segmentation in street view images using texture information,” in 13th Conf. on Computer and Robot Vision (CRV), Victoria, Canada, 2016 (IEEE, 2016), pp. 424–431. https://doi.org/10.1109/CRV.2016.47
T. Chattopadhyay and A. Sinha, “Recognition of trademarks from sports videos for channel hyperlinking in consumer end,” in IEEE 13th Int. Symp. on Consumer Electronics, Kyoto, Japan, 2009 (IEEE, 2009), pp. 943–947. https://doi.org/10.1109/ISCE.2009.5156881
A. Clavelli and D. Karatzas, “Text segmentation in colour posters from the Spanish Civil War era,” in 10th Int. Conf. on Document Analysis and Recognition, Barcelona, 2009 (IEEE, 2009), pp. 181–185. https://doi.org/10.1109/ICDAR.2009.32
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conf. on Computer Vision and Pattern Recognition, Miami, Fla., 2009 (IEEE, 2009), pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
G. Howard, A. Zhu, M. Chen, B. Kalenichenko, D. Wang, W. Weyand, T. Andreetto, and M. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” (2017). arXiv:1704.04861 [cs.CV]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016 (IEEE, 2016), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243
T. Intasuwan, J. Kaewthong, and S. Vittayakorn, “Text and object detection on billboards,” in 10th Int. Conf. on Information Technology and Electrical Engineering (ICITEE), Bali, Indonesia, 2018 (IEEE, 2018), pp. 6–11. https://doi.org/10.1109/ICITEED.2018.8534879
T.-Yi Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context,” in Computer Vision—ECCV 2014, Ed. by D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Lecture Notes in Computer Science, Vol. 8693 (Springer, Cham, 2014), pp. 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
T. Liu, S. Fang, Y. Zhao, P. Wang, and J. Zhang, “Implementation of training convolutional neural networks,” (2015). arXiv:1506.01195 [cs.CV]
V. Malykh and A. Samarin, “Combined advertising sign classifier,” in Analysis of Images, Social Networks and Texts. AIST 2019, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, V. Kuskova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevictch, A. Napoli, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 11832 (Springer, Cham, 2019), pp. 179–185. https://doi.org/10.1007/978-3-030-37334-4_16
X. Wei, S. L. Phung, and A. Bouzerdoum, “Visual descriptors for scene categorization: experimental evaluation,” Artif. Intell. Rev. 45, 333–368 (2016). https://doi.org/10.1007/s10462-015-9448-4
S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol, “Scalable logo recognition in real-world images,” in Proc. 1st ACM Int. Conf. on Multimedia Retrieval, Trento, Italy, 2011 (Association for Computing Machinery, New York, 2011), p. 25. https://doi.org/10.1145/1991996.1992021
A. Samarin and V. Malykh, “Worm-like image descriptor for signboard classification,” CEUR Workshop Proc. 2691, 17 (2020). https://ceur-ws.org/Vol-2691/paper17.pdf
A. Samarin and V. Malykh, “Ensemble-based commercial buildings facades photographs classifier,” in Analysis of Images, Social Networks and Texts, Ed. by W. M. P. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, and E. Tutubalina, Lecture Notes in Computer Science, Vol. 12602 (Springer, Cham, 2021), pp. 257–265. https://doi.org/10.1007/978-3-030-72610-2_19
A. Samarin, V. Malykh V., Muravyov, S., “Specialized image descriptors for signboard photographs classification,” in Databases and Information Systems. DB&IS 2020, Ed. by T. Robal, H. M. Haav, J. Penjam, and R. Matulevičius, Communications in Computer and Information Science, Vol. 1243 (Springer, Cham, 2020), pp. 122–129. https://doi.org/10.1007/978-3-030-57672-1_10
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-Ch. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, Utah, 2018 (IEEE, 2018), pp. 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” (2014). arXiv:1409.1556 [cs.CV]
R. Smith, An overview of the Tesseract OCR engine,” in Ninth Int. Conf. on Document Analysis and Recognition (ICDAR 2007), Cutitiba, Brazil, 2007 (IEEE, 2007), vol. 2, pp. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, 2015 (IEEE, 2015), pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Ch. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the im-pact of residual connections on learning,” Proc. AAAI Conf. Artif. Intell. 31 (1) (2017). https://doi.org/10.1609/aaai.v31i1.11231
M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. Mach. Learn. Res. 97, 6105–6114 (2019). http://proceedings.mlr.press/v97/tan19a.html
M. Tan and Q. Le, “EfficientNetV2: Smaller models and faster training,” Proc. Mach. Learn. Res. 139, 10096–10106 (2021). https://proceedings.mlr.press/v139/tan21a.html
Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connetionist text proposal network,” in Computer Vision—ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science, Vol. 9912 (Springer, Cham, 2016), pp. 56–72. https://doi.org/10.1007/978-3-319-46484-8_4
T. Tsai, W. Cheng, C. You, M. Hu, A. W. Tsui, and H. Chi, “Learning and recognition of on-premise signs from weakly labeled street view images,” IEEE Trans. Image Process. 23, 1047–1059 (2014). https://doi.org/10.1109/TIP.2014.2298982
A. Watve and S. Sural, “Soccer video processing for the detection of advertisement billboards,” Pattern Recogn. Lett. 29, 994–1006 (2008). https://doi.org/10.1016/j.patrec.2008.01.022
J. Zhou, K. McGuinness, and N. E. O’Connor, “A text recognition and retrieval system for e-business image management,” in MultiMedia Modeling, Ed. by K. Schoeffmann, T. H. Chalidabhongse, Ch. W. Ngo, S. Aramvith, N. E. O’Connor, Yo-S. Ho, M. Gabbouj, and A. Elgammal, Lecture Notes in Computer Science, Vol. 10705 (Springer, Cham, 2018), pp. 23–35. https://doi.org/10.1007/978-3-319-73600-6_3
X. Zhou, C. Yao, H. Wen, Y. ang, S. Zhou, W. He, and J. Liang, “East: An efficient and accurate scene text detector,” in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii (IEEE, 2017), pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Aleksei Samarin is an AI researcher with over 10 years of scientific and industrial experience. The main areas of interest are related to computer vision, pattern recognition, image processing and analysis. Author of more than ten scientific publications.
Alexander Savelev is a master’s student at ITMO University with experience in deep learning, computer vision, data analysis, and natural language processing. Author of four scientific publications.
Aleksei Toropov is a data scientist in the Computer Vision Lab with scientific and industrial experience in computer vision, image processing, and deep learning. Author of two scientific publications.
Alina Dzestelova is a master’s student at ITMO University. Author of three scientific publications.
Valentin Malykh is an AI researcher with over 30 scientific publications, Candidate of Engineering Sciences. The main areas of research are related to natural language understanding and deep learning.
Elena Mikhailova is the director of the Higher School of Digital Culture at ITMO University. Associate Professor, Candidate of Physical and Mathematical Sciences, has more than 30 scientific publications.
Alexandr Motyko is Associate Professor of St. Petersburg Electrotechnical University. Candidate of Engineering Sciences, author of more than 20 scientific publications. The main areas of research are related to image and video processing.
Rights and permissions
About this article
Cite this article
Samarin, A., Savelev, A., Toropov, A. et al. Predictors Based on Convolutional Neural Networks for the Movement Strategy of Trainable Agents for Building Customized Image Descriptors. Pattern Recognit. Image Anal. 33, 139–146 (2023). https://doi.org/10.1134/S105466182302013X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S105466182302013X