ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition

Tom, George; Mathew, Minesh; Garcia-Bordils, Sergi; Karatzas, Dimosthenis; Jawahar, C.V.

doi:10.1007/978-3-031-41679-8_35

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14188))

Included in the following conference series:

International Conference on Document Analysis and Recognition

965 Accesses

Abstract

In this report, we present the final results of the ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition. The RoadText challenge is based on the RoadText-1K dataset and aims to assess and enhance current methods for scene text detection, recognition, and tracking in videos. The RoadText-1K dataset contains 1000 dash cam videos with annotations for text bounding boxes and transcriptions in every frame. The competition features an end-to-end task, requiring systems to accurately detect, track, and recognize text in dash cam videos. The paper presents a comprehensive review of the submitted methods along with a detailed analysis of the results obtained by the methods. The analysis provides valuable insights into the current capabilities and limitations of video text detection, tracking, and recognition systems for dashcam videos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://rrc.cvc.uab.es/?ch=25.

References

Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)
Google Scholar
Bautista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp. 178–196. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_11
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process. 2008, 1–10 (2008)
Article Google Scholar
Cai, Z., Vasconcelos, N.: Cascade r-cnn: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)
Article Google Scholar
Cao, J., Weng, X., Khirodkar, R., Pang, J., Kitani, K.: Observation-centric sort: rethinking sort for robust multi-object tracking. arXiv preprint arXiv:2203.14360 (2022)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Garcia-Bordils, S., et al.: Read while you drive - multilingual text tracking on the road. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems. DAS 2022. LNCS, vol. 13237, pp. 756–770. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_51
JaidedAI: Easyocr. https://github.com/JaidedAI/EasyOCR
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR (2013)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Google Scholar
Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 919–931 (2022)
Article Google Scholar
Long, S., Qin, S., Panteleev, D., Bissacco, A., Fujii, Y., Raptis, M.: Towards end-to-end unified scene text detection and layout analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: Mot16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
Nguyen, P.X., Wang, K., Belongie, S.J.: Video text detection and recognition: dataset and benchmark. In: WACV (2014)
Google Scholar
Reddy, S., Mathew, M., Gomez, L., Rusinol, M., Karatzas, D., Jawahar, C.: Roadtext-1k: text detection & recognition dataset for driving videos. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 11074–11080. IEEE (2020)
Google Scholar
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016, Part II. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_2
Chapter Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Tian, S., Yin, X., Su, Y., Hao, H.W.: A unified framework for tracking based text detection and recognition from web videos. In: TPAMI (2018)
Google Scholar
Wu, W., Shen, C., et al.: End-to-end video text spotting with transformer. arxiv (2022)
Google Scholar
Wu, W., et al.: Bovtext: a large-scale, multidimensional multilingual dataset for video text spotting. Organization (2021)
Google Scholar
Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)
Google Scholar
Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9519–9528 (2022)
Google Scholar
Zhang, Y., et al.: Bytetrack: multi-object tracking by associating every detection box. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022, Part XXII, LNCS, vol. 13682, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_1

Download references

Acknowledgement

This work has been supported by IHub-Data at IIIT-Hyderabad, and MeitY, and grants PDC2021-121512-I00, and PID2020-116298GB-I00 funded by MCIN/AEI/10.13039/501100011033 and the European Union NextGenerationEU/PRTR.

Author information

Authors and Affiliations

Center for Visual Information Technology (CVIT), IIIT Hyderabad, Hyderabad, India
George Tom, Minesh Mathew & C.V. Jawahar
Computer Vision Center (CVC), UAB, Barcelona, Spain
Sergi Garcia-Bordils & Dimosthenis Karatzas
AllRead Machine Learning Technologies, Barcelona, Spain
Sergi Garcia-Bordils

Authors

George Tom
View author publications
You can also search for this author in PubMed Google Scholar
Minesh Mathew
View author publications
You can also search for this author in PubMed Google Scholar
Sergi Garcia-Bordils
View author publications
You can also search for this author in PubMed Google Scholar
Dimosthenis Karatzas
View author publications
You can also search for this author in PubMed Google Scholar
C.V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Tom .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tom, G., Mathew, M., Garcia-Bordils, S., Karatzas, D., Jawahar, C. (2023). ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-41679-8_35
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41678-1
Online ISBN: 978-3-031-41679-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition