Abstract
This paper presents the final results of the ICDAR 2023 Competition on Born Digital Video Text Question Answering (i.e., BDVT-QA) which contains two major task tracks: 1) End-to-End Video Text Spotting, and 2) Video Text Question Answering. BDVT-QA aims to spot texts and answer questions from born-digital videos. The proposed competition introduces a brand new dataset consisting of 1,000 video clips fully annotated with manually-designed question/answer pairs, where the answers are based on the text captions presented in the video clips. A total of 23 final submissions were received for this competition. The top-3 performances of each track are as follows: 1)T1.1 - 57.53%, T1.2 - 53.3%, T1.3 - 52.35%, and 2) T2.1 - 31.2%, T2.2 - 28.84%, T2.3 - 21.19%. We summarize the submitted methods and give a deep analysis. Besides, this paper also includes dataset descriptions, task definitions and evaluation protocols. The dataset and the final ranking of submissions are publicly available on the challenge’s official website: https://tianchi.aliyun.com/specials/promotion/ICDAR_2023_Competition_on_Born_Digital_Video_Text_QA.
Z. Yang, X. Song and S. Song—Equal Contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bautista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 178–196. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_11
Biten, A.F., et al.: ICDAR 2019 competition on scene text visual question answering. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1563–1570. IEEE (2019)
Biten, A.F., et al.: Scene text visual question answering. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4290–4300 (2019)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 6154–6162. Computer Vision Foundation/IEEE Computer Society (2018)
Cheng, Z., Lu, J., Niu, Y., Pu, S., Wu, F., Zhou, S.: You only recognize once: towards fast video text spotting. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 855–863 (2019)
Chng, C.K., et al.: ICDAR2019 robust reading challenge on arbitrary-shaped text - RRC-art. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, Australia, 20–25 September 2019, pp. 1571–1576. IEEE (2019)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 2315–2324. IEEE Computer Society (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016)
Kuang, Z., et al.: MMOCR: a comprehensive toolbox for text detection, recognition and understanding. In: MM 2021: ACM Multimedia Conference, Virtual Event, China, 20–24 October 2021, pp. 3791–3794. ACM (2021)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Nayef, N., et al.: ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition - RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, Australia, 20–25 September 2019, pp. 1582–1587. IEEE (2019)
Reddy, S., Mathew, M., Gómez, L., Rusiñol, M., Karatzas, D., Jawahar, C.V.: Roadtext-1k: text detection & recognition dataset for driving videos. In: 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, 31 May–31 August 2020, pp. 11074–11080. IEEE (2020)
Singh, A., et al.: Towards VQA models that can read. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8309–8318 (2019)
Tian, S., Pei, W.Y., Zuo, Z.Y., Yin, X.C.: Scene text detection in video by learning locally and globally. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 2647–2653 (2016)
Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 5987–5995. IEEE Computer Society (2017)
Xue, C., Zhang, W., Hao, Y., Lu, S., Torr, P.H.S., Bai, S.: Language matters: a weakly supervised vision-language pre-training approach for scene text detection and spotting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 284–302. Springer, Cham (2022)
Yang, X.H., He, W., Yin, F., Liu, C.L.: A unified video text detection method with network flow. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 331–336 (2017)
Acknowledgments
The authors express their gratitude to the Competition Chairs for their valuable input in organizing the competition and for their critical review of the competition report. This challenge is sponsored by Alibaba Group. This work is also supported by NSFC (62225603), NSFC (61672273) and NSFC (61832008).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Z. et al. (2023). ICDAR 2023 Competition on Born Digital Video Text Question Answering. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-41679-8_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41678-1
Online ISBN: 978-3-031-41679-8
eBook Packages: Computer ScienceComputer Science (R0)