Text Extraction from Scrolling News Tickers

Pretkalnins, Ingus Janis; Sprogis, Arturs; Barzdins, Guntis

doi:10.1007/978-3-030-57672-1_11

Ingus Janis Pretkalnins¹⁰,
Arturs Sprogis¹⁰ &
Guntis Barzdins¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1243))

Included in the following conference series:

International Baltic Conference on Databases and Information Systems

504 Accesses

Abstract

While a lot of work exists on text or keyword extraction from videos, not a lot can be found on the exact problem of extracting continuous text from scrolling tickers. In this work a novel Tesseract OCR based pipeline is proposed for location and continuous text extraction from scrolling tickers in videos. The solution worked faster than real time, and achieved a character accuracy of 97.3% on 45 min of manually transcribed 360p videos of popular Latvian news shows.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Asif, M.D.A., et al.: A novel hybrid method for text detection and extraction from news videos. Middle-East J. Sci. Res. 19(5), 716–722 (2014)
Google Scholar
Bhowmick, S., Banerjee, P.: Bangla text recognition from video sequence: a new focus. arXiv preprint arXiv:1401.1190 (2014)
Carrasco, R.C.: An open-source OCR evaluation tool. In: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage, DATeCH 2014, pp. 179–184. Association for Computing Machinery, Madrid (2014). ISBN: 9781450325882. https://doi.org/10.1145/2595188.2595221
Ghosh, H., et al.: Multimodal indexing of multilingual news video. Int. J. Digit. Multimedia Broadcast. 2010, 19 (2010)
Google Scholar
How to Use Image Preprocessing to Improve the Accuracy of Tesseract. https://www.freecodecamp.org/news/getting-started-with-tesseract-part-ii-f7f9a0899b3f/. Accessed 02 Mar 2020
Improving the Quality of the Output. https://tesseract-ocr.github.io/tessdoc/ImproveQuality#Borders. Accessed 02 Mar 2020
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)
Article Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
MathSciNet Google Scholar
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuits Syst. Video Technol. 12(4), 256–268 (2002)
Article Google Scholar
Lu, T., et al.: Video Text Detection. Springer, Heidelberg (2014). https://doi.org/10.1007/978-1-4471-6515-6
Book Google Scholar
Optimal Image Resolution (DPI/PPI) for Tesseract 4.0.0 and eng.traineddata? https://groups.google.com/forum/#!msg/tesseract-ocr/Wdh_JJwnw94/24JHDYQbBQAJ. Accessed 02 Mar 2020
Pipeline Code Repository. https://github.com/IMCS-DL4media/DL4media_ticker_extractor. Accessed 02 Mar 2020
Rice, S.V., Jenkins, F.R., Nartker, T.A.: The fourth annual test of OCR accuracy. Technical report 95 (1995)
Google Scholar
Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of google docs OCR, Tesseract, ABBYY finereader, and transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_66
Chapter Google Scholar
Tesseract Documentation. https://tesseract-ocr.github.io/tessdoc/4.0-with-LSTM. Accessed 02 Mar 2020
Tesseract FAQ: Is there a minimum/maximum text size? https://tesseract-ocr.github.io/tessdoc/FAQ-Old#is-there-a-minimum-text-size-it-wont-read-screen-text. Accessed 02 Mar 2020

Download references

Acknowledgements

The authors would like to thank the reviewers for their thought provoking comments.

The research was supported by ERDF project 1.1.1.1/18/A/045 at IMCS, University of Latvia.

Author information

Authors and Affiliations

Institute of Mathematics and Computer Science, University of Latvia, Raina blvd. 29, Riga, 1459, Latvia
Ingus Janis Pretkalnins & Arturs Sprogis
LETA, Marijas 2, Riga, 1050, Latvia
Guntis Barzdins

Authors

Ingus Janis Pretkalnins
View author publications
You can also search for this author in PubMed Google Scholar
Arturs Sprogis
View author publications
You can also search for this author in PubMed Google Scholar
Guntis Barzdins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ingus Janis Pretkalnins .

Editor information

Editors and Affiliations

Department of Computer Systems, Tallinn University of Technology, Tallinn, Estonia
Tarmo Robal
Department of Software Science, Tallinn University of Technology, Tallinn, Estonia
Hele-Mai Haav
Department of Software Science, Tallinn University of Technology, Tallinn, Estonia
Jaan Penjam
Institute of Computer Science, University of Tartu, Tartu, Estonia
Raimundas Matulevičius

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pretkalnins, I.J., Sprogis, A., Barzdins, G. (2020). Text Extraction from Scrolling News Tickers. In: Robal, T., Haav, HM., Penjam, J., Matulevičius, R. (eds) Databases and Information Systems. DB&IS 2020. Communications in Computer and Information Science, vol 1243. Springer, Cham. https://doi.org/10.1007/978-3-030-57672-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-57672-1_11
Published: 12 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57671-4
Online ISBN: 978-3-030-57672-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics