Multimedia Tools and Applications

, Volume 63, Issue 2, pp 521–545 | Cite as

Scene text recognition and tracking to identify athletes in sport videos

  • Stefano Messelodi
  • Carla Maria ModenaEmail author


We present an athlete identification module forming part of a system for the personalization of sport video broadcasts. The aim of this module is the localization of athletes in the scene, their identification through the reading of names or numbers printed on their uniforms, and the labelling of frames where athletes are visible. Building upon a previously published algorithm we extract text from individual frames and read these candidates by means of an optical character recognizer (OCR). The OCR-ed text is then compared to a known list of athletes’ names (or numbers), to provide a presence score for each athlete. Text regions are tracked in subsequent frames using a template matching technique. In this way blurred or distorted text, normally unreadable by the OCR, is exploited to provide a denser labelling of the video sequences. Extensive experiments show that the method proposed is fast, robust and reliable, out-performing results of other systems in the literature.


Embedded text detection Text tracking Sport video analysis Athlete identification Text reading Information extraction 



This work has been supported by the European Union under the Strep Project FP7 215248: My eDirector 2012. The authors would like to thank Paul Chippendale for his careful reading of the manuscript.


  1. 1.
    Andrade EL, Khan E, Woods JC, Ghanbari M (2003) Player identification in interactive sport scenes using region space analysis prior information and number recognition. In: International conference on visual information engineering, pp 57–60. Guildford, UKGoogle Scholar
  2. 2.
    Bertini M, Del Bimbo A, Nunziati W (2005) Player identification in soccer videos. In: 7th ACM SIGMM international workshop on multimedia information retrieval, pp 25–32. SingaporeGoogle Scholar
  3. 3.
    Bertini M, Del Bimbo A, Nunziati W (2006) Matching faces with textual cues in soccer videos. In: International conference on multimedia and expo, pp 537–540. Toronto, CanadaGoogle Scholar
  4. 4.
    Crow FC (1984) Summed-area tables for texture mapping. Comput Graph 18(3):207–212CrossRefGoogle Scholar
  5. 5.
    Desolneux A, Moisan L, Morel J-M (2008) From Gestalt theory to image analysis: a probabilistic approach. Springer, New YorkCrossRefGoogle Scholar
  6. 6.
    EU FP7 Project (2011) Real-time context-aware and personalized media streaming environments for large scale broadcasting applications. On-line; accessed 24 June 2011
  7. 7.
    Ezaki N, Bulacu M, Schomaker L (2004) Text detection from natural scene images: towards a system for visually impaired persons. In: International conference on pattern recognition, pp 683–686. Cambridge, UKGoogle Scholar
  8. 8.
    Jia W, He X, Piccardi M (2004) Automatic license plate recognition: a review. In: International conference on imaging science, systems and technology, pp 43–48. Las Vegas, NevadaGoogle Scholar
  9. 9.
    Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recogn 37(5):977–997CrossRefGoogle Scholar
  10. 10.
    Kokaram A, Rea N, Dahyot R, Tekalp M, Bouthemyand P, Gros P, Sezan I (2006) Browsing sports video. IEEE Signal Process Mag 23(2):47–58CrossRefGoogle Scholar
  11. 11.
    Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process (Special Issue on Image and Video Processing for Digital Libraries) 9(1):147–156Google Scholar
  12. 12.
    Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recogn 7(2–3):84–104CrossRefGoogle Scholar
  13. 13.
    Lienhart R (2003) Video OCR: a survey and practitioner’s guide. In: Video mining, pp 155–185. KluwerGoogle Scholar
  14. 14.
    Mancas-Thillou C, Gosselin B (2007) Natural scene text understanding. In: Vision systems: segmentation and pattern recognition, pp 307–332. InTechGoogle Scholar
  15. 15.
    Merino C, Mirmehdi M (2007) A framework towards realtime detection and tracking of text. In: 2nd international workshop on camera-based document analysis and recognition, pp 10–17. Curitiba, BrazilGoogle Scholar
  16. 16.
    Messelodi S, Modena CM (1999) Automatic identification and skew estimation of text lines in real scene images. Pattern Recogn 32(5):791–810CrossRefGoogle Scholar
  17. 17.
    Mirmehdi M (ed) (2005) Special issue on camera-based text and document recognition. Int J Doc Anal Recogn 7(2–3):83–200CrossRefGoogle Scholar
  18. 18.
    Myers EW (1986) An O(ND) difference algorithm and its variations. Algorithmica 1(2):251–266MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Myers GK, Burns B (2005) A robust method for tracking scene text in video. In: 1st international workshop camera-based document analysis and recognition, pp 30–35. Seoul, KoreaGoogle Scholar
  20. 20.
    Myers GK, Bolles R, Luong Q-T, Herson J, Aradhye H (2005) Rectification and recognition of text in 3-D scenes. Int J Doc Anal Recogn 7(4):147–158CrossRefGoogle Scholar
  21. 21.
    Patrikakis C, Pnevmatikakis A, Chippendale P, Nunes M, Santos Cruz R, Poslad S, Zhenchen W, Papaoulakis N, Papageorgiou P (2010) Direct your personal coverage of large athletic events. In: IEEE MultiMediaGoogle Scholar
  22. 22.
    Pnevmatikakis A, Katsarakis N, Chippendale P, Andreatta C, Messelodi S, Modena CM, Tobia F (2010) Tracking for context extraction in athletic events. In: International workshop on social, adaptive and personalized multimedia interaction and access, ACM Multimedia, pp 67–72. Florence, ItalyGoogle Scholar
  23. 23.
    Rice SV, Jenkins FR, Nartker TA (1995) The fourth annual test of OCR accuracy. Technical report TR-95-03, Information Science Research Institute, University of Nevada, Las VegasGoogle Scholar
  24. 24.
    Saric M, Dujmic H, Papic V, Rozic N, Radic J (2009) Player number recognition in soccer video using internal contours and temporal redundancy. In: 10th WSEAS international conference on automation and information, pp 175–180. Prague, Czech RepublicGoogle Scholar
  25. 25.
    Sato T, Kanade T, Hughes EK, Smith MA, Satoh S (1999) Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimedia Syst 7(5):385–395CrossRefGoogle Scholar
  26. 26.
    Satoh S, Nakamura Y, Kanade T (1999) Name-it: naming and detecting faces in news videos. IEEE Multimedia 6(1):22–35CrossRefGoogle Scholar
  27. 27.
    Shen H, Coughlan J (2006) Finding text in natural scenes by figure-ground segmentation. In: International conference on pattern recognition, pp 113–118. Hong KongGoogle Scholar
  28. 28.
    Smith R (2007) An overview of the Tesseract OCR engine. In: 9th international conference on document analysis and recognition, pp 629–633. Curitiba, BrazilGoogle Scholar
  29. 29.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: International conference on computer vision and pattern recognition, pp 511–518. Kanai, HawaiiGoogle Scholar
  30. 30.
    Weinman JJ, Learned-Miller E, Hanson AR (2009) Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans Pattern Anal Mach Intell 31(10):1733–1746CrossRefGoogle Scholar
  31. 31.
    Wu W, Chen X, Yang J (2005) Detection of text on road signs from video. IEEE Trans Intell Transport Syst 6(4):378–390CrossRefGoogle Scholar
  32. 32.
    Yang J, Chen M-Y, Hauptmann A (2004) Finding person X: correlating names with visual appearances. In: International conference on image and video retrieval, pp 270–278. Dublin, IrelandGoogle Scholar
  33. 33.
    Ye Q, Huang Q, Jiang S, Liu Y, Gao W (2005) Jersey number detection in sports video for athlete identification. In: Visual communications and image processing, SPIE 5960, pp 1599–1606. Beijing, ChinaGoogle Scholar
  34. 34.
    Zhang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: 8th IAPR workshop on document analysis systems, pp 5–17. Nara, JapanGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.FBK-irstPovoItaly

Personalised recommendations