Frame selection for OCR from video stream of book flipping


Optical Character Recognition (OCR) in video stream of flipping pages is a challenging task because flipping at random speed causes difficulties in identifying the frames that contain the open page image (OPI). Also, low resolution, blurring effect, shadow, etc., add significant noise in selection of proper frames for OCR. In this paper, we focus on identifying a set of representative frames from the video stream of flipping pages without using any explicit hardware and then perform OCR on these frames for recognition. Thus, an end-to-end solution is proposed for video stream of flipping pages. To select an OPI, we present an efficient algorithm that exploits cues from edge information during flipping event. These cues, extracted from the region of interest (ROI) of the frame, determine the flipping or open state of a page. The open state classification is performed by an SVM classifier following training of the edge cue information. After selecting a set of frames for each OPI, a representative frame from OPI set is chosen for OCR. Experiments are performed on videos captured using standard resolution camera. We have obtained 88.81 % accuracy on representative frame selection from the proposed method whereas when compared with GIST (Oliva and Torralba, Int J Comput Vis 42(3):145–175 (2001)), the accuracy was only 51.28 %. To the best of our knowledge this is the first work in this area. After frame selection, we have achieved 83.31 % character recognition accuracy and 78.11 % word recognition accuracy with traditional OCR in our dataset of flipping book.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16


  1. 1.

    Breuel TM (2008) The OCRopus open source OCR system. In: Proceedings of DRR

  2. 2.

    Bosamiya JH, Agrawal P, Roy PP, Balasubramanian R (2015) Script independent scene text segmentation using fast stroke width transform and GrabCut. In: 3rd IAPR Asian conference on pattern recognition (ACPR), pp 151–155

  3. 3.

    Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698

    Article  Google Scholar 

  4. 4.

    Chakraborty D, Roy PP, Pal U, Alvarez JM (2013) OCR from video stream of book flipping. In: Proceedings of the 2nd Asian conference on pattern recognition, pp 130–134

  5. 5.

    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27

  6. 6.

    Cunzhao S, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2961–2968

  7. 7.

    Das D, Datong C, Hauptmann AG (2008) Improving multimedia retrieval with a video OCR, pp 68200B-68200B

  8. 8.

    Fujinami K, Inagawa N (2009) Page-flipping detection and information presentation for implicit interaction with a book. Int J Multimedia Ubiquit Eng 93–112

  9. 9.

    Hearn D, Baker MP (1994) Computer graphics. Addison-Wesley

  10. 10.

    Iwamura M, Tsuji T, Horimatsu A, Kise K (2009) Real-time camera-based recognition of characters and pictograms. In: International conference on document analysis and recognition, pp 76–80

  11. 11.

    Lee CW, Jung K, Kim HJ (2003) Automatic text detection and removal in video sequences. Pattern Recogn Lett 24:2607–2623

    Article  Google Scholar 

  12. 12.

    Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268

    Article  Google Scholar 

  13. 13.

    Micusik B, Wildenauer H, Kosecka J (2008) Detection and matching of rectilinear structures. In: Proceedings of computer vision and pattern recognition (CVPR), pp 1–7

  14. 14.

    Mishra A, Karteek A, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2687–2694

  15. 15.

    Nakashima T, Watanabe Y, Komuro T, Ishikawa M (2009) Book flipping scanning. In: Symposium on user interface software and technology, vol 22, pp 79–80

  16. 16.

    Neumann L, Matas J (2012) Real-time scene text localization and recognition, pp 3538–3545

  17. 17.

    Ngo CW, Chan CK (2005) Video text detection and segmentation for optical character recognition. Multimedia Systems 10(3):261–272

    Article  Google Scholar 

  18. 18.

    Niblack W (1986) An introduction to digital image processing. Prentice Hall, pp 115–116

  19. 19.

    Ojala T, Pietikäinen M, Mäenpää T (2001) A generalized Local Binary Pattern operator for multiresolution gray scale and rotation invariant texture classification. In: Proceedings of international conference on advances in pattern recognition, pp 399–408

  20. 20.

    Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  MATH  Google Scholar 

  21. 21.

    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  22. 22. - Free Online OCR service, convert scanned PDF and images to Word, Text

  23. 23.

    Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    MathSciNet  Article  Google Scholar 

  24. 24.

    Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U (2013) HMM-based multi oriented text recognition in natural scene image. In: 2nd IAPR Asian conference on pattern recognition, pp 288–292

  25. 25.

    Sato T, Kanade T, Hughes EK, Smith MA (1998) Video OCR for digital news archive, In Proc. In: IEEE international workshop on content-based access of image and video database, pp 52–60

  26. 26.

    Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236

    Article  Google Scholar 

  27. 27.

    Shibayama H, Watanabe Y, Ishikawa M (2012) Reconstruction of 3D surface and restoration of flat document image from monocular image sequence. In: Asian conference on computer vision, pp 350–364

  28. 28.

    Shivakumara P, Bhowmick S, Su B, Tan CL, Pal U (2011) A new gradient based character segmentation method for video text recognition. In: International conference on document analysis and recognition, pp 126–130

  29. 29.

    Singh M, Kaur A (2015) An efficient hybrid scheme for key frame extraction and text localization in video. In: International conference on advances in computing, communications and informatics, pp 1250–1254

  30. 30.

    Smith R (2007) An overview of the Tesseract OCR engine. In: Proceedings of 9th international conference on document analysis and recognition, pp 629–633

  31. 31.

    Su B, Lu S, Tan CL (2013) Robust document image binarization technique for degraded document images. IEEE Trans Image Processing 22(4):1408–1417

    MathSciNet  Article  MATH  Google Scholar 

  32. 32.

    Vajda S, Rothacker L, Fink GA (2011) A method for camera-based interactive whiteboard reading

  33. 33.

    Vapnik V (1995) The nature of statistical learning theory. Springer Verlang

  34. 34.

    Wu V, Manmatha R, Riseman EM (1997) Finding text in images. In: Proceedings of ACM international conference on digital libraries, pp 23–26

  35. 35.

    Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In: IEEE conference on computer vision and pattern recognition, pp 4042–4049

  36. 36.

    Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605

    MathSciNet  Article  MATH  Google Scholar 

  37. 37.

    Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Article  Google Scholar 

  38. 38.

    Yin XC, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937

    Article  Google Scholar 

  39. 39.

    Zang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: Proceedings of 8th international association pattern recognition international workshop document analysis systems, pp 5–17

  40. 40.

    Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. In: Proceedings of 3rd international conference document analysis and recognition, p 146

Download references

Author information



Corresponding author

Correspondence to Partha Pratim Roy.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chakraborty, D., Roy, P.P., Saini, R. et al. Frame selection for OCR from video stream of book flipping. Multimed Tools Appl 77, 985–1008 (2018).

Download citation


  • Video OCR
  • OCR of flipping book
  • Video document image