Abstract
Optical Character Recognition (OCR) in video stream of flipping pages is a challenging task because flipping at random speed causes difficulties in identifying the frames that contain the open page image (OPI). Also, low resolution, blurring effect, shadow, etc., add significant noise in selection of proper frames for OCR. In this paper, we focus on identifying a set of representative frames from the video stream of flipping pages without using any explicit hardware and then perform OCR on these frames for recognition. Thus, an end-to-end solution is proposed for video stream of flipping pages. To select an OPI, we present an efficient algorithm that exploits cues from edge information during flipping event. These cues, extracted from the region of interest (ROI) of the frame, determine the flipping or open state of a page. The open state classification is performed by an SVM classifier following training of the edge cue information. After selecting a set of frames for each OPI, a representative frame from OPI set is chosen for OCR. Experiments are performed on videos captured using standard resolution camera. We have obtained 88.81 % accuracy on representative frame selection from the proposed method whereas when compared with GIST (Oliva and Torralba, Int J Comput Vis 42(3):145–175 (2001)), the accuracy was only 51.28 %. To the best of our knowledge this is the first work in this area. After frame selection, we have achieved 83.31 % character recognition accuracy and 78.11 % word recognition accuracy with traditional OCR in our dataset of flipping book.
This is a preview of subscription content, access via your institution.
















References
- 1.
Breuel TM (2008) The OCRopus open source OCR system. In: Proceedings of DRR
- 2.
Bosamiya JH, Agrawal P, Roy PP, Balasubramanian R (2015) Script independent scene text segmentation using fast stroke width transform and GrabCut. In: 3rd IAPR Asian conference on pattern recognition (ACPR), pp 151–155
- 3.
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698
- 4.
Chakraborty D, Roy PP, Pal U, Alvarez JM (2013) OCR from video stream of book flipping. In: Proceedings of the 2nd Asian conference on pattern recognition, pp 130–134
- 5.
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
- 6.
Cunzhao S, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2961–2968
- 7.
Das D, Datong C, Hauptmann AG (2008) Improving multimedia retrieval with a video OCR, pp 68200B-68200B
- 8.
Fujinami K, Inagawa N (2009) Page-flipping detection and information presentation for implicit interaction with a book. Int J Multimedia Ubiquit Eng 93–112
- 9.
Hearn D, Baker MP (1994) Computer graphics. Addison-Wesley
- 10.
Iwamura M, Tsuji T, Horimatsu A, Kise K (2009) Real-time camera-based recognition of characters and pictograms. In: International conference on document analysis and recognition, pp 76–80
- 11.
Lee CW, Jung K, Kim HJ (2003) Automatic text detection and removal in video sequences. Pattern Recogn Lett 24:2607–2623
- 12.
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268
- 13.
Micusik B, Wildenauer H, Kosecka J (2008) Detection and matching of rectilinear structures. In: Proceedings of computer vision and pattern recognition (CVPR), pp 1–7
- 14.
Mishra A, Karteek A, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2687–2694
- 15.
Nakashima T, Watanabe Y, Komuro T, Ishikawa M (2009) Book flipping scanning. In: Symposium on user interface software and technology, vol 22, pp 79–80
- 16.
Neumann L, Matas J (2012) Real-time scene text localization and recognition, pp 3538–3545
- 17.
Ngo CW, Chan CK (2005) Video text detection and segmentation for optical character recognition. Multimedia Systems 10(3):261–272
- 18.
Niblack W (1986) An introduction to digital image processing. Prentice Hall, pp 115–116
- 19.
Ojala T, Pietikäinen M, Mäenpää T (2001) A generalized Local Binary Pattern operator for multiresolution gray scale and rotation invariant texture classification. In: Proceedings of international conference on advances in pattern recognition, pp 399–408
- 20.
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
- 21.
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of spatial envelope. Int J Comput Vis 42(3):145–175
- 22.
onlineocr.net - Free Online OCR service, convert scanned PDF and images to Word, Text
- 23.
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
- 24.
Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U (2013) HMM-based multi oriented text recognition in natural scene image. In: 2nd IAPR Asian conference on pattern recognition, pp 288–292
- 25.
Sato T, Kanade T, Hughes EK, Smith MA (1998) Video OCR for digital news archive, In Proc. In: IEEE international workshop on content-based access of image and video database, pp 52–60
- 26.
Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236
- 27.
Shibayama H, Watanabe Y, Ishikawa M (2012) Reconstruction of 3D surface and restoration of flat document image from monocular image sequence. In: Asian conference on computer vision, pp 350–364
- 28.
Shivakumara P, Bhowmick S, Su B, Tan CL, Pal U (2011) A new gradient based character segmentation method for video text recognition. In: International conference on document analysis and recognition, pp 126–130
- 29.
Singh M, Kaur A (2015) An efficient hybrid scheme for key frame extraction and text localization in video. In: International conference on advances in computing, communications and informatics, pp 1250–1254
- 30.
Smith R (2007) An overview of the Tesseract OCR engine. In: Proceedings of 9th international conference on document analysis and recognition, pp 629–633
- 31.
Su B, Lu S, Tan CL (2013) Robust document image binarization technique for degraded document images. IEEE Trans Image Processing 22(4):1408–1417
- 32.
Vajda S, Rothacker L, Fink GA (2011) A method for camera-based interactive whiteboard reading
- 33.
Vapnik V (1995) The nature of statistical learning theory. Springer Verlang
- 34.
Wu V, Manmatha R, Riseman EM (1997) Finding text in images. In: Proceedings of ACM international conference on digital libraries, pp 23–26
- 35.
Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In: IEEE conference on computer vision and pattern recognition, pp 4042–4049
- 36.
Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605
- 37.
Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
- 38.
Yin XC, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937
- 39.
Zang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: Proceedings of 8th international association pattern recognition international workshop document analysis systems, pp 5–17
- 40.
Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. In: Proceedings of 3rd international conference document analysis and recognition, p 146
Author information
Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chakraborty, D., Roy, P.P., Saini, R. et al. Frame selection for OCR from video stream of book flipping. Multimed Tools Appl 77, 985–1008 (2018). https://doi.org/10.1007/s11042-016-4292-3
Received:
Revised:
Accepted:
Published:
Issue Date:
Keywords
- Video OCR
- OCR of flipping book
- Video document image