Abstract
We developed a scene text recognition system with active vision capabilities, namely: auto-focus, adaptive aperture control and auto-zoom. Our localization system is able to delimit text regions in images with complex backgrounds, and is based on an attentional cascade, asymmetric adaboost, decision trees and Gaussian mixture models. We think that text could become a valuable source of semantic information for robots, and we aim to raise interest in it within the robotics community. Moreover, thanks to the robot’s pan-tilt-zoom camera and to the active vision behaviors, the robot can use its affordances to overcome hindrances to the performance of the perceptual task. Detrimental conditions, such as poor illumination, blur, low resolution, etc. are very hard to deal with once an image has been captured and can often be prevented. We evaluated the localization algorithm on a public dataset and one of our own with encouraging results. Furthermore, we offer an interesting experiment in active vision, which makes us consider that active sensing in general should be considered early on when addressing complex perceptual problems in embodied agents.
Chapter PDF
Similar content being viewed by others
Keywords
References
Álvarez Ruiz, J.A.: Learning to Discriminate Text from Synthetic Data. In: Röfer, T., Mayer, N.M., Savage, J., Saranlı, U. (eds.) RoboCup 2011. LNCS, vol. 7416, pp. 270–281. Springer, Heidelberg (2012)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees, 1st edn. Chapman and Hall/CRC (January 1984)
Breuer, T., Giorgana Macedo, G., Hartanto, R., Hochgeschwender, N., Holz, D., Hegger, F., Jin, Z., Müller, C., Paulus, J., Reckhaus, M., Álvarez Ruiz, J.A., Plöger, P., Kraetzschmar, G.: Johnny: An autonomous service robot for domestic environments. Journal of Intelligent & Robotic Systems 66, 245–272 (2012), 10.1007/s10846-011-9608-y
Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, June 27-July 2, vol. 2, pp. II-366–II-373 (2004)
Dalal, N.: Finding people in images and videos. PhD thesis, Institut National Polytechnique de Grenoble (July 2006)
Dewey, J.: The reflex arc concept in psychology. Psychological Review 3(4), 357 (1896)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2963–2970 (June 2010)
Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, University of Washington, Department of Statistic (2006) (revised 2009)
Huber, R., Nowak, C., Spatzek, B., Schreiber, D.: Adaptive aperture control for image enhancement. In: 2003 IEEE International Workshop on Computer Architectures for Machine Perception, pp. 7–11 (May 2003)
Iwatsuka, K., Yamamoto, K., Kato, K.: Development of a guide dog system for the blind people with character recognition ability. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 1, pp. 453–456 (August 2004)
Krotkov, E.: Focusing. International Journal of Computer Vision 1, 223–237 (1987)
Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., Ashida, K., Nagai, H., Okamoto, M., Yamamoto, H., et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. International Journal on Document Analysis and Recognition 7(2), 105–122 (2005)
Micheloni, C., Foresti, G.: Active tuning of intrinsic camera parameters. IEEE Transactions on Automation Science and Engineering 6(4), 577–587 (2009)
Mirmehdi, M., Clark, P.: Extracting low resolution text with an active camera for OCR. In: IX Spanish Symposium on Pattern Recognition and Image Processing, pp. 43–48 (2001)
Pan, Y.-F., Hou, X., Liu, C.-L.: A Robust System to Detect and Localize Texts in Natural Scene Images. In: The Eighth IAPR International Workshop on Document Analysis Systems, pp. 35–42 (September 2008)
Pan, Y.-F., Hou, X., Liu, C.-L.: Text Localization in Natural Scene Images Based on Conditional Random Field. In: 10th International Conference on Document Analysis and Recognition, pp. 6–10 (July 2009)
Posner, I., Corke, P., Newman, P.: Using text-spotting to query the world. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3181–3186 (October 2010)
Shiratori, H., Goto, H., Kobayashi, H.: An efficient text capture method for moving robots using dct feature and text tracking. In: International Conference on Pattern Recognition, vol. 2, pp. 1050–1053 (2006)
Tanaka, M., Goto, H.: Autonomous text capturing robot using improved dct feature and text tracking. In: International Conference on Document Analysis and Recognition, vol. 2, pp. 1178–1182 (2007)
Therneau, T., Atkinson, E.: An introduction to recursive partitioning using the RPART routines. Technical Report, Technical Report 61 (1997), http://www.mayo.edu/hsr/techrpt/61.pdf
Viola, P.: Fast and robust classification using asymmetric adaboost and a detector cascade. In: Advances in Neural Information Processing Systems (2002)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, pp. I-511–I-518 (2001)
Willson, R.G.: Modeling and calibration of automated zoom lenses. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, UMI Order No. GAX94-19735 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ruiz, J.A.Á., Plöger, P., Kraetzschmar, G.K. (2013). Active Scene Text Recognition for a Domestic Service Robot. In: Chen, X., Stone, P., Sucar, L.E., van der Zant, T. (eds) RoboCup 2012: Robot Soccer World Cup XVI. RoboCup 2012. Lecture Notes in Computer Science(), vol 7500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39250-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-39250-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39249-8
Online ISBN: 978-3-642-39250-4
eBook Packages: Computer ScienceComputer Science (R0)