Active Scene Text Recognition for a Domestic Service Robot

Ruiz, José Antonio Álvarez; Plöger, Paul; Kraetzschmar, Gerhard K.

doi:10.1007/978-3-642-39250-4_23

José Antonio Álvarez Ruiz²³,
Paul Plöger²³ &
Gerhard K. Kraetzschmar²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7500))

Included in the following conference series:

Robot Soccer World Cup

2049 Accesses

Abstract

We developed a scene text recognition system with active vision capabilities, namely: auto-focus, adaptive aperture control and auto-zoom. Our localization system is able to delimit text regions in images with complex backgrounds, and is based on an attentional cascade, asymmetric adaboost, decision trees and Gaussian mixture models. We think that text could become a valuable source of semantic information for robots, and we aim to raise interest in it within the robotics community. Moreover, thanks to the robot’s pan-tilt-zoom camera and to the active vision behaviors, the robot can use its affordances to overcome hindrances to the performance of the perceptual task. Detrimental conditions, such as poor illumination, blur, low resolution, etc. are very hard to deal with once an image has been captured and can often be prevented. We evaluated the localization algorithm on a public dataset and one of our own with encouraging results. Furthermore, we offer an interesting experiment in active vision, which makes us consider that active sensing in general should be considered early on when addressing complex perceptual problems in embodied agents.

Download to read the full chapter text

Chapter PDF

Text Localization and Recognition in Images and Video

Spatially Prioritized and Persistent Text Detection and Decoding

Vision Transformer for Fast and Efficient Scene Text Recognition

Keywords

References

Álvarez Ruiz, J.A.: Learning to Discriminate Text from Synthetic Data. In: Röfer, T., Mayer, N.M., Savage, J., Saranlı, U. (eds.) RoboCup 2011. LNCS, vol. 7416, pp. 270–281. Springer, Heidelberg (2012)
Chapter Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees, 1st edn. Chapman and Hall/CRC (January 1984)
Google Scholar
Breuer, T., Giorgana Macedo, G., Hartanto, R., Hochgeschwender, N., Holz, D., Hegger, F., Jin, Z., Müller, C., Paulus, J., Reckhaus, M., Álvarez Ruiz, J.A., Plöger, P., Kraetzschmar, G.: Johnny: An autonomous service robot for domestic environments. Journal of Intelligent & Robotic Systems 66, 245–272 (2012), 10.1007/s10846-011-9608-y
Google Scholar
Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, June 27-July 2, vol. 2, pp. II-366–II-373 (2004)
Google Scholar
Dalal, N.: Finding people in images and videos. PhD thesis, Institut National Polytechnique de Grenoble (July 2006)
Google Scholar
Dewey, J.: The reflex arc concept in psychology. Psychological Review 3(4), 357 (1896)
Article Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2963–2970 (June 2010)
Google Scholar
Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, University of Washington, Department of Statistic (2006) (revised 2009)
Google Scholar
Huber, R., Nowak, C., Spatzek, B., Schreiber, D.: Adaptive aperture control for image enhancement. In: 2003 IEEE International Workshop on Computer Architectures for Machine Perception, pp. 7–11 (May 2003)
Google Scholar
Iwatsuka, K., Yamamoto, K., Kato, K.: Development of a guide dog system for the blind people with character recognition ability. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 1, pp. 453–456 (August 2004)
Google Scholar
Krotkov, E.: Focusing. International Journal of Computer Vision 1, 223–237 (1987)
Article Google Scholar
Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., Ashida, K., Nagai, H., Okamoto, M., Yamamoto, H., et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. International Journal on Document Analysis and Recognition 7(2), 105–122 (2005)
Article Google Scholar
Micheloni, C., Foresti, G.: Active tuning of intrinsic camera parameters. IEEE Transactions on Automation Science and Engineering 6(4), 577–587 (2009)
Article Google Scholar
Mirmehdi, M., Clark, P.: Extracting low resolution text with an active camera for OCR. In: IX Spanish Symposium on Pattern Recognition and Image Processing, pp. 43–48 (2001)
Google Scholar
Pan, Y.-F., Hou, X., Liu, C.-L.: A Robust System to Detect and Localize Texts in Natural Scene Images. In: The Eighth IAPR International Workshop on Document Analysis Systems, pp. 35–42 (September 2008)
Google Scholar
Pan, Y.-F., Hou, X., Liu, C.-L.: Text Localization in Natural Scene Images Based on Conditional Random Field. In: 10th International Conference on Document Analysis and Recognition, pp. 6–10 (July 2009)
Google Scholar
Posner, I., Corke, P., Newman, P.: Using text-spotting to query the world. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3181–3186 (October 2010)
Google Scholar
Shiratori, H., Goto, H., Kobayashi, H.: An efficient text capture method for moving robots using dct feature and text tracking. In: International Conference on Pattern Recognition, vol. 2, pp. 1050–1053 (2006)
Google Scholar
Tanaka, M., Goto, H.: Autonomous text capturing robot using improved dct feature and text tracking. In: International Conference on Document Analysis and Recognition, vol. 2, pp. 1178–1182 (2007)
Google Scholar
Therneau, T., Atkinson, E.: An introduction to recursive partitioning using the RPART routines. Technical Report, Technical Report 61 (1997), http://www.mayo.edu/hsr/techrpt/61.pdf
Viola, P.: Fast and robust classification using asymmetric adaboost and a detector cascade. In: Advances in Neural Information Processing Systems (2002)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, pp. I-511–I-518 (2001)
Google Scholar
Willson, R.G.: Modeling and calibration of automated zoom lenses. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, UMI Order No. GAX94-19735 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Applied Sciences Bonn-Rhine-Sieg, Sankt Augustin, Germany
José Antonio Álvarez Ruiz, Paul Plöger & Gerhard K. Kraetzschmar

Authors

José Antonio Álvarez Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Paul Plöger
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard K. Kraetzschmar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science School, University of Science and Technology of China, 230027, Hefei, China
Xiaoping Chen
Department of Computer Science, The University of Texas at Austin, 78712-1757, Austin, TX, USA
Peter Stone
Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Mexico
Luis Enrique Sucar
Faculty of Mathematics and Natural Sciences, Institute for Artificial Intelligence and Cognitive Engineering, University of Groningen, 9747 AG, Groningen, The Netherlands
Tijn van der Zant

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruiz, J.A.Á., Plöger, P., Kraetzschmar, G.K. (2013). Active Scene Text Recognition for a Domestic Service Robot. In: Chen, X., Stone, P., Sucar, L.E., van der Zant, T. (eds) RoboCup 2012: Robot Soccer World Cup XVI. RoboCup 2012. Lecture Notes in Computer Science(), vol 7500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39250-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-39250-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39249-8
Online ISBN: 978-3-642-39250-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Active Scene Text Recognition for a Domestic Service Robot

Abstract

Chapter PDF

Similar content being viewed by others

Text Localization and Recognition in Images and Video

Spatially Prioritized and Persistent Text Detection and Decoding

Vision Transformer for Fast and Efficient Scene Text Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Active Scene Text Recognition for a Domestic Service Robot

Abstract

Chapter PDF

Similar content being viewed by others

Text Localization and Recognition in Images and Video

Spatially Prioritized and Persistent Text Detection and Decoding

Vision Transformer for Fast and Efficient Scene Text Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation