Active Scene Text Recognition for a Domestic Service Robot

  • José Antonio Álvarez Ruiz
  • Paul Plöger
  • Gerhard K. Kraetzschmar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7500)


We developed a scene text recognition system with active vision capabilities, namely: auto-focus, adaptive aperture control and auto-zoom. Our localization system is able to delimit text regions in images with complex backgrounds, and is based on an attentional cascade, asymmetric adaboost, decision trees and Gaussian mixture models. We think that text could become a valuable source of semantic information for robots, and we aim to raise interest in it within the robotics community. Moreover, thanks to the robot’s pan-tilt-zoom camera and to the active vision behaviors, the robot can use its affordances to overcome hindrances to the performance of the perceptual task. Detrimental conditions, such as poor illumination, blur, low resolution, etc. are very hard to deal with once an image has been captured and can often be prevented. We evaluated the localization algorithm on a public dataset and one of our own with encouraging results. Furthermore, we offer an interesting experiment in active vision, which makes us consider that active sensing in general should be considered early on when addressing complex perceptual problems in embodied agents.


Scene text recognition active vision domestic robot pan-tilt auto-zoom auto-focus adaptive aperture control 


  1. 1.
    Álvarez Ruiz, J.A.: Learning to Discriminate Text from Synthetic Data. In: Röfer, T., Mayer, N.M., Savage, J., Saranlı, U. (eds.) RoboCup 2011. LNCS, vol. 7416, pp. 270–281. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees, 1st edn. Chapman and Hall/CRC (January 1984)Google Scholar
  3. 3.
    Breuer, T., Giorgana Macedo, G., Hartanto, R., Hochgeschwender, N., Holz, D., Hegger, F., Jin, Z., Müller, C., Paulus, J., Reckhaus, M., Álvarez Ruiz, J.A., Plöger, P., Kraetzschmar, G.: Johnny: An autonomous service robot for domestic environments. Journal of Intelligent & Robotic Systems 66, 245–272 (2012), 10.1007/s10846-011-9608-yGoogle Scholar
  4. 4.
    Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, June 27-July 2, vol. 2, pp. II-366–II-373 (2004)Google Scholar
  5. 5.
    Dalal, N.: Finding people in images and videos. PhD thesis, Institut National Polytechnique de Grenoble (July 2006)Google Scholar
  6. 6.
    Dewey, J.: The reflex arc concept in psychology. Psychological Review 3(4), 357 (1896)CrossRefGoogle Scholar
  7. 7.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2963–2970 (June 2010)Google Scholar
  8. 8.
    Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, University of Washington, Department of Statistic (2006) (revised 2009)Google Scholar
  9. 9.
    Huber, R., Nowak, C., Spatzek, B., Schreiber, D.: Adaptive aperture control for image enhancement. In: 2003 IEEE International Workshop on Computer Architectures for Machine Perception, pp. 7–11 (May 2003)Google Scholar
  10. 10.
    Iwatsuka, K., Yamamoto, K., Kato, K.: Development of a guide dog system for the blind people with character recognition ability. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 1, pp. 453–456 (August 2004)Google Scholar
  11. 11.
    Krotkov, E.: Focusing. International Journal of Computer Vision 1, 223–237 (1987)CrossRefGoogle Scholar
  12. 12.
    Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., Ashida, K., Nagai, H., Okamoto, M., Yamamoto, H., et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. International Journal on Document Analysis and Recognition 7(2), 105–122 (2005)CrossRefGoogle Scholar
  13. 13.
    Micheloni, C., Foresti, G.: Active tuning of intrinsic camera parameters. IEEE Transactions on Automation Science and Engineering 6(4), 577–587 (2009)CrossRefGoogle Scholar
  14. 14.
    Mirmehdi, M., Clark, P.: Extracting low resolution text with an active camera for OCR. In: IX Spanish Symposium on Pattern Recognition and Image Processing, pp. 43–48 (2001)Google Scholar
  15. 15.
    Pan, Y.-F., Hou, X., Liu, C.-L.: A Robust System to Detect and Localize Texts in Natural Scene Images. In: The Eighth IAPR International Workshop on Document Analysis Systems, pp. 35–42 (September 2008)Google Scholar
  16. 16.
    Pan, Y.-F., Hou, X., Liu, C.-L.: Text Localization in Natural Scene Images Based on Conditional Random Field. In: 10th International Conference on Document Analysis and Recognition, pp. 6–10 (July 2009)Google Scholar
  17. 17.
    Posner, I., Corke, P., Newman, P.: Using text-spotting to query the world. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3181–3186 (October 2010)Google Scholar
  18. 18.
    Shiratori, H., Goto, H., Kobayashi, H.: An efficient text capture method for moving robots using dct feature and text tracking. In: International Conference on Pattern Recognition, vol. 2, pp. 1050–1053 (2006)Google Scholar
  19. 19.
    Tanaka, M., Goto, H.: Autonomous text capturing robot using improved dct feature and text tracking. In: International Conference on Document Analysis and Recognition, vol. 2, pp. 1178–1182 (2007)Google Scholar
  20. 20.
    Therneau, T., Atkinson, E.: An introduction to recursive partitioning using the RPART routines. Technical Report, Technical Report 61 (1997),
  21. 21.
    Viola, P.: Fast and robust classification using asymmetric adaboost and a detector cascade. In: Advances in Neural Information Processing Systems (2002)Google Scholar
  22. 22.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, pp. I-511–I-518 (2001)Google Scholar
  23. 23.
    Willson, R.G.: Modeling and calibration of automated zoom lenses. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, UMI Order No. GAX94-19735 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • José Antonio Álvarez Ruiz
    • 1
  • Paul Plöger
    • 1
  • Gerhard K. Kraetzschmar
    • 1
  1. 1.Computer Science DepartmentUniversity of Applied Sciences Bonn-Rhine-SiegSankt AugustinGermany

Personalised recommendations