Journal of Intelligent & Robotic Systems

, Volume 68, Issue 2, pp 185–208 | Cite as

Evaluation of Three Vision Based Object Perception Methods for a Mobile Robot

  • Arnau Ramisa
  • David Aldavert
  • Shrihari Vasudevan
  • Ricardo Toledo
  • Ramon Lopez de Mantaras


This paper addresses visual object perception applied to mobile robotics. Being able to perceive household objects in unstructured environments is a key capability in order to make robots suitable to perform complex tasks in home environments. However, finding a solution for this task is daunting: it requires the ability to handle the variability in image formation in a moving camera with tight time constraints. The paper brings to attention some of the issues with applying three state of the art object recognition and detection methods in a mobile robotics scenario, and proposes methods to deal with windowing/segmentation. Thus, this work aims at evaluating the state-of-the-art in object perception in an attempt to develop a lightweight solution for mobile robotics use/research in typical indoor settings.


Mobile robots Object recognition 

Mathematics Subject Classifications (2010)

68T40 68T45 68T10 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aldavert, D., Ramisa, A., Toledo, R., Mantaras, R.: Fast and robust object segmentation with the integral linear classifier. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1046–1053 (2010). doi: 10.1109/CVPR.2010.5540098
  2. 2.
    Bianchi, R., Ramisa, A., Mantaras, R.: Automatic Selection of Object Recognition Methods Using Reinforcement Learning Recent Advances in Machine Learning (dedicated to the memory of Prof. Ryszard S. Michalski). Springer Studies in Computational Inteligence, vol. 262, pp. 421–439 (2010)Google Scholar
  3. 3.
    Ramisa, A.: Localization and object recognition for mobile robots. Ph.D. thesis, Autonomous University of Barcelona (2009)Google Scholar
  4. 4.
    Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern. Anal. Mach. Intell. 6, 679–698 (1986)CrossRefGoogle Scholar
  5. 5.
    Collet, A., Berenson, D., Srinivasa, S., Ferguson, D.: Object recognition and full pose registration from a single image for robotic manipulation. In: IEEE International Conference on Robotics and Automation, pp. 48–55 (2009)Google Scholar
  6. 6.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22 (2004)Google Scholar
  7. 7.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)Google Scholar
  8. 8.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)zbMATHGoogle Scholar
  9. 9.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2007 (VOC2007) results. (2007)
  10. 10.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: Workshop on Generative-Model Based Vision. IEEE Computer Society (2004)Google Scholar
  11. 11.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  12. 12.
    Fulkerson, B., Vedaldi, A., Soatto, S.: Localizing objects with smart dictionaries. In: European Conference on Computer Vision, pp. 179–192 (2008)Google Scholar
  13. 13.
    Galindo, C., Saffiotti, A., Coradeschi, S., Buschka, P., Fernandez-Madrigal, J., González, J.: Multi-hierarchical semantic maps for mobile robotics. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, pp. 2278–2283 (2005)Google Scholar
  14. 14.
    Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: International Conference on Computer Vision, pp. 1458–1465 (2005)Google Scholar
  15. 15.
    Huang, C., Ai, H., Wu, B., Lao, S.: Boosting nested cascade detector for multi-view face detection. In: International Conference on Pattern Recognition, pp. 415–418 (2004)Google Scholar
  16. 16.
    Jensfelt, P., Ekvall, S., Kragic, D., Aarno, D.: Augmenting slam with object detection in a service robot framework. In: The 15th IEEE International Symposium on Robot and Human Interactive Communication, 2006, ROMAN 2006, pp. 741–746 (2006)Google Scholar
  17. 17.
    Jones, M., Viola, P.: Fast multi-view face detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2003)Google Scholar
  18. 18.
    Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  19. 19.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77(1–3), 259–289 (2008)CrossRefGoogle Scholar
  20. 20.
    Lienhart, R., Kuranov, E., Pisarevsky, V.: Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In: DAGM 25th Pattern Recognition Symposium, pp. 297–304 (2003)Google Scholar
  21. 21.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, vol. 2, p. 1150 (1999)Google Scholar
  22. 22.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  23. 23.
    Mansur, A., Kuno, Y.: Specific and class object recognition for service robots through autonomous and interactive methods. IEICE - Trans. Inf. Syst. E91-D(6), 1793–1803 (2008). doi: 10.1093/ietisy/e91-d.6.1793 CrossRefGoogle Scholar
  24. 24.
    Martinez Mozos, O., Triebel, R., Jensfelt, P., Rottmann, A., Burgard, W.: Supervised semantic labeling of places using information extracted from sensor data. Robot. Auton. Syst. 55(5), 391–402 (2007)CrossRefGoogle Scholar
  25. 25.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)CrossRefGoogle Scholar
  26. 26.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1/2), 43–72 (2005)CrossRefGoogle Scholar
  27. 27.
    Moosmann, F., Nowak, E., Jurie, F.: Randomized clustering forests for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1632–1646 (2008)CrossRefGoogle Scholar
  28. 28.
    Muja, M., Lowe, D.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Applications (VISAPP’09) (2009)Google Scholar
  29. 29.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168 (2006)Google Scholar
  30. 30.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)Google Scholar
  31. 31.
    Pinto, N., Cox, D.D., Dicarlo, J.J.: Why is real-world visual object recognition hard? PLoS Comput. Biol. 4(1), e27+ (2008). doi: 10.1371/journal.pcbi.0040027 MathSciNetCrossRefGoogle Scholar
  32. 32.
    Porikli, F.: Integral histogram: a fast way to extract histograms in cartesian spaces. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 829–836 (2005)Google Scholar
  33. 33.
    Ramisa, A., Vasudevan, S., Scaramuzza, D., de Mántaras, R.L., Siegwart, R.: A tale of two object recognition methods for mobile robots. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS. Lecture Notes in Computer Science, vol. 5008, pp. 353–362.
  34. 34.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. Int. Conf. Comput. Vis. 2, 1470–1477 (2003)Google Scholar
  35. 35.
    Torralba, A., Murphy, K., Freeman, W.: Sharing visual features for multiclass and multiview object detection. IEEE. Trans. Pattern Anal. Mach. Intell. 29, 854–869 (2007)CrossRefGoogle Scholar
  36. 36.
    Vasudevan, S., Gachter, S., Nguyen, V., Siegwart, R.: Cognitive maps for mobile robots - an object based approach. Robot. Auton. Syst. 55(5), 359–371 (2007). From Sensors to Human Spatial ConceptsCrossRefGoogle Scholar
  37. 37.
    Vazquez, E., van de Weijer, J., Baldrich, R.: Image segmentation in the presence of shadows and highlights. In: European Conference on Computer Vision, vol. 4, pp. 1–14 (2008)Google Scholar
  38. 38.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 511 (2001)Google Scholar
  39. 39.
    Viola, P., Jones, M., Snow, D.: Detecting pedestrians using patterns of motion and appearance. Int. J. Comput. Vis. 63, 153161 (2005)CrossRefGoogle Scholar
  40. 40.
    Zhang, Z., Li, M., Li, S.Z., Zhang, H.: Multi-view face detection with floatboost. In: IEEE Workshop on Applications of Computer Vision, p. 184 (2002). doi:10.1109/ACV.2002.1182179

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • Arnau Ramisa
    • 1
  • David Aldavert
    • 2
  • Shrihari Vasudevan
    • 3
  • Ricardo Toledo
    • 2
  • Ramon Lopez de Mantaras
    • 4
  1. 1.IRI UPC-CSICBarcelonaSpain
  2. 2.Computer Vision Center, Dept. Ciències de la ComputacióUniversitat Autònoma de BarcelonaBellaterraSpain
  3. 3.Australian Center for Field RoboticsThe University of SydneySydneyAustralia
  4. 4.IIIA-CSICBellaterraSpain

Personalised recommendations