Integrating Data- and Model-Driven Analysis of RGB-D Images

  • Włodzimierz Kasprzak
  • Rafał Pietruch
  • Konrad Bojar
  • Artur Wilkowski
  • Tomasz Kornuta
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 323)


There is a growing use of RGB-D sensors in vision-based robot perception. A reliable 3D object recognition requires the integration of image-driven and model-based analysis. Only then the low-level image-like representation can be successfully transformed into a symbolic description with equivalent semantics, considered by the ontology-level representation of an autonomous robot system. An RGB-D image analysis approach is proposed that consists of a data-driven hypothesis generation step and a generic model-based object recognition step. Initially point clusters are created assuming to represent 3D object hypotheses. In parallel, 3D surface patches are estimated, 2D image textures and shapes are classified, building multi-modal image segmentation data. In the model-driven step, a built-in knowledge about basic solids, shapes and textures is used to verify the point clusters in terms of meaningful volume-like aggregates, and to create (or to recognize) generic 3D object models.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Marton, Z.-C., Pangercic, D., Blodow, N., Beetz, M.: Combined 2D–3D categorization and classification for multimodal perception systems. The International Journal of Robotics Research 30(11), 1378–1402 (2011)CrossRefGoogle Scholar
  2. 2.
    Waibel, M., Beetz, M., Civera, J., d’Andrea, R., Elfring, J., Galvez-Lopez, D., Haussermann, K., et al.: A World Wide Web for Robots. IEEE Robotics & Automation Magazine 18(2), 69–82 (2011)CrossRefGoogle Scholar
  3. 3.
    Mörwald, T., Prankl, J., Richtsfeld, A., Zillich, M., Vincze, M.: Blort-the blocks world robotic vision toolbox. In: Best Practice in 3D Perception and Modeling for Mobile Manipulation, ICRA Workshop (2010)Google Scholar
  4. 4.
    Collet, A., Martinez, M., Srinivasa, S.S.: The MOPED framework: Object recognition and pose estimation for manipulation. International Journal of Robotics Research 30(10), 1284–1306 (2011)CrossRefGoogle Scholar
  5. 5.
    Stefańczyk, M., Kasprzak, W.: Multimodal segmentation of dense depth maps and associated color information. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2012. LNCS, vol. 7594, pp. 626–632. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  7. 7.
    Kasprzak, W., Kornuta, T., Zieliński, C.: A virtual receptor in a robot control framework. In: Szewczyk, R., Zieliński, C., Kaliczyńska, M. (eds.) Recent Advances in Automation, Robotics and Measuring Techniques. AISC, vol. 267, pp. 399–408. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Niemann, H., Sagerer, G., Schroder, S., Kummert, F.: ERNEST: A semantic network system for pattern understanding. IEEE Trans PAMI 12, 883–905 (1990)CrossRefGoogle Scholar
  10. 10.
    Kasprzak, W.: A Linguistic Approach to 3-D Object Recognition. Computers & Graphics 11(4), 427–443 (1987)CrossRefGoogle Scholar
  11. 11.
    Izadi, S., et al.: Kinectfusion: Real-time 3d reconstruction and interaction using a moving depth camera. In: 24th ACM Symposium on User Interface Software and Technology (UIST 2011), New York, NY, pp. 559–568 (2011)Google Scholar
  12. 12.
    Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the rgb-d slam system. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1691–1696 (May 2012)Google Scholar
  13. 13.
    Dryanovski, I., Valenti, R., Xiao, J.: Fast visual odometry and mapping from rgb-d data. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 2305–2310 (May 2013)Google Scholar
  14. 14.
    Henry, P., Krainin, M., Herbst, E., Ren, X.-F., Fox, D.: RGB-D Mapping. Using Kinect-style depth cameras for dense 3D modeling of indoor environments. International Journal of Robotics Research 31(5), 647–663 (2012)CrossRefGoogle Scholar
  15. 15.
    Whelan, T., Johannsson, H., Kaess, M., Leonard, J., McDonald, J.: Robust real-time visual odometry for dense rgb-d mapping. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 5724–5731 (May 2013)Google Scholar
  16. 16.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  17. 17.
    Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Transactions on Systems, Man and Cybernetics 3(6), 610–621 (1973)CrossRefGoogle Scholar
  18. 18.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  19. 19.
    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and its application to boosting. J. Comp. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Pătrăucean, V., Gurdjos, P., von Gioi, R.G.: A Parameterless Line Segment and Elliptical Arc Detector with Enhanced Ellipse Fitting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 572–585. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Wenzel, S., Förstner, W.: Finding Poly-Curves of Straight Line and Ellipse Segments in Images. Photogrammetrie - Fernerkundung - Geoinformation 2013(4), 297–308 (2013)CrossRefGoogle Scholar
  22. 22.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Applications (VISAPP), pp. 331–340 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Włodzimierz Kasprzak
    • 1
  • Rafał Pietruch
    • 2
  • Konrad Bojar
    • 2
  • Artur Wilkowski
    • 2
  • Tomasz Kornuta
    • 1
  1. 1.Institute of Control and Computation EngineeringWarsaw University of TechnologyWarsawPoland
  2. 2.Industrial Research Institute for Automation and MeasurementsWarsawPoland

Personalised recommendations