Visual Object Categorization Based on Hierarchical Shape Motifs Learned From Noisy Point Cloud Decompositions

  • Christian A. MuellerEmail author
  • Andreas Birk


Object shape is a key cue that contributes to the semantic understanding of objects. In this work we focus on the categorization of real-world object point clouds to particular shape types. Therein surface description and representation of object shape structure have significant influence on shape categorization accuracy, when dealing with real-world scenes featuring noisy, partial and occluded object observations. An unsupervised hierarchical learning procedure is utilized here to symbolically describe surface characteristics on multiple semantic levels. Furthermore, a constellation model is proposed that hierarchically decomposes objects. The decompositions are described as constellations of symbols (shape motifs) in a gradual order, hence reflecting shape structure from local to global, i.e., from parts over groups of parts to entire objects. The combination of this multi-level description of surfaces and the hierarchical decomposition of shapes leads to a representation which allows to conceptualize shapes. An object discrimination has been observed in experiments with seven categories featuring instances with sensor noise, occlusions as well as inter-category and intra-category similarities. Experiments include the evaluation of the proposed description and shape decomposition approach, and comparisons to Fast Point Feature Histograms, a Vocabulary Tree and a neural network-based Deep Learning method. Furthermore, experiments are conducted with alternative datasets which analyze the generalization capability of the proposed approach.


Object shape categorization Shape reasoning Shape motifs Hierarchical shape representation Shape decomposition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The research leading to the results presented here has received funding from the European Community’s Horizon 2020 Framework Programme (H2020-EU.3.2.) within the project (ref.: 635491) “Effective dexterous ROV operations in presence of communication latencies (DexROV)”.


  1. 1.
    Abelha, P., Guerin, F., Schoeler, M.: A model-based approach to finding substitute tools in 3d vision data. In: International Conference on Robotics and Automation (2016)Google Scholar
  2. 2.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. In: Pattern Analysis and Machine Intelligence (2012)Google Scholar
  3. 3.
    Anand, A., Koppula, H.S., Joachims, T., Saxena, A.: Contextually guided semantic labeling and search for three-dimensional point clouds. Int. J. Robot. Res. 32(1), 19–34 (2013)CrossRefGoogle Scholar
  4. 4.
    Asif, U., Bennamoun, M., Sohel, F.: Efficient rgb-d object categorization using cascaded ensembles of randomized decision trees. In: International Conference on Robotics and Automation (2015)Google Scholar
  5. 5.
    Asif, U., Bennamoun, M., Sohel, F.: Unsupervised segmentation of unknown objects in complex environments. Auton. Robot. 40(5), 805–829 (2016)CrossRefGoogle Scholar
  6. 6.
    Barry, A.M.S.: Visual Intelligence: Perception, Image, and Manipulation in Visual Communication. State University of New York Press, New York (1997)Google Scholar
  7. 7.
    Biasotti, S., De Floriani, L., Falcidieno, B., Frosini, P., Giorgi, D., Landi, C., Papaleo, L., Spagnuolo, M.: Describing shapes by geometrical-topological properties of real functions. ACM Comput. Surv. 40(4), 12:1–12:87 (2008)CrossRefGoogle Scholar
  8. 8.
    Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94, 115–117 (1987)CrossRefGoogle Scholar
  9. 9.
    Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017)CrossRefGoogle Scholar
  10. 10.
    Cao, Z., Huang, Q., Karthik, R.: 3D object classification via spherical projections. In: International Conference on 3D Vision (2017)Google Scholar
  11. 11.
    Cerella, J.: The pigeon’s analysis of pictures. Pattern Recognit. 12, 1–6 (1980)CrossRefGoogle Scholar
  12. 12.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence 24(5), 603–619 (2002)CrossRefGoogle Scholar
  13. 13.
    DiCarlo, J.J., Cox, D.D.: Untangling invariant object recognition. Trends in Cognitive Sciences 11, 333–341 (2007)CrossRefGoogle Scholar
  14. 14.
    DiCarlo, J.J., Zoccolan, D., Rust, N.C.: How does the brain solve visual object recognition? Neuron 73 (3), 415–34 (2012)CrossRefGoogle Scholar
  15. 15.
    Edelman, S.: Representation is representation of similarities. Behav. Brain Sci. 21(4), 449–467 (1998)CrossRefGoogle Scholar
  16. 16.
    Eppner, C., Brock, O.: Grasping unknown objects by exploiting shape adaptability and environmental constraints. In: International Conference on Intelligent Robots and Systems (2013)Google Scholar
  17. 17.
    Fidler, S., Boben, M., Leonardis, A.: Learning hierarchical compositional representations of object structure. In: Dickinson, S, Leonardis, A., Schiele, B., Tarr, M.J. (eds.) Object Categorization: Computer and Human Vision Perspectives. Cambridge University Press, Cambridge (2009)Google Scholar
  18. 18.
    Frintrop, S., Rome, E., Christensen, H.I.: Computational visual attention systems and their cognitive foundations. ACM Trans. Appl. Percept. 7, 1–39 (2010)CrossRefGoogle Scholar
  19. 19.
    Gärdenfors, P.: Conceptual Spaces: the Geometry of Thought. MIT Press, Cambridge (2000)CrossRefGoogle Scholar
  20. 20.
    Geman, S., Geman, D.: Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. In: Pattern Analysis and Machine Intelligence (1984)Google Scholar
  21. 21.
    Hoffman, D.D., Singh, M.: Salience of visual parts. Cognition 63(1), 29–78 (1997)CrossRefGoogle Scholar
  22. 22.
    Huang, C.J., Liao, W.C.: Application of probabilistic neural networks to the class prediction of leukemia and embryonal tumor of central nervous system. Neural Process Lett 19(3), 211–226 (2004)CrossRefGoogle Scholar
  23. 23.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)CrossRefGoogle Scholar
  24. 24.
    Jerripothula, K.R., Cai, J., Lu, J., Yuan, J.: Object co-skeletonization with co-segmentation. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  25. 25.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093(2014)
  26. 26.
    Jonschkowski, R., Eppner, C., Höfer, S., Martin, R.M., Brock, O.: Probabilistic multi-class segmentation for the amazon picking challenge. In: International Conference on Intelligent Robots and Systems (2016)Google Scholar
  27. 27.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: International Conference on Computer Vision (2005)Google Scholar
  28. 28.
    Kehoe, B., Patil, S., Abbeel, P., Goldberg, K.: A survey of research on cloud robotics and automation. IEEE Trans. Autom. Sci. Eng. 12(2), 398–409 (2015)CrossRefGoogle Scholar
  29. 29.
    Kindermann, R., Snell, J.L.: Markov Random Fields and Their Applications. American Mathematical Society (1980)Google Scholar
  30. 30.
    Kirkpatrick-Steger, K., Wasserman, E.A., Biederman, I.: Effects of geon deletion, scrambling, and movement on picture recognition in pigeons. J. Exp. Psychol. Anim. Behav. Process. 24, 34–46 (1998)CrossRefGoogle Scholar
  31. 31.
    Kriegeskorte, N., Kievit, R.A.: Representational geometry: integrating cognition, computation, and the brain. Trends Cogn. Sci. 17(8), 401–412 (2013)CrossRefGoogle Scholar
  32. 32.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp 1097–1105 (2012)Google Scholar
  33. 33.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: International Conference on Robotics and Automation (2011)Google Scholar
  34. 34.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: European Conference on Computer Vision Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  35. 35.
    Leonardis, A., Fidler, S.: Learning hierarchical representations of object categories for robot vision. In: International Symposium on Robotics Research (2011)Google Scholar
  36. 36.
    van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  37. 37.
    McCloskey, M.E., Glucksberg, S.: Natural categories: Well defined or fuzzy sets? Mem. Cognit. 6(4), 462–472 (1978)CrossRefGoogle Scholar
  38. 38.
    Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Conference on Computer Vision and Pattern Recognition, pp 5425–5434 (2017)Google Scholar
  39. 39.
    Mueller, C.A., Birk, A.: Hierarchical graph-based discovery of non-primitive-shaped objects in unstructured environments. In: International Conference on Robotics and Automation (2016)Google Scholar
  40. 40.
    Mueller, C.A., Pathak, K., Birk, A.: Object recognition in rgbd images of cluttered environments using graph-based categorization with unsupervised learning of shape parts. In: International Conference on Intelligent Robots and Systems (2013)Google Scholar
  41. 41.
    Mueller, C.A., Pathak, K., Birk, A.: Object shape categorization in rgbd images using hierarchical graph constellation models based on unsupervisedly learned shape parts described by a set of shape specificity levels. In: International Conference on Intelligent Robots and Systems (2014)Google Scholar
  42. 42.
    Ozay, M., Aktas, U.R., Wyatt, J.L., Leonardis, A.: Compositional hierarchical representation of shape manifolds for classification of non-manifold shapes. In: International Conference on Computer Vision, pp 1662–1670 (2015)Google Scholar
  43. 43.
    Palmeri, TJ., Gauthier I.: Visual object understanding. Nat. Rev. Neurosci. 5(4), 291–303 (2004)CrossRefGoogle Scholar
  44. 44.
    Papon, J., Abramov, A., Schoeler, M., Wörgötter, F.: Voxel cloud connectivity segmentation—supervoxels for point clouds. In: Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  45. 45.
    Prasad, M., Knopp, J., Van Gool, L.: Class-specific 3D localization using constellations of object parts. In: British Machine Vision Conference (2011)Google Scholar
  46. 46.
    Rahtu, E., Kannala, J., Salo, M., Heikkilä, J.: Segmenting salient objects from images and videos. In: European Conference on Computer Vision (2010)Google Scholar
  47. 47.
    Rama Fiorini, S., Gärdenfors, P., Abel, M.: Representing part–whole relations in conceptual spaces. Cogn. Process. 15(2), 127–142 (2014)CrossRefGoogle Scholar
  48. 48.
    Richtsfeld, A., Moerwald, T., Prankl, J., Zillich, M., Vincze, M.: Learning of perceptual grouping for object segmentation on rgb-d data. J. Vis. Commun. Image Represent. 25, 64–73 (2014)CrossRefGoogle Scholar
  49. 49.
    Richtsfeld, A., Morwald, T., Prankl, J., Zillich, M., Vincze, M.: Segmentation of unknown objects in indoor environments. In: International Conference on Intelligent Robots and Systems (2012)Google Scholar
  50. 50.
    Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999)CrossRefGoogle Scholar
  51. 51.
    Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3d object detection: a real time scalable approach. In: International Conference on Computer Vision (2013)Google Scholar
  52. 52.
    Rosch, E.H.: Natural categories. Cogn. Psychol. 4(3), 328–350 (1973)CrossRefGoogle Scholar
  53. 53.
    Rusu, R., Blodow, N., Beetz, M.: Fast Point Feature Histograms (FPFH) for 3D registration. In: International Conference on Robotics and Automation (2009)Google Scholar
  54. 54.
    Salti, S., Tombari, F., di Stefano, L.: Shot: unique signatures of histograms for surface and texture description. Comput. Vis. Image Underst. 125, 251–264 (2014)CrossRefGoogle Scholar
  55. 55.
    Shen, T., Zhu, S., Fang, T., Zhang, R., Quan, L.: Graph-based consistent matching for structure-from-motion. In: European Conference on Computer Vision (2016)Google Scholar
  56. 56.
    Shen, W., Zhao, K., Jiang, Y., Wang, Y., Bai, X., Yuille, A.: Deepskeleton: learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans. Image Process. 26 (11), 5298–5311 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  57. 57.
    Sloutsky Vladimir, M.: From perceptual categories to concepts: what develops? Cogn. Sci. 34(7), 1244–1286 (2010)CrossRefGoogle Scholar
  58. 58.
    Stoyanov, T., Vaskevicius, N., Mueller, C.A., Fromm, T., Krug, R., Tincani, V., Mojtahedzadeh, R., Kunaschk, S., Ernits, R.M., Canelhas, D.R., Bonilla, M., Schwertfeger, S., Bonini, M., Halfar, H., Pathak, K., Rohde, M., Fantoni, G., Bicchi, A., Birk, A., Lilienthal, A.J., Echelmeyer, W.: No more heavy lifting: Robotic solutions to the container unloading problem. IEEE Robot. Autom. Mag. 23(4), 94–106 (2016)CrossRefGoogle Scholar
  59. 59.
    Utans, J.: Learning in compositional hierarchies: Inducing the structure of objects from data. In: Advances in Neural Information Processing Systems 6, pp 285–292 (1993)Google Scholar
  60. 60.
    Winkler, J., Balint-Benczedi, F., Fromm, T., Mueller, C.A., Vaskevicius, N., Birk, A., Beetz, M.: Knowledge-enabled robotic agents for shelf replenishment in cluttered retail environments: (extended abstract). In: International Conference on Autonomous Agents and Multiagent Systems (2016)Google Scholar
  61. 61.
    Zaki, H.F.M., Shafait, F., Mian, A.: Convolutional hypercube pyramid for accurate rgb-d object category and instance recognition. In: International Conference on Robotics and Automation (2016)Google Scholar
  62. 62.
    Zenker, F., Gärdenfors, P.: Applications of Conceptual Spaces: the Case for Geometric Knowledge Representation. Synthese library. Studies in Epistemology, Logic, Methodology, and Philosophy of Science. Springer International Publishing, New York (2015)CrossRefGoogle Scholar
  63. 63.
    Zhang, Y., Sohn, K., Villegas, R., Pan, G., Lee, H.: Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  64. 64.
    Zmigrod, S., Hommel, B.: Feature integration across multimodal perception and action: a review. Multisens. Res. 26(1–2), 143–157 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Robotics Group, Computer Science & Electrical EngineeringJacobs University Bremen gGmbHBremenGermany

Personalised recommendations