Physical Primitive Decomposition

  • Zhijian Liu
  • William T. Freeman
  • Joshua B. Tenenbaum
  • Jiajun WuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11216)


Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Developing such a distributed, physical, interpretable representation of objects will facilitate intelligent agents to better explore and interact with the world. In this paper, we study physical primitive decomposition—understanding an object through its components, each with physical and geometric attributes. As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object’s appearance and its behaviors in physical events. Our model performs well on block towers and tools in both synthetic and real scenarios; we also demonstrate that visual and physical observations often provide complementary signals. We further present ablation and behavioral studies to better understand our model and contrast it with human performance.



This work is supported by NSF #1231216, ONR MURI N00014-16-1-2007, Toyota Research Institute, and Facebook.

Supplementary material

474200_1_En_1_MOESM1_ESM.pdf (120 kb)
Supplementary material 1 (pdf 120 KB)


  1. 1.
    Agrawal, P., Nair, A., Abbeel, P., Malik, J., Levine, S.: Learning to poke by poking: experiential learning of intuitive physics. In: NIPS (2016)Google Scholar
  2. 2.
    Attene, M., Falcidieno, B., Spagnuolo, M.: Hierarchical mesh segmentation based on fitting primitives. Vis. Comput. 22(3), 181–193 (2006)CrossRefGoogle Scholar
  3. 3.
    Battaglia, P.W., Hamrick, J.B., Tenenbaum, J.B.: Simulation as an engine of physical scene understanding. PNAS 110(45), 18327–18332 (2013)CrossRefGoogle Scholar
  4. 4.
    Bell, S., Upchurch, P., Snavely, N., Bala, K.: Material recognition in the wild with the materials in context database. In: CVPR (2015)Google Scholar
  5. 5.
    Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)CrossRefGoogle Scholar
  6. 6.
    Binford, T.O.: Visual perception by computer. In: IEEE Conference on Systems and Control (1971)Google Scholar
  7. 7.
    Brubaker, M.A., Fleet, D.J., Hertzmann, A.: Physics-based person tracking using the anthropomorphic walker. IJCV 87(1–2), 140 (2010)CrossRefGoogle Scholar
  8. 8.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv:1512.03012 (2015)
  9. 9.
    Coumans, E.: Bullet physics engine. Open Source Software (2010).
  10. 10.
    Grabner, H., Gall, J., Van Gool, L.: What makes a chair a chair? In: CVPR (2011)Google Scholar
  11. 11.
    Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 482–496. Springer, Heidelberg (2010). Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2015)Google Scholar
  13. 13.
    van den Hengel, A., et al.: Part-based modelling of compound scenes from images. In: CVPR (2015)Google Scholar
  14. 14.
    Huang, H., Kalogerakis, E., Marlin, B.: Analysis and synthesis of 3D shape families via deep-learned generative models of surfaces. CGF 34(5), 25–38 (2015)Google Scholar
  15. 15.
    Huang, Q., Wang, H., Koltun, V.: Single-view reconstruction via joint analysis of image and shape collections. ACM TOG 34(4), 87 (2015)Google Scholar
  16. 16.
    Igarashi, T., Matsuoka, S., Tanaka, H.: Teddy: a sketching interface for 3D freeform design. In: SIGGRAPH (1999)Google Scholar
  17. 17.
    Gibson, J.J.: The theory of affordances. In: The Ecological Approach to Visual Perception, chap. 8 (1977)Google Scholar
  18. 18.
    Jia, Z., Gallagher, A., Saxena, A., Chen, T.: 3D reasoning from blocks to stability. IEEE TPAMI 37(5), 905–918 (2015)CrossRefGoogle Scholar
  19. 19.
    Kalogerakis, E., Chaudhuri, S., Koller, D., Koltun, V.: A probabilistic model for component-based shape synthesis. ACM TOG 31(4), 55 (2012)CrossRefGoogle Scholar
  20. 20.
    Kim, M., et al.: Data-driven physics for human soft tissue animation. In: SIGGRAPH (2017)CrossRefGoogle Scholar
  21. 21.
    Kim, V.G., Li, W., Mitra, N.J., Chaudhuri, S., DiVerdi, S., Funkhouser, T.: Learning part-based templates from large collections of 3D shapes. ACM TOG 32(4), 70 (2013)zbMATHGoogle Scholar
  22. 22.
    Koppula, H.S., Saxena, A.: Physically grounded spatio-temporal object affordances. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 831–847. Springer, Cham (2014). Scholar
  23. 23.
    Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: ICML (2016)Google Scholar
  24. 24.
    Levenberg, K.: A method for the solution of certain non-linear problems in least squares. Q. Appl. Math. 2(2), 164–168 (1944)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Li, J., Xu, K., Chaudhuri, S., Yumer, E., Zhang, H., Guibas, L.: GRASS: generative recursive autoencoders for shape structures. In: SIGGRAPH (2017)Google Scholar
  26. 26.
    Li, W., Leonardis, A., Fritz, M.: Visual stability prediction for robotic manipulation. In: ICRA (2017)Google Scholar
  27. 27.
    Li, Y., Wu, X., Chrysathou, Y., Sharf, A., Cohen-Or, D., Mitra, N.J.: GlobFit: consistently fitting primitives by discovering global relations. ACM TOG 30(4), 52 (2011)Google Scholar
  28. 28.
    Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11(2), 431–441 (1963)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Miller, A.T., Knoop, S., Christensen, H.I., Allen, P.K.: Automatic grasp planning using shape primitives. In: ICRA (2003)Google Scholar
  30. 30.
    Mottaghi, R., Rastegari, M., Gupta, A., Farhadi, A.: “What happens if...” learning to predict the effect of forces in images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 269–285. Springer, Cham (2016). Scholar
  31. 31.
    Nooruddin, F.S., Turk, G.: Simplification and repair of polygonal models using volumetric techniques. IEEE TVCG 9(2), 191–205 (2003)Google Scholar
  32. 32.
    Pham, T.H., Kheddar, A., Qammaz, A., Argyros, A.A.: Towards force sensing from vision: observing hand-object interactions to infer manipulation forces. In: CVPR (2015)Google Scholar
  33. 33.
    Rivlin, E., Dickinson, S.J., Rosenfeld, A.: Recognition by functional parts. CVIU 62(2), 164–176 (1995)zbMATHGoogle Scholar
  34. 34.
    Roberts, L.G.: Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology (1963)Google Scholar
  35. 35.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. IJCV 40(2), 99–121 (2000)CrossRefGoogle Scholar
  36. 36.
    Savva, M., Chang, A.X., Hanrahan, P.: Semantically-enriched 3D models for common-sense knowledge. In: CVPR Workshop (2015)Google Scholar
  37. 37.
    Schnabel, R., Degener, P., Klein, R.: Completion and reconstruction with primitive shapes. CGF 28(2), 503–512 (2009)Google Scholar
  38. 38.
    Soo Park, H., Shi, J., et al.: Force from motion: decoding physical sensation in a first person video. In: CVPR (2016)Google Scholar
  39. 39.
    Tenenbaum, J.B.: Functional parts. In: CogSci (1994)Google Scholar
  40. 40.
    Tulsiani, S., Su, H., Guibas, L.J., Efros, A.A., Malik, J.: Learning shape abstractions by assembling volumetric primitives. In: CVPR (2017)Google Scholar
  41. 41.
    Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: ICML (2016)Google Scholar
  42. 42.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. MLJ 8(3–4), 229–256 (1992)zbMATHGoogle Scholar
  43. 43.
    Wu, J., Lim, J.J., Zhang, H., Tenenbaum, J.B., Freeman, W.T.: Physics 101: Learning physical object properties from unlabeled videos. In: BMVC (2016)Google Scholar
  44. 44.
    Wu, J., Lu, E., Kohli, P., Freeman, W.T., Tenenbaum, J.B.: Learning to see physics via visual de-animation. In: NIPS (2017)Google Scholar
  45. 45.
    Wu, J., Yildirim, I., Lim, J.J., Freeman, W.T., Tenenbaum, J.B.: Galileo: perceiving physical object properties by integrating a physics engine with deep learning. In: NIPS (2015)Google Scholar
  46. 46.
    Yao, B., Ma, J., Fei-Fei, L.: Discovering object functionality. In: ICCV (2013)Google Scholar
  47. 47.
    Yumer, M.E., Kara, L.B.: Co-abstraction of shape collections. ACM TOG 31(6), 166 (2012)CrossRefGoogle Scholar
  48. 48.
    Zhao, Y., Zhu, S.C.: Scene parsing by integrating function, geometry and appearance models. In: CVPR (2013)Google Scholar
  49. 49.
    Zheng, D., Luo, V., Wu, J., Tenenbaum, J.B.: Unsupervised learning of latent physical properties using perception-prediction networks. In: UAI (2018)Google Scholar
  50. 50.
    Zheng, Y., Cohen-Or, D., Averkiou, M., Mitra, N.J.: Recurring part arrangements in shape collections. CGF 33(2), 115–124 (2014)Google Scholar
  51. 51.
    Zhu, Y., Zhao, Y., Zhu, S.C.: Understanding tools: Task-oriented object modeling, learning and recognition. In: CVPR (2015)Google Scholar
  52. 52.
    Zhu, Y., Fathi, A., Fei-Fei, L.: Reasoning about object affordances in a knowledge base representation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 408–424. Springer, Cham (2014). Scholar
  53. 53.
    Zou, C., Yumer, E., Yang, J., Ceylan, D., Hoiem, D.: 3D-PRNN: generating shape primitives with recurrent neural networks. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Zhijian Liu
    • 1
  • William T. Freeman
    • 1
    • 2
  • Joshua B. Tenenbaum
    • 1
  • Jiajun Wu
    • 1
    Email author
  1. 1.Massachusetts Institute of TechnologyCambridgeUSA
  2. 2.Google ResearchCambridgeUSA

Personalised recommendations