International Journal of Computer Vision

, Volume 127, Issue 10, pp 1501–1526 | Cite as

Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images

  • Yuan GaoEmail author
  • Alan L. Yuille


Many man-made objects have intrinsic symmetries and often Manhattan structure. By assuming an orthographic or a weak perspective projection model, this paper addresses the estimation of 3D structures and camera projection using symmetry and/or Manhattan structure cues, for the two cases when the input is a single image or multiple images from the same category, e.g. multiple different cars from various viewpoints. More specifically, analysis on the single image case shows that Manhattan alone is sufficient to recover the camera projection and then the 3D structure can be reconstructed uniquely by exploiting symmetry. But Manhattan structure can be hard to observe from a single image due to occlusion. Hence, we extend to the multiple-image case which can also exploit symmetry but does not require Manhattan structure. We propose novel structure from motion methods for both rigid and non-rigid object deformations, which exploit symmetry and use multiple images from the same object category as input. We perform experiments on the Pascal3D+ dataset with either human labeled 2D keypoints or with 2D keypoints localized from a convolutional neural network. The results show that our methods which exploit symmetry significantly outperform the baseline methods.


Symmetry Manhattan Single image Symmetric rigid structure from motion Symmetric non-rigid structure from motion 



We would like to thank Ehsan Jahangiri, Cihang Xie, Weichao Qiu, Xuan Dong, Siyuan Qiao for giving feedbacks on the manuscript. This work was partially supported by ARO 62250-CS, ONR N00014-15-1-2356, and the NSF award CCF-1317376.

Supplementary material

11263_2019_1195_MOESM1_ESM.pdf (269 kb)
Supplementary material 1 (pdf 268 KB)


  1. Agudo, A., Agapito, L., Calvo, B., & Montiel, J. (2014). Good vibrations: A modal analysis approach for sequential non-rigid structure from motion. In CVPR (pp. 1558–1565).Google Scholar
  2. Akhter, I., Sheikh, Y., & Khan, S. (2009). In defense of orthonormality constraints for nonrigid structure from motion. In CVPR.Google Scholar
  3. Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In NIPS.Google Scholar
  4. Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2011). Trajectory space: A dual representation for nonrigid structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1442–1456.CrossRefGoogle Scholar
  5. Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.zbMATHGoogle Scholar
  6. Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In ECCV.Google Scholar
  7. Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In CVPR.Google Scholar
  8. Ceylan, D., Mitra, N. J., Zheng, Y., & Pauly, M. (2014). Coupled structure-from-motion and 3D symmetry detection for urban facades. ACM Transactions on Graphics, 33, 2. Scholar
  9. Chen, X., & Yuille, A. L. (2014). Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS (pp. 1736–1744).Google Scholar
  10. Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by bayesian inference. In ICCV.Google Scholar
  11. Coughlan, J. M., & Yuille, A. L. (2003). Manhattan world: Orientation and outlier detection by bayesian inference. Neural Computation, 15(5), 1063–1088.CrossRefGoogle Scholar
  12. Dai, Y., Li, H., & He, M. (2012). A simple prior-free method for non-rigid structure-from-motion factorization. In CVPR.Google Scholar
  13. Dai, Y., Li, H., & He, M. (2014). A simple prior-free method for non-rigid structure-from-motion factorization. International Journal of Computer Vision, 107, 101–122.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Furukawa, Y., Curless, B., Seitz, S. M., & Szeliski, R. (2009). Manhattan-world stereo. In CVPR.Google Scholar
  15. Gao, Y., Ma, J., Zhao, M., Liu, W., & Yuille, A. L. (2019). NDDR-CNN: Layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In CVPR.Google Scholar
  16. Gao, Y., & Yuille, A. L. (2016). Symmetry non-rigid structure from motion for category-specific object structure estimation. In ECCV.Google Scholar
  17. Gao, Y., & Yuille, A. L. (2017). Exploiting symmetry and/or manhattan properties for 3D object structure estimation from single and multiple images. In IEEE international conference on computer vision and pattern recognition.Google Scholar
  18. Gordon, G. G. (1990). Shape from symmetry. In Proceedings of SPIE.Google Scholar
  19. Gotardo, P., & Martinez, A. (2011). Computing smooth time-trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2051–2065.CrossRefGoogle Scholar
  20. Grossmann, E., Ortin, D., & Santos-Victor, J. (2002). Single and multi-view reconstruction of structured scenes. In ACCV.Google Scholar
  21. Grossmann, E., & Santos-Victor, J. (2002). Maximum likehood 3D reconstruction from one or more images under geometric constraints. In BMVC.Google Scholar
  22. Grossmann, E., & Santos-Victor, J. (2005). Least-squares 3D reconstruction from one or more views and geometric clues. Computer Vision and Image Understanding, 99(2), 151–174.CrossRefGoogle Scholar
  23. Hamsici, O. C., Gotardo, P. F., & Martinez, A. M. (2012). Learning spatially-smooth mappings in non-rigid structure from motion. In ECCV (pp. 260–273).Google Scholar
  24. Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  25. Hong, J. H., & Fitzgibbon, A. (2015). Secrets of matrix factorization: Approximations, numerics, manifold optimization and random restarts. In ICCV.Google Scholar
  26. Hong, W., Yang, A. Y., Huang, K., & Ma, Y. (2004). On symmetry and multiple-view geometry: Structure, pose, and calibration from a single image. International Journal of Computer Vision, 60, 241–265.CrossRefGoogle Scholar
  27. Kar, A., Tulsiani, S., Carreira, J., & Malik, J. (2015). Category-specific object reconstruction from a single image. In CVPR.Google Scholar
  28. Kontsevich, L. L. (1993). Pairwise comparison technique: A simple solution for depth reconstruction. JOSA A, 10(6), 1129–1135.CrossRefGoogle Scholar
  29. Kontsevich, L. L., Kontsevich, M. L., & Shen, A. K. (1987). Two algorithms for reconstructing shapes. Optoelectronics, Instrumentation and Data Processing, 5, 76–81.Google Scholar
  30. Li, Y., & Pizlo, Z. (2007). Reconstruction of shapes of 3D symmetric objects by using planarity and compactness constraints. In Proceedings of SPIE-IS&T electronic imaging.Google Scholar
  31. Ma, J., Zhao, J., Tian, J., Tu, Z., & Yuille, A. L. (2013). Robust estimation of nonrigid transformation for point set registration. In CVPR (pp. 2147–2154).Google Scholar
  32. Marques, M., & Costeira, J. (2009). Estimating 3D shape from degenerate sequences with missing data. Computer Vision and Image Understanding, 113(2), 261–272.CrossRefGoogle Scholar
  33. Ma, J., Zhao, J., Ma, Y., & Tian, J. (2015). Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognition, 48(3), 772–784.CrossRefGoogle Scholar
  34. Ma, J., Zhao, J., Tian, J., Bai, X., & Tu, Z. (2013). Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognition, 46(12), 3519–3532.CrossRefzbMATHGoogle Scholar
  35. Morris, D. D., Kanatani, K., & Kanade, T. (2001). Gauge fixing for accurate 3D estimation. In CVPR.Google Scholar
  36. Mukherjee, D. P., Zisserman, A., & Brady, M. (1995). Shape from symmetry: Detecting and exploiting symmetry in affine images. Philosophical Transactions: Physical Sciences and Engineering, 351, 77–106.CrossRefzbMATHGoogle Scholar
  37. Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision (pp. 483–499). Springer.Google Scholar
  38. Olsen, S. I., & Bartoli, A. (2008). Implicit non-rigid structure-from-motion with priors. Journal of Mathematical Imaging and Vision, 31(2–3), 233–244.Google Scholar
  39. Pavlakos, G., Zhou, X., Chan, A., Derpanis, K. G., & Daniilidis, K. (2017). 6-DoF object pose from semantic keypoints. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 2011–2018). IEEE.Google Scholar
  40. Rosen, J. (2011). Symmetry discovered: Concepts and applications in nature and science. Mineola: Dover Publications.zbMATHGoogle Scholar
  41. Schönemann, P. H. (1966). A generalized solution of the orthogonal procrustes problem. Psychometrika, 31, 1–10.MathSciNetCrossRefzbMATHGoogle Scholar
  42. Thrun, S., & Wegbreit, B. (2005). Shape from symmetry. In ICCV.Google Scholar
  43. Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2), 137–154.CrossRefGoogle Scholar
  44. Torresani, L., Hertzmann, A., & Bregler, C. (2003). Learning non-rigid 3D shape from 2D motion. In NIPS.Google Scholar
  45. Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 878–892.Google Scholar
  46. Vetter, T., & Poggio, T. (1994). Symmetric 3D objects are an easy case for 2D object recognition. Spatial Vision, 8, 443–453.CrossRefGoogle Scholar
  47. Vicente, S., Carreira, J., Agapito, L., & Batista, J. (2014). Reconstructing PASCAL VOC. In CVPR.Google Scholar
  48. Xiang, Y., Mottaghi, R., & Savarese, S. (2014). Beyond pascal: A benchmark for 3D object detection in the wild. In WACV.Google Scholar
  49. Xiao, J., Chai, J., & Kanade, T. (2004). A closed-form solution to nonrigid shape and motion recovery. In ECCV.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and AutomationHuazhong University of Science and TechnologyWuhanChina
  2. 2.Tencent AI LabShenzhenChina
  3. 3.Departments of Computer Science and Cognitive ScienceJohns Hopkins UniversityBaltimoreUSA
  4. 4.Department of StatisticsUCLALos AngelesUSA

Personalised recommendations