Lifting 2D Object Detections to 3D: A Geometric Approach in Multiple Views

  • Cosimo Rubino
  • Andrea Fusiello
  • Alessio Del Bue
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10484)


We present two new methods based on Interval Analysis and Computational Geometry for estimating the 3D occupancy and position of objects from image sequences. Given a calibrated set of images, the proposed frameworks first detect objects using off-the-shelf object detectors and then match bounding boxes in multiple views. The 2D semantic information given by the bounding boxes are used to efficiently recover 3D object position and occupancy using solely geometrical constraints in multiple views. We also combine further constraints to obtain a solution even when few images are available. Experiments on three different realistic datasets show the applicability and the potentials of the approaches.


Object localisation Object detection Interval Analysis 


  1. 1.
    Aldoma, A., Faulhammer, T., Vincze, M.: Automation of ground truth annotation for multi-view RGB-D object instance recognition datasets. In: IROS (2014)Google Scholar
  2. 2.
    Avis, D.: A revised implementation of the reverse search vertex enumeration algorithm. In: Kalai, G., Ziegler, G.M. (eds.) Polytopes Combinatorics and Computation. DMV Seminar, vol. 29, pp. 177–198. Birkhäuser, Basel (2000). doi: 10.1007/978-3-0348-8438-9_9 CrossRefGoogle Scholar
  3. 3.
    Bao, S.Y., Xiang, Y., Savarese, S.: Object co-detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 86–101. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33718-5_7 CrossRefGoogle Scholar
  4. 4.
    Byröd, M., Josephson, K., Åström, K.: A Column-pivoting based strategy for monomial ordering in numerical gröbner basis calculations. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 130–143. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88693-8_10 CrossRefGoogle Scholar
  5. 5.
    Crocco, M., Rubino, C., Del Bue, A.: Structure from motion with objects. In: CVPR, pp. 4141–4149. IEEE (2016)Google Scholar
  6. 6.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. arXiv (2017)Google Scholar
  7. 7.
    Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.: Dense reconstruction using 3D object shape priors. In: CVPR, pp. 1288–1295. IEEE (2013)Google Scholar
  8. 8.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  9. 9.
    Farenzena, M., Fusiello, A., Dovier, A.: Reconstruction with interval constraints propagation. In: CVPR, pp. 1185–1190 (2006)Google Scholar
  10. 10.
    Farenzena, M., Fusiello, A.: Stabilizing 3D modeling with geometric constraints propagation. Comput. Vis. Image Underst. 113(11), 1147–1157 (2009)CrossRefGoogle Scholar
  11. 11.
    Fidler, S., Dickinson, S., Urtasun, R.: 3D object detection and viewpoint estimation with a deformable 3D cuboid model. In: NIPS, pp. 611–619 (2012)Google Scholar
  12. 12.
    Geiger, A., Lauer, M., Wojek, C., Stiller, C., Urtasun, R.: 3D traffic scene understanding from movable platforms. PAMI 36(5), 1012–1025 (2014)CrossRefGoogle Scholar
  13. 13.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). doi: 10.1007/978-3-319-10584-0_23 Google Scholar
  14. 14.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv (2017)Google Scholar
  15. 15.
    Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37331-2_42 CrossRefGoogle Scholar
  16. 16.
    Kim, B.S., Xu, S., Savarese, S.: Accurate localization of 3D objects from RGB-D data using segmentation hypotheses. In: CVPR, pp. 3182–3189 (2013)Google Scholar
  17. 17.
    Laurentini, A.: The visual hull concept for silhouette-based image understanding. TPAMI 16(2), 150–162 (1994)CrossRefGoogle Scholar
  18. 18.
    Moore, R.E.: Interval Analysis. Prentice-Hall, Upper Saddle River (1966)zbMATHGoogle Scholar
  19. 19.
    Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. arXiv (2016)Google Scholar
  20. 20.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33715-4_54 CrossRefGoogle Scholar
  21. 21.
    Pepik, B., Gehler, P., Stark, M., Schiele, B.: 3D2PM – 3D deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 356–370. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33783-3_26 CrossRefGoogle Scholar
  22. 22.
    Preparata, F.P., Shamos, M.I.: Computational Geometry. An Introduction, Chap. 2. Springer, New York (1985). pp. 72–77zbMATHGoogle Scholar
  23. 23.
    Kearfott, R.B.: Rigorous Global Search: Continuos Problems. Kluwer, Dordrecht (1996)CrossRefzbMATHGoogle Scholar
  24. 24.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  25. 25.
    Rump, S.: INTLAB - INTerval LABoratory. In: Developments in Reliable Computing, pp. 77–104. Kluwer Academic Publishers (1999)Google Scholar
  26. 26.
    Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelME: a database and web-based tool for image annotation. IJCV 77(1), 157–173 (2008)CrossRefGoogle Scholar
  27. 27.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: ICCV (2015)Google Scholar
  28. 28.
    Zia, M.Z., Stark, M., Schindler, K.: Towards scene understanding with detailed 3D object representations. IJCV 112(2), 188–203 (2015)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Cosimo Rubino
    • 1
  • Andrea Fusiello
    • 2
  • Alessio Del Bue
    • 1
  1. 1.Visual Geometry and Modelling (VGM) LabIstituto Italiano di Tecnologia (IIT)GenovaItaly
  2. 2.DPIAUniversità di UdineUdineItaly

Personalised recommendations