Putting the User in the Loop for Image-Based Modeling

  • Adarsh Kowdle
  • Yao-Jen Chang
  • Andrew Gallagher
  • Dhruv Batra
  • Tsuhan Chen


We refer to the task of recovering the 3D structure of an object or a scene using 2D images as image-based modeling. In this paper, we formulate the task of recovering the 3D structure as a discrete optimization problem solved via energy minimization. In this standard framework of a Markov random field (MRF) defined over the image we present algorithms that allow the user to intuitively interact with the algorithm. We introduce an algorithm where the user guides the process of image-based modeling to find and model the object of interest by manually interacting with the nodes of the graph. We develop end user applications using this algorithm that allow object of interest 3D modeling on a mobile device and 3D printing of the object of interest. We also propose an alternate active learning algorithm that guides the user input. An initial attempt is made at reconstructing the scene without supervision. Given the reconstruction, an active learning algorithm uses intuitive cues to quantify the uncertainty of the algorithm and suggest regions, querying the user to provide support for the uncertain regions via simple scribbles. These constraints are used to update the unary and the pairwise energies that, when solved, lead to better reconstructions. We show through machine experiments and a user study that the proposed approach intelligently queries the users for constraints, and users achieve better reconstructions of the scene faster, especially for scenes with textureless surfaces lacking strong textural or structural cues that algorithms typically require.


Image-based modeling Interactive 3D reconstruction  Active-learning Energy minimization 



The authors thank Anandram Sundar for the data annotation.


  1. Bagon, S. (2006). Matlab wrapper for graph cut. Accessed 7 March 2013.
  2. Bartoli, A. (2007). A random sampling strategy for piecewise planar scene segmentation. Cardiac and Vascular Institute of Ultrasound, 105(1), 42–59.Google Scholar
  3. Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2011). Interactively co-segmenting topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3), 273–292.CrossRefGoogle Scholar
  4. Baumgart, B.G. (1974). Geometric modeling for computer vision. PhD thesis, Stanford University.Google Scholar
  5. Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.CrossRefGoogle Scholar
  6. Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, 20(12), 1222–1239.CrossRefGoogle Scholar
  7. Campbell, N., Vogiatzis, G., Hernndez, C., & Cipolla, R. (2007). Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In BMVC, Bristol.Google Scholar
  8. Campbell, N.D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In ECCV.Google Scholar
  9. Chen, Z., Chou, H.L., & Chen, W.C. (2008). A performance controllable octree construction method. In ICPR.Google Scholar
  10. Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In ECCV.Google Scholar
  11. Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, 24(5), 603–619.CrossRefGoogle Scholar
  12. Criminisi, A., Reid, I.D., & Zisserman, A. (1999). Single view metrology. In ICCV.Google Scholar
  13. Debevec, P., Taylor, C., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH.Google Scholar
  14. Fang, Y. H., Chou, H. L., & Chen, Z. (2003). 3D Shape recovery of complex objects from multiple silhouette images. Pattern Recognition Letters, 24(9–10), 1279–1293.CrossRefzbMATHGoogle Scholar
  15. Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.CrossRefGoogle Scholar
  16. Forbes, K., Nicolls, F., de Jager, G., & Voigt, A. (2006). Shape-from-silhouette with two mirrors and an uncalibrated camera. In ECCV, (pp. 165–178).Google Scholar
  17. Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and robust multi-view stereopsis. Pattern Analysis and Machine Intelligence, 32:1362–1376.Google Scholar
  18. Furukawa, Y., Curless, B., Seitz, S., & Szeliski, R. (2009). Reconstructing building interiors from images. In ICCV.Google Scholar
  19. Furukawa, Y., Curless, B., Seitz, S.M., & Szeliski, R. (2010). Towards internet-scale multi-view stereo. In CVPR.Google Scholar
  20. Gallup, D., Frahm, J., & Pollefeys, M. (2010). Piecewise planar and non-planar stereo for urban scene reconstruction. In CVPR.Google Scholar
  21. Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S.M. (2007). Multi-view stereo for community photo collections. In ICCV.Google Scholar
  22. Gosselin, P. H., & Cord, M. (2008). Active learning methods for interactive image retrieval. IEEE Transactions on Image Processing, 17(7), 1200–1211.CrossRefMathSciNetGoogle Scholar
  23. Hengel, A., Dick, A. R., ThormŁhlen, T., Ward, B., & Torr, P. H. S. (2007). Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics, 26(3), 86.CrossRefGoogle Scholar
  24. Hoiem, D., Efros, A., & Hebert, M. (2005). Automatic photo pop-up. In SIGGRAPH.Google Scholar
  25. Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. IJCV, 75(1)Google Scholar
  26. Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In CVPR, (pp. 762–769).Google Scholar
  27. Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In ICCV.Google Scholar
  28. Kohli, P., & Torr, P. H. S. (2008). Measuring uncertainty in graph cut solutions. Computer Vision and Image Understanding, 112(1), 30–38.CrossRefGoogle Scholar
  29. Kohli, P., Nickisch, H., Rother, C., & Rhemann, C. (2012). User-centric learning and evaluation of interactive segmentation systems. In IJCV.Google Scholar
  30. Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? Pattern Analysis and Machine Intelligence, 26(2), 147–159.CrossRefGoogle Scholar
  31. Kowdle, A., Batra, D., Chen, W., & Chen, T. (2010). iModel: Interactive co-segmentation for object of interest 3d modeling. In ECCVRMLE Workshop.Google Scholar
  32. Kowdle, A., Chang, Y., Batra, D., & Chen, T. (2011a). Scribble based interactive 3d reconstruction via scene cosegmentation. In ICIP.Google Scholar
  33. Kowdle, A., Chang, Y., Gallagher, A., & Chen, T. (2011b). Active learning for piecewise planar 3d reconstruction. In CVPR.Google Scholar
  34. Kowdle, A., Liu, H., Hsu, S., Lew, J., Puri, C., Batra, D., & Chen, T. (2012a). iModel: Object of interest 3d modeling via interactive co-segmentation on a mobile device. In Demo session at CVPR.Google Scholar
  35. Kowdle, A., Sinha, S., & Szeliski, R. (2012b). Multiple view object cosegmentation using appearance and stereo cues. In ECCV.Google Scholar
  36. Lafarge, F., Keriven, R., Brédif, M., & Hiep, V. (2010). Hybrid multi-view reconstruction by jump-diffusion. In CVPR.Google Scholar
  37. Lee, W., Woo, W., & Boyer, E. (2007). Identifying foreground from multiple images. In ACCV.Google Scholar
  38. McGuinness, K., & O’Connor, N.E. (2012). Toward automated evaluation of interactive segmentation. In Computer Vision and Image Understanding. 115(6) (pp. 868-884).Google Scholar
  39. Micusík, B., & Kosecká, J. (2010). Multi-view superpixel stereo in urban environments. International Journal of Computer Vision, 89(1), 106–119.CrossRefGoogle Scholar
  40. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., et al. (2004). Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3), 207–232. Google Scholar
  41. Pollefeys, M., Nistr, D., Frahm, J., Akbarzadeh, A., Mordohai, P., Clipp, B., et al. (2008). Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78(2–3), 143–167.Google Scholar
  42. Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3d: Learning 3d scene structure from a single still image. Pattern Analysis and Machine Intelligence, 31(5), 824–840.Google Scholar
  43. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.Google Scholar
  44. Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.Google Scholar
  45. Sinha, S., Steedly, D., Szeliski, R., Agrawala, M., & Pollefeys, M. (2008). Interactive 3d architectural modeling from unordered photo collections. In SIGGRAPH Asia.Google Scholar
  46. Sinha, S., Steedly, D., & Szeliski, R. (2009). Piecewise planar stereo for image-based rendering. In ICCV.Google Scholar
  47. Sketchup. (2000). Google sketchup. Accessed 7 March 2013.
  48. Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In SIGGRAPH.Google Scholar
  49. Srivastava, S., Saxena, A., Theobalt, C., Thrun, S., & Ng, A.Y. (2009). i23 - Rapid interactive 3d reconstruction from a single image. In Vision, Modeling and Visualization.Google Scholar
  50. Sturm, P.F., & Maybank, S.J. (1999). A method for interactive 3d reconstruction of piecewise planar objects from single images. In BMVC.Google Scholar
  51. Szeliski, R. (1993). Rapid octree construction from image sequences. Computer Vision Graphics and Image Processing, 58(1), 23–32.CrossRefGoogle Scholar
  52. Tang, K., Kowdle, A., Batra, D., & Chen, T. (2009). iScribble. Accessed 7 March 2013.
  53. Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In CVPR.Google Scholar
  54. Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In CVPR.Google Scholar
  55. Yan, R., Yang, J., & Hauptmann, A. (2003). Automatically labeling video data using multi-class active learning. In ICCV.Google Scholar
  56. Zhou, X. S., & Huang, T. S. (2003). Relevance feedback in image retrieval: A comprehensive review. Multimedia Systems, 8(6), 536–544.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Adarsh Kowdle
    • 1
  • Yao-Jen Chang
    • 2
  • Andrew Gallagher
    • 1
  • Dhruv Batra
    • 3
  • Tsuhan Chen
    • 1
  1. 1.Cornell University IthacaUSA
  2. 2.Siemens Corporation, Corporate TechnologyPrincetonUSA
  3. 3.Virginia Tech BlacksburgUSA

Personalised recommendations