The Visual Computer

, Volume 36, Issue 1, pp 211–226 | Cite as

Calipso: physics-based image and video editing through CAD model proxies

  • Nazim HaouchineEmail author
  • Frederick Roy
  • Hadrien Courtecuisse
  • Matthias Nießner
  • Stephane Cotin
Original Article


We present Calipso, an interactive method for editing images and videos in a physically coherent manner. Our main idea is to realize physics-based manipulations by running a full-physics simulation on proxy geometries given by non-rigidly aligned CAD models. Running these simulations allows us to apply new, unseen forces to move or deform selected objects, change physical parameters such as mass or elasticity, or even add entire new objects that interact with the rest of the underlying scene. In our method, the user makes edits directly in 3D; these edits are processed by the simulation and then transferred to the target 2D content using shape-to-image correspondences in a photo-realistic rendering process. To align the CAD models, we introduce an efficient CAD-to-image alignment procedure that jointly minimizes for rigid and non-rigid alignment while preserving the high-level structure of the input shape. Moreover, the user can choose to exploit image flow to estimate scene motion, producing coherent physical behavior with ambient dynamics. We demonstrate physics-based editing on a wide range of examples producing myriad physical behavior while preserving geometric and visual consistency.


Video and image manipulations Interactive editing Physics-based modeling Scene dynamics 


Compliance with ethical standards

Conflict of interest

All authors declare having no conflict of interest.

Supplementary material

Supplementary material 1 (mp4 36903 KB)

Supplementary material 2 (mp4 15940 KB)


  1. 1.
    Anitescu, M., Potra, F.A., Stewart, D.E.: Time-stepping for three-dimensional rigid body dynamics. Comput. Methods Appl. Mech. Eng. 177(3), 183–197 (1999)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Avidan, S., Shamir, A.: Seam carving for content-aware image resizing. ACM Trans. Graph. 26, 10 (2007)Google Scholar
  3. 3.
    Bai, J., Agarwala, A., Agrawala, M., Ramamoorthi, R.: Selectively de-animating video. ACM Trans. Graph. 31(4), 66–1 (2012)Google Scholar
  4. 4.
    Baraff, D., Witkin, A.: Large steps in cloth simulation. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, pp. 43–54. ACM (1998)Google Scholar
  5. 5.
    Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.: Patchmatch: a randomized correspondence algorithm for structural image editing. In: ACM Transactions on Graphics-TOG, vol. 28, p. 24 (2009)Google Scholar
  6. 6.
    Barrett, W.A., Cheney, A.S.: Object-based image editing. ACM Trans. Graph. 21, 777–784 (2002)Google Scholar
  7. 7.
    Bazin, J.C., Yu, G., Martin, T., Jacobson, A., Gross, M.: Physically based video editing. Comput. Graph. Forum 35, 421–429 (2016)Google Scholar
  8. 8.
    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194. ACM Press/Addison-Wesley Publishing Co. (1999)Google Scholar
  9. 9.
    Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)zbMATHGoogle Scholar
  10. 10.
    Chang, C.S., Chu, H.K., Mitra, N.J.: Interactive videos: plausible video editing using sparse structure points. Comput. Graph. Forum (2016). Google Scholar
  11. 11.
    Chen, J., Paris, S., Wang, J., Matusik, W., Cohen, M., Durand, F.: The video mesh: a data structure for image-based three-dimensional video editing. In: Computational Photography (ICCP), 2011 IEEE International Conference on, pp. 1–8 (2011)Google Scholar
  12. 12.
    Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2photo: internet image montage. ACM Trans. Graph. 28(5), 124 (2009)Google Scholar
  13. 13.
    Chen, T., Zhu, Z., Shamir, A., Hu, S.M., Cohen-Or, D.: 3-sweep: extracting editable objects from a single photo. ACM Trans. Graph. 32(6), 195 (2013)Google Scholar
  14. 14.
    Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction, pp. 628–644 (2016)Google Scholar
  15. 15.
    Coquillart, S.: Extended free-form deformation: a sculpturing tool for 3d geometric modeling. In: Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’90, pp. 187–196. ACM, New York (1990)Google Scholar
  16. 16.
    Courtecuisse, H., Allard, J., Kerfriden, P., Bordas, S.P.A., Cotin, S., Duriez, C.: Real-time simulation of contact and cutting of heterogeneous soft-tissues. Med. Image Anal. 18(2), 394–410 (2014)Google Scholar
  17. 17.
    Davis, A., Bouman, K.L., Chen, J.G., Rubinstein, M., Buyukozturk, O., Durand, F., Freeman, W.T.: Visual vibrometry: estimating material properties from small motions in video. IEEE Trans. Pattern Anal. Mach. Intell PP(99), 1–1 (2016)Google Scholar
  18. 18.
    Davis, A., Chen, J.G., Durand, F.: Image-space modal bases for plausible manipulation of objects in video. ACM Trans. Graph. 34(6), 239 (2015)Google Scholar
  19. 19.
    Debevec, P.: Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography. In: ACM SIGGRAPH 2008 Classes, p. 32. ACM (2008)Google Scholar
  20. 20.
    Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 11–20. ACM (1996)Google Scholar
  21. 21.
    Fang, H., Hart, J.C.: Textureshop: texture synthesis as a photograph editing tool. ACM Trans. Graph. 23(3), 354–359 (2004)Google Scholar
  22. 22.
    Faure, F., Duriez, C., Delingette, H., Allard, J., Gilles, B., Marchesseau, S., Talbot, H., Courtecuisse, H., Bousquet, G., Peterlik, I., Cotin, S.: SOFA: A multi-model framework for interactive physical simulation. In: Soft Tissue Biomechanical Modeling for Computer Assisted Surgery, vol. 11, 283–321 (2012)Google Scholar
  23. 23.
    Goldberg, C., Chen, T., Zhang, F.L., Shamir, A., Hu, S.M.: Data-driven object manipulation in images. Comput. Graph. Forum 31, 265–274 (2012)Google Scholar
  24. 24.
    Hara, K., Nishino, K., lkeuchi, K.: Light source position and reflectance estimation from a single view without the distant illumination assumption. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 493–505 (2005). Google Scholar
  25. 25.
    Jain, A., Thormählen, T., Seidel, H.P., Theobalt, C.: Moviereshape: tracking and reshaping of humans in videos. In: ACM Transactions on Graphics (TOG), vol. 29, p. 148 (2010)Google Scholar
  26. 26.
    Karsch, K., Hedau, V., Forsyth, D., Hoiem, D.: Rendering synthetic objects into legacy photographs. In: ACM Transactions on Graphics (TOG), vol. 30, p. 157. ACM (2011)Google Scholar
  27. 27.
    Khan, E.A., Reinhard, E., Fleming, R.W., Bülthoff, H.H.: Image-based material editing. ACM Trans. Graph. 25(3), 654–663 (2006)Google Scholar
  28. 28.
    Kholgade, N., Simon, T., Efros, A., Sheikh, Y.: 3D object manipulation in a single photograph using stock 3d models. ACM Trans. Graph. 33(4), 127 (2014)zbMATHGoogle Scholar
  29. 29.
    Kikuuwe, R., Tabuchi, H., Yamamoto, M.: An edge-based computationally efficient formulation of saint venant-kirchhoff tetrahedral finite elements. ACM Trans. Graph. 28(1), 8 (2009)Google Scholar
  30. 30.
    Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., Criminisi, A.: Photo clip art. In: ACM Transactions on Graphics (TOG), vol. 26, p. 3. ACM (2007)Google Scholar
  31. 31.
    Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: an accurate o(n) solution to the pnp problem. Int. J. Comput. Vis. 81(2), 155 (2008)Google Scholar
  32. 32.
    Levin, D.: Mesh-independent surface interpolation. In: Brunnett, G., Hamann, B., Mller, H., Linsen, L. (eds.) Geometric Modeling for Scientific Visualization, Mathematics and Visualization, pp. 37–49 (2004)Google Scholar
  33. 33.
    Monszpart, A., Thuerey, N., Mitra, N.J.: Smash: physics-guided reconstruction of collisions from videos. ACM Trans. Graph. 35(6), 199:1–199:14 (2016). Google Scholar
  34. 34.
    Nealen, A., Müller, M., Keiser, R., Boxerman, E., Carlson, M.: Physically based deformable models in computer graphics. In: Computer Graphics Forum, vol. 25 (4), pp. 809–836. Wiley Online Library (2006)Google Scholar
  35. 35.
    Oh, B.M., Chen, M., Dorsey, J., Durand, F.: Image-based modeling and photo editing. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 433–442. ACM (2001)Google Scholar
  36. 36.
    Okabe, M., Dobashi, Y., Anjyo, K.: Animating pictures of water scenes using video retrieval. Vis. Comput. 34(3), 347–358 (2018). Google Scholar
  37. 37.
    Rother, C., Kolmogorov, V., Blake, A.: grabcut: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)Google Scholar
  38. 38.
    Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)Google Scholar
  39. 39.
    Saupin, G., Duriez, C., Cotin, S., Grisoni, L.: Efficient contact modeling using compliance warping. In: Computer Graphics International (2008)Google Scholar
  40. 40.
    Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., Mitra, N.J.: Imagining the unseen: stability-based cuboid arrangements for scene understanding. ACM Trans. Graph. 33(6), 209:1–209:11 (2014). Google Scholar
  41. 41.
    Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Symposium on Geometry Processing, vol. 4 (2007)Google Scholar
  42. 42.
    Tan, R.T., Ikeuchi, K.: Separating reflection components of textured surfaces using a single image. IEEE Trans. Pattern Anal. Mach. Intell. 27(2), 178–193 (2005). Google Scholar
  43. 43.
    Teramoto, O., Park, I.K., Igarashi, T.: Interactive motion photography from a single image. Vis. Comput. 26(11), 1339–1348 (2010). Google Scholar
  44. 44.
    Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34(6), 183 (2015)Google Scholar
  45. 45.
    Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: Real-time face capture and reenactment of rgb videos. Proceedings of Computer Vision and Pattern Recognition (CVPR), IEEE 1 (2016)Google Scholar
  46. 46.
    Xie, J., Girshick, R., Farhadi, A.: Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks, pp. 842–857. Cham (2016)Google Scholar
  47. 47.
    Zheng, Y., Chen, X., Cheng, M.M., Zhou, K., Hu, S.M., Mitra, N.J.: Interactive images: cuboid proxies for smart image manipulation. ACM Trans. Graph. 31(4), 99–1 (2012)Google Scholar
  48. 48.
    Zhou, S., Fu, H., Liu, L., Cohen-Or, D., Han, X.: Parametric reshaping of human bodies in images. ACM Trans. Graph. 29(4), 126:1–126:10 (2010). Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.InriaStrasbourgFrance
  2. 2.University of StrasbourgStrasbourgFrance
  3. 3.AVR/ICubeCNRSStrasbourgFrance
  4. 4.Stanford UniversityStanfordUSA
  5. 5.Technical University of MunichMunichGermany

Personalised recommendations