Occlusion and Collision Aware Smartphone AR Using Time-of-Flight Camera

  • Yuan TianEmail author
  • Yuxin Ma
  • Shuxue Quan
  • Yi Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11845)


The development of Visual Inertial Odometry (VIO) systems such as ARKit and ARCore has brought smartphone Augmented Reality (AR) to mainstream. However, interactions between virtual objects and real objects are still limited due to the lack of 3D sensing capability. Recently, smartphone makers have been touting Time-of-Flight (ToF) cameras on their phones. ToF cameras can determine depth information in a photo using infrared light. By understanding the 3D structure of the scene, more AR capabilities can be enabled. In this paper, we propose practical methods to process ToF depth maps in real time and enable occlusion handling and collision detection for AR applications simultaneously. Our experimental results show real time performance and good visual quality for both occlusion rendering and collision detection.


Augmented Reality Time-of-flight Visual occlusion Collision detection Depth up-sampling 


  1. 1.
    Aliaga, D.G.: Virtual and real object collisions in a merged environment. In: Proceedings of Virtual Reality Software and Technology 1994, pp. 287–298 (1994)Google Scholar
  2. 2.
  3. 3.
    Breen, D.E., Whitaker, R.T., Rose, E., Tuceryan, M.: Interactive occlusion and automatic object placement for augmented reality. Comput. Graph. Forum 15(3), 11–22 (1996)CrossRefGoogle Scholar
  4. 4.
    Bridson, R., Fedkiw, R., Anderson, J.: Robust treatment of collisions, contact and friction for cloth animation. ACM Trans. Graph. (ToG) 21(3), 594–603 (2002)CrossRefGoogle Scholar
  5. 5.
    Brochu, T., Edwards, E., Bridson, R.: Efficient geometrically exact continuous collision detection. ACM Trans. Graph. (ToG) 31(4), 96 (2012)CrossRefGoogle Scholar
  6. 6.
    Chen, L., Lin, H., Li, S.: Depth image enhancement for Kinect using region growing and bilateral filter. In: Proceedings of the 21st International Conference on Pattern Recognition, pp. 3070–3073 (Nov 2012)Google Scholar
  7. 7.
    Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of SIGGRAPH 1996, pp. 303–312 (1996)Google Scholar
  8. 8.
    Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)CrossRefGoogle Scholar
  9. 9.
    Holynski, A., Kopf, J.: Fast depth densification for occlusion-aware augmented reality. ACM Trans. Graph. 37(6), 194 (2019)Google Scholar
  10. 10.
    Hornácek, M., Rhemann, C., Gelautz, M., Rother, C.: Depth super resolution by rigid body self-similarity in 3D. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1123–1130, June 2013Google Scholar
  11. 11.
    Hui, T.-W., Loy, C.C., Tang, X.: Depth map super-resolution by deep multi-scale guidance. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 353–369. Springer, Cham (2016). Scholar
  12. 12.
    Klingensmith, M., Dryanovski, I., Srinivasa, S., Xiao, J.: Chisel: Real time large scale 3D reconstruction onboard a mobile device using spatially hashed signed distance fields. In: Robotics: Science and Systems, vol. 4, p. 1 (2015)Google Scholar
  13. 13.
    Ku, J., Harakeh, A., Waslander, S.L.: In defense of classical image processing: fast depth completion on the CPU. In: 2018 15th Conference on Computer and Robot Vision (CRV), pp. 16–22, May 2018Google Scholar
  14. 14.
    Lee, D., Lee, S.G., Kim, W.M., Lee, Y.J.: Sphere-to-sphere collision estimation of virtual objects to arbitrarily-shaped real objects for augmented reality. Electron. Lett. 46(13), 915–916 (2010)CrossRefGoogle Scholar
  15. 15.
    Li, Y., Min, D., Do, M.N., Lu, J.: Fast guided global interpolation for depth and motion. ECCV 2016, 717–733 (2016)Google Scholar
  16. 16.
    Luo, T., Liu, Z., Pan, Z., Zhang, M.: A virtual-real occlusion method based on GPU acceleration for MR. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 1068–1069, March 2019Google Scholar
  17. 17.
    Matyunin, S., Vatolin, D., Berdnikov, Y., Smirnov, M.: Temporal filtering for depth maps generated by Kinect depth camera. In: 2011 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, pp. 1–4, May 2011Google Scholar
  18. 18.
    Miles, H., Seungkyu, L., Ouk, C., Radu, H.: Time-of-Flight Cameras: Principles, Methods and Applications. Springer, London (2012). Scholar
  19. 19.
    Min, D., Choi, S., Lu, J., Ham, B., Sohn, K., Do, M.N.: Fast global image smoothing based on weighted least squares. IEEE Trans. Image Process. 23(12), 5638–5653 (2014)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136, October 2011Google Scholar
  21. 21.
    Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (ToG) 32(6), 169 (2013)CrossRefGoogle Scholar
  22. 22.
    Park, J., Kim, H., Tai, Y.-W., Brown, M.S., Kweon, I.: High quality depth map upsampling for 3D-TOF cameras. In: 2011 International Conference on Computer Vision, pp. 1623–1630, November 2011Google Scholar
  23. 23.
    Qi, F., Han, J., Wang, P., Shi, G., Li, F.: Structure guided fusion for depth map inpainting. Pattern Recogn. Lett. 34(1), 70–76 (2013)CrossRefGoogle Scholar
  24. 24.
    Richardt, C., Stoll, C., Dodgson, N.A., Seidel, H., Theobalt, C.: Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. Comput. Graph. Forum 31(2), 247–256 (2012)CrossRefGoogle Scholar
  25. 25.
    Roxas, M., Hori, T., Fukiage, T., Okamoto, Y., Oishi, T.: Occlusion handling using semantic segmentation and visibility-based rendering for mixed reality. In: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology, VRST 2018, pp. 20:1–20:8. ACM, New York (2018)Google Scholar
  26. 26.
    Song, X., Dai, Y., Qin, X.: Deep depth super-resolution: learning depth super-resolution using deep convolutional neural network. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10114, pp. 360–376. Springer, Cham (2017). Scholar
  27. 27.
    Tang, M., Curtis, S., Yoon, S.E., Manocha, D.: ICCD: interactive continuous collision detection between deformable models using connectivity-based culling. IEEE Trans. Visual Comput. Graphics 15(4), 544–557 (2009)CrossRefGoogle Scholar
  28. 28.
    Teschner, M., Heidelberger, B., Mueller, M., Pomeranets, D., Gross, M.: Optimized spatial hashing for collision detection of deformable objects. In: Proceedings of the Vision, Modeling, Visualization (VMV), pp. 47–54 (2003)Google Scholar
  29. 29.
    Tian, Y., Li, C., Guo, X., Prabhakaran, B.: Real time stable haptic rendering of 3D deformable streaming surface. In: Proceedings of the 8th ACM on Multimedia Systems Conference, MMSys 2017, pp. 136–146. ACM, New York (2017)Google Scholar
  30. 30.
    Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: 2017 International Conference on 3D Vision (3DV), pp. 11–20, October 2017Google Scholar
  31. 31.
    Valentin, J.P.C., et al.: Depth from motion for smartphone AR. ACM Trans. Graph. 37(6), 193 (2019)Google Scholar
  32. 32.
    Walton, D.R., Steed, A.: Accurate real-time occlusion for mixed reality. In: Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology, VRST 2017, pp. 11:1–11:10. ACM, New York (2017)Google Scholar
  33. 33.
    Weerasekera, C.S., Dharmasiri, T., Garg, R., Drummond, T., Reid, I.D.: Just-in-time reconstruction: inpainting sparse maps using single view depth predictors as priors. In: International Conference on Robotics and Automation, pp. 1–9 (2018)Google Scholar
  34. 34.
    Xie, J., Feris, R., Sun, M.T.: Edge-guided single depth image super resolution. IEEE Trans. Image Process. 25(1), 428–438 (2016)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Xie, J., Feris, R., Yu, S.S., Sun, M.T.: Joint super resolution and denoising from a single depth image. IEEE Trans. Multimedia 17(9), 1525–1537 (2015)CrossRefGoogle Scholar
  36. 36.
    Xu, Y., Wu, Y., Zhou, H.: Multi-scale voxel hashing and efficient 3D representation for mobile augmented reality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1505–1512 (2018)Google Scholar
  37. 37.
    Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 175–185, June 2018Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.OPPO US Research CenterPalo AltoUSA

Personalised recommendations