Skip to main content
Log in

A semantic-based approach to digital content placement for immersive environments

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

This paper presents a semantic-based interactive system that enables virtual content placement using natural language. We propose a novel computational framework composed of three components including 3D reconstruction, 3D segmentation, and 3D annotation. Based on the framework, the system can automatically construct a semantic representation of the environment from raw point cloud data. Users can then assign virtual content to a specific physical location by referring to its semantic label. Compared with traditional projection mapping which may involve tedious manual adjustments, the proposed system can facilitate intuitive and efficient manipulation of virtual content in immersive environments through speech inputs. The technical evaluation and user study results show that the system can provide users with accurate semantic information for effective virtual content placement at room scale.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Xiao, R., Harrison, C., and Hudson, S. E.: WorldKit: rapid and easy creation of ad-hoc interactive applications on everyday surfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 879–888, (2013)

  2. Jones, B., Sodhi, R., Murdock, M., Mehra, R., Benko, H., Wilson, A., Ofek, E., MacIntyre, B., Raghuvanshi, N., and Shapira, L.: RoomAlive: magical experiences enabled by scalable, adaptive projector-camera units. In: Proceedings of the 27th annual ACM symposium on User interface software and technology, pp. 637–644, (2014)

  3. Fender, A., Herholz, P., Alexa, M., and Müller, J.: OptiSpace: automated placement of interactive 3D projection mapping content. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–11, (2018)

  4. Fender, A., and Müller, J.: SpaceState: Ad-Hoc definition and recognition of hierarchical room states for smart environments. In: Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces, pp. 303–314, (2019)

  5. Weimer, D., Ganapathy, S.K.: A synthetic visual environment with hand gesturing and voice input. ACM SIGCHI Bull. 20, 235–240 (1989)

    Article  Google Scholar 

  6. Marsh, E., Wauchope, K., and Gurney, J.: Human-machine dialogue for multi-modal decision support systems. In: Proceedings of the AAAI Spring Symposium on Multi-Media Multi-Modal Systems. Citeseer, (1994)

  7. Liu, J.: Semantic mapping: a semantics-based approach to virtual content placement for immersive environments. In: 2021 17th International Conference on Intelligent Environments (IE). IEEE, pp. 1–8, (2021)

  8. Raskar, R., Welch, G., and Fuchs, H.: Spatially augmented reality. In: Proceedings of the International Workshop on Augmented Reality: Placing Artificial Objects in Real Scenes: Placing Artificial Objects in Real Scenes, pp. 63–72, (1999)

  9. Raskar, R., Welch, G., Low, K.-L., and Bandyopadhyay, D.: Shader lamps: animating real objects with image-based illumination. In: Eurographics Workshop on Rendering Techniques, pp. 89–102, (2001)

  10. Raskar, R., Baar, J. V., Beardsley, P., Willwacher, T., Rao, S., and Forlines, C.: iLamps: geometrically aware and self-configuring projectors. pp. 7–es, (2006)

  11. Wilson, A. D.: Depth-sensing video cameras for 3d tangible tabletop interaction. In: Second Annual IEEE International Workshop on Horizontal Interactive Human-Computer Systems (TABLETOP’07), pp. 201–204, (2007)

  12. Rekimoto, J., and Saitoh, M.: Augmented surfaces: a spatially continuous work space for hybrid computing environments. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 378–385, (1999)

  13. Harrison, C., Benko, H., and Wilson, A. D.: OmniTouch: wearable multitouch interaction everywhere. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 441–450, (2011)

  14. Lucente, M., Zwart, G.-J., and George, A. D.: Visualization space: a testbed for deviceless multimodal user interface. In: Intelligent Environments Symposium, vol. 98, (1998)

  15. Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., and Feiner, S.: Mutual disambiguation of 3d multimodal interaction in augmented and virtual reality. In: Proceedings of the 5th International Conference on Multimodal interfaces, pp. 12–19, (2003)

  16. Cohen, P. R.: The role of natural language in a multimodal interface. In: Proceedings of the 5th Annual ACM Symposium on User Interface Software and Technology, pp. 143–149, (1992)

  17. Billinghurst, M.: Put that where? voice and gesture at the graphics interface. Acm Siggraph Comput. Graphics 32(4), 60–63 (1998)

    Article  Google Scholar 

  18. Bell, B., Feiner, S., and Höllerer, T.: View management for virtual and augmented reality. In: Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, pp. 101–110, (2001)

  19. Fender, A., Lindlbauer, D., Herholz, P., Alexa, M., and Müller, J.: Heatspace: automatic placement of displays by empirical analysis of user behavior. In: Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, pp. 611–621, (2017)

  20. Bell, B., Feiner, S., and Höllerer, T.: View management for virtual and augmented reality. In: Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, pp. 101–110, (2001)

  21. Tatzgern, M., Orso, V., Kalkofen, D., Jacucci, G., Gamberini, L., Schmalstieg, D.: Adaptive information density for augmented reality displays. In: IEEE Virtual Reality (VR) 2016, 83–92 (2016)

  22. Lindlbauer, D., Feit, A. M., and Hilliges, O.: Context-aware online adaptation of mixed reality interfaces. In: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, pp. 147–160, (2019)

  23. Ahuja, K., Pareddy, S., Xiao, R., Goel, M., and Harrison, C.: Lightanchors: appropriating point lights for spatially-anchored augmented reality interfaces. In: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, pp. 189–196, (2019)

  24. Grasset, R., Langlotz, T., Kalkofen, D., Tatzgern, M., Schmalstieg, D.: Image-driven view management for augmented reality browsers. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2012, 177–186 (2012)

  25. Gal, R., Shapira, L., Ofek, E., Kohli, P.: FLARE: fast layout for augmented reality applications. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2014, pp. 207–212 (2014)

  26. Nuernberger, B., Ofek, E., Benko, H., and Wilson, A. D.: Snaptoreality: aligning augmented reality to the real world. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 1233–1244, (2016)

  27. Du, R., Turner, E., Dzitsiuk, M., Prasso, L., Duarte, I., Dourgarian, J., Afonso, J., Pascoal, J., Gladstone, J., Cruces, N. et al.: Depthlab: real-time 3d interaction with depth maps for mobile augmented reality. In: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 829–843, (2020)

  28. Winograd, T.: Understanding natural language. Cogn. Psychol. 3(1), 1–191 (1972)

    Article  Google Scholar 

  29. Bolt, R. A.: Put-that-there voice and gesture at the graphics interface. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, pp. 262–270, (1980)

  30. Long, J., Shelhamer, E., and Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, (2015)

  31. Dai, J., He, K., and Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3150–3158, (2016)

  32. Li, Y., Qi, H., Dai, J., Ji, X., and Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2367, (2017)

  33. Barreira, J., Bessa, M., Barbosa, L., Magalhães, L.: A context-aware method for authentically simulating outdoors shadows for mobile augmented reality. IEEE Trans. Vis. Comput. Graph. 24(3), 1223–1231 (2017)

    Article  Google Scholar 

  34. Chen, L., Tang, W., John, N. W., Wan, T. R., and Zhang, J. J.: Context-aware mixed reality: a learning-based framework for semantic-level interaction. In: Computer Graphics Forum, vol. 39, pp. 484–496, (2020)

  35. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., and Davison, A.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568, (2011)

  36. Sappa, A. D., and Devy, M.: Fast range image segmentation by an edge detection strategy. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 292–299, (2001)

  37. Jagannathan, A., Miller, E.L.: Three-dimensional surface mesh segmentation using curvedness-based region growing approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2195–2204 (2007)

    Article  Google Scholar 

  38. Schnabel, R., Wahl, R., and Klein, R.: Efficient RANSAC for point-cloud shape detection. In: Computer Graphics Forum, vol. 26, pp. 214–226, (2007)

  39. Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., and Su, H.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015

  40. Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., and Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1754, (2017)

  41. Silberman, N., Hoiem, D., Kohli, P., and Fergus, R.: Indoor segmentation and support inference from rgbd images. In: European Conference on Computer Vision, pp. 746–760, (2012)

  42. Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., and Zhang, Y.: Matterport3d: learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158 (2017)

  43. Qi, C. R., Su, H., Mo, K., and Guibas, L. J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

  44. Qi, C. R., Yi, L., Su, H., and Guibas, L. J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

  45. Fayolle, P.-A., Pasko, A.: Segmentation of discrete point clouds using an extensible set of templates. Visual Comput. 29(5), 449–465 (2013)

    Article  Google Scholar 

  46. Mattausch, O., Panozzo, D., Mura, C., Sorkine-Hornung, O., and Pajarola, R.: Object detection and classification from large-scale cluttered indoor scans. In: Computer Graphics Forum, vol. 33, no. 2. Wiley Online Library, pp. 11–21 (2014)

  47. Sun, Y., Miao, Y., Chen, J., Pajarola, R.: Pgcnet: patch graph convolutional network for point cloud segmentation of indoor scenes. Visual Comput. 36(10), 2407–2418 (2020)

    Article  Google Scholar 

  48. Jones, B., Sodhi, R., Murdock, M., Mehra, R., Benko, H., Wilson, A., Ofek, E., MacIntyre, B., Raghuvanshi, N., and Shapira, L.: RoomAlive: magical experiences enabled by scalable, adaptive projector-camera units. In: Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, pp. 637–644 (2014)

  49. Besl, P. J., and McKay, N. D.: Method for registration of 3-D shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606 (1992)

  50. Armeni, I., Sax, S., Zamir, A. R., and Savarese, S.: Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)

  51. Armeni, I., Sener, O., Zamir, A. R., Jiang, H., Brilakis, I., Fischer, M., and Savarese, S.: 3d semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1534–1543 (2016)

  52. Bashir, U., Abbas, M., Ali, J.M.: The g2 and c2 rational quadratic trigonometric bézier curve with two shape parameters with applications. Appl. Math. Comput. 219(20), 183–197 (2013)

    Google Scholar 

  53. Maqsood, S., Abbas, M., Miura, K.T., Majeed, A., Iqbal, A.: Geometric modeling and applications of generalized blended trigonometric bézier curves with shape parameters. Adv. Diff. Equ. 2020(1), 1–18 (2020)

    Article  MATH  Google Scholar 

  54. BiBi, S., Abbas, M., Misro, M.Y., Hu, G.: A novel approach of hybrid trigonometric bézier curve to the modeling of symmetric revolutionary curves and symmetric rotation surfaces. IEEE Access 7, 779–792 (2019)

  55. Liao, C.-W., Huang, J.S.: Stroke segmentation by bernstein-bezier curve fitting. Pattern Recogn. 23(5), 475–484 (1990)

    Article  Google Scholar 

  56. Rusu, R.B., Marton, Z.C., Blodow, N., Dolha, M., Beetz, M.: Towards 3d point cloud based object maps for household environments. Robot. Autonomous Syst. 56(11), 927–941 (2008)

    Article  Google Scholar 

  57. Berkmann, J., Caelli, T.: Computation of surface geometry and segmentation using covariance techniques. IEEE Trans. Pattern Anal. Mach. Intell. 16(11), 1114–1116 (1994)

    Article  Google Scholar 

  58. Pauly, M., Gross, M., Kobbelt, L.P.: Efficient simplification of point-sampled surfaces. In: IEEE Visualization, VIS 2002, 163–170 (2002)

  59. Katz, S., Tal, A., and Basri, R.: Direct visibility of point sets. pp. 24–es (2007)

  60. Chen, Q., Zhuo, Z., and Wang, W.: Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909 (2019)

  61. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  62. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

  63. Li, C., Li, L., and Qi, J.: A self-attentive model with gate mechanism for spoken language understanding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3824–3833 (2018)

  64. Zhou, J., and Xu, W.: End-to-end learning of semantic role labeling using recurrent neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol 1: Long Papers), pp. 1127–1137 (2015)

  65. Pennington, J., Socher, R., and Manning, C. D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

  66. Zhou, Q.-Y., Park, J., and Koltun, V.: Open3D: a modern library for 3D data processing. arXiv:1801.09847 (2018)

  67. Scharstein, D., and Szeliski, R.: High-accuracy stereo depth maps using structured light. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Proceedings., vol. 1. IEEE, pp. I–I (2003)

  68. Besl, P. J., and McKay, N. D.: Method for registration of 3-d shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611. International Society for Optics and Photonics, pp. 586–606 (1992)

Download references

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingyang Liu.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflicts of interest.

Data availability

Datasets for this research are openly available at locations cited in the reference section [50, 51].

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Li, Y. & Goel, M. A semantic-based approach to digital content placement for immersive environments. Vis Comput 39, 5989–6003 (2023). https://doi.org/10.1007/s00371-022-02707-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02707-8

Keywords

Navigation