Skip to main content


In the context of optimization and cycles reduction for product design in industry, digital collaborative tools have a major impact, allowing an early stage integration of multidisciplinary challenges and oftentimes the search of global optimum rather than domain specific improvements. This paper presents a methodology for improving participants’ implication and performance during collaborative design sessions through virtual reality (VR) tools, thanks to intention detection through body language interpretation. A prototype of the methodology is being implemented based on an existing VR aided design tool called DragonFly developed by Airbus. In what follows we will first discuss the choice of the different biological inputs for our purpose, and how to merge these multimodal inputs a meaningful way. Thus, we obtain a rich representation of the body language expression, suitable to recognize the actions wanted by the user and their related parameters. We will then show that this solution has been designed for fast training thanks to a majority of unsupervised training and existing pre-trained models, and for fast evolution thanks to the modularity of the architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    The frequencies of gestures’ features and speeches’ ones are uncorrelated and thus may be completely different (see Shared Representation.).


  1. Gaschler, A., Jentzsch, S., Giuliani, M., Huth, K., de Ruiter, J., Knoll, A.: Social behavior recognition using body posture and head pose for human-robot interaction. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2128–2133. IEEE, Vilamoura-Algarve, Portugal (2012)

    Google Scholar 

  2. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: 28th International Conference on Machine Learning (ICML), pp. 689–696 (2011)

    Google Scholar 

  3. Khan, S., Tunçer, B.: Gesture and speech elicitation for 3D CAD modeling in conceptual design. Autom. Constr. 106, 102847 (2019)

    Article  Google Scholar 

  4. Vuletic, T., Duffy, A., Hay, L., McTeague, C., Campbell, G., Grealy, M.: Systematic literature review of hand gestures used in human computer interaction interfaces. Int. J. Hum. Comput. Stud. 129, 74–94 (2019)

    Article  Google Scholar 

  5. Laviola, J.J.R., Zeleznik, R.C.: Flex and pinch: a case study of whole hand input design for virtual environment interaction. In: Second IASTED International Conference on Computer Graphics and Imaging, Innsbruck, Austria, pp. 221–225 (1999)

    Google Scholar 

  6. Ghojogh, B., Karray, F., Crowley, M.: Hidden markov model: tutorial. engrXiv (2019)

    Google Scholar 

  7. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: 18th International Conference on Machine Learning (ICML), pp. 282–289 (2001)

    Google Scholar 

  8. Bevilacqua, F., Zamborlin, B., Sypniewski, A., Schnell, N., Guédy, F., Rasamimanana, N.: Continuous real-time gesture following and recognition. In: Kopp, S., Wachsmuth, I. (eds.) Gesture in Embodied Communication and Human-Computer Interaction, pp. 73–84. Springer, Berlin Heidelberg, Berlin, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)

    Article  Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv:1409.1259. (2014)

  12. Gradient flow in recurrent nets: the difficulty of learning long term dependencies. In: A Field Guide to Dynamical Recurrent Networks. IEEE (2001)

    Google Scholar 

  13. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  14. Kramer, M.A.: Nonlinear principal component analysis using auto associative neural networks. AIChE J. 37(2), 233–243 (1991)

    Article  Google Scholar 

  15. Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12868–12878. IEEE, Nashville, TN, USA (2021)

    Google Scholar 

  16. Li, W., et al.: UNIMO: towards unified-modal understanding and generation via cross-modal contrastive learning. In: Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (2021)

    Google Scholar 

  17. Radford, A., et al.: Learning transferable visual models from natural language supervision. arXiv:2103.00020. (2021)

  18. Liu, A.H., Jin, S., Lai, C.I.J., Rouditchenko, A., Oliva, A., Glass, J.: Cross-modal discrete representation learning. arXiv:2106.05438. (2021)

  19. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. (2019)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Romain Guillaume .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guillaume, R., Pailhès, J., Gruhier, E., Laville, X., Baudin, Y., Lou, R. (2023). Intent Detection for Virtual Reality Architectural Design. In: Noël, F., Nyffenegger, F., Rivest, L., Bouras, A. (eds) Product Lifecycle Management. PLM in Transition Times: The Place of Humans and Transformative Technologies. PLM 2022. IFIP Advances in Information and Communication Technology, vol 667. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25181-8

  • Online ISBN: 978-3-031-25182-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics