Skip to main content

Advanced Scene Sensing for Virtual Teleconference

  • 311 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 1527)


Over the last year the need for video conferences has risen significantly due to the ongoing global pandemic. The goal of this project is to improve user experience from having access to only voice and plain 2D image by adding a third spatial dimension, creating a more immersive setting. Azure Kinect Development Kit utilizes multiple cameras, namely the RGB camera and depth camera. Depth camera is based on ToF principle, which uses near-IR to cast modulated illumination onto the scene. The setup uses multiple Azure Kinect devices in sync and offset in space to obtain non-static 3D capture of a person. Unity engine with Azure Kinect SDK is used to process the data gathered by all devices. Firstly, a depth spatial map is created by combining overlaid outputs from each device. Secondly, RGB pixels are mapped onto depth spatial points to provide a final texture to the 3D model. Taking into account the need to export a continuous capture of raw data to a server, body tracking and image processing algorithms are used. Finally, the processed data can be exported and utilized in AR, VR or any other 3D capable interface. This 3D projection aims to enhance sensory experience by utilising non-verbal communication along with classical speech in video conferences.


This is a preview of subscription content, log in via an institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. Lapakko, D.: Communication is 93% nonverbal: an urban legend proliferates. Commun. Theater Assoc. Minnesota J. 34(1), 2 (2007)

    Google Scholar 

  2. BIT, T., et al.: A study of learning experience with a dash-based multimedia delivery system. In: EDULEARN 2018: 10th International Conference on Education and New Learning Technologies, EDULEARN Proceedings, Palma, Spain, pp. 8590–8595, 02–04 July 2018. ISSN 2340-1117, ISBN 978-84-09-02709-5, WOS:000531474303006

    Google Scholar 

  3. Vančo, M., Minárik, I., Rozinaj, G.: Gesture identification for system navigation in 3D scene. In: 54th ELMAR International Symposium ELMAR-2012, ELMAR Proceedings, Zadar, Croatia, pp. 45–48, 12–14 Sept 2012. ISSN 1334-2630, ISBN 978-953-7044-13-8, WOS:000399723300010

    Google Scholar 

  4. Polakovič, A., Vargic, R., Rozinaj, G.: Adaptive multimedia content delivery in 5G networks using DASH and saliency information. In: 25th International Conference on Systems, Signals and Image Processing (IWSSIP), Maribor, Slovenia, 20–22 June 2018. ISSN 2157-8672, ISBN 978-1-5386-6979-2, WOS: 000451277200008,

  5. Sarbolandi, H., Lefloch, D., Kolb, A.: Kinect range sensing: structured-light versus time-of-flight kinect. Comput. Vis. Image Underst. 139, 1–20 (2015)

    Google Scholar 

  6. The basics of stereo depth vision - IntelRealSense™Depth and Tracking Cameras (2021). IntelRealSense™Depth and Tracking Cameras [online]

    Google Scholar 

  7. Introducing the IntelRealSense™Depth Camera D455 (2021). IntelRealSense™Depth and Tracking Cameras [online]

    Google Scholar 

  8. Zennaro, S., et al.: Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2015)

    Google Scholar 

  9. Albert, J.A., et al.: Evaluation of the pose tracking performance of the Azure Kinect and Kinect v2 for gait analysis in comparison with a gold standard: a pilot study. Sensors 20(18), 5104 (2020)

    Google Scholar 

  10. Azure Kinect DK depth camera (2021). [online]

    Google Scholar 

  11. Wasenmüller, O., Stricker, D.: Comparison of kinect V1 and V2 depth images in terms of accuracy and precision. In: Chen, C.-S., Lu, J., Ma, K.-K. (eds.) ACCV 2016. LNCS, vol. 10117, pp. 34–45. Springer, Cham (2017).

    Chapter  Google Scholar 

  12. Wen Yu, C.: Stereo-Camera Occupancy Grid Mapping [online]. The Pennsylvania State University (2020). Master thesis. The Pennsylvania State University, Aerospace Engineering. Thesis Advisor: Eric Norman Johnson. [cit. 2021-5-6]. Link:

  13. Gutta, V., et al.: A comparison of depth sensors for 3D object surface reconstruction. In: CMBES Proceedings, vol. 42 (2019)

    Google Scholar 

  14. Tölgyessy, M., et al.: Evaluation of the Azure kinect and its comparison to kinect v1 and kinect v2. Sensors 21(2), 413 (2021)

    Google Scholar 

  15. Haas, J.: A history of the unity game engine. Dissertation, WORCESTER POLYTECHNIC INSTITUTE (2014)

    Google Scholar 

  16. UNITY TECHNOLOGIES: Unity - Unity (2021). Accessed 8 May 2021

  17. Shi, S.: Emgu CV Essentials. Packt Publishing Ltd., Birmingham (2013)

    Google Scholar 

Download references


Research described in the paper was financially supported by the 2020-1-CZ01-KA226-VET-094346 DiT4LL ERASMUS+ Innovation Project, MonEd - Modern Trends and New Technologies of Online Education in ICT Study Programs in European Educational Space (KEGA 015STU-4/2021), and the Excellent creative team project VirTel.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ivan Minárik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Minárik, I., Vančo, M., Rozinaj, G. (2022). Advanced Scene Sensing for Virtual Teleconference. In: Rozinaj, G., Vargic, R. (eds) Systems, Signals and Image Processing. IWSSIP 2021. Communications in Computer and Information Science, vol 1527. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96877-9

  • Online ISBN: 978-3-030-96878-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics