Abstract
Over the last year the need for video conferences has risen significantly due to the ongoing global pandemic. The goal of this project is to improve user experience from having access to only voice and plain 2D image by adding a third spatial dimension, creating a more immersive setting. Azure Kinect Development Kit utilizes multiple cameras, namely the RGB camera and depth camera. Depth camera is based on ToF principle, which uses near-IR to cast modulated illumination onto the scene. The setup uses multiple Azure Kinect devices in sync and offset in space to obtain non-static 3D capture of a person. Unity engine with Azure Kinect SDK is used to process the data gathered by all devices. Firstly, a depth spatial map is created by combining overlaid outputs from each device. Secondly, RGB pixels are mapped onto depth spatial points to provide a final texture to the 3D model. Taking into account the need to export a continuous capture of raw data to a server, body tracking and image processing algorithms are used. Finally, the processed data can be exported and utilized in AR, VR or any other 3D capable interface. This 3D projection aims to enhance sensory experience by utilising non-verbal communication along with classical speech in video conferences.
Keywords
- 3D sensing
- Volumetric alignment
- Point cloud
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Lapakko, D.: Communication is 93% nonverbal: an urban legend proliferates. Commun. Theater Assoc. Minnesota J. 34(1), 2 (2007)
BIT, T., et al.: A study of learning experience with a dash-based multimedia delivery system. In: EDULEARN 2018: 10th International Conference on Education and New Learning Technologies, EDULEARN Proceedings, Palma, Spain, pp. 8590–8595, 02–04 July 2018. ISSN 2340-1117, ISBN 978-84-09-02709-5, WOS:000531474303006
Vančo, M., Minárik, I., Rozinaj, G.: Gesture identification for system navigation in 3D scene. In: 54th ELMAR International Symposium ELMAR-2012, ELMAR Proceedings, Zadar, Croatia, pp. 45–48, 12–14 Sept 2012. ISSN 1334-2630, ISBN 978-953-7044-13-8, WOS:000399723300010
Polakovič, A., Vargic, R., Rozinaj, G.: Adaptive multimedia content delivery in 5G networks using DASH and saliency information. In: 25th International Conference on Systems, Signals and Image Processing (IWSSIP), Maribor, Slovenia, 20–22 June 2018. ISSN 2157-8672, ISBN 978-1-5386-6979-2, WOS: 000451277200008, https://doi.org/10.1109/IWSSIP.2018.8439215
Sarbolandi, H., Lefloch, D., Kolb, A.: Kinect range sensing: structured-light versus time-of-flight kinect. Comput. Vis. Image Underst. 139, 1–20 (2015)
The basics of stereo depth vision - IntelRealSense™Depth and Tracking Cameras (2021). IntelRealSense™Depth and Tracking Cameras [online]
Introducing the IntelRealSense™Depth Camera D455 (2021). IntelRealSense™Depth and Tracking Cameras [online]
Zennaro, S., et al.: Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2015)
Albert, J.A., et al.: Evaluation of the pose tracking performance of the Azure Kinect and Kinect v2 for gait analysis in comparison with a gold standard: a pilot study. Sensors 20(18), 5104 (2020)
Azure Kinect DK depth camera (2021). Docs.microsoft.com [online]
Wasenmüller, O., Stricker, D.: Comparison of kinect V1 and V2 depth images in terms of accuracy and precision. In: Chen, C.-S., Lu, J., Ma, K.-K. (eds.) ACCV 2016. LNCS, vol. 10117, pp. 34–45. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54427-4_3
Wen Yu, C.: Stereo-Camera Occupancy Grid Mapping [online]. The Pennsylvania State University (2020). Master thesis. The Pennsylvania State University, Aerospace Engineering. Thesis Advisor: Eric Norman Johnson. [cit. 2021-5-6]. Link: https://etda.libraries.psu.edu/catalog/18031wuc188
Gutta, V., et al.: A comparison of depth sensors for 3D object surface reconstruction. In: CMBES Proceedings, vol. 42 (2019)
Tölgyessy, M., et al.: Evaluation of the Azure kinect and its comparison to kinect v1 and kinect v2. Sensors 21(2), 413 (2021)
Haas, J.: A history of the unity game engine. Dissertation, WORCESTER POLYTECHNIC INSTITUTE (2014)
UNITY TECHNOLOGIES: Unity - Unity (2021). https://unity.com/. Accessed 8 May 2021
Shi, S.: Emgu CV Essentials. Packt Publishing Ltd., Birmingham (2013)
Acknowledgement
Research described in the paper was financially supported by the 2020-1-CZ01-KA226-VET-094346 DiT4LL ERASMUS+ Innovation Project, MonEd - Modern Trends and New Technologies of Online Education in ICT Study Programs in European Educational Space (KEGA 015STU-4/2021), and the Excellent creative team project VirTel.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Minárik, I., Vančo, M., Rozinaj, G. (2022). Advanced Scene Sensing for Virtual Teleconference. In: Rozinaj, G., Vargic, R. (eds) Systems, Signals and Image Processing. IWSSIP 2021. Communications in Computer and Information Science, vol 1527. Springer, Cham. https://doi.org/10.1007/978-3-030-96878-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-96878-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96877-9
Online ISBN: 978-3-030-96878-6
eBook Packages: Computer ScienceComputer Science (R0)