Automatic object annotation in streamed and remotely explored large 3D reconstructions

Höller, Benjamin; Mossel, Annette; Kaufmann, Hannes

doi:10.1007/s41095-020-0194-4

Automatic object annotation in streamed and remotely explored large 3D reconstructions

Research Article
Open access
Published: 07 January 2021

Volume 7, pages 71–86, (2021)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Automatic object annotation in streamed and remotely explored large 3D reconstructions

Download PDF

Benjamin Höller¹,
Annette Mossel¹ &
Hannes Kaufmann¹

602 Accesses
6 Citations
Explore all metrics

Abstract

We introduce a novel framework for 3D scene reconstruction with simultaneous object annotation, using a pre-trained 2D convolutional neural network (CNN), incremental data streaming, and remote exploration, with a virtual reality setup. It enables versatile integration of any 2D box detection or segmentation network. We integrate new approaches to (i) asynchronously perform dense 3D-reconstruction and object annotation at interactive frame rates, (ii) efficiently optimize CNN results in terms of object prediction and spatial accuracy, and (iii) generate computationally-efficient colliders in large triangulated 3D-reconstructions at run-time for 3D scene interaction. Our method is novel in combining CNNs with long and varying inference time with live 3D-reconstruction from RGB-D camera input. We further propose a lightweight data structure to store the 3D-reconstruction data and object annotations to enable fast incremental data transmission for real-time exploration with a remote client, which has not been presented before. Our framework achieves update rates of 22 fps (SSD Mobile Net) and 19 fps (Mask RCNN) for indoor environments up to 800 m³. We evaluated the accuracy of 3D-object detection. Our work provides a versatile foundation for semantic scene understanding of large streamed 3D-reconstructions, while being independent from the CNN’s processing time. Source code is available for non-commercial use.

Article PDF

On-Line Large Scale Semantic Fusion

SimpleRecon: 3D Reconstruction Without 3D Convolutions

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Mossel, A.; Kroeter, M. Streaming and exploration of dynamically changing dense 3D reconstructions in immersive virtual reality. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 43–48, 2016.
Ruddle, R. A.; Lessels, S. The benefits of using a walking interface to navigate virtual environments. ACM Transactions on Computer-Human Interaction Vol. 16, No. 1, Article No. 5, 2009.
Google Scholar
Sünderhauf, N.; Pham, T. T.; Latif, Y.; Milford M.; Reid, I. Meaningful maps with object-oriented semantic mapping. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 5079–5085, 2017.
Kammerl, J.; Blodow, N.; Rusu, R. B.; Gedikli, S.; Beetz, M.; Steinbach, E. Real-time compression of point cloud streams. In: Proceedings of the IEEE International Conference on Robotics and Automation, 778–785, 2012.
Golla, T.; Klein, R. Real-time point cloud compression. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 5087–5092, 2015.
Morell, V.; Orts, S.; Cazorla, M.; Garcia-Rodriguez, J. Geometric 3D point cloud compression. Pattern Recognition Letters Vol. 50, 55–62, 2014.
Article Google Scholar
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 580–587, 2014.
Girshick, R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440–1448, 2015.
Ren, S.; He, K.; Girshick, R.; Sun, J.; Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137–1149, 2017.
Article Google Scholar
Redmon, J.; Farhadi, A. Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, 7263–7271, 2017.
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C. Y.; Berg, A. C. SSD: Single shot MultiBox detector. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 21–37, 2016.
Google Scholar
He, K.; Gkioxari, G.; Dollfiar, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961–2969, 2017.
McCormac, J.; Handa, A.; Davison, A.; Leutenegger, S. Semanticfusion: Dense 3D semantic mapping with convolutional neural networks. In: Proceedings of the IEEE International Conference on Robotics and automation, 4628–4635, 2017.
Whelan, T.; Leutenegger, S.; Salas Moreno, R.; Glocker, B.; Davison, A. ElasticFusion: Dense SLAM without a pose graph. In: Proceedings of the Robotics: Science and Systems, 2015.
Runz, M.; Bufier, M.; Agapito, L. MaskFusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 10–20, 2018.
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2980–2988, 2017.
Nakajima, Y.; Saito, H. Efficient object-oriented semantic mapping with object detector. IEEE Access Vol. 7, 3206–3213, 2019.
Article Google Scholar
Prisacariu, V. A.; Kähler, O.; Golodetz, S.; Sapienza, M.; Cavallari, T.; Torr, P. H.; Murray, D. W. InfiniTAM v3: A framework for large-scale 3D reconstruction with loop closure. arXiv preprint arXiv:1708.00783, 2017.
Tateno, K.; Tombari, F.; Navab, N. Realtime and scalable incremental segmentation on dense slam. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 4465–4472, 2015.
Wang, L.; Li, R.; Sun, J.; Liu, X.; Zhao, L.; Seah, H. S.; Quah, C. K.; Tandianus, B. Multi-view fusion-based 3D object detection for robot indoor scene perception. Sensors Vol. 19, No. 19, 4092, 2019.
Article Google Scholar
Hou, J.; Dai, A.; Niefiner, M. 3D-sis: 3D semantic instance segmentation of RGB-D scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4421–4430, 2019.
Prisacariu, V. A.; Kahler, O.; Cheng, M. M.; Ren, C. Y.; Valentin, J.; Torr, P. H.; Reid, I. D.; Murray, D. W. A framework for the volumetric integration of depth images. arXiv preprint arXiv:1410.0925, 2014.
Nießner, M.; Zollhöfer, M.; Izadi, S.; Stamminger, M. Real-time 3D reconstruction at scale using voxel hashing. ACM Transactions on Graphics Vol. 32, No. 6, Article No. 169, 2013.
Google Scholar
Nickolls, J.; Buck, I.; Garland, M.; Skadron, K. Scalable parallel programming with CUDA. Queue Vol. 6, No. 2, 40–53, 2008.
Article Google Scholar
Newcombe, R. A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A. J.; Kohi, P.; Shotton, J.; Hodges, S.; Fitzgibbon, A. KinectFusion: Real-time dense surface mapping and tracking. In: Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality, 127–136, 2011.
Deutsch, P. DEFLATE Compressed Data Format Specification version 1.3. RFC 1951. DOI: 10.17487/RFC1951. 1996.
Lorensen, W. E.; Cline, H. E. Marching cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Computer Graphics Vol. 21, No. 4, 163–169, 1987.
Article Google Scholar
Kahler, O.; Adrian Prisacariu, V.; Yuheng Ren, C.; Sun, X.; Torr, P., Murray, D. Very high frame rate volumetric integration of depth images on mobile devices. IEEE Transactions on Visualization and Computer Graphics Vol. 21, No. 11, 1241–1250, 2015.
Article Google Scholar
Howard, A. G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C. Y.; Berg, A. C. SSD mobilenet v2 COCO 2018 03 29. Available at https://github.com/opencv/opencvfiextra/blob/master/testdata/dnn/ssdfimobilenetfiv2ficocofi2018fi03fi29.pbtxt.
Abdulla, W. Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. 2017. Avaiblable at https://github.com/matterport/Mask_RCNN.
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollfiar, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.
Chapter Google Scholar
Zhou, Q. Y.; Koltun, V. Dense scene reconstruction with points of interest. ACM Transactions on Graphics Vol. 32, No. 4, Article No. 112, 2013.
Google Scholar
Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from RGBD images. In: Computer Vision-ECCV 2012. Lecture Notes in Computer Science, Vol. 7576. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 746–760, 2012.
Chapter Google Scholar

Download references

Acknowledgements

The work was solely supported by Vienna University of Technology.

Author information

Authors and Affiliations

Institute of Visual Computing and Human-Centered Technology, Vienna University of Technology, Favoritenstraße 9-11/193/06, A-1040, Vienna, Austria
Benjamin Höller, Annette Mossel & Hannes Kaufmann

Authors

Benjamin Höller
View author publications
You can also search for this author in PubMed Google Scholar
Annette Mossel
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Kaufmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hannes Kaufmann.

Additional information

Benjamin Höller is a postgraduate student with the Interactive Media System Group at the Institute of Visual Computing and Human-Centered Technology at Vienna University of Technology, where he received his M.Sc. degree with distinction in 2019. His research interests lie at the intersection of virtual and augmented reality, and 3D computer vision, with a strong focus on machine learning.

Annette Mossel is a post-doctoral researcher at the Institute of Visual Computing and Human-Centered Technology at Vienna University of Technology, and a scientific entrepreneur. She received her Ph.D. degree in 2014 from Vienna University of Technology. During her studies, she worked as a visiting researcher at the Fraunhofer Institute for Computer Graphics and the MIT Media Lab. She has 12 years of experience in mixed reality with strong expertise in vision-based self-localization, dense 3D mapping, and 3D human computer interaction (HCI). She has authored or co-authored more than 20 scientific publications and participated and lead multiple nationally funded scientific projects on wide-area indoor localization, multi-user VR, and dense 3D surface reconstruction.

Hannes Kaufmann is full professor of virtual and augmented reality at the Institute of Visual Computing & Human-Centered Technology at TU Wien. He has conducted research into virtual reality, tracking, mobile augmented reality, training spatial abilities in AR/VR, tangible interaction, medical VR/AR applications, real time ray-tracing, redirected walking, geometry, and educational mathematics software. His habilitation (2010) was on “applications of mixed reality” with a major focus on educational mixed reality applications. He has acted on behalf of the European Commission as a project reviewer, participated in EU projects in FP5, FP7, and Horizon2020, managed over 30 research projects and published more than 100 scientific papers.

Electronic Supplementary Material

Supplementary material, approximately 38.7 MB.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Höller, B., Mossel, A. & Kaufmann, H. Automatic object annotation in streamed and remotely explored large 3D reconstructions. Comp. Visual Media 7, 71–86 (2021). https://doi.org/10.1007/s41095-020-0194-4

Download citation

Received: 31 August 2020
Accepted: 06 September 2020
Published: 07 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s41095-020-0194-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automatic object annotation in streamed and remotely explored large 3D reconstructions

Abstract

Article PDF

Similar content being viewed by others

On-Line Large Scale Semantic Fusion

SimpleRecon: 3D Reconstruction Without 3D Convolutions

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

Supplementary material, approximately 38.7 MB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic object annotation in streamed and remotely explored large 3D reconstructions

Abstract

Article PDF

Similar content being viewed by others

On-Line Large Scale Semantic Fusion

SimpleRecon: 3D Reconstruction Without 3D Convolutions

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

Supplementary material, approximately 38.7 MB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation