Large-scale, real-time 3D scene reconstruction on a mobile device
- 1.3k Downloads
Google’s Project Tango has made integrated depth sensing and onboard visual-intertial odometry available to mobile devices such as phones and tablets. In this work, we explore the problem of large-scale, real-time 3D reconstruction on a mobile devices of this type. Solving this problem is a necessary prerequisite for many indoor applications, including navigation, augmented reality and building scanning. The main challenges include dealing with noisy and low-frequency depth data and managing limited computational and memory resources. State of the art approaches in large-scale dense reconstruction require large amounts of memory and high-performance GPU computing. Other existing 3D reconstruction approaches on mobile devices either only build a sparse reconstruction, offload their computation to other devices, or require long post-processing to extract the geometric mesh. In contrast, we can reconstruct and render a global mesh on the fly, using only the mobile device’s CPU, in very large (300 m\(^2\)) scenes, at a resolutions of 2–3 cm. To achieve this, we divide the scene into spatial volumes indexed by a hash map. Each volume contains the truncated signed distance function for that area of space, as well as the mesh segment derived from the distance function. This approach allows us to focus computational and memory resources only in areas of the scene which are currently observed, as well as leverage parallelization techniques for multi-core processing. Furthermore, we describe an on-device post-processing method for fusing datasets from multiple, independent trials, in order to improve the quality and coverage of the reconstruction. We discuss how the particularities of the devices impact our algorithm and implementation decisions. Finally, we provide both qualitative and quantitative results on publicly available RGB-D datasets, and on datasets collected in real-time from two devices.
Keywords3D reconstruction Mobile technology SLAM Computer vision Mapping Pose estimation
This work was done with the support of Googles Advanced Technologies and Projects division (ATAP) for Project Tango. The authors thank to Johnny Lee, Joel Hesch, Esha Nerurkar, Simon Lynen, Ryan Hickman and other ATAP members for their close collaboration and support on this project.
Supplementary material 1 (mp4 215717 KB)
- Amanatides, J., & Woo, A. (1987). A fast voxel traversal algorithm for ray tracing. Eurographics, 87, 3–10.Google Scholar
- Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers D. (2013). Real-time camera tracking and 3D reconstruction using signed distance functions. In Robotics: Science and systems (RSS) conference 2013.Google Scholar
- Chen, Y., & Medioni, G. (1991, April). Object modeling by registration of multiple range images. In Proceedings., 1991 IEEE international conference on robotics and automation (Vol. 3, pp. 2724 –2729).Google Scholar
- Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In SIGGRAPH 96 conference proceedings (pp. 303–312). ACM.Google Scholar
- Engel, J., Schöps, T., & Cremers, D. (2014, September). LSD-SLAM: Large-scale direct monocular SLAM. In European conference on computer vision (ECCV).Google Scholar
- Garland, M., & Heckbert, P. S. (1997). Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on computer graphics and interactive techniques (pp. 209–216). ACM Press/Addison-Wesley Publishing Co.Google Scholar
- Google. Project Tango (2014). https://www.google.com/atap/projecttango.
- Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2015). Scenenet: Understanding real world indoor scenes with synthetic data. In CoRR. arXiv:1511.07041.
- Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality, ISMAR.Google Scholar
- Klingensmith, M., Dryanovski, I., Srinivasa, S., & Xiao, J. (2015, July). Chisel: Real time large scale 3d reconstruction onboard a mobile device using spatially hashed signed distance fields. In Proceedings of robotics: Science and systems, Rome.Google Scholar
- Klingensmith, M., Herrmann, M., & Srinivasa, S. S. (2014). Object modeling and recognition from sparse: Noisy data via voxel depth carving. In ISER, number d.Google Scholar
- Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In SIGGRAPH 1987, (Vol. 21 pp. 163–169). ACM.Google Scholar
- Lynen, S., Bosse, M., Furgale, P., & Siegwart, R. (2014). Placeless place-recognition. In 2nd international conference on 3D vision (3DV) Google Scholar
- Microsoft. Kinect for Windows. http://www.microsoft.com/en-us/kinectforwindows/.
- Mourikis, A. I., & Roumeliotis, S. I. (2007). A multi-state constraint Kalman filter for vision-aided inertial navigation. In 2007 IEEE international conference on robotics and automation.Google Scholar
- Nerurkar, E. D., Wu, K. J., & Roumeliotis, S. I. (2014). C-KLAM: Constrained keyframe-based localization and mapping. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 3638–3643).Google Scholar
- Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., & Davison, A. J. Pushmeet K., Jamie S., Steve H., & Andrew F. (2011) KinectFusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality, ISMAR 2011 (pp. 127–136).Google Scholar
- Newcombe, R. A., Lovegrove, S. J., & Davison, A. J. (2011). DTAM: Dense tracking and mapping in real-time. 2011 IEEE international conference on computer vision (ICCV).Google Scholar
- Nguyen, C. V., Izadi, S., & Lovell, D. (2012). Modeling kinect sensor noise for improved 3D reconstruction and tracking. In Proceedings—2nd joint 3DIM/3DPVT conference: 3D imaging, modeling, processing, visualization and transmission, 3DIMPVT 2012 (pp. 524–530).Google Scholar
- Nieß ner, M., Zollhöfer, M., Izadi, S., & Stamminger, M. (2013). Real-time 3D reconstruction at scale using voxel hashing. In ACM transactions on graphics (TOG).Google Scholar
- Rusinkiewicz, S., Hall-Holt, O., & Levoy, M. (2002). Real-time 3D model acquisition. In ACM transactions on graphics (Vol. 21, pp. 438–446). ACMGoogle Scholar
- Scherzer, D., Wimmer, M., & Purgathofer, W. (2011). A survey of real-time hard shadow mapping methods. In Computer graphics forum (Vol. 30, pp. 169–186). Wiley Online Library.Google Scholar
- Schöps, T., Sattler, T., Häne, C., & Pollefeys, M. (2015). 3D modeling on the go: Interactive 3D reconstruction of large-scale scenes on mobile devices. In International conference on 3D vision (3DV).Google Scholar
- Structure Sensor. http://structure.io/
- Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In IEEE international conference on intelligent robots and systems (pp. 573–580).Google Scholar
- Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., & Pollefeys, M. (2013). Live metric 3D reconstruction on mobile phones. In 2013 IEEE international conference on computer vision (pp. 65–72).Google Scholar
- Teschner, M., Hiedelberger, B., Müller, M., Pomeranets, D., & Gross, M. (2003). 2003. In: Vmv: Optimized spatial hashing for collision detection of deformable objects.Google Scholar
- Weise, T., Leibe, B., & Van Gool, L. (2008). Accurate and robust registration for in-hand modeling. In 26th IEEE conference on computer vision and pattern recognition, CVPR (pp. 1–8).Google Scholar
- Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B., & Davison, A. J. (2015, July). ElasticFusion: Dense SLAM without a pose graph. In Robotics: Science and systems (RSS), Rome.Google Scholar
- Whelan, T., Johannsson, H., Kaess, M., Leonard, J. J., & McDonald, J. (2013). Robust real-time visual odometry for dense RGB-D mapping. In 2013 IEEE international conference on robotics and automation (ICRA).Google Scholar
- Whelan, T., & Kaess, M. (2013, November). Deformation-based loop closure for large scale dense RGB-D SLAM. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS), Tokyo.Google Scholar
- Wurm, K. M., Hornung, A., Bennewitz, M., Stachniss, C., & Burgard, W. (2010). OctoMap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In Proceedings of the ICRA 2010 workshop on best practice in 3D perception and modeling for mobile manipulation.Google Scholar
- Zeng, M., Zhao, F., Zheng, J., & Liu, X. (2013). Octree-based fusion for realtime 3D reconstruction. Graphical Models, 75(3), 126–136.Google Scholar