Advertisement

EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time

  • Henri Rebecq
  • Guillermo Gallego
  • Elias Mueggler
  • Davide Scaramuzza
Article
  • 510 Downloads

Abstract

Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the output is composed of a sequence of asynchronous events rather than actual intensity images, traditional vision algorithms cannot be applied, so that a paradigm shift is needed. We introduce the problem of event-based multi-view stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (1) its ability to respond to scene edges—which naturally provide semi-dense geometric information without any pre-processing operation—and (2) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps, without requiring any explicit data association or intensity estimation. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a CPU.

Keywords

Multi-view stereo Event cameras Event-based vision 3D reconstruction 

Notes

Acknowledgements

This research was funded by the DARPA FLA Program, the National Center of Competence in Research (NCCR) Robotics through the Swiss National Science Foundation and the SNSF-ERC Starting Grant.

Supplementary material

Open image in new window Open image in new window

Supplementary material 1 (mp4 23797 KB)

References

  1. Bardow, P., Davison, A. J., & Leutenegger, S. (2016). Simultaneous optical flow and intensity estimation from an event camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.  https://doi.org/10.1109/CVPR.2016.102.
  2. Benosman, R., Clercq, C., Lagorce, X., Ieng, S.-H., & Bartolozzi, C. (2014). Event-based visual flow. IEEE Transactions on Neural Networks and Learning Systems, 25(2), 407–417.  https://doi.org/10.1109/TNNLS.2013.2273537.CrossRefGoogle Scholar
  3. Benosman, R., Ieng, S.-H., Clercq, C., Bartolozzi, C., & Srinivasan, M. (2012). Asynchronous frameless event-based optical flow. Neural Networks, 27, 32–37.  https://doi.org/10.1016/j.neunet.2011.11.001.CrossRefGoogle Scholar
  4. Brandli, C., Muller, L., & Delbruck, T. (2014a) Real-time, high-speed video decompression using a frame- and event-based DAVIS sensor. In International Symposium Circuits and Systems (ISCAS) (pp. 686–689).  https://doi.org/10.1109/ISCAS.2014.6865228.
  5. Brandli, C., Berner, R., Yang, M., Liu, S.-C., & Delbruck, T. (2014b). A 240 \(\times \) 180 130 dB 3us latency global shutter spatiotemporal vision sensor. IEEE Jorunal of Solid-State Circuits, 49(10), 2333–2341.  https://doi.org/10.1109/JSSC.2014.2342715.
  6. Camunas-Mesa, L. A., Serrano-Gotarredona, T., Ieng, S. H., Benosman, R. B., & Linares-Barranco, B. (2014). On the use of orientation filters for 3D reconstruction in event-driven stereo vision. Frontiers in Neuroscience, 8, 48.  https://doi.org/10.3389/fnins.2014.00048.Google Scholar
  7. Censi, A., & Scaramuzza, D. (2014). Low-latency event-based visual odometry. In IEEE International Conference on Robotics and Automation (ICRA).  https://doi.org/10.1109/IROS.2016.7758089.
  8. Collins, R. T. (1996). A space-sweep approach to true multi-image matching. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (pp. 358–363).  https://doi.org/10.1109/CVPR.1996.517097.
  9. Cook, M., Gugelmann, L., Jug, F., Krautz, C., & Steger, A. (2011). Interacting maps for fast visual interpretation. In International Joint Conference Neural Networks (IJCNN) (pp. 770–776).  https://doi.org/10.1109/IJCNN.2011.6033299.
  10. Delbruck, T. (2016). Neuromorophic vision sensing and processing. In European Solid-State Device Research Conferernce (ESSDERC) (pp. 7–14).  https://doi.org/10.1109/ESSDERC.2016.7599576.
  11. Delbruck, T., & Lichtsteiner, P. (2007). Fast sensory motor control based on event-based hybrid neuromorphic-procedural system. In IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 845–848).  https://doi.org/10.1109/ISCAS.2007.378038.
  12. Delbruck, T., & Lang, M. (2013). Robotic goalie with 3 ms reaction time at 4% CPU load using event-based dynamic vision sensor. Frontiers in Neuroscience,.  https://doi.org/10.3389/fnins.2013.00223.Google Scholar
  13. Drazen, D., Lichtsteiner, P., Hafliger, P., Delbruck, T., & Jensen, A. (2011). Toward real-time particle tracking using an event-based dynamic vision sensor. Experiments in Fluids, 51(5), 1465–1469.  https://doi.org/10.1007/s00348-011-1207-y.CrossRefGoogle Scholar
  14. Engel, J., Schöps, J., & Cremers, D. (2014). LSD-SLAM: Large-scale direct monocular SLAM. In European Conference on Computer Vision (ECCV).  https://doi.org/10.1007/978-3-319-10605-2_54.
  15. Engel, J., Koltun, V., & Cremers, D. (2017). Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99), 1.  https://doi.org/10.1109/TPAMI.2017.2658577.
  16. Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). SVO: Fast semi-direct monocular visual odometry. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 15–22).  https://doi.org/10.1109/ICRA.2014.6906584.
  17. Gallego, G, Lund, E. A., Mueggler, E., Rebecq, H., Delbruck, T., & Scaramuzza, D. (2017). Event-based, 6-DOF camera tracking from photometric depth maps. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.  https://doi.org/10.1109/TPAMI.2017.2769655.
  18. Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.MATHGoogle Scholar
  19. Kim, H., Handa, A., Benosman, R., Ieng, S.-H., & Davison, A. J. (2014). Simultaneous mosaicing and tracking with an event camera. In British Machine Vision Conference (BMVC).  https://doi.org/10.5244/C.28.26.
  20. Kim, H., Leutenegger, S., & Davison, A. J. (2016). Real-time 3D reconstruction and 6-DoF tracking with an event camera. In European Conference on Computer Vision (ECCV) (pp. 349–364).  https://doi.org/10.1007/978-3-319-46466-4_21.
  21. Kogler, J., Humenberger, M., & Sulzbachner, C. (2011a). Event-based stereo matching approaches for frameless address event stereo data. In International Symposium on Advances in Visual Computing (ISVC) (pp. 674–685).  https://doi.org/10.1007/978-3-642-24028-7_62.
  22. Kogler, J., Sulzbachner, C., Humenberger, M., & Eibensteiner, F. (2011b). Address-event based stereo vision with bio-inspired silicon retina imagers. In Advances in Theory and Applications of Stereo Vision (pp. 165–188). InTech.  https://doi.org/10.5772/12941.
  23. Kueng, B., Mueggler, E., Gallego, G., & Scaramuzza, D. (2016). Low-latency visual odometry using event-based feature tracks. In IEEE/RSJ International Conference on IIntelligent Robots and Systems (IROS) (pp. 16–23). Daejeon, Korea.  https://doi.org/10.1109/IROS.2016.7758089.
  24. Lagorce, X., Orchard, G., Gallupi, F., Shi, B. E., & Benosman, R. (2016). HOTS: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,.  https://doi.org/10.1109/TPAMI.2016.2574707.Google Scholar
  25. Lee, J., Delbruck, T., Park, P. K. J., Pfeiffer, M., Shin, C.-W., Ryu, H., & Kang, B. C. (2012). Live demonstration: Gesture-based remote control using stereo pair of dynamic vision sensors. In IEEE International Symposium on Circuits and Systems (ISCAS).  https://doi.org/10.1109/ISCAS.2012.6272144.
  26. Lee, J. H., Delbruck, T., Pfeiffer, M., Park, P. K. J., Shin, C.-W., Ryu, H., et al. (2014). Real-time gesture interface based on event-driven processing from stereo silicon retinas. IEEE Transactions on Neural Networks and Learning Systems, 25(12), 2250–2263.  https://doi.org/10.1109/TNNLS.2014.2308551.
  27. Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128 \(\times \) 128 120 dB 15 \(\mu \)s latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.  https://doi.org/10.1109/JSSC.2007.914337.CrossRefGoogle Scholar
  28. Litzenberger, M., Belbachir, A. N., Donath, N., Gritsch, G., Garn, H., Kohn, B., Posch, C., & Schraml, S. (2006). Estimation of vehicle speed based on asynchronous data from a silicon retina optical sensor. In IEEE Intelligent Transportation Systems Conference (pp. 653–658).  https://doi.org/10.1109/ITSC.2006.1706816.
  29. Matsuda, N., Cossairt, O., & Gupta. M. (2015). MC3D: Motion contrast 3D scanning. In IEEE International Conference on Computational Photography (ICCP) (pp. 1–10).  https://doi.org/10.1109/ICCPHOT.2015.7168370.
  30. Mueggler, E., Huber, B., & Scaramuzza, D. (2014). Event-based, 6-DOF pose tracking for high-speed maneuvers. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 2761–2768).  https://doi.org/10.1109/IROS.2014.6942940.
  31. Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. International Journal of Robotics Research, 36, 142–149.  https://doi.org/10.1177/0278364917691115.CrossRefGoogle Scholar
  32. Orchard, G., Meyer, C., Etienne-Cummings, R., Posch, C., Thakor, N., & Benosman, R. (2015). HFirst: A temporal approach to object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10), 2028–2040.  https://doi.org/10.1109/TPAMI.2015.2392947.CrossRefGoogle Scholar
  33. Piatkowska, E., Belbachir, A. N., & Gelautz, M. (2013). Asynchronous stereo vision for event-driven dynamic stereo sensor using an adaptive cooperative approach. In International Conference on Computer Vision Workshops (ICCVW) (pp. 45–50).  https://doi.org/10.1109/ICCVW.2013.13.
  34. Piatkowska, E., Belbachir, A. N., Schraml, S., & Gelautz, M. (2012). Spatiotemporal multiple persons tracking using dynamic vision sensor. In IEEE International Conference on Computer Vision and Pattern Recognition Workshop (pp. 35–40).  https://doi.org/10.1109/CVPRW.2012.6238892.
  35. Pizzoli, M., Forster, C., & Scaramuzza, D. (2014). REMODE: Probabilistic, monocular dense reconstruction in real time. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 2609–2616).  https://doi.org/10.1109/ICRA.2014.6907233.
  36. Rebecq, H., Gallego, G., & Scaramuzza, D. (2016). EMVS: Event-based multi-view stereo. In British Machine Vision Conference (BMVC).  https://doi.org/10.5244/C.30.63.
  37. Rebecq, H., Horstschäfer, T., Gallego, G., & Scaramuzza, D. (2017). EVO: A geometric approach to event-based 6-DOF parallel tracking and mapping in real-time. IEEE Robotics and Automation Letters, 2, 593–600.  https://doi.org/10.1109/LRA.2016.2645143.CrossRefGoogle Scholar
  38. Reinbacher, C., Graber, G., & Pock, T. (2016). Real-time intensity-image reconstruction for event cameras using manifold regularisation. In British Machine Vision Conference (BMVC).  https://doi.org/10.5244/C.30.9.
  39. Rogister, P., Benosman, R., Ieng, S.-H., Lichtsteiner, P., & Delbruck, T. (2012). Asynchronous event-based binocular stereo matching. IEEE Transactions on Neural Networks and Learning Systems, 23(2), 347–353.  https://doi.org/10.1109/TNNLS.2011.2180025.CrossRefGoogle Scholar
  40. Rueckauer, B., & Delbruck, T. (2016). Evaluation of event-based algorithms for optical flow with ground-truth from inertial measurement sensor. Frontiers in Neuroscience,.  https://doi.org/10.3389/fnins.2016.00176.Google Scholar
  41. Rusu, R. B., & Cousins, S. (2011). 3D is here: Point cloud library (PCL). In IEEE International Conference on Robotics and Automation (ICRA). Shanghai, China.  https://doi.org/10.1109/ICRA.2011.5980567.
  42. Schraml, S., Belbachir, A. N., & Bischof, H. (2015). Event-driven stereo matching for real-time 3D panoramic vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 466–474).  https://doi.org/10.1109/CVPR.2015.7298644.
  43. Schraml, S., Belbachir, A. N., Milosevic, N., & Schön, P. (2010). Dynamic stereo vision system for real-time tracking. In IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1409–1412).  https://doi.org/10.1109/ISCAS.2010.5537289.
  44. Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.  https://doi.org/10.1109/CVPR.2006.19.
  45. Szeliski, R. (2010). Computer vision: Algorithms and applications., Texts in computer science London: Springer.MATHGoogle Scholar
  46. Szeliski, R., & Golland, P. (1999). Stereo matching with transparency and matting. International Journal of Computer Vision, 32(1), 45–61.  https://doi.org/10.1023/A:1008192912624.CrossRefGoogle Scholar
  47. Vogiatzis, G., & Hernández, C. (2011). Video-based, real-time multi view stereo. Image and Vision Computing, 29(7), 434–441.  https://doi.org/10.1016/j.imavis.2011.01.006.CrossRefGoogle Scholar
  48. Weikersdorfer, D., & Conradt, J. (2012). Event-based particle filtering for robot self-localization. In IEEE International Conference on Robotics and Biomimetics (ROBIO) (pp. 866–870).  https://doi.org/10.1109/ROBIO.2012.6491077.
  49. Weikersdorfer, D., Hoffmann, R., & Conradt, J. (2013). Simultaneous localization and mapping for event-based vision systems. In International Conference on Computer Vision Systems (ICVS) (pp. 133–142).  https://doi.org/10.1007/978-3-642-39402-7_14.
  50. Wiesmann, G., Schraml, S., Litzenberger, M., Belbachir, A. N., Hofstatter, M., & Bartolozzi, C. (2012). Event-driven embodied system for feature extraction and object recognition in robotic applications. In IEEE International Conference on Computer Vision and Pattern Recognition Workshop (pp. 76–82).  https://doi.org/10.1109/CVPRW.2012.6238898.
  51. Wolberg, G. (1990). Digital image warping. California: Wiley-IEEE Computer Society Press.Google Scholar
  52. Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330–1334.  https://doi.org/10.1109/34.888718. ISSN 0162-8828.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Robotics and Perception Group, Department of InformaticsUniversity of ZurichZurichSwitzerland
  2. 2.Robotics and Perception Group, Department of NeuroinformaticsUniversity of Zurich and ETH ZurichZurichSwitzerland

Personalised recommendations