RGB-D Railway Platform Monitoring and Scene Understanding for Enhanced Passenger Safety

Wallner, Marco; Steininger, Daniel; Widhalm, Verena; Schörghuber, Matthias; Beleznai, Csaba

doi:10.1007/978-3-030-68787-8_47

Marco Wallner¹⁶,
Daniel Steininger¹⁶,
Verena Widhalm¹⁶,
Matthias Schörghuber¹⁶ &
…
Csaba Beleznai¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12667))

Included in the following conference series:

International Conference on Pattern Recognition

1900 Accesses
2 Citations
5 Altmetric

Abstract

Automated monitoring and analysis of passenger movement in safety-critical parts of transport infrastructures represent a relevant visual surveillance task. Recent breakthroughs in visual representation learning and spatial sensing opened up new possibilities for detecting and tracking humans and objects within a 3D spatial context. This paper proposes a flexible analysis scheme and a thorough evaluation of various processing pipelines to detect and track humans on a ground plane, calibrated automatically via stereo depth and pedestrian detection. We consider multiple combinations within a set of RGB- and depth-based detection and tracking modalities. We exploit the modular concepts of Meshroom [2] and demonstrate its use as a generic vision processing pipeline and scalable evaluation framework. Furthermore, we introduce a novel open RGB-D railway platform dataset with annotations to support research activities in automated RGB-D surveillance. We present quantitative results for multiple object detection and tracking for various algorithmic combinations on our dataset. Results indicate that the combined use of depth-based spatial information and learned representations yields substantially enhanced detection and tracking accuracies. As demonstrated, these enhancements are especially pronounced in adverse situations when occlusions and objects not captured by learned representations are present.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/raileye3d/raileye3d_dataset

References

Scalable open-source web annotation tool. https://scalabel.ai. Accessed 01 Oct 2020
AliceVision: Meshroom: A 3D reconstruction software (2018). https://github.com/alicevision/meshroom
Bagautdinov, T., Fleuret, F., Fua, P.: Probability occupancy maps for occluded depth images. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2829–2837 (2015)
Google Scholar
Beleznai, C., Steininger, D., Broneder, E.: Human detection in crowded situations by combining stereo depth and deeply-learned models. In: Lu, H. (ed.) ISAIR 2018. SCI, vol. 810, pp. 485–495. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-04946-1_47
Chapter Google Scholar
Bernardin, K., Elbs, A., Stiefelhagen, R.: Multiple object tracking performance metrics and evaluation in a smart room environment. In: Sixth IEEE International Workshop on Visual Surveillance, in Conjunction with ECCV, vol. 90, p. 91 (2006)
Google Scholar
Bertozzi, M., Binelli, E., Broggi, A., Del Rose, M.: Stereo vision-based approaches for pedestrian detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) - Workshops, CVPR 2005, vol. 03, p. 16. IEEE Computer Society, Washington, DC, USA (2005)
Google Scholar
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE ICPR, pp. 6247–6257 (2020)
Google Scholar
Braun, M., Krebs, S., Flohr, F., Gavrila, D.M.: EuroCity persons: a novel benchmark for person detection in traffic scenes. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2897684
Combs, T.S., Sandt, L.S., Clamann, M.P., McDonald, N.C.: Automated vehicles and pedestrian safety: exploring the promise and limits of pedestrian detection. Am. J. Prev. Med. (2019). https://doi.org/10.1016/j.amepre.2018.06.024
Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arXiv:2003.09003 [cs], March 2020. http://arxiv.org/abs/1906.04567
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proceedings of the BMVC, pp. 91.1–91.11 (2009)
Google Scholar
Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, June 2008
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: The KITTI Vision Benchmark Suite. The KITTI Vision Benchmark Suite (2013)
Google Scholar
Hasan, I., Liao, S., Li, J., Akram, S.U., Shao, L.: Pedestrian detection: the elephant in the room. arXiv preprint arXiv:2003.08799 (2020)
Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. PAMI (2008). https://doi.org/10.1109/TPAMI.2007.1166
Humenberger, M., Engelke, T., Kubinger, W.: A Census-based stereo vision algorithm using modified Semi-Global Matching and plane fitting to improve matching quality. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, CVPRW 2010 (2010). https://doi.org/10.1109/CVPRW.2010.5543769
Jafari, O.H., Mitzel, D., Leibe, B.: Real-time RGB-D based people detection and Tracking for mobile robots and head-worn cameras. In: Proceedings - IEEE International Conference on Robotics and Automation (2014). https://doi.org/10.1109/ICRA.2014.6907688
Leal-Taixé, L., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S.: Tracking the trackers: an analysis of the state of the art in multiple object tracking, April 2017. http://arxiv.org/abs/1704.02781
Leal-Taixé, L., Pons-Moll, G., Rosenhahn, B.: Everybody needs somebody: modeling social and grouping behavior on a linear programming multiple people tracker. In: Proceedings of the IEEE International Conference on Computer Vision (2011)
Google Scholar
Li, Y., Huang, C., Nevatia, R.: Learning to associate: hybridboosted multi-target tracker for crowded scene. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009 (2009). https://doi.org/10.1109/CVPRW.2009.5206735
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., Hasan, I., Liao, S.: Center and scale prediction: a box-free approach for pedestrian and face detection (2020)
Google Scholar
Milan, A., Leal-Taixe, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking, pp. 1–12 (2016). http://arxiv.org/abs/1603.00831
Munaro, M., Basso, F., Menegatti, E.: Tracking people within groups with RGB-D data. In: IEEE International Conference on Intelligent Robots and Systems (2012)
Google Scholar
Muoñz-Salinas, R., Aguirre, E., García-Silvente, M., Ayesh, A., Góngora, M.: Multi-agent system for people detection and tracking using stereo vision in mobile robots. Robotica (2009). https://doi.org/10.1017/S0263574708005092
Ophoff, T., Beeck, K.V., Goedeme, T.: Improving real-time pedestrian detectors with RGB+depth fusion. In: Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance (2019)
Google Scholar
Pirsiavash, H., Ramanan, D., Fowlkes, C.C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. (2017)
Google Scholar
Spinello, L., Arras, K.O.: People detection in RGB-D data. In: IEEE International Conference on Intelligent Robots and Systems (2011)
Google Scholar
Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. arXiv:1902.03604 [cs] (2019)
Wang, Q., Gao, J., Lin, W., Li, X.: NWPU-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2020)
Google Scholar
Wang, Z., Zheng, L., Liu, Y., Wang, S.: Towards real-time multi-object tracking, September 2019. http://arxiv.org/abs/1909.12605
WiderPed: Wider pedestrian 2019 dataset (2019). https://competitions.codalab.org/competitions/20132
Woonhyun, N., Dollár, P., Hee Han, J.: Local decorrelation for improved pedestrian detection. In: NIPS (2014)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016). https://doi.org/10.1109/CVPR.2016.141
Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection (2017)
Google Scholar
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: A simple baseline for multi-object tracking, April 2020. http://arxiv.org/abs/2004.01888
Zhou, C., Yuan, J.: Bi-box regression for pedestrian detection and occlusion estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 138–154. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_9
Chapter Google Scholar
Zhou, K., Paiement, A., Mirmehdi, M.: Detecting humans in RGB-D data with CNNs. In: Proceedings of the 15th IAPR International Conference on Machine Vision Applications, MVA 2017 (2017). https://doi.org/10.23919/MVA.2017.7986862
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Chapter Google Scholar

Download references

Acknowledgement

The authors would like to thank both the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, and the Austrian Research Promotion Agency (FFG) for co-financing the RAIL: EYE3D research project (FFG No. 871520) within the framework of the National Research Development Programme “Mobility of the Future”. In addition, we would like to thank our industry partner EYYES GmbH, Martin Prießnitz with the Federal Austrian Railways (ÖBB) for enabling the recordings, and Marlene Glawischnig and Vanessa Klugsberger for support in annotation.

Author information

Authors and Affiliations

AIT Austrian Institute of Technology GmbH, Vienna, Austria
Marco Wallner, Daniel Steininger, Verena Widhalm, Matthias Schörghuber & Csaba Beleznai

Authors

Marco Wallner
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Steininger
View author publications
You can also search for this author in PubMed Google Scholar
Verena Widhalm
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Schörghuber
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Beleznai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Wallner .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wallner, M., Steininger, D., Widhalm, V., Schörghuber, M., Beleznai, C. (2021). RGB-D Railway Platform Monitoring and Scene Understanding for Enhanced Passenger Safety. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12667. Springer, Cham. https://doi.org/10.1007/978-3-030-68787-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-68787-8_47
Published: 21 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68786-1
Online ISBN: 978-3-030-68787-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)