Abstract
Automated monitoring and analysis of passenger movement in safety-critical parts of transport infrastructures represent a relevant visual surveillance task. Recent breakthroughs in visual representation learning and spatial sensing opened up new possibilities for detecting and tracking humans and objects within a 3D spatial context. This paper proposes a flexible analysis scheme and a thorough evaluation of various processing pipelines to detect and track humans on a ground plane, calibrated automatically via stereo depth and pedestrian detection. We consider multiple combinations within a set of RGB- and depth-based detection and tracking modalities. We exploit the modular concepts of Meshroom [2] and demonstrate its use as a generic vision processing pipeline and scalable evaluation framework. Furthermore, we introduce a novel open RGB-D railway platform dataset with annotations to support research activities in automated RGB-D surveillance. We present quantitative results for multiple object detection and tracking for various algorithmic combinations on our dataset. Results indicate that the combined use of depth-based spatial information and learned representations yields substantially enhanced detection and tracking accuracies. As demonstrated, these enhancements are especially pronounced in adverse situations when occlusions and objects not captured by learned representations are present.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Scalable open-source web annotation tool. https://scalabel.ai. Accessed 01 Oct 2020
AliceVision: Meshroom: A 3D reconstruction software (2018). https://github.com/alicevision/meshroom
Bagautdinov, T., Fleuret, F., Fua, P.: Probability occupancy maps for occluded depth images. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2829–2837 (2015)
Beleznai, C., Steininger, D., Broneder, E.: Human detection in crowded situations by combining stereo depth and deeply-learned models. In: Lu, H. (ed.) ISAIR 2018. SCI, vol. 810, pp. 485–495. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-04946-1_47
Bernardin, K., Elbs, A., Stiefelhagen, R.: Multiple object tracking performance metrics and evaluation in a smart room environment. In: Sixth IEEE International Workshop on Visual Surveillance, in Conjunction with ECCV, vol. 90, p. 91 (2006)
Bertozzi, M., Binelli, E., Broggi, A., Del Rose, M.: Stereo vision-based approaches for pedestrian detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) - Workshops, CVPR 2005, vol. 03, p. 16. IEEE Computer Society, Washington, DC, USA (2005)
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE ICPR, pp. 6247–6257 (2020)
Braun, M., Krebs, S., Flohr, F., Gavrila, D.M.: EuroCity persons: a novel benchmark for person detection in traffic scenes. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2897684
Combs, T.S., Sandt, L.S., Clamann, M.P., McDonald, N.C.: Automated vehicles and pedestrian safety: exploring the promise and limits of pedestrian detection. Am. J. Prev. Med. (2019). https://doi.org/10.1016/j.amepre.2018.06.024
Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arXiv:2003.09003 [cs], March 2020. http://arxiv.org/abs/1906.04567
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proceedings of the BMVC, pp. 91.1–91.11 (2009)
Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, June 2008
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: The KITTI Vision Benchmark Suite. The KITTI Vision Benchmark Suite (2013)
Hasan, I., Liao, S., Li, J., Akram, S.U., Shao, L.: Pedestrian detection: the elephant in the room. arXiv preprint arXiv:2003.08799 (2020)
Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. PAMI (2008). https://doi.org/10.1109/TPAMI.2007.1166
Humenberger, M., Engelke, T., Kubinger, W.: A Census-based stereo vision algorithm using modified Semi-Global Matching and plane fitting to improve matching quality. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, CVPRW 2010 (2010). https://doi.org/10.1109/CVPRW.2010.5543769
Jafari, O.H., Mitzel, D., Leibe, B.: Real-time RGB-D based people detection and Tracking for mobile robots and head-worn cameras. In: Proceedings - IEEE International Conference on Robotics and Automation (2014). https://doi.org/10.1109/ICRA.2014.6907688
Leal-Taixé, L., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S.: Tracking the trackers: an analysis of the state of the art in multiple object tracking, April 2017. http://arxiv.org/abs/1704.02781
Leal-Taixé, L., Pons-Moll, G., Rosenhahn, B.: Everybody needs somebody: modeling social and grouping behavior on a linear programming multiple people tracker. In: Proceedings of the IEEE International Conference on Computer Vision (2011)
Li, Y., Huang, C., Nevatia, R.: Learning to associate: hybridboosted multi-target tracker for crowded scene. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009 (2009). https://doi.org/10.1109/CVPRW.2009.5206735
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., Hasan, I., Liao, S.: Center and scale prediction: a box-free approach for pedestrian and face detection (2020)
Milan, A., Leal-Taixe, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking, pp. 1–12 (2016). http://arxiv.org/abs/1603.00831
Munaro, M., Basso, F., Menegatti, E.: Tracking people within groups with RGB-D data. In: IEEE International Conference on Intelligent Robots and Systems (2012)
Muoñz-Salinas, R., Aguirre, E., García-Silvente, M., Ayesh, A., Góngora, M.: Multi-agent system for people detection and tracking using stereo vision in mobile robots. Robotica (2009). https://doi.org/10.1017/S0263574708005092
Ophoff, T., Beeck, K.V., Goedeme, T.: Improving real-time pedestrian detectors with RGB+depth fusion. In: Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance (2019)
Pirsiavash, H., Ramanan, D., Fowlkes, C.C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. (2017)
Spinello, L., Arras, K.O.: People detection in RGB-D data. In: IEEE International Conference on Intelligent Robots and Systems (2011)
Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. arXiv:1902.03604 [cs] (2019)
Wang, Q., Gao, J., Lin, W., Li, X.: NWPU-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2020)
Wang, Z., Zheng, L., Liu, Y., Wang, S.: Towards real-time multi-object tracking, September 2019. http://arxiv.org/abs/1909.12605
WiderPed: Wider pedestrian 2019 dataset (2019). https://competitions.codalab.org/competitions/20132
Woonhyun, N., Dollár, P., Hee Han, J.: Local decorrelation for improved pedestrian detection. In: NIPS (2014)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016). https://doi.org/10.1109/CVPR.2016.141
Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection (2017)
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: A simple baseline for multi-object tracking, April 2020. http://arxiv.org/abs/2004.01888
Zhou, C., Yuan, J.: Bi-box regression for pedestrian detection and occlusion estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 138–154. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_9
Zhou, K., Paiement, A., Mirmehdi, M.: Detecting humans in RGB-D data with CNNs. In: Proceedings of the 15th IAPR International Conference on Machine Vision Applications, MVA 2017 (2017). https://doi.org/10.23919/MVA.2017.7986862
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Acknowledgement
The authors would like to thank both the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, and the Austrian Research Promotion Agency (FFG) for co-financing the RAIL: EYE3D research project (FFG No. 871520) within the framework of the National Research Development Programme “Mobility of the Future”. In addition, we would like to thank our industry partner EYYES GmbH, Martin Prießnitz with the Federal Austrian Railways (ÖBB) for enabling the recordings, and Marlene Glawischnig and Vanessa Klugsberger for support in annotation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wallner, M., Steininger, D., Widhalm, V., Schörghuber, M., Beleznai, C. (2021). RGB-D Railway Platform Monitoring and Scene Understanding for Enhanced Passenger Safety. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12667. Springer, Cham. https://doi.org/10.1007/978-3-030-68787-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-68787-8_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68786-1
Online ISBN: 978-3-030-68787-8
eBook Packages: Computer ScienceComputer Science (R0)