Abstract
Properly training LSTMs requires long time and extensive amount of data. To improve the training of these models, this paper proposes a novel residual and recurrent neural network, Resnet-LSTM, for spatio-temporal pedestrian action recognition from image sequences. The model includes a novel layer, called MapGrad, whose goal is improving stationarity of the feature map sequences processed by the ConvLSTM. The paper demonstrates the effectiveness of the proposed model and the MapGrad layer in the spatio-temporal classification of pedestrian actions through an ablation study and comparison with state-of-the-art methods. Overall, RLSTM achieves an accuracy value of 88% and an average precision of 94% on the JAAD dataset, which is a widely used benchmark in the field. Finally, the paper empirically analyzes the effect of increasing input sequence length on standing action recognition, showing that the proposed method yields a recall of 93%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rasouli, A., Kotseruba, I., Tsotsos, J.K.: Are they going to cross? A benchmark dataset and baseline for pedestrian crosswalk behavior. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 206–213 (2017)
Park, S.K., Chung, J.H., Pae, D.S., Lim, M.T.: Binary dense SIFT flow based position-information added two-stream CNN for pedestrian action recognition. Appl. Sci. 12(20), 10445 (2022)
Marginean, A., Brehar, R., Negru, M.: Understanding pedestrian behaviour with pose estimation and recurrent networks. In: 2019 6th International Symposium on Electrical and Electronics Engineering (ISEEE), pp. 1–6. IEEE (2019)
Yang, B., Zhan, W., Wang, P., Chan, C., Cai, Y., Wang, N.: Crossing or not? Context-based recognition of pedestrian crossing intention in the urban environment. IEEE Trans. Intell. Transp. Syst. 23(6), 5338–5349 (2021)
Yang, D., Zhang, H., Yurtsever, E., Redmill, K.A., Özgüner, Ü.: Predicting pedestrian crossing intention with feature fusion and spatio-temporal attention. IEEE Trans. Intell. Veh. 7(2), 221–230 (2022)
Chen, T., Tian, R., Ding, Z.: Visual reasoning using graph convolutional networks for predicting pedestrian crossing intention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3103–3109 (2021)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Shi, X., Chen, Z., Wang, H., Yeung, D. Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, J.: Track R-CNN: multiple object tracking with track-RCNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10838–10847 (2020)
Liu, B., et al.: Spatiotemporal relationship reasoning for pedestrian intent prediction. IEEE Robot. Autom. Lett. 5(2), 3485–3492 (2020)
Guo, D., Mordan, T., Alahi, A.: Pedestrian stop and go forecasting with hybrid feature fusion. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 940–947. IEEE (2022)
Qi, M., Qin, J., Wu, Y., Yang, Y.: Imitative non-autoregressive modeling for trajectory forecasting and imputation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12736–12745 (2020)
Mangalam, K., et al.: It is not the journey but the destination: endpoint conditioned trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 759–776. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_45
Noguchi, C., Tanizawa, T.: Ego-vehicle action recognition based on semi-supervised contrastive learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5988–5998 (2023)
Lian, J., Yu, F., Li, L., Zhou, Y.: Early intention prediction of pedestrians using contextual attention-based LSTM. Multimedia Tools Appl. 82(10), 14713–14729 (2023)
Rasouli, A., Kotseruba, I., Tsotsos, J.K.: Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs (2020)
Cadena, P.R.G., Yang, M., Qian, Y., Wang, C.: Pedestrian graph: pedestrian crossing prediction based on 2D pose estimation and graph convolutional networks. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 2000–2005. IEEE (2019)
Moreno, E., et al.: Pedestrian crossing intention forecasting at unsignalized intersections using naturalistic trajectories. Sensors 23(5), 2773 (2023)
Yang, C., Pei, Z.: Long-short term spatio-temporal aggregation for trajectory prediction. IEEE Trans. Intell. Transp. Syst. 24(4), 4114–4126 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gazzeh, S., Lo Presti, L., Douik, A., La Cascia, M. (2023). RLSTM: A Novel Residual and Recurrent Network for Pedestrian Action Classification. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14185. Springer, Cham. https://doi.org/10.1007/978-3-031-44240-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-44240-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44239-1
Online ISBN: 978-3-031-44240-7
eBook Packages: Computer ScienceComputer Science (R0)