Abstract
This study alms at the automatic understanding of pedestrians’ car-hailing intention in traffic scenes. Traffic scenes are highly complex, with a completely random spatial distribution of pedestrians. Different pedestrians use different behavior to express car-hailing intention, making it difficult to accurately understand the intention of pedestrians for autonomous taxis in complex scenes. A novel intention recognition algorithm with interpretability is proposed in this paper to solve the above problems. Firstly, we employ OpenPose to obtain skeleton data and the facial region. Then, we input the facial region into a facial attention network to extract the facial attention features and infer whether the pedestrian is paying attention to the ego-vehicle. In addition, the skeleton data are also input into a random forest classifier and GCN to extract both explicit and implicit pose features. Finally, an interpretable fusion rule is proposed to fuse the facial and pose features. The fusion algorithm can accurately and stably infer the pedestrians’ intention and identify pedestrians with car-hailing intentions. In order to evaluate the performance of the proposed method, we collected road videos using experimental cars to obtain suitable datasets, and established the corresponding evaluation benchmarks. The experimental results demonstrate that the proposed algorithm has high accuracy and robustness.
Similar content being viewed by others
References
Bochkovskiy, A., Wang, C. Y. and Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv: 2004.10934.
Cao, Z., Simon, T., Wei, S. E. and Sheikh, Y. (2017). Realtime multi-person 2D pose estimation using part affinity fields. IEEE Conf Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.
Chao, Y. W., Vijayanarasimhan, S., Seybold, B., Ross, D. A., Deng, J. and Sukthankar, R. (2018). Rethinking the faster R-CNN architecture for temporal action localization. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G. and Sun, J. (2018). Cascaded pyramid network for multi-person pose estimation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.
Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I. and Zafeiriou, S. (2019). RetinaFace: Single-stage dense face localisation in the wild. arXiv: 1905.00641.
Duan, L., Wei, Y., Zhang, J. and Xia, Y. (2020). Centralized and decentralized autonomous dispatching strategy for dynamic autonomous taxi operation in hybrid request mode. Transportation Research Part C: Emerging Technologies, 111, 397–420.
Fang, Z. and López, A. M. (2019). Intention recognition of pedestrians and cyclists by 2D pose estimation. IEEE Trans. Intelligent Transportation Systems 21, 11, 4773–4783.
Huang, Y., Dai, Q. and Lu, Y. (2019). Decoupling localization and classification in single shot temporal action detection. IEEE Int. Conf. Multimedia and Expo (ICME), Shanghai, China.
Ji, S., Xu, W., Yang, M. and Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 35, 1, 221–231.
Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson, J. P., Kane, A. D., Menon, D. K., Ruekert, D. and Glocker, B. (2017). Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical Image Analysis, 36, 61–78.
Kulkarni, T. D., Whitney, W., Kohli, P. and Tenenbaum, J. B. (2015). Deep convolutional inverse graphics network. arXiv: 1503.03167.
Li, J., Wang, Y., Wang, C., Tai, Y., Qian, J., Yang, J., Wang, C., Li. J. and Huang, F. (2019). DSFD: dual shot face detector. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.
Liu, Y., Kang, Y., Xing, C., Chen, T. and Yang, Q. (2020). A secure federated transfer learning framework. IEEE Intelligent Systems 35, 4, 70–82.
Presti, L. L. and La Cascia, M. (2016). 3D skeleton-based human action classification: A survey. Pattern Recognition, 53, 130–147.
Qian, X., Ju, W. and Sirkin, D. M. (2020). Aladdin’s magic carpet: Navigation by in-air static hand gesture in autonomous vehicles. Int. J. Human—Computer Interaction 36, 20, 1912–1927.
Qiu, L., Wang, K., Long, W., Wang, K., Hu, W. and Amable, G. S. (2016). A comparative assessment of the influences of human impacts on soil Cd concentrations based on stepwise linear regression, classification and regression tree, and random forest models. PLoS One 11, 3, e0151131.
Si, C., Chen, W., Wang, W., Wang, L. and Tan, T. (2019). An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.
Sun, K., Xiao, B., Liu, D. and Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.
Thakurdesai, H. M. and Aghav, J. V. (2021). Autonomous cars: technical challenges and a solution to blind spot. Advances in Computational Intelligence and Communication Technology (CICT), Kurnool, India.
Utesch, F., Brandies, A., Fouopi, P. P. and Schießl, C. (2020). Towards behaviour based testing to understand the black box of autonomous cars. European Transport Research Review 12, 1, 1–11.
Wang, H. and Wang, L. (2017). Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X. and Gool, L. V. (2016). Temporal segment networks: Towards good practices for deep action recognition. European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands.
Wang, Z., Li, L., Li, L., Pi, J., Li, S. and Zhou, Y. (2020). Object detection algorithm based on improved Yolov3-tiny network in traffic scenes. 4th CAA Int. Conf. Vehicular Control and Intelligence (CVCI), Hangzhou, China.
Wei, S. E., Ramakrishna, V., Kanade, T. and Sheikh, Y. (2016). Convolutional pose machines. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.
Wozniak, M., Silka, J., Wieczorek, M. and Alrashoud, M. (2020). Recurrent neural network model for IoT and networking malware threat detection. IEEE Trans. Industrial Informatics 17, 8, 5583–5594.
Zhao, Y., Xiong, Y., Wang, L., Wu, Z., Tang, X. and Lin, D. (2017). Temporal action detection with structured segment networks. Int. Conf. Computer Vision (ICCV), Venice, Italy.
Acknowledgement
This work was supported by the National Natural Science Foundation of China (52172382, 61976039) and the China Fundamental Research Funds for the Central Universities (DUT20GJ207), and Science and Technology Innovation Fund of Dalian (2021JJ12GX015).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Z., Lian, J., Li, L. et al. Understanding Pedestrians’ Car-Hailing Intention in Traffic Scenes. Int.J Automot. Technol. 23, 1023–1034 (2022). https://doi.org/10.1007/s12239-022-0089-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12239-022-0089-8