Abstract
Distracted driver action is the main cause of road traffic crashes, which threatens the security of human life and public property. Based on the observation that cues (like the hand holding the cigarette) reveal what the driver is doing, a driver action recognition model is proposed, which is called deformable and dilated Faster R-CNN (DD-RCNN). Our approach utilizes the detection of motion-specific objects to classify driver actions exhibiting great intra-class differences and inter-class similarity. Firstly, deformable and dilated residual block are designed to extract features of action-specific RoIs that are small in size and irregular in shape (such as cigarettes and cell phones). Attention modules are embedded in the modified ResNet to reweight features in channel and spatial dimensions. Then, the region proposal optimization network (RPON) is presented to reduce the number of RoIs entering R-CNN and improves model efficiency. Lastly, the RoI pooling module is replaced with the deformable one, and the simplified R-CNN without regression layer is trained as the final classifier. Experiments show that DD-RCNN demonstrates state-of-the-art results on Kaggle-driving dataset and self-built dataset.
Similar content being viewed by others
References
Yanbin Y, Lijuan Z, Mengjun L et al. (2016) Early warning of traffic accident in Shanghai based on large data set mining[C]. In: 2016 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). IEEE, 2016, pp 18–21
Chiang HH, Chen YL, Wu BF et al (2014) Embedded driver-assistance system using multiple sensors for safe overtaking maneuver[J]. IEEE Syst J 8(3):681–698
Ba Y, Zhang W, Wang Q et al (2017) Crash prediction with behavioural and physiological features for advanced vehicle collision avoidance system[J]. Transport Res C Emerg Technol 74:22–33
Martinez CM, Heucke M, Wang FY et al (2018) Driving style recognition for intelligent vehicle control and advanced driver assistance: a survey[J]. IEEE Trans Intell Transp Syst 19(3):666–676
Xing Y, Lv C, Wang H, Cao D, Velenis E, Wang F (2019) Driver activity recognition for intelligent vehicles: a deep learning approach. IEEE Trans Veh Technol 68(6):5379–5390
Hu Y, Lu M, Lu X (2018) Driving behaviour recognition from still images by using multi-stream fusion cnn. Mach Vis Appl. https://doi.org/10.1007/s00138-018-0994-z
Zhao CH, Zhang BL, He J, Lian J (2012) Recognition of driving postures by contourlet transform and random forests. IET Intell Transp Syst 6(2):161–168
Zhao C, Zhang B, Lian J, He J, Lin T, Zhang X (2011) Classification of driving postures by support vector machines. In: 2011 sixth international conference on image and graphics, pp 926–930
Zhao C, Gao Y, He J et al (2012) Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier[J]. Eng Appl Artif Intell 25(8):1677–1686
Zhao CH, Zhang BL, Zhang XZ et al (2013) Recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers[J]. Neural Comput & Applic 22(1):175–184
Yan C, Coenen F, Zhang B (2016) Driving posture recognition by convolutional neural networks[J]. IET Comput Vis 10(2):103–114
Hoang Ngan Le T, Zheng Y, Zhu C et al (2016) Multiple scale faster-rcnn approach to driver’s cell-phone usage and hands on steering wheel detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 46–53
Koesdwiady A, Soua R, Karray F et al (2016) Recent trends in driver safety monitoring systems: state of the art and challenges[J]. IEEE Trans Veh Technol 66(6):4550–4563
Yan C, Coenen F, Zhang B (2014) Driving posture recognition by joint application of motion history image and pyramid histogram of oriented gradients[J]. Int J Veh Technol 2014:1–11
Hu Y, Lu M Q, Lu X (2018) Spatial-temporal fusion convolutional neural network for simulated driving behavior recognition[C]. In: 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 2018, pp 1271–1277
Dai J, Qi H, Xiong Y et al (2017) Deformable convolutional networks[J]. CoRR, abs/1703.06211, 1(2):3
Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks[C]. Adv Neural Inf Process Syst 39:91–99
Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks[C]. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, 2017, pp 636–644
Girshick R (2015) Fast R-CNN[J]. Computer Science
Dai K J, R-FCN Y L (2016) Object detection via region-based fully convolutional networks. arXiv preprint[J]. arXiv preprint arXiv:1605.06409
Delaitre V, Sivic J, Laptev I (2011) Learning person-object interactions for action recognition in still images[C]. In: Advances in neural information processing systems, pp 1503–1511
Cortes C, Vapnik V (1995) Support-vector networks[J]. Mach Learn 20(3):273–297
Guo G, Lai A (2014) A survey on still image based human action recognition[J]. Pattern Recogn 47(10):3343–3361
Sharma G, Jurie F, Schmid C (2012). Discriminative spatial saliency for image classification[C]. In: 2012 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2012, pp 3506–3513
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Neural Inf Process Syst 25:84–90
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
Gong Y, Wang L, Guo R et al (2014) Multi-scale orderless pooling of deep convolutional activation features[C]//European conference on computer vision. Springer, Cham, pp 392–407
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Redmon J, Divvala S, Girshick R et al (2016) You only look once: Unified, real-time object detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Zhang Y, Zhou D, Chen S et al (2016) Single-image crowd counting via multi-column convolutional neural network[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Hu Y, Chang H, Nian F et al (2016) Dense crowd counting from still images with convolutional neural networks[J]. J Vis Commun Image Represent 38:530–539
Zhang C, Li H, Wang X et al (2015) Cross-scene crowd counting via deep convolutional neural networks[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841
Gkioxari G, Girshick R, Malik J (2015) Actions and attributes from wholes and parts[C]. In: Proceedings of the IEEE international conference on computer vision, pp 2470–2478
Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with r* cnn[C]. In: Proceedings of the IEEE international conference on computer vision, pp 1080–1088
Khan FS, Xu J, Van De Weijer J et al (2015) Recognizing actions through action-specific person detection[J]. IEEE Trans Image Process 24(11):4422–4432
Qi T, Xu Y, Quan Y et al (2017) Image-based action recognition using hint-enhanced deep neural networks[J]. Neurocomputing 267:475–488
Ragab A, Craye C, Kamel MS et al (2014) A visual-based driver distraction recognition and detection using random forest[C]//international conference image analysis and recognition. Springer, Cham, pp 256–265
Hu J, Xu L, He X et al (2017) Abnormal driving detection based on normalised driving behaviour [J]. IEEE Trans Veh Technol 66(8):6645–6652
Koesdwiady A, Bedawi SM, Ou C et al (2017) End-to-end deep learning for driver distraction recognition[C]//international conference image analysis and recognition. Springer, Cham, pp 11–18
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection[C]. In: International Conference on computer vision & Pattern Recognition (CVPR’05), vol 1. IEEE Computer Society, 2005, pp 886–893
Lowe DG (2004) Distinctive image features from scale-invariant keypoints[J]. Int J Comput Vis 60(2):91–110
LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551
Lin T Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Woo S, Park J, Lee J Y et al Cbam: convolutional block attention module[C]. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Selvaraju R R, Cogswell M, Das A et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization[C]. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769
Acknowledgements
The authors would like to thank the editor and the anonymous reviewers for their valuable comments and constructive suggestions. This work was supported by the National Natural Science Foundation of China (No.61871123), Key Research and Development Program in Jiangsu Province (No.BE2016739) and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, M., Hu, Y. & Lu, X. Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals. Appl Intell 50, 1100–1111 (2020). https://doi.org/10.1007/s10489-019-01603-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01603-4