Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals

Lu, Mingqi; Hu, Yaocong; Lu, Xiaobo

doi:10.1007/s10489-019-01603-4

Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals

Published: 17 December 2019

Volume 50, pages 1100–1111, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1423 Accesses
49 Citations
Explore all metrics

Abstract

Distracted driver action is the main cause of road traffic crashes, which threatens the security of human life and public property. Based on the observation that cues (like the hand holding the cigarette) reveal what the driver is doing, a driver action recognition model is proposed, which is called deformable and dilated Faster R-CNN (DD-RCNN). Our approach utilizes the detection of motion-specific objects to classify driver actions exhibiting great intra-class differences and inter-class similarity. Firstly, deformable and dilated residual block are designed to extract features of action-specific RoIs that are small in size and irregular in shape (such as cigarettes and cell phones). Attention modules are embedded in the modified ResNet to reweight features in channel and spatial dimensions. Then, the region proposal optimization network (RPON) is presented to reduce the number of RoIs entering R-CNN and improves model efficiency. Lastly, the RoI pooling module is replaced with the deformable one, and the simplified R-CNN without regression layer is trained as the final classifier. Experiments show that DD-RCNN demonstrates state-of-the-art results on Kaggle-driving dataset and self-built dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

Article 04 April 2022

RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network

Article 04 January 2021

ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

Article 12 July 2023

References

Yanbin Y, Lijuan Z, Mengjun L et al. (2016) Early warning of traffic accident in Shanghai based on large data set mining[C]. In: 2016 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). IEEE, 2016, pp 18–21
Chiang HH, Chen YL, Wu BF et al (2014) Embedded driver-assistance system using multiple sensors for safe overtaking maneuver[J]. IEEE Syst J 8(3):681–698
Article Google Scholar
Ba Y, Zhang W, Wang Q et al (2017) Crash prediction with behavioural and physiological features for advanced vehicle collision avoidance system[J]. Transport Res C Emerg Technol 74:22–33
Article Google Scholar
Martinez CM, Heucke M, Wang FY et al (2018) Driving style recognition for intelligent vehicle control and advanced driver assistance: a survey[J]. IEEE Trans Intell Transp Syst 19(3):666–676
Article Google Scholar
Xing Y, Lv C, Wang H, Cao D, Velenis E, Wang F (2019) Driver activity recognition for intelligent vehicles: a deep learning approach. IEEE Trans Veh Technol 68(6):5379–5390
Article Google Scholar
Hu Y, Lu M, Lu X (2018) Driving behaviour recognition from still images by using multi-stream fusion cnn. Mach Vis Appl. https://doi.org/10.1007/s00138-018-0994-z
Zhao CH, Zhang BL, He J, Lian J (2012) Recognition of driving postures by contourlet transform and random forests. IET Intell Transp Syst 6(2):161–168
Article Google Scholar
Zhao C, Zhang B, Lian J, He J, Lin T, Zhang X (2011) Classification of driving postures by support vector machines. In: 2011 sixth international conference on image and graphics, pp 926–930
Zhao C, Gao Y, He J et al (2012) Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier[J]. Eng Appl Artif Intell 25(8):1677–1686
Article Google Scholar
Zhao CH, Zhang BL, Zhang XZ et al (2013) Recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers[J]. Neural Comput & Applic 22(1):175–184
Article Google Scholar
Yan C, Coenen F, Zhang B (2016) Driving posture recognition by convolutional neural networks[J]. IET Comput Vis 10(2):103–114
Article Google Scholar
Hoang Ngan Le T, Zheng Y, Zhu C et al (2016) Multiple scale faster-rcnn approach to driver’s cell-phone usage and hands on steering wheel detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 46–53
Koesdwiady A, Soua R, Karray F et al (2016) Recent trends in driver safety monitoring systems: state of the art and challenges[J]. IEEE Trans Veh Technol 66(6):4550–4563
Article Google Scholar
Yan C, Coenen F, Zhang B (2014) Driving posture recognition by joint application of motion history image and pyramid histogram of oriented gradients[J]. Int J Veh Technol 2014:1–11
Article Google Scholar
Hu Y, Lu M Q, Lu X (2018) Spatial-temporal fusion convolutional neural network for simulated driving behavior recognition[C]. In: 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 2018, pp 1271–1277
Dai J, Qi H, Xiong Y et al (2017) Deformable convolutional networks[J]. CoRR, abs/1703.06211, 1(2):3
Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks[C]. Adv Neural Inf Process Syst 39:91–99
Google Scholar
Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks[C]. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, 2017, pp 636–644
Girshick R (2015) Fast R-CNN[J]. Computer Science
Dai K J, R-FCN Y L (2016) Object detection via region-based fully convolutional networks. arXiv preprint[J]. arXiv preprint arXiv:1605.06409
Delaitre V, Sivic J, Laptev I (2011) Learning person-object interactions for action recognition in still images[C]. In: Advances in neural information processing systems, pp 1503–1511
Cortes C, Vapnik V (1995) Support-vector networks[J]. Mach Learn 20(3):273–297
MATH Google Scholar
Guo G, Lai A (2014) A survey on still image based human action recognition[J]. Pattern Recogn 47(10):3343–3361
Article Google Scholar
Sharma G, Jurie F, Schmid C (2012). Discriminative spatial saliency for image classification[C]. In: 2012 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2012, pp 3506–3513
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Neural Inf Process Syst 25:84–90
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
Gong Y, Wang L, Guo R et al (2014) Multi-scale orderless pooling of deep convolutional activation features[C]//European conference on computer vision. Springer, Cham, pp 392–407
Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Redmon J, Divvala S, Girshick R et al (2016) You only look once: Unified, real-time object detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Zhang Y, Zhou D, Chen S et al (2016) Single-image crowd counting via multi-column convolutional neural network[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Hu Y, Chang H, Nian F et al (2016) Dense crowd counting from still images with convolutional neural networks[J]. J Vis Commun Image Represent 38:530–539
Article Google Scholar
Zhang C, Li H, Wang X et al (2015) Cross-scene crowd counting via deep convolutional neural networks[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841
Gkioxari G, Girshick R, Malik J (2015) Actions and attributes from wholes and parts[C]. In: Proceedings of the IEEE international conference on computer vision, pp 2470–2478
Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with r* cnn[C]. In: Proceedings of the IEEE international conference on computer vision, pp 1080–1088
Khan FS, Xu J, Van De Weijer J et al (2015) Recognizing actions through action-specific person detection[J]. IEEE Trans Image Process 24(11):4422–4432
Article MathSciNet Google Scholar
Qi T, Xu Y, Quan Y et al (2017) Image-based action recognition using hint-enhanced deep neural networks[J]. Neurocomputing 267:475–488
Article Google Scholar
Ragab A, Craye C, Kamel MS et al (2014) A visual-based driver distraction recognition and detection using random forest[C]//international conference image analysis and recognition. Springer, Cham, pp 256–265
Google Scholar
Hu J, Xu L, He X et al (2017) Abnormal driving detection based on normalised driving behaviour [J]. IEEE Trans Veh Technol 66(8):6645–6652
Article Google Scholar
Koesdwiady A, Bedawi SM, Ou C et al (2017) End-to-end deep learning for driver distraction recognition[C]//international conference image analysis and recognition. Springer, Cham, pp 11–18
Google Scholar
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection[C]. In: International Conference on computer vision & Pattern Recognition (CVPR’05), vol 1. IEEE Computer Society, 2005, pp 886–893
Lowe DG (2004) Distinctive image features from scale-invariant keypoints[J]. Int J Comput Vis 60(2):91–110
Article Google Scholar
LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551
Article Google Scholar
Lin T Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Woo S, Park J, Lee J Y et al Cbam: convolutional block attention module[C]. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Selvaraju R R, Cogswell M, Das A et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization[C]. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their valuable comments and constructive suggestions. This work was supported by the National Natural Science Foundation of China (No.61871123), Key Research and Development Program in Jiangsu Province (No.BE2016739) and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

School of Automation, Southeast University, Nanjing, 210096, China
Mingqi Lu, Yaocong Hu & Xiaobo Lu
Key Laboratory of Measurement and Control of CSE, Ministry of Education, Southeast University, Nanjing, 210096, China
Mingqi Lu, Yaocong Hu & Xiaobo Lu

Authors

Mingqi Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yaocong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobo Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaobo Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, M., Hu, Y. & Lu, X. Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals. Appl Intell 50, 1100–1111 (2020). https://doi.org/10.1007/s10489-019-01603-4

Download citation

Published: 17 December 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10489-019-01603-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals

Abstract

Access this article

Similar content being viewed by others

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network

ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals

Abstract

Access this article

Similar content being viewed by others

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network

ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation