Skip to main content
Log in

Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Distracted driver action is the main cause of road traffic crashes, which threatens the security of human life and public property. Based on the observation that cues (like the hand holding the cigarette) reveal what the driver is doing, a driver action recognition model is proposed, which is called deformable and dilated Faster R-CNN (DD-RCNN). Our approach utilizes the detection of motion-specific objects to classify driver actions exhibiting great intra-class differences and inter-class similarity. Firstly, deformable and dilated residual block are designed to extract features of action-specific RoIs that are small in size and irregular in shape (such as cigarettes and cell phones). Attention modules are embedded in the modified ResNet to reweight features in channel and spatial dimensions. Then, the region proposal optimization network (RPON) is presented to reduce the number of RoIs entering R-CNN and improves model efficiency. Lastly, the RoI pooling module is replaced with the deformable one, and the simplified R-CNN without regression layer is trained as the final classifier. Experiments show that DD-RCNN demonstrates state-of-the-art results on Kaggle-driving dataset and self-built dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Yanbin Y, Lijuan Z, Mengjun L et al. (2016) Early warning of traffic accident in Shanghai based on large data set mining[C]. In: 2016 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). IEEE, 2016, pp 18–21

  2. Chiang HH, Chen YL, Wu BF et al (2014) Embedded driver-assistance system using multiple sensors for safe overtaking maneuver[J]. IEEE Syst J 8(3):681–698

    Article  Google Scholar 

  3. Ba Y, Zhang W, Wang Q et al (2017) Crash prediction with behavioural and physiological features for advanced vehicle collision avoidance system[J]. Transport Res C Emerg Technol 74:22–33

    Article  Google Scholar 

  4. Martinez CM, Heucke M, Wang FY et al (2018) Driving style recognition for intelligent vehicle control and advanced driver assistance: a survey[J]. IEEE Trans Intell Transp Syst 19(3):666–676

    Article  Google Scholar 

  5. Xing Y, Lv C, Wang H, Cao D, Velenis E, Wang F (2019) Driver activity recognition for intelligent vehicles: a deep learning approach. IEEE Trans Veh Technol 68(6):5379–5390

    Article  Google Scholar 

  6. Hu Y, Lu M, Lu X (2018) Driving behaviour recognition from still images by using multi-stream fusion cnn. Mach Vis Appl. https://doi.org/10.1007/s00138-018-0994-z

  7. Zhao CH, Zhang BL, He J, Lian J (2012) Recognition of driving postures by contourlet transform and random forests. IET Intell Transp Syst 6(2):161–168

    Article  Google Scholar 

  8. Zhao C, Zhang B, Lian J, He J, Lin T, Zhang X (2011) Classification of driving postures by support vector machines. In: 2011 sixth international conference on image and graphics, pp 926–930

  9. Zhao C, Gao Y, He J et al (2012) Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier[J]. Eng Appl Artif Intell 25(8):1677–1686

    Article  Google Scholar 

  10. Zhao CH, Zhang BL, Zhang XZ et al (2013) Recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers[J]. Neural Comput & Applic 22(1):175–184

    Article  Google Scholar 

  11. Yan C, Coenen F, Zhang B (2016) Driving posture recognition by convolutional neural networks[J]. IET Comput Vis 10(2):103–114

    Article  Google Scholar 

  12. Hoang Ngan Le T, Zheng Y, Zhu C et al (2016) Multiple scale faster-rcnn approach to driver’s cell-phone usage and hands on steering wheel detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 46–53

  13. Koesdwiady A, Soua R, Karray F et al (2016) Recent trends in driver safety monitoring systems: state of the art and challenges[J]. IEEE Trans Veh Technol 66(6):4550–4563

    Article  Google Scholar 

  14. Yan C, Coenen F, Zhang B (2014) Driving posture recognition by joint application of motion history image and pyramid histogram of oriented gradients[J]. Int J Veh Technol 2014:1–11

    Article  Google Scholar 

  15. Hu Y, Lu M Q, Lu X (2018) Spatial-temporal fusion convolutional neural network for simulated driving behavior recognition[C]. In: 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 2018, pp 1271–1277

  16. Dai J, Qi H, Xiong Y et al (2017) Deformable convolutional networks[J]. CoRR, abs/1703.06211, 1(2):3

  17. Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks[C]. Adv Neural Inf Process Syst 39:91–99

    Google Scholar 

  18. Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks[C]. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, 2017, pp 636–644

  19. Girshick R (2015) Fast R-CNN[J]. Computer Science

  20. Dai K J, R-FCN Y L (2016) Object detection via region-based fully convolutional networks. arXiv preprint[J]. arXiv preprint arXiv:1605.06409

  21. Delaitre V, Sivic J, Laptev I (2011) Learning person-object interactions for action recognition in still images[C]. In: Advances in neural information processing systems, pp 1503–1511

  22. Cortes C, Vapnik V (1995) Support-vector networks[J]. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  23. Guo G, Lai A (2014) A survey on still image based human action recognition[J]. Pattern Recogn 47(10):3343–3361

    Article  Google Scholar 

  24. Sharma G, Jurie F, Schmid C (2012). Discriminative spatial saliency for image classification[C]. In: 2012 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2012, pp 3506–3513

  25. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Neural Inf Process Syst 25:84–90

    Google Scholar 

  26. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556

  27. Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446

  28. Gong Y, Wang L, Guo R et al (2014) Multi-scale orderless pooling of deep convolutional activation features[C]//European conference on computer vision. Springer, Cham, pp 392–407

    Google Scholar 

  29. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  30. Redmon J, Divvala S, Girshick R et al (2016) You only look once: Unified, real-time object detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  31. Zhang Y, Zhou D, Chen S et al (2016) Single-image crowd counting via multi-column convolutional neural network[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597

  32. Hu Y, Chang H, Nian F et al (2016) Dense crowd counting from still images with convolutional neural networks[J]. J Vis Commun Image Represent 38:530–539

    Article  Google Scholar 

  33. Zhang C, Li H, Wang X et al (2015) Cross-scene crowd counting via deep convolutional neural networks[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841

  34. Gkioxari G, Girshick R, Malik J (2015) Actions and attributes from wholes and parts[C]. In: Proceedings of the IEEE international conference on computer vision, pp 2470–2478

  35. Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with r* cnn[C]. In: Proceedings of the IEEE international conference on computer vision, pp 1080–1088

  36. Khan FS, Xu J, Van De Weijer J et al (2015) Recognizing actions through action-specific person detection[J]. IEEE Trans Image Process 24(11):4422–4432

    Article  MathSciNet  Google Scholar 

  37. Qi T, Xu Y, Quan Y et al (2017) Image-based action recognition using hint-enhanced deep neural networks[J]. Neurocomputing 267:475–488

    Article  Google Scholar 

  38. Ragab A, Craye C, Kamel MS et al (2014) A visual-based driver distraction recognition and detection using random forest[C]//international conference image analysis and recognition. Springer, Cham, pp 256–265

    Google Scholar 

  39. Hu J, Xu L, He X et al (2017) Abnormal driving detection based on normalised driving behaviour [J]. IEEE Trans Veh Technol 66(8):6645–6652

    Article  Google Scholar 

  40. Koesdwiady A, Bedawi SM, Ou C et al (2017) End-to-end deep learning for driver distraction recognition[C]//international conference image analysis and recognition. Springer, Cham, pp 11–18

    Google Scholar 

  41. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  42. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection[C]. In: International Conference on computer vision & Pattern Recognition (CVPR’05), vol 1. IEEE Computer Society, 2005, pp 886–893

  43. Lowe DG (2004) Distinctive image features from scale-invariant keypoints[J]. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  44. LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551

    Article  Google Scholar 

  45. Lin T Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  46. Woo S, Park J, Lee J Y et al Cbam: convolutional block attention module[C]. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  47. Selvaraju R R, Cogswell M, Das A et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization[C]. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  48. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their valuable comments and constructive suggestions. This work was supported by the National Natural Science Foundation of China (No.61871123), Key Research and Development Program in Jiangsu Province (No.BE2016739) and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaobo Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, M., Hu, Y. & Lu, X. Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals. Appl Intell 50, 1100–1111 (2020). https://doi.org/10.1007/s10489-019-01603-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01603-4

Keywords

Navigation