Skip to main content
Log in

Hybrid dilated multilayer faster RCNN for object detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Faster region-based convolution neural network (Faster RCNN) architecture was proposed as an efficient object detection method, wherein a CNN is used to extract image features. However, CNNs require a large number of learning parameters, and an excessive amount of pooling layers lead to a loss of information on small objects, which may affect efficiency. In this study, we proposed a hybrid dilated multilayer Faster RCNN model to address this problem. The key contributions of this work are summarized as follows: (1) We substituted a hybrid dilated CNN (HDC) model for the VGG16 network used in the original Faster RCNN architecture to extract features and ensure portability. We also used a LeakyReLU activation function to improve the mapping ability of negative input information to detect objects rapidly and accurately. (2) We used a multilayer feature spatial pyramid to convert single-scale features into multi-scale features, and higher-resolution information was obtained through a deconvolutional network to achieve more accurate object detection. (3) We conducted experiments to verify the performance of the proposed HDMF-RCNN model using the Microsoft COCO data set. The results indicated that the accuracy of HDMF-RCNN was 8.12% greater than that of the traditional Faster RCNN model, and the training loss and training time were lower by 44.64% and 39.46% on average, respectively. Overall, the results verified that HDMF-RCNN can significantly improve on the efficiency of existing object detection methods. As an independent feature extraction network, HDC can be adapted to different network frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The code involved in this paper involves the subsequent application of related patents. Our code will be available for others and teams after other projects are completed.

References

  1. Ren, J., Wang, Y.: Overview of object detection algorithms using convolutional neural networks. J. Comput. Commun. 10(1), 115–132 (2022)

    Google Scholar 

  2. Zheng, W., Liu, X., Yin, L.: Research on image classification method based on improved multi-scale relational network. Peer J. Comput. Sci. 7, e613 (2021)

    Article  Google Scholar 

  3. Tripathi, M.: Analysis of convolutional neural network based image classification techniques. J. Innov. Image Process. 3(2), 100–117 (2021)

    Article  Google Scholar 

  4. Jiang, D., Li, G., Tan, C., et al.: Semantic segmentation for multiscale target based on object recognition using the improved Faster-RCNN model. Future Gener. Comput. Syst. 123, 94–104 (2021)

    Article  Google Scholar 

  5. Zheng, S., Lu, J., Zhao, H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6881-6890 (2021)

  6. Lei, M., Rao, Z., Wang, H., et al.: Maceral groups analysis of coal based on semantic segmentation of photomicrographs via the improved U-net. Fuel 294, 120475 (2021)

    Article  Google Scholar 

  7. Liu Z., Hu H., Lin Y., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009-12019 (2022)

  8. Fan H., Xiong B., Mangalam K., et al.: Multiscale vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6824-6835 (2021)

  9. Zhang H., Li F., Liu S., et al.: Dino: detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022)

  10. Zhao, Z.Q., Zheng, P., Xu, S., et al.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 20(11), 3212–3232 (2019)

    Article  Google Scholar 

  11. Wu, X., Sahoo, D., Hoi, S.C.H.: Recent advances in deep learning for object detection. Neurocomputing 396, 39–64 (2020)

    Article  Google Scholar 

  12. Lecun, Y., Bottou, E., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  13. Lecun, Y., Boser, B., Denker, J.: Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 2, 396–404 (1990)

    Google Scholar 

  14. Girshick R., Donahue J., Darrell T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587 (2014)

  15. Chandra, M.A., Bedi, S.S.: Survey on SVM and their application in image classification. Int. J. Inf. Technol. 13(5), 1–11 (2021)

    Google Scholar 

  16. He, K.M., Zhang, X.Y., Ren, S.Q., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2014)

    Article  Google Scholar 

  17. Girshick R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440-1448 (2015)

  18. Ren S., He K., Girshick R, et al.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  19. Mansour, R.F., Escorcia-Gutierrez, J., Gamarra, M., et al.: Intelligent video anomaly detection and classification using faster RCNN with deep reinforcement learning model. Image Vis. Comput. 112, 104229 (2021)

    Article  Google Scholar 

  20. Albahli, S., Nawaz, M., Javed, A., et al.: An improved faster-RCNN model for handwritten character recognition. Arab. J. Sci. Eng. 46(9), 8509–8523 (2021)

    Article  Google Scholar 

  21. Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 472-480 (2017)

  22. Zhao, Z., Li, Q., Zhang, Z., et al.: Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition. Neural Netw. 141, 52–60 (2021)

    Article  Google Scholar 

  23. Wang P., Chen P., Yuan Y., et al.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451-1460 (2018)

  24. Jiang, X., Wang, N., Xin, J., et al.: Image super-resolution via multi-view information fusion networks. Neurocomputing 402, 29–37 (2020)

    Article  Google Scholar 

  25. Jiang, X., Wang, N., Xin, J., et al.: Learning lightweight super-resolution networks with weight pruning. Neural Netw. 144, 21–32 (2021)

    Article  Google Scholar 

  26. Li, H., Wang, N., Yu, Y., et al.: LBAN-IL: a novel method of high discriminative representation for facial expression recognition. Neurocomputing 432, 159–169 (2021)

    Article  Google Scholar 

  27. Li, H., Wang, N., Ding, X., et al.: Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans. Image Process. 30, 2016–2028 (2021)

    Article  Google Scholar 

  28. Zeiler M.D., Krishnan D., Taylor G.W., et al.: Deconvolutional networks. In: Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recongnition, pp. 25528-2535, San Francisco, CA, USA(2010)

  29. Cheng B. W., Xiao B., Wang J. D., Shi H. H.: HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5386-5395 (2020)

  30. Xu, C.H., Shi, C., Chen, Y.N.: End-to-end dilated convolution network for document image semantic segmentation. J. Cent. South Univ. 28(6), 1765–1774 (2021)

    Article  Google Scholar 

  31. Yu F., Koltun V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv preprint arXiv:1511.07122 (2015)

  32. Xiao B., Wu H., Wei Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision, pp. 466-481 (2018)

  33. Lin T. Y., Maire M., Belongie S., et al.: Microsoft coco: common objects in context. In: European Conference on Computer Vision. pp. 740-755 (2014)

  34. Mansour, R.F., Escorcia-Gutierrez, J., Gamarra, M., et al.: Intelligent video anomaly detection and classification using faster RCNN with deep reinforcement learning model. Image Vis. Comput. 112, 104229 (2021)

  35. Qiao L., Zhao Y., Li Z., et al.: Defrcn: Decoupled faster r-cnn for few-shot object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8681-8690 (2021)

  36. Albahli, S., Nawaz, M., Javed, A., et al.: An improved faster-RCNN model for handwritten character recognition. Arab. J. Sci. Eng. 46(9), 8509–8523 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Natural Science Basic Research Program of Shaanxi (Grant No. 2021JQ-572, 2021JQ-574), in part by the National Natural Science Foundation of China (grant No. 51804250, 51905416, 51804249), by the Xi’an Science and Technology Program (grant No. 2022JH-RGZN-0041), in part by the Qin Chuangyuan “Scientists + Engineers” Team Construction Program in Shaanxi Province (grant No. 2022KXJ-38), and in part by the Scientific Research Plan Projects of Shaanxi Education Department (grant No. 20JK0758).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fangfang Xin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xin, F., Zhang, H. & Pan, H. Hybrid dilated multilayer faster RCNN for object detection. Vis Comput 40, 393–406 (2024). https://doi.org/10.1007/s00371-023-02789-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02789-y

Keywords

Navigation