Hybrid dilated multilayer faster RCNN for object detection

Xin, Fangfang; Zhang, Huipeng; Pan, Hongguang

doi:10.1007/s00371-023-02789-y

Hybrid dilated multilayer faster RCNN for object detection

Original article
Published: 07 March 2023

Volume 40, pages 393–406, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Fangfang Xin¹,
Huipeng Zhang¹ &
Hongguang Pan¹

469 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Faster region-based convolution neural network (Faster RCNN) architecture was proposed as an efficient object detection method, wherein a CNN is used to extract image features. However, CNNs require a large number of learning parameters, and an excessive amount of pooling layers lead to a loss of information on small objects, which may affect efficiency. In this study, we proposed a hybrid dilated multilayer Faster RCNN model to address this problem. The key contributions of this work are summarized as follows: (1) We substituted a hybrid dilated CNN (HDC) model for the VGG16 network used in the original Faster RCNN architecture to extract features and ensure portability. We also used a LeakyReLU activation function to improve the mapping ability of negative input information to detect objects rapidly and accurately. (2) We used a multilayer feature spatial pyramid to convert single-scale features into multi-scale features, and higher-resolution information was obtained through a deconvolutional network to achieve more accurate object detection. (3) We conducted experiments to verify the performance of the proposed HDMF-RCNN model using the Microsoft COCO data set. The results indicated that the accuracy of HDMF-RCNN was 8.12% greater than that of the traditional Faster RCNN model, and the training loss and training time were lower by 44.64% and 39.46% on average, respectively. Overall, the results verified that HDMF-RCNN can significantly improve on the efficiency of existing object detection methods. As an independent feature extraction network, HDC can be adapted to different network frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Data availability

The code involved in this paper involves the subsequent application of related patents. Our code will be available for others and teams after other projects are completed.

References

Ren, J., Wang, Y.: Overview of object detection algorithms using convolutional neural networks. J. Comput. Commun. 10(1), 115–132 (2022)
Google Scholar
Zheng, W., Liu, X., Yin, L.: Research on image classification method based on improved multi-scale relational network. Peer J. Comput. Sci. 7, e613 (2021)
Article Google Scholar
Tripathi, M.: Analysis of convolutional neural network based image classification techniques. J. Innov. Image Process. 3(2), 100–117 (2021)
Article Google Scholar
Jiang, D., Li, G., Tan, C., et al.: Semantic segmentation for multiscale target based on object recognition using the improved Faster-RCNN model. Future Gener. Comput. Syst. 123, 94–104 (2021)
Article Google Scholar
Zheng, S., Lu, J., Zhao, H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6881-6890 (2021)
Lei, M., Rao, Z., Wang, H., et al.: Maceral groups analysis of coal based on semantic segmentation of photomicrographs via the improved U-net. Fuel 294, 120475 (2021)
Article Google Scholar
Liu Z., Hu H., Lin Y., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009-12019 (2022)
Fan H., Xiong B., Mangalam K., et al.: Multiscale vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6824-6835 (2021)
Zhang H., Li F., Liu S., et al.: Dino: detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022)
Zhao, Z.Q., Zheng, P., Xu, S., et al.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 20(11), 3212–3232 (2019)
Article Google Scholar
Wu, X., Sahoo, D., Hoi, S.C.H.: Recent advances in deep learning for object detection. Neurocomputing 396, 39–64 (2020)
Article Google Scholar
Lecun, Y., Bottou, E., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Lecun, Y., Boser, B., Denker, J.: Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 2, 396–404 (1990)
Google Scholar
Girshick R., Donahue J., Darrell T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587 (2014)
Chandra, M.A., Bedi, S.S.: Survey on SVM and their application in image classification. Int. J. Inf. Technol. 13(5), 1–11 (2021)
Google Scholar
He, K.M., Zhang, X.Y., Ren, S.Q., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2014)
Article Google Scholar
Girshick R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440-1448 (2015)
Ren S., He K., Girshick R, et al.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Mansour, R.F., Escorcia-Gutierrez, J., Gamarra, M., et al.: Intelligent video anomaly detection and classification using faster RCNN with deep reinforcement learning model. Image Vis. Comput. 112, 104229 (2021)
Article Google Scholar
Albahli, S., Nawaz, M., Javed, A., et al.: An improved faster-RCNN model for handwritten character recognition. Arab. J. Sci. Eng. 46(9), 8509–8523 (2021)
Article Google Scholar
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 472-480 (2017)
Zhao, Z., Li, Q., Zhang, Z., et al.: Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition. Neural Netw. 141, 52–60 (2021)
Article Google Scholar
Wang P., Chen P., Yuan Y., et al.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451-1460 (2018)
Jiang, X., Wang, N., Xin, J., et al.: Image super-resolution via multi-view information fusion networks. Neurocomputing 402, 29–37 (2020)
Article Google Scholar
Jiang, X., Wang, N., Xin, J., et al.: Learning lightweight super-resolution networks with weight pruning. Neural Netw. 144, 21–32 (2021)
Article Google Scholar
Li, H., Wang, N., Yu, Y., et al.: LBAN-IL: a novel method of high discriminative representation for facial expression recognition. Neurocomputing 432, 159–169 (2021)
Article Google Scholar
Li, H., Wang, N., Ding, X., et al.: Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans. Image Process. 30, 2016–2028 (2021)
Article Google Scholar
Zeiler M.D., Krishnan D., Taylor G.W., et al.: Deconvolutional networks. In: Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recongnition, pp. 25528-2535, San Francisco, CA, USA(2010)
Cheng B. W., Xiao B., Wang J. D., Shi H. H.: HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5386-5395 (2020)
Xu, C.H., Shi, C., Chen, Y.N.: End-to-end dilated convolution network for document image semantic segmentation. J. Cent. South Univ. 28(6), 1765–1774 (2021)
Article Google Scholar
Yu F., Koltun V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv preprint arXiv:1511.07122 (2015)
Xiao B., Wu H., Wei Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision, pp. 466-481 (2018)
Lin T. Y., Maire M., Belongie S., et al.: Microsoft coco: common objects in context. In: European Conference on Computer Vision. pp. 740-755 (2014)
Mansour, R.F., Escorcia-Gutierrez, J., Gamarra, M., et al.: Intelligent video anomaly detection and classification using faster RCNN with deep reinforcement learning model. Image Vis. Comput. 112, 104229 (2021)
Qiao L., Zhao Y., Li Z., et al.: Defrcn: Decoupled faster r-cnn for few-shot object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8681-8690 (2021)
Albahli, S., Nawaz, M., Javed, A., et al.: An improved faster-RCNN model for handwritten character recognition. Arab. J. Sci. Eng. 46(9), 8509–8523 (2021)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Natural Science Basic Research Program of Shaanxi (Grant No. 2021JQ-572, 2021JQ-574), in part by the National Natural Science Foundation of China (grant No. 51804250, 51905416, 51804249), by the Xi’an Science and Technology Program (grant No. 2022JH-RGZN-0041), in part by the Qin Chuangyuan “Scientists + Engineers” Team Construction Program in Shaanxi Province (grant No. 2022KXJ-38), and in part by the Scientific Research Plan Projects of Shaanxi Education Department (grant No. 20JK0758).

Author information

Authors and Affiliations

Xi’an University of Science and Technology, Xi’an, China
Fangfang Xin, Huipeng Zhang & Hongguang Pan

Authors

Fangfang Xin
View author publications
You can also search for this author in PubMed Google Scholar
Huipeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongguang Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fangfang Xin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xin, F., Zhang, H. & Pan, H. Hybrid dilated multilayer faster RCNN for object detection. Vis Comput 40, 393–406 (2024). https://doi.org/10.1007/s00371-023-02789-y

Download citation

Accepted: 20 January 2023
Published: 07 March 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00371-023-02789-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid dilated multilayer faster RCNN for object detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid dilated multilayer faster RCNN for object detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation