Abstract
With the continuous in-depth study of convolutional neural network in computer vision, how to improve the performance of network structure has been the focus of current research. Recent works have shown that multi-scale feature concatenation, shortcut connection and grouping convolution can effectively train deeper networks and improve the accuracy and effectiveness of the network. In this paper, we present a novel feature transformation strategy of fragmented multi-scale feature fusion. Moreover, an efficient modularized image classification network, IX-ResNet, is proposed based on this new strategy. IX-ResNet consists of many large isomorphic modules stacked in the form of residual network while Each large module can be composed of many small heterogeneous modules. The performance of IX-ResNet is verified on cifar-10, cifar-100 and ImageNet-1 K datasets, which indicates that IX-ResNet model using fragmented multi-scale feature fusion strategy can further improve accuracy compare to the original grouping convolution network ResNeXt with the same or even lower parameters.
Similar content being viewed by others
References
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) Challenge.IJCV,pages 303–338
Girshick R (2015) Fast R-CNN. In ICCV
Girshick R Donahue J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In ECCV
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. arXiv:1703.06870
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In ECCV
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In CVPR
Hornik K, Stinchcobe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks.arXiv preprint arXiv:1709.01507
Huang G, Liu Z, Weinberger KQ, Maaten L (2017) Densely connected convolutional networks. In CVPR
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2018) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal Applic
Khan MA, Sarfaraz MS, Alhaisoni MM, Albesher AA, Ashraf I (2020) StomachNet: Optimal Deep Learning Features Fusion for Stomach Abnormalities Classification. IEEE Access
Khan MA, Zhang YD, Khan SA, Attique M, Seo S (2020) A resource conscious human action recognition framework using 26-layered deep convolutional neural network. Multimed Tools Appl
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In NIPS
Larsson G,Maire M,Shakhnarovich G (2017) FractalNet: Ultra-Deep Neural Networks without Residuals. in ICLR
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie (2017) Feature pyramid networks for object detection. In CVPR
Lin T, Goyal P, Girshick RB, He K, Dollár P (2017) Focal loss for dense object detection. In ICCV
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, Berg A. C (2016) SSD: single shot multibox detector. In ECCV, pages 21–37
Ma N, Zhang X, Zheng H, Sun J (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. In ECCV
Mehmood A, Khan MA, Sharif M, Khan SA, Shaheen M, Saba T (2020) Prosperous human gait recognition: an end-to-end system based on pre-trained CNN features selection[J]. Multimed Tools Appl
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In ICML
Rashid M, Khan MA, Alhaisoni M, Wang SH, Naqvi SR, Rehman A (2020) A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection. Sustainability
Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified, real-time object detection. In CVPR, pages 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In NIPS
O Russakovsky, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2014) Imagenet large scale visual recognition challenge. arXiv:1409.0575,
Sermanet P, Eigen D, Zhang X, Mathieu M, Fer-gus R, LeCun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR
Sharif M, Khan MA, Zahid F, Shah JH, Akram T (2020) Human action recognition: a framework of statistical weighted segmentation and rank correlation-based selection. Pattern Anal Applic 23:281–294
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In ICLR
Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, inception-resnet and the impact of rescidual connections on learning. In ICLR Workshop
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S,Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In CVPR
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In CVPR
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated Residual Transformations for Deep Neural Networks.in CVPR
Zagoruyko S, Komodakis N (2016) Wide Residual Networks. In BMVC
Zhang X, XinyuZhou ML, Jian Sun M (2017) Inc. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.arXiv:1707.01083
Zhu C, He Y, Savvides M (2019) Feature Selective Anchor-Free Module for Single-Shot Object Detection. In CVPR
Acknowledgements
This research was supported by the Shaanxi Province Technical Innovation Foundation (grant No. 2020CGXNG-012).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xue, T., Hong, Y. IX-ResNet: fragmented multi-scale feature fusion for image classification. Multimed Tools Appl 80, 27855–27865 (2021). https://doi.org/10.1007/s11042-021-10893-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10893-1