Advertisement

PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer

Conference paper
  • 570 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12366)

Abstract

Despite their strong modeling capacities, Convolutional Neural Networks (CNNs) are often scale-sensitive. For enhancing the robustness of CNNs to scale variance, multi-scale feature fusion from different layers or filters attracts great attention among existing solutions, while the more granular kernel space is overlooked. We bridge this regret by exploiting multi-scale features in a finer granularity. The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates and tactfully allocates them in the individual convolutional kernels of each filter regarding a single convolutional layer. Specifically, dilation rates vary cyclically along the axes of input and output channels of the filters, aggregating features over a wide range of scales in a neat style. PSConv could be a drop-in replacement of the vanilla convolution in many prevailing CNN backbones, allowing better representation learning without introducing additional parameters and computational complexities. Comprehensive experiments on the ImageNet and MS COCO benchmarks validate the superior performance of PSConv. Code and models are available at https://github.com/d-li14/PSConv.

Keywords

Convolutional kernel Multi-scale feature fusion Dilated convolution Categorization and detection 

Supplementary material

504479_1_En_37_MOESM1_ESM.pdf (11.5 mb)
Supplementary material 1 (pdf 11771 KB)

References

  1. 1.
    Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)Google Scholar
  2. 2.
    Chen, C.F.R., Fan, Q., Mallinar, N., Sercu, T., Feris, R.: Big-little net: an efficient multi-scale feature representation for visual and speech recognition. In: ICLR (2019)Google Scholar
  3. 3.
    Chen, K., et al.: MMDetection: open MMLab detection toolbox and benchmark. arXiv e-prints arXiv:1906.07155 (Jun 2019)
  4. 4.
    Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV (2018)Google Scholar
  5. 5.
    Chen, Y., et al.: Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In: ICCV (2019)Google Scholar
  6. 6.
    Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  8. 8.
    Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2net: a new multi-scale backbone architecture. IEEE TPAMI, 1 (2019)Google Scholar
  9. 9.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  10. 10.
    He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  12. 12.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)Google Scholar
  13. 13.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  14. 14.
    Jaderberg, M., Simonyan, K., Zisserman, A., kavukcuoglu, k.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  15. 15.
    Jeon, Y., Kim, J.: Active convolution: learning the shape of convolution for image classification. In: CVPR (2017)Google Scholar
  16. 16.
    Ke, T.W., Maire, M., Yu, S.X.: Multigrid neural architectures. In: CVPR (2017)Google Scholar
  17. 17.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)Google Scholar
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  19. 19.
    Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: ICCV (2019)Google Scholar
  20. 20.
    Li, Y., Kuang, Z., Chen, Y., Zhang, W.: Data-driven neuron allocation for scale aggregation networks. In: CVPR (2019)Google Scholar
  21. 21.
    Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Detnet: design backbone for object detection. In: ECCV (2018)Google Scholar
  22. 22.
    Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  23. 23.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  24. 24.
    Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: ECCV (2018)Google Scholar
  25. 25.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  26. 26.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV (2015)Google Scholar
  27. 27.
    Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)Google Scholar
  28. 28.
    Peng, J., Sun, M., Zhang, Z., Yan, J., Tan, T.: POD: practical object detection with scale-sensitive network. In: ICCV (2019)Google Scholar
  29. 29.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  30. 30.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  31. 31.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  32. 32.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  33. 33.
    Sun, K., et al.: High-resolution representations for labeling pixels and regions. arXiv e-prints arXiv:1904.04514, April 2019
  34. 34.
    Sun, S., Pang, J., Shi, J., Yi, S., Ouyang, W.: Fishnet: a versatile backbone for image, region, and pixel level prediction. In: NeurIPS (2018)Google Scholar
  35. 35.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  36. 36.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)Google Scholar
  37. 37.
    Tan, M., Le, Q.V.: Mixconv: Mixed depthwise convolutional kernels. In: BMVC (2019)Google Scholar
  38. 38.
    Wang, H., Kembhavi, A., Farhadi, A., Yuille, A.L., Rastegari, M.: ELASTIC: improving cnns with dynamic scaling policies. In: CVPR (2019)Google Scholar
  39. 39.
    Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)Google Scholar
  40. 40.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV (2015)Google Scholar
  41. 41.
    Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: CVPR (2017)Google Scholar
  42. 42.
    Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: CVPR (2018)Google Scholar
  43. 43.
    Zhang, R., Tang, S., Zhang, Y., Li, J., Yan, S.: Scale-adaptive convolutions for scene parsing. In: ICCV (2017)Google Scholar
  44. 44.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar
  45. 45.
    Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: ICCV (2019)Google Scholar
  46. 46.
    Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The Hong Kong University of Science and TechnologyKowloonHong Kong
  2. 2.Intel Labs ChinaBeijingChina

Personalised recommendations