Advertisement

Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

  • Domen TabernikEmail author
  • Matej Kristan
  • Aleš Leonardis
Article
  • 118 Downloads
Part of the following topical collections:
  1. Special Issue on Efficient Visual Recognition

Abstract

Convolutional neural networks excel in a number of computer vision tasks. One of their most crucial architectural elements is the effective receptive field size, which has to be manually set to accommodate a specific task. Standard solutions involve large kernels, down/up-sampling and dilated convolutions. These require testing a variety of dilation and down/up-sampling factors and result in non-compact networks and large number of parameters. We address this issue by proposing a new convolution filter composed of displaced aggregation units (DAU). DAUs learn spatial displacements and adapt the receptive field sizes of individual convolution filters to a given problem, thus reducing the need for hand-crafted modifications. DAUs provide a seamless substitution of convolutional filters in existing state-of-the-art architectures, which we demonstrate on AlexNet, ResNet50, ResNet101, DeepLab and SRN-DeblurNet. The benefits of this design are demonstrated on a variety of computer vision tasks and datasets, such as image classification (ILSVRC 2012), semantic segmentation (PASCAL VOC 2011, Cityscape) and blind image de-blurring (GOPRO). Results show that DAUs efficiently allocate parameters resulting in up to 4\(\times \) more compact networks in terms of the number of parameters at similar or better performance.

Keywords

Compact ConvNets Efficient ConvNets Displacement units Adjustable receptive fields 

Notes

Acknowledgements

The authors would like to thank Hector Basevi for his valuable comments and suggestion on improving the paper. This work was supported in part by the following research projects and programs: Project GOSTOP C3330-16-529000, DIVID J2-9433 and ViAMaRo L2-6765, Program P2-0214 financed by Slovenian Research Agency ARRS, and MURI Project financed by MoD/Dstl and EPSRC through EP/N019415/1 Grant. We thank Vitjan Zavrtanik for his contribution in porting the DAUs to the TensorFlow framework.

References

  1. Amidror, I. (2013). Mastering the discrete Fourier transform in one, two or several dimensions. Berlin: Springer.CrossRefGoogle Scholar
  2. Bruna, J., & Mallat, S. (2013). Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1872–86.  https://doi.org/10.1109/TPAMI.2012.230.CrossRefGoogle Scholar
  3. Chang, J., Gu, J., Wang, L., Meng, G., Xiang, S., & Pan, C. (2018). Structure-aware convolutional neural networks. In Proceedings of the neural information processing systems (pp. 1–10).Google Scholar
  4. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016a). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. In Pattern analysis and machine intelligence (pp. 1–14). arXiv:1606.00915.
  5. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016b). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.  https://doi.org/10.1109/TPAMI.2017.2699184.CrossRefGoogle Scholar
  6. Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.
  7. Chen, L. C., Zhu, Y., Papandreou, G., & Schroff, F. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In European conference on machine learning: Workshop on music and machine learning.Google Scholar
  8. Cheng, M. M., Zhang, Z., Lin, W. Y., & Torr, P. (2014). BING: Binarized normed gradients for objectness estimation at 300fps. In Computer vision and pattern recognition (pp. 3286–3293). IEEE.  https://doi.org/10.1109/CVPR.2014.414.
  9. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Computer vision and pattern recognition.  https://doi.org/10.1109/CVPR.2016.350.
  10. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In International conference on computer vision.  https://doi.org/10.1051/0004-6361/201527329.CrossRefGoogle Scholar
  11. Eigen, D., Rolfe, J., Fergus, R., & Lecun, Y. (2014). Understanding deep architectures using a recursive convolutional network (pp. 1–9). arXiv:1312.1847v2.
  12. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2011). The Pascal visual object classes challenge 2011 (VOC2011) results. Retrieved December 17, 2019, from http://host.robots.ox.ac.uk/pascal/VOC/voc2011/index.html.
  13. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (vol. 9, pp. 249–256). Google Scholar
  14. Hariharan, B., Arbel, P., Bourdev, L., Maji, S., Malik, J., Berkeley, U. C., Systems, A., Ave, P., & Jose, S. (2011). Semantic contours from inverse detectors. In International conference on computer vision.Google Scholar
  15. He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In European conference on computer vision (pp. 346–361).Google Scholar
  16. He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Deep residual learning for image recognition. In CVPR (pp. 171–180).  https://doi.org/10.3389/fpsyg.2013.00124.
  17. He, K., Zhang, X., Ren, S., & Sun, J. (2016b) Identity mappings in deep residual networks. In European conference on computer vision (vol. 9908, pp. 630–645). LNCS.  https://doi.org/10.1007/978-3-319-46493-0_38.CrossRefGoogle Scholar
  18. Holschneider, M., Kronland-Martinet, R., Morlet, J., & Tchamitchian, P. (1990). A real-time algorithm for signal analysis with the help of the wavelet transform. In J. M. Combes, A. Grossmann, & P. Tchamitchian (Eds.), Wavelets (pp. 286–297). Berlin: Springer.CrossRefGoogle Scholar
  19. Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50 \(\times \)fewer parameters and \(<0.5\) MB model size (pp 1–13).  https://doi.org/10.1007/978-3-319-24553-9.Google Scholar
  20. Jacobsen, J. H., van Gemert, J., Lou, Z., & Smeulders, A. W. M. (2016). Structured receptive fields in CNNs. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2610–2619).  https://doi.org/10.1109/CVPR.2016.286.
  21. Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. In British machine vision conference (p. 7).  https://doi.org/10.5244/C.28.88.
  22. Jeon, Y., & Kim, J. (2017). Active convolution: Learning the shape of convolution for image classification.  https://doi.org/10.1109/CVPR.2017.200.
  23. Kaiming, H., Gkioxara, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. In International conference on computer vision (pp. 2961–2969). arXiv:1703.06870.
  24. Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. In International conference on learning representations (pp. 1–13). arXiv:1412.6980v5.
  25. Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Science Department, University of Toronto, Tech Report (pp. 1–60).Google Scholar
  26. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105. arXiv:1102.0183.Google Scholar
  27. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2323.  https://doi.org/10.1109/5.726791.CrossRefGoogle Scholar
  28. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8828, 3431–3440.  https://doi.org/10.1109/CVPR.2015.7298965.CrossRefGoogle Scholar
  29. Luan, S., Zhang, B., Chen, C., Cao, X., Ye, Q., Han, J., & Liu, J. (2017). Gabor convolutional networks. In British machine vision conference (pp. 1–12). arXiv:1705.01450.
  30. Luo, P., Wang, G., Lin, L., & Wang, X. (2017). Deep dual learning for semantic image segmentation. In Computer vision and pattern recognition (CVPR) (pp. 2718–2726).  https://doi.org/10.1109/ICCV.2017.296.
  31. Luo, W., Li, Y., Urtasun, R., & Richard, Z. (2016). Understanding the effective receptive field in deep convolutional neural networks. In NIPS. arXiv:1701.04128.
  32. Nah, S., Kim, T. H., & Lee, K. M. (2017). Deep multi-scale convolutional neural network for dynamic scene deblurring. In Computer vision and pattern recognition (pp. 3883–3891).  https://doi.org/10.1109/CVPR.2017.35.
  33. Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger.  https://doi.org/10.1109/IJCNN.2015.7280696.
  34. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention—MICCAI 2015 (pp. 234–241).  https://doi.org/10.1007/978-3-319-24574-4_28.Google Scholar
  35. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision , 115(3), 211–252.  https://doi.org/10.1007/s11263-015-0816-y.MathSciNetCrossRefGoogle Scholar
  36. Shelhamer, E., Long, J., & Darrell, T. (2016). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651.  https://doi.org/10.1109/TPAMI.2016.2572683.CrossRefGoogle Scholar
  37. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (pp 1–14). arXiv:1409.1556v6.
  38. Tabernik, D., Kristan, M., & Leonardis, A. (2018). Spatially-adaptive filter units for deep neural networks. In Computer vision and pattern recognition (pp. 9388–9396). arXiv:1711.11473.
  39. Tabernik, D., Kristan, M., Wyatt, J. L., & Leonardis, A. (2016). Towards deep compositional networks. In International conference on pattern recognition. arXiv:1609.03795.
  40. Tao, X., Gao, H., Wang, Y., Shen, X., Wang, J., & Jia, J. (2018). Scale-recurrent network for deep image deblurring. In Computer vision and pattern recognition (pp. 8174–8182).  https://doi.org/10.1109/CVPR.2018.00853.
  41. Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Conference on computer vision and pattern recognition.  https://doi.org/10.1109/CVPR.2017.634.
  42. Yu, F., Koltun, V., & Funkhouser, T. (2017). Dilated residual networks. In Computer vision and pattern recognition. arxiv:1705.09914.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Faculty of Computer and Information ScienceUniversity of LjubljanaLjubljanaSlovenia
  2. 2.School of Computer ScienceUniversity of BirminghamBirminghamUK

Personalised recommendations