While usage of convolutional neural networks (CNN) is widely prevalent, methods proposed so far always have considered homogeneous kernels for this task. In this paper, we propose a new type of convolution operation using heterogeneous kernels. The proposed Heterogeneous Kernel-Based Convolution (HetConv) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while it maintains representational efficiency. To show the effectiveness of our proposed convolution, we present extensive experimental results on the standard CNN architectures such as VGG, ResNet, Faster-RCNN, MobileNet, and SSD. We observe that after replacing the standard convolutional filters in these architectures with our proposed HetConv filters, we achieve 1.5 \(\times \) to 8 \(\times \) FLOPs based improvement in speed while it maintains (sometimes improves) the accuracy. We also compare our proposed convolution with group/depth wise convolution and show that it achieves more FLOPs reduction with significantly higher accuracy. Moreover, we demonstrate the efficacy of HetConv based CNN by showing that it also generalizes on object detection and is not constrained to image classification tasks. We also empirically show that the proposed HetConv convolution is more robust towards the over-fitting problem as compared to standard convolution.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
The size of the set of transformations.
One parallel step is converted to multiple sequential step hence reduction in parallelizability. The later stage of layers waits for the execution to be finished on the previous stage because all computations have to be done sequentially across layers.
Abbasi-Asl, R., & Yu, B. (2017). Structural compression of convolutional neural networks based on greedy filter pruning. arXiv preprint arXiv:1705.07356.
Alvarez, J. M., & Salzmann, M. (2016). Learning the number of neurons in deep networks. In NIPS (pp. 2270–2278).
Brock, A., Lim, T., Ritchie, J. M., & Weston, N. (2018). Smash: One-shot model architecture search through hypernetworks. In ICLR.
Cai, H., Chen, T., Zhang, W., Yu, Y., & Wang, J. (2018a). Efficient architecture search by network transformation. In Thirty-second AAAI conference on artificial intelligence.
Cai, H., Yang, J., Zhang, W., Han, S., & Yu, Y. (2018b). Path-level network transformation for efficient architecture search. In ICML.
Chen, W., Wilson, J., Tyree, S., Weinberger, K., & Chen, Y. (2015). Compressing neural networks with the hashing trick. In ICML (pp. 2285–2294).
Chen, Y., Fang, H., Xu, B., Yan, Z., Kalantidis, Y., Rohrbach, M., et al. (2019). Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. arXiv preprint arXiv:1904.05049
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In CVPR.
Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In NIPS.
Ding, X., Ding, G., Han, J., & Tang, S. (2018). Auto-balanced filter pruning for efficient convolutional neural networks. In AAAI.
Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In ICML.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).
Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In ICLR.
Hassibi, B., & Stork, D. G. (1993). Second order derivatives for network pruning: Optimal brain surgeon. In NIPS.
He, K., & Sun, J. (2015). Convolutional neural networks at constrained time cost. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5353–5360). IEEE.
He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Deep residual learning for image recognition. In CVPR (pp. 770–778).
He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Identity mappings in deep residual networks. In European conference on computer vision (pp. 630–645). Springer.
He, Y., Kang, G., Dong, X., Fu, Y., & Yang, Y. (2018). Soft filter pruning for accelerating deep convolutional neural networks. In IJCAI.
He, Y., Zhang, X., Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In ICCV (p. 6).
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
Hu, H., Peng, R., Tai, Y. W., & Tang, C. K. (2016). Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In CVPR.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR.
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50\(\,\times \) fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360.
Ioannou, Y., Robertson, D., Cipolla, R., & Criminisi, A. (2017). Deep roots: Improving cnn efficiency with hierarchical filter groups. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1231–1240).
Ioannou, Y., Robertson, D., Shotton, J., Cipolla, R., & Criminisi, A. (2015). Training CNNs with low-rank filters for efficient image classification. arXiv preprint arXiv:1511.06744.
Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866.
Kamath, P., Singh, A., & Dutta, D. (2018). Neural architecture construction using envelopenets. arXiv preprint arXiv:1803.06744.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS (pp. 1097–1105).
Lebedev, V., Lempitsky, V. (2016). Fast convnets using group-wise brain damage. In CVPR (pp. 2554–2564).
LeCun, Y., Denker, J. S., & Solla, S. A. (1990). Optimal brain damage. In NIPS (pp. 598–605).
Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2016). Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710.
Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. . In ICLR.
Li, Y., Kuang, Z., Chen, Y., & Zhang, W. (2019). Data-driven neuron allocation for scale aggregation networks. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 11,526–11,534).
Lin, J., Rao, Y., Lu, J., & Zhou, J. (2017a). Runtime neural pruning. In Advances in neural information processing systems (pp. 2181–2191).
Lin, T. Y., Dollár, P., Girshick, R. B., He, K., Hariharan, B., & Belongie, S. J. (2017b). Feature pyramid networks for object detection. In CVPR (p. 4).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer.
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L. J., et al. (2018). Progressive neural architecture search. In Proceedings of the European conference on computer vision (ECCV) (pp. 19–34).
Liu, H., Simonyan, K., Vinyals, O., Fernando, C., & Kavukcuoglu, K. (2017a). Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., et al. (2016). Ssd: Single shot multibox detector. In ECCV (pp. 21–37). Springer.
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017b). Learning efficient convolutional networks through network slimming. In ICCV (pp. 2755–2763). IEEE.
Louizos, C., Ullrich, K., & Welling, M. (2017). Bayesian compression for deep learning. In NIPS (pp. 3288–3298).
Luo, J. H., Wu, J., Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In CVPR (pp. 5058–5066).
Miao, H., Li, A., Davis, L. S., & Deshpande, A. (2017). Towards unified data and lifecycle management for deep learning. In ICDE (pp. 571–582). IEEE.
Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2017). Pruning convolutional neural networks for resource efficient inference. In ICLR.
Neklyudov, K., Molchanov, D., Ashukha, A., & Vetrov, D. P. (2017). Structured bayesian pruning via log-normal multiplicative noise. In NIPS (pp. 6775–6784).
Noroozi, M., & Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision (pp. 69–84). Springer.
Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. In ICML.
Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV (pp. 525–542). Springer.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. IJCV, 115(3), 211–252.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Singh, P., Verma, V. K., Rai, P., & Namboodiri, V. P. (2019). Hetconv: Heterogeneous kernel-based convolutions for deep CNNs. In The IEEE conference on computer vision and pattern recognition (CVPR).
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In Advances in neural information processing systems (pp. 4077–4087).
Stamoulis, D., Ding, R., Wang, D., Lymberopoulos, D., Priyantha, B., Liu, J., et al. (2019). Single-path nas: Designing hardware-efficient convnets in less than 4 hours. arXiv preprint arXiv:1904.02877.
Sun, K., Li, M., Liu, D., & Wang, J. (2018). Igcv3: Interleaved low-rank group convolutions for efficient deep neural networks. arXiv preprint arXiv:1806.00178.
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI (vol. 4, p. 12).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).
Vanhoucke, V. (2014) Learning visual representations at scale. In ICLR invited talk.
Verma, V. K., Arora, G., Mishra, A., Rai, P. (2018). Generalized zero-shot learning via synthesized examples. In The IEEE conference on computer vision and pattern recognition (CVPR).
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H. (2016). Learning structured sparsity in deep neural networks. In NIPS (pp. 2074–2082).
Wu, Z., Nagarajan, T., Kumar, A., Rennie, S., Davis, L. S., Grauman, K., et al. (2018). Blockdrop: Dynamic inference paths in residual networks. In CVPR (pp. 8817–8826).
Xie, G., Wang, J., Zhang, T., Lai, J., Hong, R., & Qi, G. J. (2018). Interleaved structured sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8847–8856).
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K. (2017). Aggregated residual transformations for deep neural networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5987–5995). IEEE.
Yu, R., Li, A., Chen, C. F., Lai, J. H., Morariu, V. I., Han, X., et al. (2018). Nisp: Pruning networks using neuron importance score propagation. In CVPR.
Zhang, T., Qi, G. J., Xiao, B., & Wang, J. (2017). Interleaved group convolutions. In Proceedings of the IEEE international conference on computer vision (pp. 4373–4382).
Zhang, X., Zhou, X., Lin, M., Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices.
Zhang, X., Zou, J., Ming, X., He, K., & Sun, J. (2015). Efficient and accurate approximations of nonlinear convolutional networks. In NIPS (pp. 1984–1992).
Zhou, H., Alvarez, J. M., & Porikli, F. (2016). Less is more: Towards compact cnns. In ECCV (pp. 662–677). Springer.
Zoph, B., & Le, Q.V. (2017). Neural architecture search with reinforcement learning. In ICLR.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.
About this article
Cite this article
Singh, P., Verma, V.K., Rai, P. et al. HetConv: Beyond Homogeneous Convolution Kernels for Deep CNNs. Int J Comput Vis 128, 2068–2088 (2020). https://doi.org/10.1007/s11263-019-01264-3
- Efficient convolutional neural networks
- Heterogeneous convolution
- FLOPs compression
- Model compression
- Efficient visual recognition