Skip to main content
Log in

GhostNets on Heterogeneous Devices via Cheap Operations

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Deploying convolutional neural networks (CNNs) on mobile devices is difficult due to the limited memory and computation resources. We aim to design efficient neural networks for heterogeneous devices including CPU and GPU, by exploiting the redundancy in feature maps, which has rarely been investigated in neural architecture design. For CPU-like devices, we propose a novel CPU-efficient Ghost (C-Ghost) module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. C-Ghost bottlenecks are designed to stack C-Ghost modules, and then the lightweight C-GhostNet can be easily established. We further consider the efficient networks for GPU devices. Without involving too many GPU-inefficient operations (e.g., depth-wise convolution) in a building stage, we propose to utilize the stage-wise feature redundancy to formulate GPU-efficient Ghost (G-Ghost) stage structure. The features in a stage are split into two parts where the first part is processed using the original block with fewer output channels for generating intrinsic features, and the other are generated using cheap operations by exploiting stage-wise redundancy. Experiments conducted on benchmarks demonstrate the effectiveness of the proposed C-Ghost module and the G-Ghost stage. C-GhostNet and G-GhostNet can achieve the optimal trade-off of accuracy and latency for CPU and GPU, respectively. MindSpore code is available at https://gitee.com/mindspore/models/pulls/1809, and PyTorch code is available at https://github.com/huawei-noah/CV-Backbones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. https://www.tensorflow.org/

  • Cai, H., Zhu, L., & Han, S. (2019). Proxylessnas: Direct neural architecture search on target task and hardware. In ICLR.

  • Chen, H., Wang, Y., Xu, C., Yang, Z., Liu, C., Shi, B., Xu, C., Xu, C., & Tian, Q. (2019a). Data-free learning of student networks. In ICCV (pp. 3514–3522).

  • Chen, H., Wang, Y., Xu, C., Shi, B., Xu, C., Tian, Q., & Xu, C. (2020a). Addernet: Do we really need multiplications in deep learning? In CVPR (pp. 1468–1477).

  • Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C. C., & Lin, D. (2019b). MMDetection: Open mmlab detection toolbox and benchmark. ArXiv preprint arXiv:1906.07155.

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Semantic image segmentation with deep convolutional nets and fully connected CRFs. In ICLR.

  • Chen, W., Xie, D., Zhang, Y., & Pu, S. (2019c). All you need is a few shifts: Designing efficient convolutional neural networks for image classification. In CVPR (pp. 7241–7250).

  • Chen, W., Gong, X., Liu, X., Zhang, Q., Li, Y., & Wang, Z. (2020b). Fasterseg: Searching for faster real-time semantic segmentation. In ICLR.

  • Chin, T. W., Ding, R., Zhang, C., & Marculescu, D. (2020). Towards efficient model compression via learned global ranking. In CVPR (pp. 1518–1528).

  • Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In CVPR (pp. 1251–1258).

  • Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V. (2020). Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops (pp. 702–703).

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR (pp. 248–255). IEEE.

  • Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In NeurIPS (pp. 1269–1277).

  • Forrest, N. I., Song, H., Matthew, W., Khalid, A., & Dally, J. W. (2017). Squeezenet: Alexnet-level accuracy with 50\(\times \) fewer parameters and 0.5 mb model size. In ICLR.

  • Gholami, A., Kwon, K., Wu, B., Tai, Z., Yue, X., Jin, P., Zhao, S., & Keutzer, K. (2018). Squeezenext: Hardware-aware neural network design. In CVPR workshops (pp. 1638–1647).

  • Gong, X., Chang, S., Jiang, Y., & Wang, Z. (2019). Autogan: Neural architecture search for generative adversarial networks. In ICCV (pp. 3224–3234).

  • Gui, S., Wang, H. N., Yang, H., Yu, C., Wang, Z., & Liu, J. (2019). Model compression with adversarial robustness: A unified optimization framework. In NeurIPS (Vol. 32, pp. 1285–1296).

  • Guo, J., Han, K., Wang, Y., Wu, H., Chen, X., Xu, C., & Xu, C. (2021). Distilling object detectors via decoupled features. In CVPR (pp. 2154–2164).

  • Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., Sugiyama, M. (2018a). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NeurIPS (pp. 8535–8545).

  • Han, K., Guo, J., Zhang, C., & Zhu, M. (2018b). Attribute-aware attention model for fine-grained representation learning. In Proceedings of the 26th ACM international conference on Multimedia (pp. 2040–2048).

  • Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., & Xu, C. (2020a). Ghostnet: More features from cheap operations. In CVPR (pp. 1580–1589).

  • Han, K., Wang, Y., Xu, Y., Xu, C., Wu, E., & Xu, C. (2020b). Training binary neural networks through learning with noisy supervision. In ICML (pp. 4017–4026).

  • Han, K., Wang, Y., Xu, C., Xu, C., Wu, E., & Tao, D. (2021). Learning versatile convolution filters for efficient visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3114368

    Article  Google Scholar 

  • Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In NeurIPS (pp. 1135–1143).

  • Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In ICLR.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

  • He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In ICCV (pp. 1389–1397).

  • He, Y., Kang, G., Dong, X., Fu, Y., & Yang, Y. (2018a). Soft filter pruning for accelerating deep convolutional neural networks. In IJCAI (pp. 2234–2240).

  • He, Y., Lin, J., Liu, Z., Wang, H., Li, L. J., & Han, S. (2018b). AMC: Automl for model compression and acceleration on mobile devices. In ECCV (pp. 784–800).

  • He, Y., Liu, P., Wang, Z., Hu, Z., & Yang, Y. (2019). Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR (pp. 4340–4349).

  • He, Y., Ding, Y., Liu, P., Zhu, L., Zhang, H., & Yang, Y. (2020). Learning filter pruning criteria for deep convolutional neural networks acceleration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2009–2018).

  • Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. ArXiv preprint arXiv:1503.02531.

  • Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., & Vasudevan, V., et al. (2019). Searching for mobilenetv3. In ICCV (pp. 1314–1324).

  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv preprint arXiv:1704.04861.

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In CVPR (pp. 7132–7141).

  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR (pp. 4700–4708).

  • Huang, Z., & Wang, N. (2018). Data-driven sparse structure selection for deep neural networks. In ECCV (pp. 304–320).

  • Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., & Bengio, Y. (2016). Binarized neural networks. In NeurIPS (pp. 4107–4115).

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML (pp. 448–456).

  • Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In CVPR (pp. 2704–2713).

  • Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. In BMVC. https://doi.org/10.5244/C.28.88

    Article  Google Scholar 

  • Jeon, Y., & Kim, J. (2018). Constructing fast network through deconstruction of convolution. In NeurIPS (pp. 5951–5961).

  • Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Citeseer: Tech. rep.

    Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS (pp. 1097–1105).

  • Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. In ICLR.

  • Liebenwein, L., Baykal, C., Lang, H., Feldman, D., & Rus, D. (2020). Provable filter pruning for efficient neural networks. In ICLR.

  • Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., & Shao, L. (2020a). Hrank: Filter pruning using high-rank feature map. In CVPR (pp. 1529–1538).

  • Lin, M., Ji, R., Zhang, Y., Zhang, B., Wu, Y., Tian, Y. (2020b). Channel pruning via automatic structure search. In IJCAI (pp. 673–679).

  • Lin, S., Ji, R., Yan, C., Zhang, B., Cao, L., Ye, Q., Huang, F., & Doermann, D. (2019). Towards optimal structured CNN pruning via generative adversarial learning. In CVPR (pp. 2790–2799).

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer.

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In CVPR (pp. 2117–2125).

  • Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In ICCV (pp. 2980–2988).

  • Liu, C., Wang, Y., Han, K., Xu, C., & Xu, C. (2019a). Learning instance-wise sparsity for accelerating deep models. In IJCAI (pp. 3001–3007).

  • Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., & Cheng, K. T. (2018). Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In ECCV (pp. 722–737).

  • Liu, Z., Mu, H., Zhang, X., Guo, Z., Yang, X., Cheng, T. K. T., & Sun, J. (2019b). Metapruning: Meta learning for automatic neural network channel pruning. In ICCV (pp. 3296–3305).

  • Liu, Z., Sun, M., Zhou, T., Huang, G., & Darrell, T. (2019c). Rethinking the value of network pruning. In ICLR.

  • Luo, J. H., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In ICCV (pp. 5058–5066).

  • Ma, N., Zhang, X., Zheng, H.T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient CNN architecture design. In ECCV (pp. 116–131).

  • Molchanov, P., Mallya, A., Tyree, S., Frosio, I., & Kautz, J. (2019). Importance estimation for neural network pruning. In CVPR (pp. 11264–11272).

  • Ning, X., Zhao, T., Li, W., Lei, P., Wang, Y., & Yang, H. (2020). DSA: More efficient budgeted pruning via differentiable sparsity allocation. In ECCV (pp. 592–607).

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L., & Desmaison, A. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS (Vol. 32, pp. 8026–8037).

  • Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., & Dollár, P. (2020). Designing network design spaces. In CVPR (pp. 10428–10436).

  • Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV (pp. 525–542). Springer.

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS (Vol. 28, pp. 91–99).

  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR (pp. 4510–4520).

  • Shen, M., Han, K., Xu, C., & Wang, Y. (2019). Searching for accurate binary neural architectures. In ICCV workshops.

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR (pp. 1–9).

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In CVPR (pp. 2818–2826).

  • Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML (pp. 6105–6114).

  • Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In CVPR (pp. 2820–2828).

  • Wan, A., Dai, X., Zhang, P., He, Z., Tian, Y., Xie, S., Wu, B., Yu, M., Xu, T., & Chen, K., et al. (2020). Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In CVPR (pp. 12965–12974).

  • Wang, Y., Xu, C., You, S., Tao, D., & Xu, C. (2016). CNNpack: packing convolutional neural networks in the frequency domain. In NeurIPS (pp. 253–261).

  • Wang, Y., Xu, C., XU, C., Xu, C., & Tao, D. (2018). Learning versatile filters for efficient convolutional neural networks. In NeurIPS (Vol. 31, pp. 1608–1618).

  • Wang, Y., Jiang, Z., Chen, X., Xu, P., Zhao, Y., Lin, Y., & Wang, Z. (2019). E2-train: Training state-of-the-art CNNs with over 80% energy savings. In NeurIPS (Vol. 32, pp. 5138–5150).

  • Wen, W., Wu, C., Wang, Y., Chen, Y., & Li, H. (2016). Learning structured sparsity in deep neural networks. In NeurIPS (pp. 2074–2082).

  • Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52(4), 65–76.

    Article  Google Scholar 

  • Wilson, R. C., Hancock, E. R., & Smith, W. A. P. (2016). Wide residual networks. In BMVC.

  • Wu, B., Wan, A., Yue, X., Jin, P., Zhao, S., Golmant, N., Gholaminejad, A., Gonzalez, J., & Keutzer, K. (2018). Shift: A zero flop, zero parameter alternative to spatial convolutions. In CVPR (pp. 9127–9135).

  • Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR (pp. 10734–10742).

  • Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In CVPR (pp. 1492–1500).

  • Xu, Y., Wang, Y., Chen, H., Han, K., Chunjing, X., Tao, D., & Xu, C. (2019). Positive-unlabeled compression on the cloud. In NeurIPS (Vol. 32, pp. 2565–2574).

  • Yang, L., Jiang, H., Cai, R., Wang, Y., Song, S., Huang, G., & Tian, Q. (2021). Condensenet v2: Sparse feature reactivation for deep networks. In CVPR (pp. 3569–3578).

  • Yang, Z., Wang, Y., Liu, C., Chen, H., Xu, C., Shi, B., Xu, C., & Xu, C. (2019). Legonet: Efficient convolutional neural networks with lego filters. In ICML (pp. 7005–7014).

  • Yang, Z., Wang, Y., Chen, X., Shi, B., Xu, C., Xu, C., Tian, Q., & Xu, C. (2020a). Cars: Continuous evolution for efficient neural architecture search. In CVPR (pp. 1829–1838).

  • Yang, Z., Wang, Y., Han, K., Xu, C., Xu, C., Tao, D., & Xu, C. (2020b). Searching for low-bit weights in quantized neural networks. In NeurIPS (Vol. 33, pp. 4091–4102).

  • You, S., Xu, C., Xu, C., & Tao, D. (2017). Learning from multiple teacher networks. In SIGKDD (pp. 1285–1294).

  • Yu, J., Yang, L., Xu, N., Yang, J., & Huang, T. (2019). Slimmable neural networks. In ICLR. https://openreview.net/forum?id=H1gMCsAqY7

  • Yu, R., Li, A., Chen, C. F., Lai, J. H., Morariu, V. I., Han, X., Gao, M., Lin, C. Y., Davis, L. S. (2018). Nisp: Pruning networks using neuron importance score propagation. In CVPR (pp. 9194–9203).

  • Zagoruyko, S. (2015). 92.45 on cifar-10 in torch. http://torch.ch/blog/2015/07/30/cifar.html

  • Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR (pp. 6848–6856).

  • Zhou, D., Hou, Q., Chen, Y., Feng, J., & Yan, S. (2020). Rethinking bottleneck structure for efficient mobile network design. In ECCV. Springer (pp. 680–697).

  • Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In CVPR (pp. 8697–8710).

Download references

Acknowledgements

This work was supported by NSFC (62072449, 61872241, 61632003), Macao FDCT Grant (0018/2019/AKP). Chang Xu was supported by the Australian Research Council under Project DP210101859 and the University of Sydney SOAR Prize. This project is also partially supported by CANN (https://www.hiascend.com/software/cann).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunhe Wang.

Additional information

Communicated by Jifeng Dai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, K., Wang, Y., Xu, C. et al. GhostNets on Heterogeneous Devices via Cheap Operations. Int J Comput Vis 130, 1050–1069 (2022). https://doi.org/10.1007/s11263-022-01575-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01575-y

Keywords

Navigation