Skip to main content

An efficient and low power deep learning framework for image recognition on mobile devices


Image classification on mobile devices can provide convenient and secure services for users when using various social software. The traditional classification method mainly relies on the user’s manual marking, but the accuracy of automatic classification has some defects. With the development of convolutional neural network(CNN), the design of lightweight neural network has become a hot topic. However, the state-of-the-art studies always sacrifice classification accuracy for network lightweight, which greatly frustrates usability. In this paper, a new neural network framework, named MobVi, is proposed to enhance the precision of lightweight neural network by solution space division. MobVi is including image solution space division and judgment class. The former uses clustering method based on deep learning to distinguish which small solution space the image belongs to, while the latter uses lightweight neural network customized for the solution space to judge the class. In order to reduce the amount of model parameters and calculations, we designed a customized CNN module. Finally, we propose an energy prediction model to measure whether the model can be successfully implemented on mobile devices. A series of experiments have proved that MobVi has better performance than most existing models for mobile devices. Our model achieves 83.5% accuracy on CIFAR-10 data set, and the parameter quantity is only 2.0 M.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Alaziz, M., Jia, Z., Howard, R.E., Lin, X., Zhang, Y.: In-bed body motion detection and classification system. ACM Trans. Sens. Netw. 16(2), 13:1-13:26 (2020)

    Article  Google Scholar 

  2. Bay, H., Tuytelaars, T., Gool, L. Van.: Surf: Speeded up robust features. In: European Conference on Computer Vision(ECCV), pp. 404–417 (2006)

  3. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems 19, pp. 153–160 (2007)

  4. Bhandari, R., Nambi, A.U., Padmanabhan, V.N., Raman, B.: Driving lane detection on smartphones using deep neural networks. ACM Trans. Sens. Netw 16(1), 2:1-2:22 (2020)

    Article  Google Scholar 

  5. Bhardwaj, S., Srinivasan, M., Khapra, M. M.: Efficient video classification using fewer frames. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 354–363 (2019)

  6. Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: European Conference on Computer Vision(ECCV), pp. 139–156 (2018)

  7. Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C.: Deep adaptive image clustering. In: IEEE International Conference on Computer Vision (ICCV), pp. 5880–5888 (2017)

  8. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: International Conference on Knowledge Discovery and Data Mining, p. 133142 (2007)

  9. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 248–255 (2009)

  10. Franti, P., Virmajoki, O., Hautamaki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)

    Article  Google Scholar 

  11. Gowda, K., Krishna, G.: Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recog. 10(2), 105–112 (1978)

    Article  Google Scholar 

  12. Han, S., Mao, H., Dally, W. J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv: 1510.00149 (2016)

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  14. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile visionapplications. CoRR (2017). arXiv:1704.04861

  15. Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. CoRR (2016). arXiv:1602.07360

  16. Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: British Machine Vision Conference(BMVC) (2014)

  17. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Handb. Syst. Autoim. Dis. 1 (2009)

  18. Krizhevsky, A., Sutskever, I.,Hinton, G.: Imagenet:classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems(NIPS) (2012)

  19. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H. P.: Pruning filters for efficient convnets. In: International Conference on Learning Representations(ICLR) (2016)

  20. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  21. Norouzi, M., Fleet, D. J.: Cartesian k-means. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3017–3024 (2013)

  22. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018)

  23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations(ICLR) (2014)

  24. Sindhwani, V., Sainath, T., Kumar, S.: Structured transforms for small-footprint deep learning. In: International Conference on Neural Information Processing Systems(NIPS), pp. 3088–3096 (2015)

  25. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)

  26. Wang, J., Wang, J., Song, J., Xu, X., Shen, H., Li, S.: Optimized cartesian k-means. IEEE Trans. Knowl. Data Eng. 27(1), 180–192 (2015)

    Article  Google Scholar 

  27. Yang, Y., Xu, D., Nie, F., Yan, S., Zhuang, Y.: Image clustering using local discriminant models and global integration. IEEE Trans. Image Process. 19(10), 2761–2773 (2010)

    MathSciNet  Article  Google Scholar 

  28. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856 (2018)

Download references


This work was supported in part by International Cooperation Project of Shaanxi Province (No. 2020KW-004), the China Postdoctoral Science Foundation (No. 2017M613187), the Key Research and Development Project of Shaanxi Province (No. 2018SF-369), and the Shaanxi Science and Technology Innovation Team Support Project under grant agreement (No. 2018TD-026).

Author information



Corresponding author

Correspondence to Tianzhang Xing.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, G., Dai, X., Liu, X. et al. An efficient and low power deep learning framework for image recognition on mobile devices. CCF Trans. Pervasive Comp. Interact. (2021).

Download citation


  • Mobile devices
  • Convolutional neural network
  • Image classification
  • Solution space division