Advertisement

Quantized Densely Connected U-Nets for Efficient Landmark Localization

  • Zhiqiang Tang
  • Xi Peng
  • Shijie Geng
  • Lingfei Wu
  • Shaoting Zhang
  • Dimitris Metaxas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)

Abstract

In this paper, we propose quantized densely connected U-Nets for efficient visual landmark localization. The idea is that features of the same semantic meanings are globally reused across the stacked U-Nets. This dense connectivity largely improves the information flow, yielding improved localization accuracy. However, a vanilla dense design would suffer from critical efficiency issue in both training and testing. To solve this problem, we first propose order-K dense connectivity to trim off long-distance shortcuts; then, we use a memory-efficient implementation to significantly boost the training efficiency and investigate an iterative refinement that may slice the model size in half. Finally, to reduce the memory consumption and high precision operations both in training and testing, we further quantize weights, inputs, and gradients of our localization network to low bit-width numbers. We validate our approach in two tasks: human pose estimation and face alignment. The results show that our approach achieves state-of-the-art localization accuracy, but using \(\sim \)70% fewer parameters, \(\sim \)98% less model size and saving \(\sim \)32\(\times \) training memory compared with other benchmark localizers.

Notes

Acknowledgment

This work is partly supported by the Air Force Office of Scientific Research (AFOSR) under the Dynamic Data-Driven Application Systems Program, NSF 1763523, 1747778, 1733843 and 1703883 Awards.

References

  1. 1.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: New benchmark and state of the art analysis. In: CVPR (2014)Google Scholar
  2. 2.
    Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. Artif. Res. 22, 385–421 (2004)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Belagiann., V., Zisserman, A.: Recurrent human pose estimation. In: FG (2017)Google Scholar
  4. 4.
    Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_44CrossRefGoogle Scholar
  5. 5.
    Bulat, A., Tzimiropoulos, G.: Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: ICCV (2017)Google Scholar
  6. 6.
    Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: CVPR (2016)Google Scholar
  7. 7.
    Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial posenet: A structure-aware convolutional network for human pose estimation. In: ICCV (2017)Google Scholar
  8. 8.
    Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A., Wang, X.: Multi-context attention for human pose estimation. In: CVPR (2016)Google Scholar
  9. 9.
    Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or \(-1\). arXiv (2016)Google Scholar
  10. 10.
    Gkioxari, G., Toshev, A., Jaitly, N.: Chained predictions using convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 728–743. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_44CrossRefGoogle Scholar
  11. 11.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTAT (2011)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  13. 13.
    Hu, P., Ramanan, D.: Bottom-up and top-down reasoning with hierarchical rectified Gaussians. In: CVPR (2016)Google Scholar
  14. 14.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  15. 15.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
  16. 16.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  17. 17.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)Google Scholar
  18. 18.
    Li, D., Wang, X., Kong, D.: DeepRebirth: accelerating deep neural network execution on mobile devices. AAAI (2018)Google Scholar
  19. 19.
    Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv (2016)Google Scholar
  20. 20.
    Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 246–260. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_16CrossRefGoogle Scholar
  21. 21.
    Lv, J., Shao, X., Xing, J., Cheng, C., Zhou, X.: A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: CVPR (2017)Google Scholar
  22. 22.
    McMahan, H.B., Moore, E., Ramage, D., Hampson, S., et al.: Communication-efficient learning of deep networks from decentralized data. arXiv (2016)Google Scholar
  23. 23.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  24. 24.
    Peng, X., Feris, R.S., Wang, X., Metaxas, D.N.: A recurrent encoder-decoder network for sequential face alignment. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 38–56. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_3CrossRefGoogle Scholar
  25. 25.
    Peng, X., Feris, R.S., Wang, X., Metaxas, D.N.: RED-Net: a recurrent encoder-decoder network for video-based face alignment. IJCV (2018)Google Scholar
  26. 26.
    Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D.: Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: CVPR (2018)Google Scholar
  27. 27.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV (2013)Google Scholar
  28. 28.
    Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)Google Scholar
  29. 29.
    Pleiss, G., Chen, D., Huang, G., Li, T., van der Maaten, L., Weinberger, K.Q.: Memory-efficient implementation of DenseNets. arXiv (2017)Google Scholar
  30. 30.
    Rafi, U., Leibe, B., Gall, J., Kostrikov, I.: An efficient convolutional network for human pose estimation. In: BMVC (2016)Google Scholar
  31. 31.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  32. 32.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  33. 33.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: ICCVW (2013)Google Scholar
  34. 34.
    Shi, B., Bai, X., Liu, W., Wang, J.: Deep regression for face alignment. arXiv (2014)Google Scholar
  35. 35.
    Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR (2015)Google Scholar
  36. 36.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)Google Scholar
  37. 37.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)Google Scholar
  38. 38.
    Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: CVPR (2016)Google Scholar
  39. 39.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  40. 40.
    Wu, S., Li, G., Chen, F., Shi, L.: Training and inference with integers in deep neural networks. In: ICLR (2018)Google Scholar
  41. 41.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)Google Scholar
  42. 42.
    Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: ICCV (2017)Google Scholar
  43. 43.
    Zafeiriou, S., Trigeorgis, G., Chrysos, G., Deng, J., Shen, J.: The Menpo facial landmark localisation challenge: a step towards the solution. In: CVPRW (2017)Google Scholar
  44. 44.
    Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 1–16. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_1CrossRefGoogle Scholar
  45. 45.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_7CrossRefGoogle Scholar
  46. 46.
    Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.: Learning to forecast and refine residual motion for image-to-video generation. In: ECCV (2018)Google Scholar
  47. 47.
    Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv (2016)Google Scholar
  48. 48.
    Zhu, S., Li, C., Change Loy, C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: CVPR (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Zhiqiang Tang
    • 1
  • Xi Peng
    • 2
  • Shijie Geng
    • 1
  • Lingfei Wu
    • 3
  • Shaoting Zhang
    • 4
  • Dimitris Metaxas
    • 1
  1. 1.Rutgers UniversityNew BrunswickUSA
  2. 2.Binghamton UniversityBinghamtonUSA
  3. 3.IBM T. J. WatsonYorktown HeightsUSA
  4. 4.SenseTimeBeijingChina

Personalised recommendations