Skip to main content
Log in

Smoothness-based consistency learning for macaque pose estimation

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Macaques are a rare substitute and play an important role in study of human psychology and spiritual science. Accurate estimation of macaque pose information is key to these studies, macaque pose estimation remains to be hindered by the scarcity of labeled images. To address this problem, this work introduces a novel semi-supervised approach called smoothness-based spatio-temporal consistency learning (SSTCL) and a dual network structure (DNS) to leverage the amounts of unlabeled real images. Specifically, the SSTCL introduces the smoothness assumption to help the model generalize from the labeled training images to the unlabeled images, and the spatio-temporal consistency is designed to leverage both spatial and temporal consistencies to pick the most reliable pseudo-labels. Moreover, a dual network structure (DNS) is proposed to empower the model the ability of self-correction, which can prevent the degeneration caused by the noisy pseudo-labels in semi-supervised learning. In ablation experiments, the effectiveness of DNS for pseudo-label quality assurance is demonstrated. We evaluate the proposed method on the public OpenMonkeyPose dataset, the results show that the proposed method can achieve competitive performance while using less labeled images, and the final accuracy surpasses the strong baseline HRNet-w48 of 2.1 AP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

In this article, public dataset OpenMonkeyStudio is available at https://github.com/OpenMonkeyStudio/OMS_Data

References

  1. Mathis, A., Mamidanna, P., Cury, K.M., Abe, T., Murthy, V.N., Mathis, M.W., Bethge, M.: Deeplabcut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21(9), 1281–1289 (2018)

    Article  Google Scholar 

  2. Pereira, T.D., Aldarondo, D.E., Willmore, L., Kislin, M., Wang, S.S.-H., Murthy, M., Shaevitz, J.W.: Fast animal pose estimation using deep neural networks. Nat. Methods 16(1), 117–125 (2019)

    Article  Google Scholar 

  3. Graving, J.M., Chae, D., Naik, H., Li, L., Koger, B., Costelloe, B.R., Couzin, I.D.: Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning. Elife 8, 47994 (2019)

    Article  Google Scholar 

  4. Negrete, S.B., Labuguen, R., Matsumoto, J., Go, Y., Inoue, K.-i., Shibata, T.: Multiple monkey pose estimation using openpose. bioRxiv (2021)

  5. Pereira, T.D., Tabris, N., Li, J., Ravindranath, S., Papadoyannis, E.S., Wang, Z.Y., Turner, D.M., McKenzie-Smith, G., Kocher, S.D., Falkner, A.L., et al.: Sleap: multi-animal pose tracking. BioRxiv (2020)

  6. Mathis, A., Biasi, T., Schneider, S., Yuksekgonul, M., Rogers, B., Bethge, M., Mathis, M.W.: Pretraining boosts out-of-domain robustness for pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1859–1868 (2021)

  7. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)

  8. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)

  9. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)

  10. Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv preprint arXiv:1703.01780 (2017)

  11. Abuduweili, A., Li, X., Shi, H., Xu, C.-Z., Dou, D.: Adaptive consistency regularization for semi-supervised transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6923–6932 (2021)

  12. Mu, J., Qiu, W., Hager, G.D., Yuille, A.L.: Learning from synthetic animals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12386–12395 (2020)

  13. Cao, J., Tang, H., Fang, H.-S., Shen, X., Lu, C., Tai, Y.-W.: Cross-domain adaptation for animal pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9498–9507 (2019)

  14. Xie, R., Wang, C., Zeng, W., Wang, Y.: Humble teacher and eager student: dual network learning for semi-supervised 2D human pose estimation. arXiv preprint arXiv:2011.12498 (2020)

  15. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)

  16. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)

  17. Zhang, Z., Tang, J., Wu, G.: Simple and lightweight human pose estimation. arXiv preprint arXiv:1911.10346 (2019)

  18. Li, W., Wang, Z., Yin, B., Peng, Q., Du, Y., Xiao, T., Yu, G., Lu, H., Wei, Y., Sun, J.: Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148 (2019)

  19. Cai, Y., Wang, Z., Luo, Z., Yin, B., Du, A., Wang, H., Zhang, X., Zhou, X., Zhou, E., Sun, J.: Learning delicate local representations for multi-person pose estimation. In: European Conference on Computer Vision, pp. 455–472. Springer (2020)

  20. Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7093–7102 (2020)

  21. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

  22. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision, pp. 34–50. Springer (2016)

  23. Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5386–5395 (2020)

  24. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)

  25. Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)

    Article  Google Scholar 

  26. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)

    Google Scholar 

  27. Munea, T.L., Jembre, Y.Z., Weldegebriel, H.T., Chen, L., Huang, C., Yang, C.: The progress of human pose estimation: a survey and taxonomy of models applied in 2D human pose estimation. IEEE Access 8, 133330–133348 (2020)

    Article  Google Scholar 

  28. Badger, M., Wang, Y., Modh, A., Perkes, A., Kolotouros, N., Pfrommer, B.G., Schmidt, M.F., Daniilidis, K.: 3D bird reconstruction: a dataset, model, and shape recovery from a single view. arXiv preprint arXiv:2008.06133 (2020)

  29. Zhou, F., Jiang, Z., Liu, Z., Chen, F., Chen, L., Tong, L., Yang, Z., Wang, H., Fei, M., Li, L., et al.: Structured context enhancement network for mouse pose estimation. arXiv preprint arXiv:2012.00630 (2020)

  30. Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)

  31. Bala, P.C., Eisenreich, B.R., Yoo, S.B.M., Hayden, B.Y., Park, H.S., Zimmermann, J.: Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun. 11(1), 1–12 (2020)

    Article  Google Scholar 

  32. Lee, D.-H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 896 (2013)

  33. Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., He, K.: Data distillation: Towards omni-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4119–4128 (2018)

  34. Xie, Q., Luong, M.-T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 (2020)

  35. Van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  36. Cho, E., Kim, D.: Accurate human pose estimation by aggregating multiple pose hypotheses using modified kernel density approximation. IEEE Signal Process. Lett. 22(4), 445–449 (2014)

    Article  Google Scholar 

  37. Xu, X., Zou, Q., Lin, X., Huang, Y., Tian, Y.: Integral knowledge distillation for multi-person pose estimation. IEEE Signal Process. Lett. 27, 436–440 (2020)

    Article  Google Scholar 

  38. Luo, Z., Wang, Z., Huang, Y., Wang, L., Tan, T., Zhou, E.: Rethinking the heatmap regression for bottom-up human pose estimation. In: CVPR (2021)

  39. Geng, Z., Sun, K., Xiao, B., Zhang, Z., Wang, J.: Bottom-up human pose estimation via disentangled keypoint regression. In: CVPR (2021)

  40. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  41. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Funding

This research was funded by the Natural Science Foundation of Heilongjiang Province of China (F201310).

Author information

Authors and Affiliations

Authors

Contributions

Xue Ping and Deng Shixiong prepared the manuscript text. Xue Ping prepared Table 2 and Figs. 3 and 4 through ablation experiment. Deng Shixiong collected the dataset and participated in the experiment and prepared Figs. 1 and 2, Table 1 and Algorithm 1.

Corresponding author

Correspondence to Ping Xue.

Ethics declarations

Competing Interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethical Approval

This work did not require ethical approval under the research governance guidelines operating at the time of the research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, P., Deng, S. Smoothness-based consistency learning for macaque pose estimation. SIViP 17, 4327–4335 (2023). https://doi.org/10.1007/s11760-023-02665-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02665-1

Keywords

Navigation