Abstract
Macaques are a rare substitute and play an important role in study of human psychology and spiritual science. Accurate estimation of macaque pose information is key to these studies, macaque pose estimation remains to be hindered by the scarcity of labeled images. To address this problem, this work introduces a novel semi-supervised approach called smoothness-based spatio-temporal consistency learning (SSTCL) and a dual network structure (DNS) to leverage the amounts of unlabeled real images. Specifically, the SSTCL introduces the smoothness assumption to help the model generalize from the labeled training images to the unlabeled images, and the spatio-temporal consistency is designed to leverage both spatial and temporal consistencies to pick the most reliable pseudo-labels. Moreover, a dual network structure (DNS) is proposed to empower the model the ability of self-correction, which can prevent the degeneration caused by the noisy pseudo-labels in semi-supervised learning. In ablation experiments, the effectiveness of DNS for pseudo-label quality assurance is demonstrated. We evaluate the proposed method on the public OpenMonkeyPose dataset, the results show that the proposed method can achieve competitive performance while using less labeled images, and the final accuracy surpasses the strong baseline HRNet-w48 of 2.1 AP.
Similar content being viewed by others
Data Availability
In this article, public dataset OpenMonkeyStudio is available at https://github.com/OpenMonkeyStudio/OMS_Data
References
Mathis, A., Mamidanna, P., Cury, K.M., Abe, T., Murthy, V.N., Mathis, M.W., Bethge, M.: Deeplabcut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21(9), 1281–1289 (2018)
Pereira, T.D., Aldarondo, D.E., Willmore, L., Kislin, M., Wang, S.S.-H., Murthy, M., Shaevitz, J.W.: Fast animal pose estimation using deep neural networks. Nat. Methods 16(1), 117–125 (2019)
Graving, J.M., Chae, D., Naik, H., Li, L., Koger, B., Costelloe, B.R., Couzin, I.D.: Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning. Elife 8, 47994 (2019)
Negrete, S.B., Labuguen, R., Matsumoto, J., Go, Y., Inoue, K.-i., Shibata, T.: Multiple monkey pose estimation using openpose. bioRxiv (2021)
Pereira, T.D., Tabris, N., Li, J., Ravindranath, S., Papadoyannis, E.S., Wang, Z.Y., Turner, D.M., McKenzie-Smith, G., Kocher, S.D., Falkner, A.L., et al.: Sleap: multi-animal pose tracking. BioRxiv (2020)
Mathis, A., Biasi, T., Schneider, S., Yuksekgonul, M., Rogers, B., Bethge, M., Mathis, M.W.: Pretraining boosts out-of-domain robustness for pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1859–1868 (2021)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv preprint arXiv:1703.01780 (2017)
Abuduweili, A., Li, X., Shi, H., Xu, C.-Z., Dou, D.: Adaptive consistency regularization for semi-supervised transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6923–6932 (2021)
Mu, J., Qiu, W., Hager, G.D., Yuille, A.L.: Learning from synthetic animals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12386–12395 (2020)
Cao, J., Tang, H., Fang, H.-S., Shen, X., Lu, C., Tai, Y.-W.: Cross-domain adaptation for animal pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9498–9507 (2019)
Xie, R., Wang, C., Zeng, W., Wang, Y.: Humble teacher and eager student: dual network learning for semi-supervised 2D human pose estimation. arXiv preprint arXiv:2011.12498 (2020)
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)
Zhang, Z., Tang, J., Wu, G.: Simple and lightweight human pose estimation. arXiv preprint arXiv:1911.10346 (2019)
Li, W., Wang, Z., Yin, B., Peng, Q., Du, Y., Xiao, T., Yu, G., Lu, H., Wei, Y., Sun, J.: Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148 (2019)
Cai, Y., Wang, Z., Luo, Z., Yin, B., Du, A., Wang, H., Zhang, X., Zhou, X., Zhou, E., Sun, J.: Learning delicate local representations for multi-person pose estimation. In: European Conference on Computer Vision, pp. 455–472. Springer (2020)
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7093–7102 (2020)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision, pp. 34–50. Springer (2016)
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5386–5395 (2020)
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)
Munea, T.L., Jembre, Y.Z., Weldegebriel, H.T., Chen, L., Huang, C., Yang, C.: The progress of human pose estimation: a survey and taxonomy of models applied in 2D human pose estimation. IEEE Access 8, 133330–133348 (2020)
Badger, M., Wang, Y., Modh, A., Perkes, A., Kolotouros, N., Pfrommer, B.G., Schmidt, M.F., Daniilidis, K.: 3D bird reconstruction: a dataset, model, and shape recovery from a single view. arXiv preprint arXiv:2008.06133 (2020)
Zhou, F., Jiang, Z., Liu, Z., Chen, F., Chen, L., Tong, L., Yang, Z., Wang, H., Fei, M., Li, L., et al.: Structured context enhancement network for mouse pose estimation. arXiv preprint arXiv:2012.00630 (2020)
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Bala, P.C., Eisenreich, B.R., Yoo, S.B.M., Hayden, B.Y., Park, H.S., Zimmermann, J.: Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun. 11(1), 1–12 (2020)
Lee, D.-H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 896 (2013)
Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., He, K.: Data distillation: Towards omni-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4119–4128 (2018)
Xie, Q., Luong, M.-T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 (2020)
Van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2020)
Cho, E., Kim, D.: Accurate human pose estimation by aggregating multiple pose hypotheses using modified kernel density approximation. IEEE Signal Process. Lett. 22(4), 445–449 (2014)
Xu, X., Zou, Q., Lin, X., Huang, Y., Tian, Y.: Integral knowledge distillation for multi-person pose estimation. IEEE Signal Process. Lett. 27, 436–440 (2020)
Luo, Z., Wang, Z., Huang, Y., Wang, L., Tan, T., Zhou, E.: Rethinking the heatmap regression for bottom-up human pose estimation. In: CVPR (2021)
Geng, Z., Sun, K., Xiao, B., Zhang, Z., Wang, J.: Bottom-up human pose estimation via disentangled keypoint regression. In: CVPR (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Funding
This research was funded by the Natural Science Foundation of Heilongjiang Province of China (F201310).
Author information
Authors and Affiliations
Contributions
Xue Ping and Deng Shixiong prepared the manuscript text. Xue Ping prepared Table 2 and Figs. 3 and 4 through ablation experiment. Deng Shixiong collected the dataset and participated in the experiment and prepared Figs. 1 and 2, Table 1 and Algorithm 1.
Corresponding author
Ethics declarations
Competing Interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Ethical Approval
This work did not require ethical approval under the research governance guidelines operating at the time of the research.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xue, P., Deng, S. Smoothness-based consistency learning for macaque pose estimation. SIViP 17, 4327–4335 (2023). https://doi.org/10.1007/s11760-023-02665-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02665-1