Skip to main content

Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13695))

Included in the following conference series:

Abstract

We present a semi-supervised learning approach to the temporal action segmentation task. The goal of the task is to temporally detect and segment actions in long, untrimmed procedural videos, where only a small set of videos are densely labelled, and a large collection of videos are unlabelled. To this end, we propose two novel loss functions for the unlabelled data: an action affinity loss and an action continuity loss. The action affinity loss guides the unlabelled samples learning by imposing the action priors induced from the labelled set. Action continuity loss enforces the temporal continuity of actions, which also provides frame-wise classification supervision. In addition, we propose an Adaptive Boundary Smoothing (ABS) approach to build coarser action boundaries for more robust and reliable learning. The proposed loss functions and ABS were evaluated on three benchmarks. Results show that they significantly improved action segmentation performance with a low amount (5% and 10%) of labelled data and achieved comparable results to full supervision with 50% labelled data. Furthermore, ABS succeeded in boosting performance when integrated into fully-supervised learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arazo, E., Ortego, D., Albert, P., O’Connor, N.E., McGuinness, K.: Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: IJCNN, pp. 1–8. IEEE (2020)

    Google Scholar 

  2. Chang, C.Y., Huang, D.A., Sui, Y., Fei-Fei, L., Niebles, J.C.: D3TW: discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation. In: CVPR, pp. 3546–3555 (2019)

    Google Scholar 

  3. Chen, M.H., Li, B., Bao, Y., AlRegib, G., Kira, Z.: Action segmentation with joint self-supervised temporal domain adaptation. In: CVPR, pp. 9454–9463 (2020)

    Google Scholar 

  4. Cho, S., Lee, H., Kim, M., Jang, S., Lee, S.: Pixel-level bijective matching for video object segmentation. In: WACV, pp. 129–138 (2022)

    Google Scholar 

  5. Ding, G., Zhang, S., Khan, S., Tang, Z., Zhang, J., Porikli, F.: Feature affinity-based pseudo labeling for semi-supervised person re-identification. TMM 21(11), 2891–2902 (2019)

    Google Scholar 

  6. Ding, L., Xu, C.: Weakly-supervised action segmentation with iterative soft boundary assignment. In: CVPR, pp. 6508–6516 (2018)

    Google Scholar 

  7. Du, Z., Wang, X., Zhou, G., Wang, Q.: Fast and unsupervised action boundary detection for action segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3323–3332 (2022)

    Google Scholar 

  8. Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: CVPR, pp. 3575–3584 (2019)

    Google Scholar 

  9. Fathi, A., Ren, X., Rehg, J.M.: Learning to recognize objects in egocentric activities. In: CVPR (2011)

    Google Scholar 

  10. Fayyaz, M., Gall, J.: SCT: set constrained temporal transformer for set supervised action segmentation. In: CVPR, pp. 501–510 (2020)

    Google Scholar 

  11. He, R., Yang, J., Qi, X.: Re-distributing biased pseudo labels for semi-supervised semantic segmentation: a baseline investigation. In: ICCV, pp. 6930–6940 (2021)

    Google Scholar 

  12. Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Label propagation for deep semi-supervised learning. In: CVPR, pp. 5070–5079 (2019)

    Google Scholar 

  13. Ishikawa, Y., Kasai, S., Aoki, Y., Kataoka, H.: Alleviating over-segmentation errors by detecting action boundaries. In: WACV, pp. 2322–2331 (2021)

    Google Scholar 

  14. Kuehne, H., Arslan, A., Serre, T.: The language of actions: Recovering the syntax and semantics of goal-directed human activities. In: CVPR (2014)

    Google Scholar 

  15. Kuehne, H., Richard, A., Gall, J.: Weakly supervised learning of actions from transcripts. CVIU 163, 78–89 (2017)

    Google Scholar 

  16. Kukleva, A., Kuehne, H., Sener, F., Gall, J.: Unsupervised learning of action classes with continuous temporal embedding. In: CVPR, pp. 12066–12074 (2019)

    Google Scholar 

  17. Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: CVPR, pp. 156–165 (2017)

    Google Scholar 

  18. Lee, D.H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, vol. 3, p. 896. ICML (2013)

    Google Scholar 

  19. Lei, P., Todorovic, S.: Temporal deformable residual networks for action segmentation in videos. In: CVPR, pp. 6742–6751 (2018)

    Google Scholar 

  20. Li, J., Todorovic, S.: Action shuffle alternating learning for unsupervised action segmentation. In: CVPR, pp. 12628–12636 (2021)

    Google Scholar 

  21. Li, Z., Abu Farha, Y., Gall, J.: Temporal action segmentation from timestamp supervision. In: CVPR, pp. 8365–8374 (2021)

    Google Scholar 

  22. Richard, A., Kuehne, H., Gall, J.: Action sets: weakly supervised action segmentation without ordering constraints. In: CVPR, pp. 5987–5996 (2018)

    Google Scholar 

  23. Rizve, M.N., Duarte, K., Rawat, Y.S., Shah, M.: In defense of pseudo-labeling: an uncertainty-aware pseudo-label selection framework for semi-supervised learning. arXiv preprint arXiv:2101.06329 (2021)

  24. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sig. Process. 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  25. Samuli, L., Timo, A.: Temporal ensembling for semi-supervised learning. In: ICLR, vol. 4, p. 6 (2017)

    Google Scholar 

  26. Sarfraz, S., Murray, N., Sharma, V., Diba, A., Van Gool, L., Stiefelhagen, R.: Temporally-weighted hierarchical clustering for unsupervised action segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11225–11234 (2021)

    Google Scholar 

  27. Sener, F., Yao, A.: Unsupervised learning and segmentation of complex activities from video. In: CVPR, pp. 8368–8376 (2018)

    Google Scholar 

  28. Singhania, D., Rahaman, R., Yao, A.: Coarse to fine multi-resolution temporal convolutional network. arXiv preprint arXiv:2105.10859 (2021)

  29. Singhania, D., Rahaman, R., Yao, A.: Iterative contrast-classify for semi-supervised temporal action segmentation. arXiv preprint arXiv:2112.01402 (2021)

  30. Stein, S., McKenna, S.J.: Combining embedded accelerometers with computer vision for recognizing food preparation activities. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 729–738. ACM (2013)

    Google Scholar 

  31. Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS, pp. 1195–1204 (2017)

    Google Scholar 

  32. Wang, X., Zhang, S., Qing, Z., Shao, Y., Gao, C., Sang, N.: Self-supervised learning for semi-supervised temporal action proposal. In: CVPR, pp. 1905–1914 (2021)

    Google Scholar 

  33. Wang, Z., Li, Y., Guo, Y., Fang, L., Wang, S.: Data-uncertainty guided multi-phase learning for semi-supervised object detection. In: CVPR, pp. 4568–4577 (2021)

    Google Scholar 

Download references

Acknowledgements

This research is supported by the National Research Foundation, Singapore under its NRF Fellowship for AI (NRF-NRFFAI1–2019–0001). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angela Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ding, G., Yao, A. (2022). Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13695. Springer, Cham. https://doi.org/10.1007/978-3-031-19833-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19833-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19832-8

  • Online ISBN: 978-3-031-19833-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics