Skip to main content
Log in

Unsupervised open-world human action recognition

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Open-world recognition (OWR) is an important field of research that strives to develop machine learning models capable of identifying and learning new classes as they appear. Concurrently, human action recognition (HAR) has received increasing attention from the research community. We approach Open-World HAR in the unsupervised setting. In unsupervised OWR, class labels are available for the initial classes but not for new ones. Hence, we propose a clustering method to label unknown classes automatically for incremental learning (IL). Our framework consists of an Initial Learning phase for initializing the models, an open-set recognition phase for identifying unknown classes, an Automatic Clustering phase for estimating the number of clusters and generating labels, and an IL phase for incorporating new knowledge. The proposed framework was evaluated at each phase separately in eleven experimental settings of the UCF-101 dataset. We also presented parameter sensitivity studies of the main parameters and visual analysis of misclassified videos, revealing interesting visual similarities between overlapped classes. Experiments have shown promising results in all phases of Open-World HAR, even without labels, which closely resembles real-world problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The UCF-101 dataset [86] that support the findings of this study are available from the University of Central Florida (UCF) website, [https://www.crcv.ucf.edu/data/UCF101.php].

Notes

  1. https://github.com/matheusgutoski/unsupervised-openworld-video-classification.

References

  1. Bendale A, Boult T (2015) Towards open world recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 1893–1902

  2. Willes J, Harrison J, Harakeh A, Finn C, Pavone M, Waslander S (2022) Bayesian embeddings for few-shot open world recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3201541

    Article  Google Scholar 

  3. Mundt M, Hong Y, Pliushch I, Ramesh V (2023) A wholistic view of continual learning with deep neural networks: forgotten lessons and the bridge to active and open world learning. Neural Netw 160:306–336

    Article  Google Scholar 

  4. Joseph K, Khan S, Khan FS, Balasubramanian VN (2021) Towards open world object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 5830–5840

  5. Jafarzadeh M, Dhamija AR, Cruz S, Li C, Ahmad T, Boult TE (2020) Open-world learning without labels. arXiv preprint arXiv:2011.12906

  6. Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the 30th IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 4724–4733

  7. Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 305–321

  8. Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 6450–6459

  9. Gutoski M, Lazzaretti AE, Lopes HS (2021) Deep metric learning for open-set human action recognition in videos. Neural Comput Appl 33:1207–1220

    Article  Google Scholar 

  10. Gutoski M, Lazzaretti AE, Lopes HS (2021) Incremental human action recognition with dual memory. Image Vis Comput 116:1–15

    Article  Google Scholar 

  11. Rudd EM, Jain LP, Scheirer WJ, Boult TE (2018) The extreme value machine. IEEE Trans Pattern Anal Mach Intell 40(3):762–768

    Article  Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 770–778

  13. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, vol 1. Curran Associates, Red Hook, pp 1097–1105

  14. Szegedy C, Liu W, Jia Y, SermarXivanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 1–9

  15. Wu CY, Zaheer M, Hu H, Manmatha R, Smola AJ, Krähenbühl P (2018) Compressed video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 6026–6035

  16. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 2625–2634

  17. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the advances in neural information processing systems. MIT Press, Cambridge, pp 568–576

  18. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 20–36

  19. Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Proceedings of the Asian conference on computer vision. Springer, Heidelberg, pp 363–378

  20. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision (CVPR). IEEE Press, Piscataway, pp 4489–4497

  21. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 7794–7803

  22. Wang Y, Zhou W, Zhang Q, Zhu X, Li H (2018) Low-latency human action recognition with weighted multi-region convolutional neural network. arXiv preprint arXiv:1805.02877

  23. Ng JYH, Choi J, Neumann J, Davis LS (2018) Actionflownet: learning motion representation for action recognition. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV). IEEE Press, Piscataway, pp 1616–1624

  24. Wang L, Li W, Li W, van Gool L (2018) Appearance-and-relation networks for video classification. In: Proc. of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 1430–1439

  25. Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imagenet?. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 6546–6555

  26. Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) Multi-fiber networks for video recognition. In: Proceedings of the European conference on computer vision (ECCV). Springer, Switzerland, pp 352–367

  27. Gao M, Cai W, Liu R (2021) AGTH-Net: attention-based graph convolution-guided third-order hourglass network for sports video classification. J Healthc Eng 2021:1–10

    Google Scholar 

  28. Jing L, Parag T, Wu Z, Tian Y, Wang H (2021) Videossl: semi-supervised learning for video classification. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. IEEE Press, Piscataway, pp 1110–1119

  29. Cao K, Ji J, Cao Z, Chang CY, Niebles JC (2020) Few-shot video classification via temporal alignment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 10618–10627

  30. Fu H, Maraghi VO, Faez K (2022) Class-incremental learning on video-based action recognition by distillation of various knowledge. Comput Intell Neurosci 2022:4879942

    Google Scholar 

  31. Busto PP, Iqbal A, Gall J (2020) Open set domain adaptation for image and action recognition. IEEE Trans Pattern Anal Mach Intell 42(2):1–15

    Google Scholar 

  32. Roitberg A, Al-Halah Z, Stiefelhagen R (2018) Informed democracy: voting-based novelty detection for action recognition. In: Proceedings of the British machine vision conference. BMVA, Durham, pp 1–14

  33. Roitberg A, Ma C, Haurilet M, Stiefelhagen R (2020) Open set driver activity recognition. In: 2020 IEEE intelligent vehicles symposium (IV). IEEE Press, Piscataway, pp 1048–1053

  34. Yang Y, Hou C, Lang Y, Guan D, Huang D, Xu J (2019) Open-set human activity recognition based on micro-Doppler signatures. Pattern Recogn 85:60–69

    Article  Google Scholar 

  35. Al-Obaydy WNI, Suandi SA (2020) Automatic pose normalization for open-set single-sample face recognition in video surveillance. Multimed Tools Appl 79(3):2897–2915

    Article  Google Scholar 

  36. Chen Z, Luo Y, Baktashmotlagh M (2021) Conditional extreme value theory for open set video domain adaptation. In: ACM multimedia Asia. Association for Computing Machinery, New York, pp 1–8

  37. Wang Y, Song X, Wang Y, Xu P, Hu R, Chai H (2021) Dual metric discriminator for open set video domain adaptation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE Press, Piscataway, pp 8198–8202

  38. Bao W, Yu Q, Kong Y (2021) Evidential deep learning for open set action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. IEEE Press, Piscataway, pp 13349–13358

  39. French RM (1999) Catastrophic forgetting in connectionist networks. Trends Cogn Sci 3(4):128–135

    Article  Google Scholar 

  40. Masana M, Liu X, Twardowski B, Menta M, Bagdanov AD, van de Weijer J (2020) Class-incremental learning: survey and performance evaluation. arXiv preprint arXiv:2010.15277

  41. Delange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T (2021) A continual learning survey: defying forgetting in classification tasks. IEEE Trans Pattern Anal Mach Intell 1–26

  42. Pfülb B, Gepperth A (2019) A comprehensive, application-oriented study of catastrophic forgetting in DNNs. In: Proceedings of the international conference on learning representations. OpenReview.net, Amherst, pp 1–14

  43. Chaudhry A, Dokania PK, Ajanthan T, Torr PH (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 532–547

  44. Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH (2017) iCaRL: incremental classifier and representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 2001–2010

  45. Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 233–248

  46. Wu Y, Chen Y, Wang L, Ye Y, Liu Z, Guo Y, Fu Y (2019) Large scale incremental learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 374–382

  47. Belouadah E, Popescu A (2019) Il2m: class incremental learning with dual memory. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV). IEEE Press, Piscataway, pp 583–592

  48. Hou S, Pan X, Loy CC, Wang Z, Lin D (2019) Learning a unified classifier incrementally via rebalancing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 831–839

  49. Kim Y, Kim E (2021) Clustering-guided incremental learning of tasks. In: International conference on information networking (ICOIN). IEEE Press, Piscataway, pp 417–421

  50. Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci 114(13):3521–3526

    Article  MathSciNet  MATH  Google Scholar 

  51. Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: Proceedings of the international conference on machine learning. PMLR, Sydney, pp 3987–3995

  52. Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: learning what (not) to forget. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 139–154

  53. Li Z, Hoiem D (2017) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947

    Article  Google Scholar 

  54. Michieli U, Zanuttigh P (2021) Knowledge distillation for incremental learning in semantic segmentation. Comput Vis Image Underst 205:1–16

    Article  Google Scholar 

  55. Mallya A, Davis D, Lazebnik S (2018) Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 67–82

  56. Masana M, Tuytelaars T, van Weijer J (2020) Ternary feature masks: continual learning without any forgetting. arXiv preprint arXiv:2001.08714

  57. Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks. arXiv preprint arXiv:1606.04671

  58. Schwarz J, Czarnecki W, Luketina J, Grabska-Barwinska A, Teh YW, Pascanu R, Hadsell R (2018) Progress & compress: a scalable framework for continual learning. In: Proceedings of the international conference on machine learning. PMLR, Stockholm, pp 4528–4537

  59. Aljundi R, Chakravarty P, Tuytelaars T (2017) Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 3366–3375

  60. Sokar G, Mocanu DC, Pechenizkiy M (2021) Spacenet: make free space for continual learning. Neurocomputing 439:1–11

    Article  Google Scholar 

  61. Ma J, Tao X, Ma J, Hong X, Gong Y (2021) Class incremental learning for video action classification. In: IEEE international conference on image processing (ICIP). IEEE Press, Piscataway, pp 504–508

  62. Wong SF, Kim TK, Cipolla R (2007) Learning motion categories using both semantic and structural information. In: Proceedings of the 2007 IEEE conference on computer vision and pattern recognition. IEEE press, Piscataway, pp 1–6

  63. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: IEEE international conference on computer vision, vol 1. IEEE Press, Piscataway, pp 1395–1402

  64. Reddy KK, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: Proceedings of the 12th IEEE international conference on computer vision. IEEE press, Piscataway, pp 1010–1017

  65. Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3d exemplars. In: Proceedings of the 2007 IEEE international conference on computer vision. IEEE Press, Piscataway, pp 1–7

  66. Tang C, Li W, Wang P, Wang L (2018) Online human action recognition based on incremental learning of weighted covariance descriptors. Inf Sci 467:219–237

    Article  Google Scholar 

  67. Wu X, Jia Y, Liang W (2010) Incremental discriminant-analysis of canonical correlations for action recognition. Pattern Recogn 43(12):4190–4197

    Article  MATH  Google Scholar 

  68. Lu Y, Boukharouba K, Boonært J, Fleury A, Lecœuche S (2014) Application of an incremental SVM algorithm for on-line human recognition from video surveillance using texture and color features. Neurocomputing 126:132–140

    Article  Google Scholar 

  69. Minhas R, Mohammed AA, Wu QMJ (2012) Incremental learning in human action recognition based on snippets. IEEE Trans Circuits Syst Video Technol 22(11):1529–1541

    Article  Google Scholar 

  70. De Rosa R, Cesa-Bianchi N, Gori I, Cuzzolin F (2014) Online action recognition via nonparametric incremental learning. In: Proceedings of the British machine vision conference. BMVA Press, Guildford, pp 1–15

  71. Boult TE, Cruz S, Dhamija AR, Gunther M, Henrydoss J, Scheirer WJ (2019) Learning and the unknown: surveying steps toward open world recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9801–9807

  72. Li X, Wu A, Zheng WS (2018) Adversarial open-world person re-identification. In: Proceedings of the European conference on computer vision (ECCV). Springer, Switzerland, pp 280–296

  73. Matta A, Pinto JR, Cardoso JS (2021) Mixture-based open world face recognition. In: World conference on information systems and technologies. Springer, Switzerland, pp 653–662

  74. Leng Q, Ye M, Tian Q (2020) A survey of open-world person re-identification. IEEE Trans Circuits Syst Video Technol 30(4):1092–1108

    Article  Google Scholar 

  75. Mancini M, Karaoguz H, Ricci E, Jensfelt P, Caputo B (2019) Knowledge is never enough: towards web aided deep open world recognition. In: IEEE international conference on robotics and automation (ICRA). IEEE Press, Piscataway, pp 9537–9543

  76. Cen J, Yun P, Cai J, Wang MY, Liu M (2021) Deep metric learning for open world semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV). IEEE Press, Piscataway, pp 15333–15342

  77. Irfan B, Ortiz MG, Lyubova N, Belpaeme T (2021) Multi-modal open world user identification. ACM Trans Hum Robot Interact (THRI) 11(1):1–50

    Google Scholar 

  78. Mancini M, Naeem MF, Xian Y, Akata Z (2021) Open world compositional zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 5222–5230

  79. Zhong Z, Zhu L, Luo Z, Li S, Yang Y, Sebe N (2021) Openmix: reviving known knowledge for discovering novel visual categories in an open world. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 9457–9465

  80. Liu Z, Miao Z, Zhan X, Wang J, Gong B, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 2537–2546

  81. Jafarzadeh M, Ahmad T, Dhamija AR, Li C, Cruz S, Boult TE (2021) Automatic open-world reliability assessment. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. IEEE Press, Piscataway, pp 1984–1993

  82. Shu Y, Shi Y, Wang Y, Zou Y, Yuan Q, Tian Y (2018) ODN: opening the deep network for open-set action recognition. In: Proceedings of the IEEE international conference on multimedia and expo (ICME). IEEE Press, Piscataway, pp 1–6

  83. Shu Y, Shi Y, Wang Y, Huang T, Tian Y (2020) P-odn: prototype-based open deep network for open set recognition. Sci Rep 10:1–13

    Article  Google Scholar 

  84. Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: Proceedings of the international workshop on similarity-based pattern recognition. Springer, Heidelberg, pp 84–92

  85. Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

    Article  MathSciNet  Google Scholar 

  86. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402

  87. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics. Microtome Publishing, Brookline, pp 249–256

  88. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  89. Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35

    Article  Google Scholar 

  90. Strehl A, Ghosh J (2002) Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  91. Min E, Guo X, Liu Q, Zhang G, Cui J, Long J (2018) A survey of clustering with deep learning: from the perspective of network architecture. IEEE Access 6:39501–39514

    Article  Google Scholar 

  92. Sarfraz S, Sharma V, Stiefelhagen R (2019) Efficient parameter-free clustering using first neighbor relations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 8934–8943

  93. Pelleg D, Moore AW et al (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, vol 1. PMLR, San Francisco, pp 727–734

  94. Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, USA, pp 1027–1035

  95. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605

    MATH  Google Scholar 

Download references

Acknowledgements

Author M. Gutoski would like to thank CNPq for the scholarship number 141983/2018-3. Author H. S. Lopes would like to thank to CNPq for the research grant 311785/2019- 0, and Fundação Araucária for grant PRONEX 042/2018. Author A. E. Lazzaretti would like to thank to CNPq for the research grant 306569/2022-1. All authors would like to thank NVIDIA Corp. for the donation of the Titan-Xp GPUs used in the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André Eugenio Lazzaretti.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare to the best of their knowledge.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gutoski, M., Lazzaretti, A.E. & Lopes, H.S. Unsupervised open-world human action recognition. Pattern Anal Applic 26, 1753–1770 (2023). https://doi.org/10.1007/s10044-023-01202-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-023-01202-7

Keywords

Navigation