Abstract
Open-world recognition (OWR) is an important field of research that strives to develop machine learning models capable of identifying and learning new classes as they appear. Concurrently, human action recognition (HAR) has received increasing attention from the research community. We approach Open-World HAR in the unsupervised setting. In unsupervised OWR, class labels are available for the initial classes but not for new ones. Hence, we propose a clustering method to label unknown classes automatically for incremental learning (IL). Our framework consists of an Initial Learning phase for initializing the models, an open-set recognition phase for identifying unknown classes, an Automatic Clustering phase for estimating the number of clusters and generating labels, and an IL phase for incorporating new knowledge. The proposed framework was evaluated at each phase separately in eleven experimental settings of the UCF-101 dataset. We also presented parameter sensitivity studies of the main parameters and visual analysis of misclassified videos, revealing interesting visual similarities between overlapped classes. Experiments have shown promising results in all phases of Open-World HAR, even without labels, which closely resembles real-world problems.
Similar content being viewed by others
Data availability
The UCF-101 dataset [86] that support the findings of this study are available from the University of Central Florida (UCF) website, [https://www.crcv.ucf.edu/data/UCF101.php].
References
Bendale A, Boult T (2015) Towards open world recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 1893–1902
Willes J, Harrison J, Harakeh A, Finn C, Pavone M, Waslander S (2022) Bayesian embeddings for few-shot open world recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3201541
Mundt M, Hong Y, Pliushch I, Ramesh V (2023) A wholistic view of continual learning with deep neural networks: forgotten lessons and the bridge to active and open world learning. Neural Netw 160:306–336
Joseph K, Khan S, Khan FS, Balasubramanian VN (2021) Towards open world object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 5830–5840
Jafarzadeh M, Dhamija AR, Cruz S, Li C, Ahmad T, Boult TE (2020) Open-world learning without labels. arXiv preprint arXiv:2011.12906
Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the 30th IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 4724–4733
Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 305–321
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 6450–6459
Gutoski M, Lazzaretti AE, Lopes HS (2021) Deep metric learning for open-set human action recognition in videos. Neural Comput Appl 33:1207–1220
Gutoski M, Lazzaretti AE, Lopes HS (2021) Incremental human action recognition with dual memory. Image Vis Comput 116:1–15
Rudd EM, Jain LP, Scheirer WJ, Boult TE (2018) The extreme value machine. IEEE Trans Pattern Anal Mach Intell 40(3):762–768
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, vol 1. Curran Associates, Red Hook, pp 1097–1105
Szegedy C, Liu W, Jia Y, SermarXivanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 1–9
Wu CY, Zaheer M, Hu H, Manmatha R, Smola AJ, Krähenbühl P (2018) Compressed video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 6026–6035
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 2625–2634
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the advances in neural information processing systems. MIT Press, Cambridge, pp 568–576
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 20–36
Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Proceedings of the Asian conference on computer vision. Springer, Heidelberg, pp 363–378
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision (CVPR). IEEE Press, Piscataway, pp 4489–4497
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 7794–7803
Wang Y, Zhou W, Zhang Q, Zhu X, Li H (2018) Low-latency human action recognition with weighted multi-region convolutional neural network. arXiv preprint arXiv:1805.02877
Ng JYH, Choi J, Neumann J, Davis LS (2018) Actionflownet: learning motion representation for action recognition. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV). IEEE Press, Piscataway, pp 1616–1624
Wang L, Li W, Li W, van Gool L (2018) Appearance-and-relation networks for video classification. In: Proc. of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 1430–1439
Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imagenet?. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 6546–6555
Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) Multi-fiber networks for video recognition. In: Proceedings of the European conference on computer vision (ECCV). Springer, Switzerland, pp 352–367
Gao M, Cai W, Liu R (2021) AGTH-Net: attention-based graph convolution-guided third-order hourglass network for sports video classification. J Healthc Eng 2021:1–10
Jing L, Parag T, Wu Z, Tian Y, Wang H (2021) Videossl: semi-supervised learning for video classification. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. IEEE Press, Piscataway, pp 1110–1119
Cao K, Ji J, Cao Z, Chang CY, Niebles JC (2020) Few-shot video classification via temporal alignment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 10618–10627
Fu H, Maraghi VO, Faez K (2022) Class-incremental learning on video-based action recognition by distillation of various knowledge. Comput Intell Neurosci 2022:4879942
Busto PP, Iqbal A, Gall J (2020) Open set domain adaptation for image and action recognition. IEEE Trans Pattern Anal Mach Intell 42(2):1–15
Roitberg A, Al-Halah Z, Stiefelhagen R (2018) Informed democracy: voting-based novelty detection for action recognition. In: Proceedings of the British machine vision conference. BMVA, Durham, pp 1–14
Roitberg A, Ma C, Haurilet M, Stiefelhagen R (2020) Open set driver activity recognition. In: 2020 IEEE intelligent vehicles symposium (IV). IEEE Press, Piscataway, pp 1048–1053
Yang Y, Hou C, Lang Y, Guan D, Huang D, Xu J (2019) Open-set human activity recognition based on micro-Doppler signatures. Pattern Recogn 85:60–69
Al-Obaydy WNI, Suandi SA (2020) Automatic pose normalization for open-set single-sample face recognition in video surveillance. Multimed Tools Appl 79(3):2897–2915
Chen Z, Luo Y, Baktashmotlagh M (2021) Conditional extreme value theory for open set video domain adaptation. In: ACM multimedia Asia. Association for Computing Machinery, New York, pp 1–8
Wang Y, Song X, Wang Y, Xu P, Hu R, Chai H (2021) Dual metric discriminator for open set video domain adaptation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE Press, Piscataway, pp 8198–8202
Bao W, Yu Q, Kong Y (2021) Evidential deep learning for open set action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. IEEE Press, Piscataway, pp 13349–13358
French RM (1999) Catastrophic forgetting in connectionist networks. Trends Cogn Sci 3(4):128–135
Masana M, Liu X, Twardowski B, Menta M, Bagdanov AD, van de Weijer J (2020) Class-incremental learning: survey and performance evaluation. arXiv preprint arXiv:2010.15277
Delange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T (2021) A continual learning survey: defying forgetting in classification tasks. IEEE Trans Pattern Anal Mach Intell 1–26
Pfülb B, Gepperth A (2019) A comprehensive, application-oriented study of catastrophic forgetting in DNNs. In: Proceedings of the international conference on learning representations. OpenReview.net, Amherst, pp 1–14
Chaudhry A, Dokania PK, Ajanthan T, Torr PH (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 532–547
Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH (2017) iCaRL: incremental classifier and representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 2001–2010
Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 233–248
Wu Y, Chen Y, Wang L, Ye Y, Liu Z, Guo Y, Fu Y (2019) Large scale incremental learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 374–382
Belouadah E, Popescu A (2019) Il2m: class incremental learning with dual memory. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV). IEEE Press, Piscataway, pp 583–592
Hou S, Pan X, Loy CC, Wang Z, Lin D (2019) Learning a unified classifier incrementally via rebalancing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 831–839
Kim Y, Kim E (2021) Clustering-guided incremental learning of tasks. In: International conference on information networking (ICOIN). IEEE Press, Piscataway, pp 417–421
Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci 114(13):3521–3526
Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: Proceedings of the international conference on machine learning. PMLR, Sydney, pp 3987–3995
Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: learning what (not) to forget. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 139–154
Li Z, Hoiem D (2017) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947
Michieli U, Zanuttigh P (2021) Knowledge distillation for incremental learning in semantic segmentation. Comput Vis Image Underst 205:1–16
Mallya A, Davis D, Lazebnik S (2018) Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European conference on computer vision (ECCV). Springer, Heidelberg, pp 67–82
Masana M, Tuytelaars T, van Weijer J (2020) Ternary feature masks: continual learning without any forgetting. arXiv preprint arXiv:2001.08714
Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks. arXiv preprint arXiv:1606.04671
Schwarz J, Czarnecki W, Luketina J, Grabska-Barwinska A, Teh YW, Pascanu R, Hadsell R (2018) Progress & compress: a scalable framework for continual learning. In: Proceedings of the international conference on machine learning. PMLR, Stockholm, pp 4528–4537
Aljundi R, Chakravarty P, Tuytelaars T (2017) Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 3366–3375
Sokar G, Mocanu DC, Pechenizkiy M (2021) Spacenet: make free space for continual learning. Neurocomputing 439:1–11
Ma J, Tao X, Ma J, Hong X, Gong Y (2021) Class incremental learning for video action classification. In: IEEE international conference on image processing (ICIP). IEEE Press, Piscataway, pp 504–508
Wong SF, Kim TK, Cipolla R (2007) Learning motion categories using both semantic and structural information. In: Proceedings of the 2007 IEEE conference on computer vision and pattern recognition. IEEE press, Piscataway, pp 1–6
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: IEEE international conference on computer vision, vol 1. IEEE Press, Piscataway, pp 1395–1402
Reddy KK, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: Proceedings of the 12th IEEE international conference on computer vision. IEEE press, Piscataway, pp 1010–1017
Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3d exemplars. In: Proceedings of the 2007 IEEE international conference on computer vision. IEEE Press, Piscataway, pp 1–7
Tang C, Li W, Wang P, Wang L (2018) Online human action recognition based on incremental learning of weighted covariance descriptors. Inf Sci 467:219–237
Wu X, Jia Y, Liang W (2010) Incremental discriminant-analysis of canonical correlations for action recognition. Pattern Recogn 43(12):4190–4197
Lu Y, Boukharouba K, Boonært J, Fleury A, Lecœuche S (2014) Application of an incremental SVM algorithm for on-line human recognition from video surveillance using texture and color features. Neurocomputing 126:132–140
Minhas R, Mohammed AA, Wu QMJ (2012) Incremental learning in human action recognition based on snippets. IEEE Trans Circuits Syst Video Technol 22(11):1529–1541
De Rosa R, Cesa-Bianchi N, Gori I, Cuzzolin F (2014) Online action recognition via nonparametric incremental learning. In: Proceedings of the British machine vision conference. BMVA Press, Guildford, pp 1–15
Boult TE, Cruz S, Dhamija AR, Gunther M, Henrydoss J, Scheirer WJ (2019) Learning and the unknown: surveying steps toward open world recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9801–9807
Li X, Wu A, Zheng WS (2018) Adversarial open-world person re-identification. In: Proceedings of the European conference on computer vision (ECCV). Springer, Switzerland, pp 280–296
Matta A, Pinto JR, Cardoso JS (2021) Mixture-based open world face recognition. In: World conference on information systems and technologies. Springer, Switzerland, pp 653–662
Leng Q, Ye M, Tian Q (2020) A survey of open-world person re-identification. IEEE Trans Circuits Syst Video Technol 30(4):1092–1108
Mancini M, Karaoguz H, Ricci E, Jensfelt P, Caputo B (2019) Knowledge is never enough: towards web aided deep open world recognition. In: IEEE international conference on robotics and automation (ICRA). IEEE Press, Piscataway, pp 9537–9543
Cen J, Yun P, Cai J, Wang MY, Liu M (2021) Deep metric learning for open world semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV). IEEE Press, Piscataway, pp 15333–15342
Irfan B, Ortiz MG, Lyubova N, Belpaeme T (2021) Multi-modal open world user identification. ACM Trans Hum Robot Interact (THRI) 11(1):1–50
Mancini M, Naeem MF, Xian Y, Akata Z (2021) Open world compositional zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 5222–5230
Zhong Z, Zhu L, Luo Z, Li S, Yang Y, Sebe N (2021) Openmix: reviving known knowledge for discovering novel visual categories in an open world. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE Press, Piscataway, pp 9457–9465
Liu Z, Miao Z, Zhan X, Wang J, Gong B, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 2537–2546
Jafarzadeh M, Ahmad T, Dhamija AR, Li C, Cruz S, Boult TE (2021) Automatic open-world reliability assessment. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. IEEE Press, Piscataway, pp 1984–1993
Shu Y, Shi Y, Wang Y, Zou Y, Yuan Q, Tian Y (2018) ODN: opening the deep network for open-set action recognition. In: Proceedings of the IEEE international conference on multimedia and expo (ICME). IEEE Press, Piscataway, pp 1–6
Shu Y, Shi Y, Wang Y, Huang T, Tian Y (2020) P-odn: prototype-based open deep network for open set recognition. Sci Rep 10:1–13
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: Proceedings of the international workshop on similarity-based pattern recognition. Springer, Heidelberg, pp 84–92
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics. Microtome Publishing, Brookline, pp 249–256
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35
Strehl A, Ghosh J (2002) Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Min E, Guo X, Liu Q, Zhang G, Cui J, Long J (2018) A survey of clustering with deep learning: from the perspective of network architecture. IEEE Access 6:39501–39514
Sarfraz S, Sharma V, Stiefelhagen R (2019) Efficient parameter-free clustering using first neighbor relations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE Press, Piscataway, pp 8934–8943
Pelleg D, Moore AW et al (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, vol 1. PMLR, San Francisco, pp 727–734
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, USA, pp 1027–1035
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605
Acknowledgements
Author M. Gutoski would like to thank CNPq for the scholarship number 141983/2018-3. Author H. S. Lopes would like to thank to CNPq for the research grant 311785/2019- 0, and Fundação Araucária for grant PRONEX 042/2018. Author A. E. Lazzaretti would like to thank to CNPq for the research grant 306569/2022-1. All authors would like to thank NVIDIA Corp. for the donation of the Titan-Xp GPUs used in the experiments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare to the best of their knowledge.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gutoski, M., Lazzaretti, A.E. & Lopes, H.S. Unsupervised open-world human action recognition. Pattern Anal Applic 26, 1753–1770 (2023). https://doi.org/10.1007/s10044-023-01202-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01202-7