Advertisement

Clustering Driven Deep Autoencoder for Video Anomaly Detection

Conference paper
  • 778 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12360)

Abstract

Because of the ambiguous definition of anomaly and the complexity of real data, video anomaly detection is one of the most challenging problems in intelligent video surveillance. Since the abnormal events are usually different from normal events in appearance and/or in motion behavior, we address this issue by designing a novel convolution autoencoder architecture to separately capture spatial and temporal informative representation. The spatial part reconstructs the last individual frame (LIF), while the temporal part takes consecutive frames as input and RGB difference as output to simulate the generation of optical flow. The abnormal events which are irregular in appearance or in motion behavior lead to a large reconstruction error. Besides, we design a deep k-means cluster to force the appearance and the motion encoder to extract common factors of variation within the dataset. Experiments on some publicly available datasets demonstrate the effectiveness of our method with the state-of-the-art performance.

Keywords

Video anomaly detection Spatio-temporal dissociation Deep k-means cluster 

Notes

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities (2042020KF0016 and CCNU20TS028). It was also supported by the Wuhan University-Huawei Company Project.

References

  1. 1.
    Abati, D., Porrello, A., Calderara, S., Cucchiara, R.: Latent space autoregression for novelty detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 481–490 (2019)Google Scholar
  2. 2.
    Blanchard, G., Lee, G., Scott, C.: Semi-supervised novelty detection. J. Mach. Learn. Res. 11, 2973–3009 (2010)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Chang, Y., Tu, Z., Luo, B., Qin, Q.: Learning spatiotemporal representation based on 3D autoencoder for anomaly detection. In: Cree, M., Huang, F., Yuan, J., Yan, W.Q. (eds.) ACPR 2019. CCIS, vol. 1180, pp. 187–195. Springer, Singapore (2020).  https://doi.org/10.1007/978-981-15-3651-9_17CrossRefGoogle Scholar
  4. 4.
    Fard, M.M., Thonet, T., Gaussier, E.: Deep k-means: jointly clustering with k-means and learning representations. arXiv, Learning (2018)Google Scholar
  5. 5.
    Ghasedi Dizaji, K., Herandi, A., Deng, C., Cai, W., Huang, H.: Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: IEEE International Conference on Computer Vision (CVPR), pp. 5736–5745 (2017)Google Scholar
  6. 6.
    Gong, D., et al.: Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 1705–1714 (2019)Google Scholar
  7. 7.
    Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: International Joint Conferences on Artificial Intelligence (IJCAI), pp. 1753–1759 (2017)Google Scholar
  8. 8.
    Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–742 (2016)Google Scholar
  9. 9.
    Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: IEEE International Conference on Computer Vision (ICCV), pp. 3619–3627 (2017)Google Scholar
  10. 10.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hsu, C., Lin, C.: CNN-based joint clustering and representation learning with feature drift compensation for large-scale image data. IEEE Trans. Multimed. 20(2), 421–429 (2017)CrossRefGoogle Scholar
  12. 12.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2462–2470 (2017)Google Scholar
  13. 13.
    Ionescu, R.T., Khan, F.S., Georgescu, M.I., Shao, L.: Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7842–7851 (2019)Google Scholar
  14. 14.
    Kim, J., Grauman, K.: Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2928 (2009)Google Scholar
  15. 15.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  16. 16.
    Lin, Z., et al.: A structured self-attentive sentence embedding. In: International Conference on Learning Representations (ICLR) (2017)Google Scholar
  17. 17.
    Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection-a new baseline. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6536–6545 (2018)Google Scholar
  18. 18.
    Liu, Y., Zheng, Y.F.: Minimum enclosing and maximum excluding machine for pattern description and discrimination. In: International Conference on Pattern Recognition (ICPR), vol. 3, pp. 129–132 (2006)Google Scholar
  19. 19.
    Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: IEEE International Conference on Computer Vision, pp. 2720–2727 (2013)Google Scholar
  20. 20.
    Luo, W., Liu, W., Gao, S.: Remembering history with convolutional LSTM for anomaly detection. In: International Conference on Multimedia and Expo (ICME), pp. 439–444 (2017)Google Scholar
  21. 21.
    Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked RNN framework. In: IEEE International Conference on Computer Vision, pp. 341–349 (2017)Google Scholar
  22. 22.
    Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1975–1981. IEEE (2010)Google Scholar
  23. 23.
    Nguyen, T.N., Meunier, J.: Anomaly detection in video sequence with appearance-motion correspondence. In: IEEE International Conference on Computer Vision (ICCV), pp. 1273–1283 (2019)Google Scholar
  24. 24.
    Perera, P., Nallapati, R., Xiang, B.: OCGAN: one-class novelty detection using GANs with constrained latent representations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2898–2906 (2019)Google Scholar
  25. 25.
    Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems, pp. 1137–1144 (2007)Google Scholar
  26. 26.
    Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N.: Abnormal event detection in videos using generative adversarial nets. In: IEEE International Conference on Image Processing (ICIP), pp. 1577–1581 (2017)Google Scholar
  27. 27.
    Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: International Conference on Machine Learning (ICML), pp. 833–840 (2011)Google Scholar
  28. 28.
    Ruff, L., et al.: Deep one-class classification. In: International Conference on Machine Learning, pp. 4393–4402 (2018)Google Scholar
  29. 29.
    Ruff, L., et al.: Deep semi-supervised anomaly detection. In: International Conference on Learning Representations (ICLR) (2020)Google Scholar
  30. 30.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)Google Scholar
  31. 31.
    Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)Google Scholar
  32. 32.
    Tu, Z., et al.: Multi-stream CNN: learning representations based on human-related regions for action recognition. Pattern Recogn. 79, 32–43 (2018)CrossRefGoogle Scholar
  33. 33.
    Tu, Z., et al.: A survey of variational and CNN-based optical flow techniques. Sig. Process. Image Commun. 72, 9–24 (2019)CrossRefGoogle Scholar
  34. 34.
    Tung, F., Zelek, J.S., Clausi, D.A.: Goal-based trajectory analysis for unusual behaviour detection in intelligent surveillance. Image Vis. Comput. 29(4), 230–240 (2011)CrossRefGoogle Scholar
  35. 35.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning (ICML), pp. 1096–1103 (2008)Google Scholar
  36. 36.
    Wang, L., et al.: Temporal segment networks for action recognition in videos. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2740–2755 (2018)CrossRefGoogle Scholar
  37. 37.
    Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016)Google Scholar
  38. 38.
    Xu, D., Yan, Y., Ricci, E., Sebe, N.: Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput. Vis. Image Underst. 156, 117–127 (2017)CrossRefGoogle Scholar
  39. 39.
    Yan, M., Meng, J., Zhou, C., Tu, Z., Tan, Y.P., Yuan, J.: Detecting spatiotemporal irregularities in videos via a 3D convolutional autoencoder. J. Vis. Commun. Image Represent. 67, 102747 (2020)CrossRefGoogle Scholar
  40. 40.
    Yu, T., Ren, Z., Li, Y., Yan, E., Xu, N., Yuan, J.: Temporal structure mining for weakly supervised action detection. In: IEEE International Conference on Computer Vision, pp. 5522–5531 (2019)Google Scholar
  41. 41.
    Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-\(L^1\) optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 214–223. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74936-3_22CrossRefGoogle Scholar
  42. 42.
    Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3313–3320 (2011)Google Scholar
  43. 43.
    Zimek, A., Schubert, E., Kriegel, H.P.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. 5(5), 363–387 (2012)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Wuhan UniversityWuhanChina
  2. 2.Central China Normal UniversityWuhanChina
  3. 3.State University of New York at BuffaloBuffaloUSA

Personalised recommendations