Surveillance video online prediction using multilayer ELM with object principal trajectory

  • Haiyang Yu
  • Jian Wang
  • Xiaoying SunEmail author
Original Paper


For online prediction of surveillance video, how to design a valid machine learning model is a challenging problem. To deal with the issue, a multilayer ELM with object principal trajectory has been proposed. In this scheme, in order to support dynamic semantic representation between adjacent frames, the temporal and spatial characteristics have been taken into consideration. And after calculated the coordinate distance by K-means algorithm, the objective regions can be separated at the pixel level. Then, the object moving trend is determined according to the principal trajectory of interest area. Finally, multilayer ELM is adopted to quantify the new shape characteristics. This deep neural network helps generate the new frame sequence enough to be true. The proposed method not only recognizes multiple objects with different movement directions, but also effectively identifies subtle semantic features. The whole forecasting process avoids the trial and error caused by user intervention, which makes the model suitable for online environment. Numerical experiments are conducted on two different kinds of surveillance video datasets. The result is shown that the proposed algorithm has better performance than other state-of-the-art methods.


Video sequence prediction Multilayer ELM Multiple objects Principal trajectory 



The work was supported by the National Key Research Project of China under Grant No. 2016YFB1001304, the National Natural Science Foundation of China under Grant 61572229, the JLUSTIRT High-level Innovation Team, and the Fundamental Research Funds for Central Universities under Grant No. 2017TD-19. The authors gratefully acknowledge financial support from the Research Centre for Intelligent Signal Identification and Equipment, Jilin Province.


  1. 1.
    Tripathi, R.K., Jalal, A.S., Agrawal, S.C.: Suspicious human activity recognition: a review. Artif. Intell. Rev. 10, 1–57 (2017)Google Scholar
  2. 2.
    Zhang, R., Liu, X., Hu, J., et al.: A fast method for moving object detection in video surveillance image. Signal Image Video Process. 11(5), 841–848 (2017)CrossRefGoogle Scholar
  3. 3.
    Bahmani, S., Romberg, J.: Compressive deconvolution in random mask imaging. IEEE Trans. Comput. Imaging 1(4), 236–246 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Huang, T.S.: Image Sequence Analysis, vol. 5. Springer, Berlin (2013)Google Scholar
  5. 5.
    Zhang, X., Tian, Y., Huang, T., et al.: Optimizing the hierarchical prediction and coding in HEVC for surveillance and conference videos with background modeling. IEEE Trans. Image Process. 23(10), 4511–4526 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Ibrahim, A., Tharwat, A., Gaber, T., Hassanien, A.E.: Optimized superpixel and AdaBoost classifier for human thermal face recognition. Signal Image Video Process. 12(4), 711–719 (2018)CrossRefGoogle Scholar
  7. 7.
    Tian, Z., Zheng, N., Xue, J., et al.: Video object segmentation with shape cue based on spatiotemporal superpixel neighbourhood. IET Comput. Vis. 8(1), 16–25 (2014)CrossRefGoogle Scholar
  8. 8.
    Alibouch, B., Radgui, A., Demonceaux, C., et al.: A phase-based framework for optical flow estimation on omnidirectional images. Signal Image Video Process. 10(2), 285–292 (2016)CrossRefGoogle Scholar
  9. 9.
    Guo, D., Li, W., Fang, X.: Capturing temporal structures for video captioning by spatio-temporal contexts and channel attention mechanism. Neural Process. Lett. 46, 1–16 (2017)CrossRefGoogle Scholar
  10. 10.
    Antony, A., Sreelekha, G.: Performance enhancement of HEVC lossless mode using sample-based angular and planar predictions. Signal Image Video Process. 11(6), 1057–1064 (2017)CrossRefGoogle Scholar
  11. 11.
    Diaz-Honrubia, A.J., Martinez, J.L., Cuenca, P.: A fast intra H. 264/AVC to HEVC transcoding system. Multimed. Tools Appl. 77(5), 6367–6384 (2018)CrossRefGoogle Scholar
  12. 12.
    Dey, B., Kundu, M.K.: Efficient foreground extraction from HEVC compressed video for application to real-time analysis of surveillance ‘big’data. IEEE Trans. Image Process. 24(11), 3574–3585 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Kaviani, H.R., Shirani, S.: Frame rate upconversion using optical flow and patch-based reconstruction. IEEE Trans. Circuits Syst. Video Technol. 26(9), 1581–1594 (2016)CrossRefGoogle Scholar
  14. 14.
    Yin, Y., Zhao, Y., Zhang, B., Li, C., Guo, S.: Enhancing ELM by Markov boundary based feature selection. Neurocomputing 261, 57–69 (2017)CrossRefGoogle Scholar
  15. 15.
    Tavakoli, H.R., Borji, A., Laaksonen, J., Rahtu, E.: Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features. Neurocomputing 244, 10–18 (2017)CrossRefGoogle Scholar
  16. 16.
    Jia, B., Feng, W., Zhu, M.: Obstacle detection in single images with deep neural networks. Signal Image Video Process. 10(6), 1033–1040 (2016)CrossRefGoogle Scholar
  17. 17.
    Srivastava, N., Mansimov, E., Salakhudinov, R. Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, vol. 6, pp. 843–852 (2015)Google Scholar
  18. 18.
    Zhao, F., Feng, J., Zhao, J., et al.: Robust LSTM-autoencoders for face de-occlusion in the wild. IEEE Trans. Image Process. 27(2), 778–790 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Zhao, Z., Song, Y., Su, F.: Specific video identification via joint learning of latent semantic concept, scene and temporal structure. Neurocomputing 208, 378–386 (2016)CrossRefGoogle Scholar
  20. 20.
    Li, H., Trocan, M.: Deep neural network based single pixel prediction for unified video coding. Neurocomputing 272, 558–570 (2018)CrossRefGoogle Scholar
  21. 21.
    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)CrossRefGoogle Scholar
  22. 22.
    Sahin, S.O., Kozat, S.S.: Nonuniformly sampled data processing using LSTM networks. IEEE Trans. Neural Netw. Learn. Syst. (online)Google Scholar
  23. 23.
    Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Greff, K., Srivastava, R.K., Koutník, J., et al.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 10(28), 2222–2232 (2017)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Tang, J., Deng, C., Huang, G.B.: Extreme learning machine for multilayer perceptron. IEEE Trans. Neural Netw. Learn. Syst. 27(4), 809–821 (2016)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Wu, H.C.: The Karush–Kuhn–Tucker optimality conditions in an optimization problem with interval-valued objective function. Eur. J. Oper. Res. 176(1), 46–59 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Dataset: Mixtures of Dynamic Textures.: Statistical Visual Computing Laboratory (SVCL) at UCSD.
  29. 29.
    Dataset: Detection of Moving Objects.: Department of Advanced Information Technology, Kyushu University.
  30. 30.
    Kim, S., Pak, D., Lee, S.: SSIM-based distortion metric for film grain noise in HEVC. Signal Image Video Process. 12(3), 489–496 (2018)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Communication EngineeringJilin UniversityChangchunChina
  2. 2.College of Computer Science and TechnologyJilin UniversityChangchunChina
  3. 3.Department of Intelligent VehicleChina Automotive Engineering Research Institute (CAERI)ChongqingChina

Personalised recommendations