Skip to main content
Log in

Predicting humans future motion trajectories in video streams using generative adversarial network

  • 1158T: Role of Computer Vision in Smart Cities: Applications and Research Challenges
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Understanding the behavior of human motion in social environments is important for various domains of a smart city, e.g, smart transportation, automatic navigation of service robots, efficient navigation of autonomous cars and surveillance systems. Examining past trajectories or environmental factors alone are not enough to address this problem. We propose a novel methodology to predict future motion trajectories of humans based on past attitude of individuals, crowd attitude and environmental context. Many researchers have proposed different techniques based on different features extraction and features fusion to predict the future motion trajectory. They used traditional machine learning algorithms like SVM,social forces, probabilistic models and LSTM to analyze the heuristic motion trajectories but they didn’t consider the other environmental factors e.g relative positions of other humans present in environment and positions of objects present in environment which can affect the motion trajectories of humans. We intend to achieve this goal by employing Long Short Term Memory(LSTM) units to analyze motion histories, convolution neural networks to environmental facts e.g. human-human, human-object interaction and relative positioning of 80 different objects including pedestrians and generative adversarial networks(GANs) to predict possible future motion paths. Our proposed method achieved 70% lower Average Displacement Error(ADE) and 41% lower Final Displacement Error(FDE) in comparison to other state of the art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Alahi A, Goel K, Ramanathan V, Robicquet A, Fei-Fei L, Savarese S (2016) Social lstm: human trajectory prediction in crowded spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 961–971

    Google Scholar 

  2. Ali A, Rafique H, Arshad T, Alqarni MA, Chauhdary SH, Bashir AK (2019) A fractal-based authentication technique using sierpinski triangles in smart devices. Sensors 19(3):678

    Article  Google Scholar 

  3. Azad MA, Morla R (2013) Caller-rep: detecting unwanted calls with caller social strength. Comput Secur 39:219–236

    Article  Google Scholar 

  4. Azad MA, Alazab M, Riaz F, Arshad J, Abullah T (2020) Socioscope: I know who you are, a robo, human caller or service number. Futur Gener Comput Syst 105:297–307

    Article  Google Scholar 

  5. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:http://arxiv.org/abs/1409.0473

  6. Ballan L, Castaldo F, Alahi A, Palmieri F, Savarese S (2016) Knowledge transfer for scene-specific motion prediction. In: European conference on computer vision. Springer, pp 697–713

    Google Scholar 

  7. Bhatti MH, Khan J, Khan MUG, Iqbal R, Aloqaily M, Jararweh Y, Gupta B (2019) Soft computing-based eeg classification by optimal feature selection and neural networks. IEEE Trans Ind Inform 15(10):5747–5754

    Article  Google Scholar 

  8. Bush PCM (2019) Police with the latest information on the mosque shootings. https://www.rnz.co.nz/news/national/384896/police-with-the-latest-information-on-the-mosque-shootings,

  9. Chathuramali KM, Rodrigo R (2012) Faster human activity recognition with svm. In: International conference on advances in ICT for emerging regions (ICTer2012). IEEE, pp 197–203

    Chapter  Google Scholar 

  10. Chorowski J, Bahdanau D, Cho K, Bengio Y (2014) End-to-end continuous speech recognition using attention-based recurrent nn: first results. arXiv:http://arxiv.org/abs/1412.1602

  11. Chung J, Kastner K, Dinh L, Goel K, Courville A, Bengio Y (2015) A recurrent latent variable model for sequential data. In: Advances in neural information processing systems, pp 2980–2988

    Google Scholar 

  12. Coscia P, Castaldo F, Palmieri FA, Ballan L, Alahi A, Savarese S (2016) Point-based path prediction from polar histograms. In: 2016 19th international conference on information fusion (FUSION). IEEE, pp 1961–1967

    Google Scholar 

  13. Deng J, Dong W, Socher R, Li L -J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR09

    Google Scholar 

  14. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118

    Google Scholar 

  15. Fernando T, Denman S, Sridharan S, Fookes C (2018) Soft+ hardwired attention: an lstm framework for human trajectory prediction and abnormal event detection. Neural Netw 108:466–478

    Article  Google Scholar 

  16. Gambrell J, Aya Batrawy AP (2015) New tally shows at least 1,621 killed in saudi hajj tragedy. https://www.businessinsider.com/ap-new-tally-shows-at-least-1621-killed-in-saudi-hajj-tragedy-2015-10

  17. Gashteroodkhani O, Majidi M, Etezadi-Amoli M, Nematollahi A, Vahidi B (2019) A hybrid svm-tt transform-based method for fault location in hybrid transmission lines with underground cables. Electr Power Syst Res 170:205–214

    Article  Google Scholar 

  18. Goel K, Robicquet A (2015) Learning causalities behind human trajectories. In: Conference on computer vision and pattern recognition

    Google Scholar 

  19. Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: International conference on machine learning, pp 1764–1772

    Google Scholar 

  20. Gupta A, Johnson J, Fei-Fei L, Savarese S, Alahi A (2018) Social gan: socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2255–2264

    Google Scholar 

  21. He Z, Jin L (2009) Activity recognition from acceleration data based on discrete consine transform and svm. In: 2009 IEEE international conference on systems, man and cybernetics. IEEE, pp 5041–5044

    Chapter  Google Scholar 

  22. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  23. Hussain CS, Park M -S, Bashir AK, Shah SC, Lee J (2013) A collaborative scheme for boundary detection and tracking of continuous objects in wsns. Intell Autom Soft Comput 19(3):439–456

    Article  Google Scholar 

  24. Jiang S, Lian M, Lu C, Ruan S, Wang Z, Chen B (2019) Svm-ds fusion based soft fault detection and diagnosis in solar water heaters. Energy Explor Exploit 37(3):1125–1146

    Article  Google Scholar 

  25. Khan MZ, Harous S, Hassan SU, Khan MUG, Iqbal R, Mumtaz S (2019) Deep unified model for face recognition based on convolution neural network and edge computing. IEEE Access 7:72622–72633

    Article  Google Scholar 

  26. Karpathy A, Joulin A, Fei-Fei LF (2014) Deep fragment embeddings for bidirectional image sentence mapping. In: Advances in neural information processing systems, pp 1889–1897

    Google Scholar 

  27. Khan MZ, Jabeen S, ul Hassan S, Hassan M, Khan MUG (2019) Video summarization using cnn and bidirectional lstm by utilizing scene boundary detection. In: 2019 International conference on applied and engineering mathematics (ICAEM). IEEE, pp 197–202

    Chapter  Google Scholar 

  28. Khan G, Jabeen S, Khan MZ, Khan MUG, Iqbal R (2020) Blockchain-enabled deep semantic video-to-video summarization for iot devices. Comput Electr Eng 81:106524

    Article  Google Scholar 

  29. Kim B, Pineau J (2016) Socially adaptive path planning in human environments using inverse reinforcement learning. Int J Social Robot 8(1):51–66

    Article  Google Scholar 

  30. Lee N, Choi W, Vernaza P, Choy CB, Torr PH, Chandraker M (2017) Desire: distant future prediction in dynamic scenes with interacting agents. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 336–345

    Google Scholar 

  31. Lin T -Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

    Google Scholar 

  32. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833

    Google Scholar 

  33. Luber M, Stork JA, Tipaldi GD, Arras KO (2010) People tracking with human motion predictions from social forces. In: 2010 IEEE international conference on robotics and automation. IEEE, pp 464–469

    Chapter  Google Scholar 

  34. Master N (2010) Intentional homicide, number and rate per 100,000 population. https://www.nationmaster.com/country-info/stats/Crime/Violent-crime/Murder-rate

  35. Peltier E, Breeden A (2010) France declares strasbourg shooting an act of terrorism. https://www.nytimes.com/2018/12/12/world/europe/france-strasbourg-shooting.html

  36. Qassim H, Verma A, Feinzimer D (2018) Compressed residual-vgg16 cnn model for big data places image recognition. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE, pp 169–175

    Chapter  Google Scholar 

  37. Sadeghian A, Kosaraju V, Sadeghian A, Hirose N, Rezatofighi H, Savarese S (2019) Sophie: an attentive gan for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1349–1358

    Google Scholar 

  38. Saleem S, Dilawari A, Khan UG, Iqbal R, Wan S, Umer T (2019) Stateful human-centered visual captioning system to aid video surveillance. Comput Electr Eng 78:108–119

    Article  Google Scholar 

  39. Satake S, Kanda T, Glas DF, Imai M, Ishiguro H, Hagita N (2009) How to approach humans?: strategies for social robots to initiate interaction. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, pp 109–116

    Chapter  Google Scholar 

  40. Shu T, Todorovic S, Zhu S -C (2017) Cern: confidence-energy recurrent network for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5523–5531

    Google Scholar 

  41. Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International conference on machine learning, pp 843–852

    Google Scholar 

  42. Sultan S, Javed A, Irtaza A, Dawood H, Dawood H, Bashir AK (2019) A hybrid egocentric video summarization method to improve the healthcare for alzheimer patients. J Ambient Intell Hum Comput 10(10):4197–4206

    Article  Google Scholar 

  43. Vasquez D, Large F, Fraichard T, Laugier C (2004) High-speed autonomous navigation with motion prediction for unknown moving obstacles. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE cat. no. 04CH37566), vol 1. IEEE, pp 82–87

    Google Scholar 

  44. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

    Google Scholar 

  45. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

    Google Scholar 

Download references

Acknowledgments

Financial support for this study was provided by a grant from the National Center For Artificial Intelligence at University of Engineering and Technology, Lahore, Pakistan. The authors wish to thank Al-Khawarizimi Institute of Computer Science, UET Lahore for providing research platform and technical support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Ahmed Hassan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Financial support for this study was provided by agrant from the National Center For Artificial Intelligence at University of Engineering and Technology, Lahore, Pakistan

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hassan, M.A., Khan, M.U.G., Iqbal, R. et al. Predicting humans future motion trajectories in video streams using generative adversarial network. Multimed Tools Appl 83, 15289–15311 (2024). https://doi.org/10.1007/s11042-021-11457-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11457-z

Keywords

Navigation