Predicting humans future motion trajectories in video streams using generative adversarial network

Hassan, Muhammad Ahmed; Khan, Muhammad Usman Ghani; Iqbal, Razi; Riaz, Omer; Bashir, Ali Kashif; Tariq, Usman

doi:10.1007/s11042-021-11457-z

Predicting humans future motion trajectories in video streams using generative adversarial network

1158T: Role of Computer Vision in Smart Cities: Applications and Research Challenges
Published: 13 September 2021

Volume 83, pages 15289–15311, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Muhammad Ahmed Hassan^1,2,
Muhammad Usman Ghani Khan^1,2,
Razi Iqbal ORCID: orcid.org/0000-0003-0513-3665³,
Omer Riaz⁴,
Ali Kashif Bashir⁵ &
…
Usman Tariq⁶

Abstract

Understanding the behavior of human motion in social environments is important for various domains of a smart city, e.g, smart transportation, automatic navigation of service robots, efficient navigation of autonomous cars and surveillance systems. Examining past trajectories or environmental factors alone are not enough to address this problem. We propose a novel methodology to predict future motion trajectories of humans based on past attitude of individuals, crowd attitude and environmental context. Many researchers have proposed different techniques based on different features extraction and features fusion to predict the future motion trajectory. They used traditional machine learning algorithms like SVM,social forces, probabilistic models and LSTM to analyze the heuristic motion trajectories but they didn’t consider the other environmental factors e.g relative positions of other humans present in environment and positions of objects present in environment which can affect the motion trajectories of humans. We intend to achieve this goal by employing Long Short Term Memory(LSTM) units to analyze motion histories, convolution neural networks to environmental facts e.g. human-human, human-object interaction and relative positioning of 80 different objects including pedestrians and generative adversarial networks(GANs) to predict possible future motion paths. Our proposed method achieved 70% lower Average Displacement Error(ADE) and 41% lower Final Displacement Error(FDE) in comparison to other state of the art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Deepfake video detection: challenges and opportunities

Article Open access 29 May 2024

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Article 04 June 2022

Probabilistic spatio-temporal graph convolutional network for traffic forecasting

Article 31 May 2024

References

Alahi A, Goel K, Ramanathan V, Robicquet A, Fei-Fei L, Savarese S (2016) Social lstm: human trajectory prediction in crowded spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 961–971
Google Scholar
Ali A, Rafique H, Arshad T, Alqarni MA, Chauhdary SH, Bashir AK (2019) A fractal-based authentication technique using sierpinski triangles in smart devices. Sensors 19(3):678
Article Google Scholar
Azad MA, Morla R (2013) Caller-rep: detecting unwanted calls with caller social strength. Comput Secur 39:219–236
Article Google Scholar
Azad MA, Alazab M, Riaz F, Arshad J, Abullah T (2020) Socioscope: I know who you are, a robo, human caller or service number. Futur Gener Comput Syst 105:297–307
Article Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:http://arxiv.org/abs/1409.0473
Ballan L, Castaldo F, Alahi A, Palmieri F, Savarese S (2016) Knowledge transfer for scene-specific motion prediction. In: European conference on computer vision. Springer, pp 697–713
Google Scholar
Bhatti MH, Khan J, Khan MUG, Iqbal R, Aloqaily M, Jararweh Y, Gupta B (2019) Soft computing-based eeg classification by optimal feature selection and neural networks. IEEE Trans Ind Inform 15(10):5747–5754
Article Google Scholar
Bush PCM (2019) Police with the latest information on the mosque shootings. https://www.rnz.co.nz/news/national/384896/police-with-the-latest-information-on-the-mosque-shootings,
Chathuramali KM, Rodrigo R (2012) Faster human activity recognition with svm. In: International conference on advances in ICT for emerging regions (ICTer2012). IEEE, pp 197–203
Chapter Google Scholar
Chorowski J, Bahdanau D, Cho K, Bengio Y (2014) End-to-end continuous speech recognition using attention-based recurrent nn: first results. arXiv:http://arxiv.org/abs/1412.1602
Chung J, Kastner K, Dinh L, Goel K, Courville A, Bengio Y (2015) A recurrent latent variable model for sequential data. In: Advances in neural information processing systems, pp 2980–2988
Google Scholar
Coscia P, Castaldo F, Palmieri FA, Ballan L, Alahi A, Savarese S (2016) Point-based path prediction from polar histograms. In: 2016 19th international conference on information fusion (FUSION). IEEE, pp 1961–1967
Google Scholar
Deng J, Dong W, Socher R, Li L -J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR09
Google Scholar
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Google Scholar
Fernando T, Denman S, Sridharan S, Fookes C (2018) Soft+ hardwired attention: an lstm framework for human trajectory prediction and abnormal event detection. Neural Netw 108:466–478
Article Google Scholar
Gambrell J, Aya Batrawy AP (2015) New tally shows at least 1,621 killed in saudi hajj tragedy. https://www.businessinsider.com/ap-new-tally-shows-at-least-1621-killed-in-saudi-hajj-tragedy-2015-10
Gashteroodkhani O, Majidi M, Etezadi-Amoli M, Nematollahi A, Vahidi B (2019) A hybrid svm-tt transform-based method for fault location in hybrid transmission lines with underground cables. Electr Power Syst Res 170:205–214
Article Google Scholar
Goel K, Robicquet A (2015) Learning causalities behind human trajectories. In: Conference on computer vision and pattern recognition
Google Scholar
Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: International conference on machine learning, pp 1764–1772
Google Scholar
Gupta A, Johnson J, Fei-Fei L, Savarese S, Alahi A (2018) Social gan: socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2255–2264
Google Scholar
He Z, Jin L (2009) Activity recognition from acceleration data based on discrete consine transform and svm. In: 2009 IEEE international conference on systems, man and cybernetics. IEEE, pp 5041–5044
Chapter Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hussain CS, Park M -S, Bashir AK, Shah SC, Lee J (2013) A collaborative scheme for boundary detection and tracking of continuous objects in wsns. Intell Autom Soft Comput 19(3):439–456
Article Google Scholar
Jiang S, Lian M, Lu C, Ruan S, Wang Z, Chen B (2019) Svm-ds fusion based soft fault detection and diagnosis in solar water heaters. Energy Explor Exploit 37(3):1125–1146
Article Google Scholar
Khan MZ, Harous S, Hassan SU, Khan MUG, Iqbal R, Mumtaz S (2019) Deep unified model for face recognition based on convolution neural network and edge computing. IEEE Access 7:72622–72633
Article Google Scholar
Karpathy A, Joulin A, Fei-Fei LF (2014) Deep fragment embeddings for bidirectional image sentence mapping. In: Advances in neural information processing systems, pp 1889–1897
Google Scholar
Khan MZ, Jabeen S, ul Hassan S, Hassan M, Khan MUG (2019) Video summarization using cnn and bidirectional lstm by utilizing scene boundary detection. In: 2019 International conference on applied and engineering mathematics (ICAEM). IEEE, pp 197–202
Chapter Google Scholar
Khan G, Jabeen S, Khan MZ, Khan MUG, Iqbal R (2020) Blockchain-enabled deep semantic video-to-video summarization for iot devices. Comput Electr Eng 81:106524
Article Google Scholar
Kim B, Pineau J (2016) Socially adaptive path planning in human environments using inverse reinforcement learning. Int J Social Robot 8(1):51–66
Article Google Scholar
Lee N, Choi W, Vernaza P, Choy CB, Torr PH, Chandraker M (2017) Desire: distant future prediction in dynamic scenes with interacting agents. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 336–345
Google Scholar
Lin T -Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Google Scholar
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833
Google Scholar
Luber M, Stork JA, Tipaldi GD, Arras KO (2010) People tracking with human motion predictions from social forces. In: 2010 IEEE international conference on robotics and automation. IEEE, pp 464–469
Chapter Google Scholar
Master N (2010) Intentional homicide, number and rate per 100,000 population. https://www.nationmaster.com/country-info/stats/Crime/Violent-crime/Murder-rate
Peltier E, Breeden A (2010) France declares strasbourg shooting an act of terrorism. https://www.nytimes.com/2018/12/12/world/europe/france-strasbourg-shooting.html
Qassim H, Verma A, Feinzimer D (2018) Compressed residual-vgg16 cnn model for big data places image recognition. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE, pp 169–175
Chapter Google Scholar
Sadeghian A, Kosaraju V, Sadeghian A, Hirose N, Rezatofighi H, Savarese S (2019) Sophie: an attentive gan for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1349–1358
Google Scholar
Saleem S, Dilawari A, Khan UG, Iqbal R, Wan S, Umer T (2019) Stateful human-centered visual captioning system to aid video surveillance. Comput Electr Eng 78:108–119
Article Google Scholar
Satake S, Kanda T, Glas DF, Imai M, Ishiguro H, Hagita N (2009) How to approach humans?: strategies for social robots to initiate interaction. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, pp 109–116
Chapter Google Scholar
Shu T, Todorovic S, Zhu S -C (2017) Cern: confidence-energy recurrent network for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5523–5531
Google Scholar
Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International conference on machine learning, pp 843–852
Google Scholar
Sultan S, Javed A, Irtaza A, Dawood H, Dawood H, Bashir AK (2019) A hybrid egocentric video summarization method to improve the healthcare for alzheimer patients. J Ambient Intell Hum Comput 10(10):4197–4206
Article Google Scholar
Vasquez D, Large F, Fraichard T, Laugier C (2004) High-speed autonomous navigation with motion prediction for unknown moving obstacles. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE cat. no. 04CH37566), vol 1. IEEE, pp 82–87
Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Google Scholar

Download references

Acknowledgments

Financial support for this study was provided by a grant from the National Center For Artificial Intelligence at University of Engineering and Technology, Lahore, Pakistan. The authors wish to thank Al-Khawarizimi Institute of Computer Science, UET Lahore for providing research platform and technical support.

Author information

Authors and Affiliations

Department of Computer Science, UET, Lahore, Pakistan
Muhammad Ahmed Hassan & Muhammad Usman Ghani Khan
National Centre of Artificial Intelligence, KICS, UET, Lahore, Pakistan
Muhammad Ahmed Hassan & Muhammad Usman Ghani Khan
Al-Khawarizmi Institute of Computer Science, UET, Lahore, Pakistan
Razi Iqbal
Islamia University, Bahawalpur, Pakistan
Omer Riaz
Manchester Metropolitan University, Manchester, UK
Ali Kashif Bashir
Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia
Usman Tariq

Authors

Muhammad Ahmed Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Usman Ghani Khan
View author publications
You can also search for this author in PubMed Google Scholar
Razi Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Omer Riaz
View author publications
You can also search for this author in PubMed Google Scholar
Ali Kashif Bashir
View author publications
You can also search for this author in PubMed Google Scholar
Usman Tariq
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Ahmed Hassan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Financial support for this study was provided by agrant from the National Center For Artificial Intelligence at University of Engineering and Technology, Lahore, Pakistan

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassan, M.A., Khan, M.U.G., Iqbal, R. et al. Predicting humans future motion trajectories in video streams using generative adversarial network. Multimed Tools Appl 83, 15289–15311 (2024). https://doi.org/10.1007/s11042-021-11457-z

Download citation

Received: 27 February 2020
Revised: 18 January 2021
Accepted: 19 August 2021
Published: 13 September 2021
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-021-11457-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting humans future motion trajectories in video streams using generative adversarial network

Abstract

Access this article

Similar content being viewed by others

Deepfake video detection: challenges and opportunities

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Probabilistic spatio-temporal graph convolutional network for traffic forecasting

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting humans future motion trajectories in video streams using generative adversarial network

Abstract

Access this article

Similar content being viewed by others

Deepfake video detection: challenges and opportunities

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Probabilistic spatio-temporal graph convolutional network for traffic forecasting

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation