Soccer Video Event Detection Based on Deep Learning

Yu, Junqing; Lei, Aiping; Hu, Yangliu

doi:10.1007/978-3-030-05716-9_31

Soccer Video Event Detection Based on Deep Learning

Junqing Yu^19,20,
Aiping Lei¹⁹ &
Yangliu Hu¹⁹

Conference paper
First Online: 11 December 2018

2768 Accesses
17 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Abstract

Automatically identifying the most interesting content in a long video remains a challenging task. Event detection is an important aspect of soccer game research. In this paper, we propose a model that is able to detect events in long soccer games with a single pass through the video. Combined with replay detection, we generate story clips, which contain more complete temporal context, meeting audiences’ needs. We also introduce a soccer game dataset that contains 222 broadcast soccer videos, totaling 170 video hours. The dataset covers three annotation types: (1) shot annotations (type and boundary), (2) event annotations (with 11 event labels), and (3) story annotations (with 15 story labels). Finally, we report the performance of the proposed model for soccer events and story analysis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

D’Orazio, T., Leo, M.: A review of vision-based systems for soccer video analysis. Pattern Recognition (2010)
Google Scholar
Karpathy, A., Toderici, S. Shetty, T. Leung, R. Sukthankar, Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Jiang, Y.-G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: THUMOS Challenge: Action Recognition with a Large Number of Classes (2014). http://crcv.ucf.edu/THUMOS14/
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles. J.: Activitynet: a large-scale video benchmark for human activity understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Over, P., Fiscus, J., Sanders, G., Joy, D., Michel, M., Awad, G., Smeaton, A., Kraaij, W., Quénot, G.: Trecvid 2014–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID (2014)
Google Scholar
Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Yang, X., Molchanov, P., Kautz, J.: Multilayer and multimodal fusion of deep neural networks for video classification. In: Proceedings of the 2016 ACM on Multimedia Conference (2016)
Google Scholar
Sun, L., Jia, K., Yeung, D.-Y., Shi, B.E.: Human action recognition using factorized spatio-temporal convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Wang, Y., Long, M., Wang, J., Yu, P.S.: Spatiotemporal pyramid network for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR, abs/1412.0767 (2014)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Li, Q., Qiu, Z., Yao, T., Mei, T., Rui, Y., Luo, J.: Action recognition by learning deep multi-granular spatio-temporal video representation. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (2016)
Google Scholar
Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. CoRR, abs/1502.04681 (2015)
Google Scholar
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Doman, K., Tomita, T., Ide, I., Deguchi, D., Murase, H.: Event detection based on twitter enthusiasm degree for generating a sports highlight video. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014)
Google Scholar
Tavassolipour, M., Karimian, M., Kasaei, S.: Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans. Circuits Syst. Video Technol. 24(2), 291–304 (2014)
Article Google Scholar
Kolekar, M.H., Sengupta, S.: Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Trans. Broadcast. 61(2), 195–209 (2015)
Article Google Scholar
Arbat, S., Sinha, S.K., Shikha, B.K.: Event detection in broadcast soccer video by detecting replays. Int. J. Sci. Technol. Res. 3(5), 282–285 (2014)
Google Scholar
Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori: A hierarchical deep temporal model for group activity recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Shou, Z., Wang, D., Chang: Temporal action localization in untrimmed videos via multi-stage CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recogn. Challenge 1(2), 2 (2014)
Google Scholar
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.-F.: CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Escorcia, Victor, Caba Heilbron, Fabian, Niebles, J.C., Ghanem, Bernard: DAPs: deep action proposals for action understanding. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_47
Chapter Google Scholar
Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Niebles, J.C.: Dense-captioning events in videos. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Yeung, S., Russakovsky, O., Jin, N., Andriluka, M., Mori, G., Li, F.-F.: Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. CoRR (2015)
Google Scholar
Gan, C., Wang, N., Yang, Y., Yeung, D.-Y., Hauptmann, A.G.: Devnet: a deep event network for multimedia event detection and evidence recounting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar

Download references

Acknowledgments

We gratefully acknowledge the granted financial support from the National Natural Science Foundation of China (No. 61572211, 61173114, 61202300).

Author information

Authors and Affiliations

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Junqing Yu, Aiping Lei & Yangliu Hu
Center of Network and Computation, Huazhong University of Science and Technology, Wuhan, 430074, China
Junqing Yu

Authors

Junqing Yu
View author publications
You can also search for this author in PubMed Google Scholar
Aiping Lei
View author publications
You can also search for this author in PubMed Google Scholar
Yangliu Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junqing Yu .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, J., Lei, A., Hu, Y. (2019). Soccer Video Event Detection Based on Deep Learning. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-05716-9_31
Published: 11 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics