Abstract
Automatically identifying the most interesting content in a long video remains a challenging task. Event detection is an important aspect of soccer game research. In this paper, we propose a model that is able to detect events in long soccer games with a single pass through the video. Combined with replay detection, we generate story clips, which contain more complete temporal context, meeting audiences’ needs. We also introduce a soccer game dataset that contains 222 broadcast soccer videos, totaling 170 video hours. The dataset covers three annotation types: (1) shot annotations (type and boundary), (2) event annotations (with 11 event labels), and (3) story annotations (with 15 story labels). Finally, we report the performance of the proposed model for soccer events and story analysis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
D’Orazio, T., Leo, M.: A review of vision-based systems for soccer video analysis. Pattern Recognition (2010)
Karpathy, A., Toderici, S. Shetty, T. Leung, R. Sukthankar, Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Jiang, Y.-G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: THUMOS Challenge: Action Recognition with a Large Number of Classes (2014). http://crcv.ucf.edu/THUMOS14/
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles. J.: Activitynet: a large-scale video benchmark for human activity understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2011)
Over, P., Fiscus, J., Sanders, G., Joy, D., Michel, M., Awad, G., Smeaton, A., Kraaij, W., Quénot, G.: Trecvid 2014–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID (2014)
Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Yang, X., Molchanov, P., Kautz, J.: Multilayer and multimodal fusion of deep neural networks for video classification. In: Proceedings of the 2016 ACM on Multimedia Conference (2016)
Sun, L., Jia, K., Yeung, D.-Y., Shi, B.E.: Human action recognition using factorized spatio-temporal convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Wang, Y., Long, M., Wang, J., Yu, P.S.: Spatiotemporal pyramid network for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR, abs/1412.0767 (2014)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Li, Q., Qiu, Z., Yao, T., Mei, T., Rui, Y., Luo, J.: Action recognition by learning deep multi-granular spatio-temporal video representation. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (2016)
Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. CoRR, abs/1502.04681 (2015)
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Doman, K., Tomita, T., Ide, I., Deguchi, D., Murase, H.: Event detection based on twitter enthusiasm degree for generating a sports highlight video. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014)
Tavassolipour, M., Karimian, M., Kasaei, S.: Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans. Circuits Syst. Video Technol. 24(2), 291–304 (2014)
Kolekar, M.H., Sengupta, S.: Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Trans. Broadcast. 61(2), 195–209 (2015)
Arbat, S., Sinha, S.K., Shikha, B.K.: Event detection in broadcast soccer video by detecting replays. Int. J. Sci. Technol. Res. 3(5), 282–285 (2014)
Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori: A hierarchical deep temporal model for group activity recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Shou, Z., Wang, D., Chang: Temporal action localization in untrimmed videos via multi-stage CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recogn. Challenge 1(2), 2 (2014)
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.-F.: CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Escorcia, Victor, Caba Heilbron, Fabian, Niebles, J.C., Ghanem, Bernard: DAPs: deep action proposals for action understanding. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_47
Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Niebles, J.C.: Dense-captioning events in videos. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Yeung, S., Russakovsky, O., Jin, N., Andriluka, M., Mori, G., Li, F.-F.: Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. CoRR (2015)
Gan, C., Wang, N., Yang, Y., Yeung, D.-Y., Hauptmann, A.G.: Devnet: a deep event network for multimedia event detection and evidence recounting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Acknowledgments
We gratefully acknowledge the granted financial support from the National Natural Science Foundation of China (No. 61572211, 61173114, 61202300).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, J., Lei, A., Hu, Y. (2019). Soccer Video Event Detection Based on Deep Learning. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-05716-9_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)