Advertisement

Soccer Video Event Detection Based on Deep Learning

  • Junqing YuEmail author
  • Aiping Lei
  • Yangliu Hu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)

Abstract

Automatically identifying the most interesting content in a long video remains a challenging task. Event detection is an important aspect of soccer game research. In this paper, we propose a model that is able to detect events in long soccer games with a single pass through the video. Combined with replay detection, we generate story clips, which contain more complete temporal context, meeting audiences’ needs. We also introduce a soccer game dataset that contains 222 broadcast soccer videos, totaling 170 video hours. The dataset covers three annotation types: (1) shot annotations (type and boundary), (2) event annotations (with 11 event labels), and (3) story annotations (with 15 story labels). Finally, we report the performance of the proposed model for soccer events and story analysis.

Keywords

Soccer video Event detection Deep learning Video analysis 

Notes

Acknowledgments

We gratefully acknowledge the granted financial support from the National Natural Science Foundation of China (No. 61572211, 61173114, 61202300).

References

  1. 1.
    D’Orazio, T., Leo, M.: A review of vision-based systems for soccer video analysis. Pattern Recognition (2010)Google Scholar
  2. 2.
    Karpathy, A., Toderici, S. Shetty, T. Leung, R. Sukthankar, Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  3. 3.
    Jiang, Y.-G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: THUMOS Challenge: Action Recognition with a Large Number of Classes (2014). http://crcv.ucf.edu/THUMOS14/
  4. 4.
    Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
  5. 5.
    Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles. J.: Activitynet: a large-scale video benchmark for human activity understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  6. 6.
    Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2011)Google Scholar
  7. 7.
    Over, P., Fiscus, J., Sanders, G., Joy, D., Michel, M., Awad, G., Smeaton, A., Kraaij, W., Quénot, G.: Trecvid 2014–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID (2014)Google Scholar
  8. 8.
    Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  9. 9.
    Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  10. 10.
    Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  11. 11.
    Yang, X., Molchanov, P., Kautz, J.: Multilayer and multimodal fusion of deep neural networks for video classification. In: Proceedings of the 2016 ACM on Multimedia Conference (2016)Google Scholar
  12. 12.
    Sun, L., Jia, K., Yeung, D.-Y., Shi, B.E.: Human action recognition using factorized spatio-temporal convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  13. 13.
    Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  14. 14.
    Wang, Y., Long, M., Wang, J., Yu, P.S.: Spatiotemporal pyramid network for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  15. 15.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  16. 16.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  17. 17.
    Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR, abs/1412.0767 (2014) Google Scholar
  18. 18.
    Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  19. 19.
    Li, Q., Qiu, Z., Yao, T., Mei, T., Rui, Y., Luo, J.: Action recognition by learning deep multi-granular spatio-temporal video representation. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (2016)Google Scholar
  20. 20.
    Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  21. 21.
    Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. CoRR, abs/1502.04681 (2015)Google Scholar
  22. 22.
    Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  23. 23.
    Doman, K., Tomita, T., Ide, I., Deguchi, D., Murase, H.: Event detection based on twitter enthusiasm degree for generating a sports highlight video. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014)Google Scholar
  24. 24.
    Tavassolipour, M., Karimian, M., Kasaei, S.: Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans. Circuits Syst. Video Technol. 24(2), 291–304 (2014)CrossRefGoogle Scholar
  25. 25.
    Kolekar, M.H., Sengupta, S.: Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Trans. Broadcast. 61(2), 195–209 (2015)CrossRefGoogle Scholar
  26. 26.
    Arbat, S., Sinha, S.K., Shikha, B.K.: Event detection in broadcast soccer video by detecting replays. Int. J. Sci. Technol. Res. 3(5), 282–285 (2014)Google Scholar
  27. 27.
    Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori: A hierarchical deep temporal model for group activity recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  28. 28.
    Shou, Z., Wang, D., Chang: Temporal action localization in untrimmed videos via multi-stage CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  29. 29.
    Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recogn. Challenge 1(2), 2 (2014)Google Scholar
  30. 30.
    Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  31. 31.
    Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.-F.: CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  32. 32.
    Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  33. 33.
    Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  34. 34.
    Escorcia, Victor, Caba Heilbron, Fabian, Niebles, J.C., Ghanem, Bernard: DAPs: deep action proposals for action understanding. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_47CrossRefGoogle Scholar
  35. 35.
    Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Niebles, J.C.: Dense-captioning events in videos. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  36. 36.
    Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  37. 37.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  38. 38.
    Yeung, S., Russakovsky, O., Jin, N., Andriluka, M., Mori, G., Li, F.-F.: Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. CoRR (2015)Google Scholar
  39. 39.
    Gan, C., Wang, N., Yang, Y., Yeung, D.-Y., Hauptmann, A.G.: Devnet: a deep event network for multimedia event detection and evidence recounting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.Center of Network and ComputationHuazhong University of Science and TechnologyWuhanChina

Personalised recommendations