Skip to main content

Soccer Video Event Detection Based on Deep Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Abstract

Automatically identifying the most interesting content in a long video remains a challenging task. Event detection is an important aspect of soccer game research. In this paper, we propose a model that is able to detect events in long soccer games with a single pass through the video. Combined with replay detection, we generate story clips, which contain more complete temporal context, meeting audiences’ needs. We also introduce a soccer game dataset that contains 222 broadcast soccer videos, totaling 170 video hours. The dataset covers three annotation types: (1) shot annotations (type and boundary), (2) event annotations (with 11 event labels), and (3) story annotations (with 15 story labels). Finally, we report the performance of the proposed model for soccer events and story analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. D’Orazio, T., Leo, M.: A review of vision-based systems for soccer video analysis. Pattern Recognition (2010)

    Google Scholar 

  2. Karpathy, A., Toderici, S. Shetty, T. Leung, R. Sukthankar, Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  3. Jiang, Y.-G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: THUMOS Challenge: Action Recognition with a Large Number of Classes (2014). http://crcv.ucf.edu/THUMOS14/

  4. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  5. Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles. J.: Activitynet: a large-scale video benchmark for human activity understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  6. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  7. Over, P., Fiscus, J., Sanders, G., Joy, D., Michel, M., Awad, G., Smeaton, A., Kraaij, W., Quénot, G.: Trecvid 2014–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID (2014)

    Google Scholar 

  8. Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  9. Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  10. Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  11. Yang, X., Molchanov, P., Kautz, J.: Multilayer and multimodal fusion of deep neural networks for video classification. In: Proceedings of the 2016 ACM on Multimedia Conference (2016)

    Google Scholar 

  12. Sun, L., Jia, K., Yeung, D.-Y., Shi, B.E.: Human action recognition using factorized spatio-temporal convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  13. Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  14. Wang, Y., Long, M., Wang, J., Yu, P.S.: Spatiotemporal pyramid network for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  15. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  16. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  17. Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR, abs/1412.0767 (2014)

    Google Scholar 

  18. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  19. Li, Q., Qiu, Z., Yao, T., Mei, T., Rui, Y., Luo, J.: Action recognition by learning deep multi-granular spatio-temporal video representation. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (2016)

    Google Scholar 

  20. Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  21. Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. CoRR, abs/1502.04681 (2015)

    Google Scholar 

  22. Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  23. Doman, K., Tomita, T., Ide, I., Deguchi, D., Murase, H.: Event detection based on twitter enthusiasm degree for generating a sports highlight video. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014)

    Google Scholar 

  24. Tavassolipour, M., Karimian, M., Kasaei, S.: Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans. Circuits Syst. Video Technol. 24(2), 291–304 (2014)

    Article  Google Scholar 

  25. Kolekar, M.H., Sengupta, S.: Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Trans. Broadcast. 61(2), 195–209 (2015)

    Article  Google Scholar 

  26. Arbat, S., Sinha, S.K., Shikha, B.K.: Event detection in broadcast soccer video by detecting replays. Int. J. Sci. Technol. Res. 3(5), 282–285 (2014)

    Google Scholar 

  27. Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori: A hierarchical deep temporal model for group activity recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  28. Shou, Z., Wang, D., Chang: Temporal action localization in untrimmed videos via multi-stage CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  29. Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recogn. Challenge 1(2), 2 (2014)

    Google Scholar 

  30. Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  31. Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.-F.: CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  32. Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  33. Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  34. Escorcia, Victor, Caba Heilbron, Fabian, Niebles, J.C., Ghanem, Bernard: DAPs: deep action proposals for action understanding. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_47

    Chapter  Google Scholar 

  35. Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Niebles, J.C.: Dense-captioning events in videos. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  36. Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  38. Yeung, S., Russakovsky, O., Jin, N., Andriluka, M., Mori, G., Li, F.-F.: Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. CoRR (2015)

    Google Scholar 

  39. Gan, C., Wang, N., Yang, Y., Yeung, D.-Y., Hauptmann, A.G.: Devnet: a deep event network for multimedia event detection and evidence recounting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

Download references

Acknowledgments

We gratefully acknowledge the granted financial support from the National Natural Science Foundation of China (No. 61572211, 61173114, 61202300).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junqing Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yu, J., Lei, A., Hu, Y. (2019). Soccer Video Event Detection Based on Deep Learning. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05716-9_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05715-2

  • Online ISBN: 978-3-030-05716-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics