Skip to main content
Log in

SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Soccer video analysis is the focus of sports video research as it receives widespread attention around the world. However, the lack of soccer datasets hinders the rapid development of this field. In this paper, we construct a soccer dataset named Soccer Dataset for Shot, Event, and Tracking (SSET), which can meet the research needs of shot segmentation, soccer event detection and player tracking. So far, we have collected 350 soccer videos, involving a variety of soccer games, for a total of 282 h. The dataset consists of three parts: (1) Shot, including five shot types and two shot transition types; (2) Event/Story, consisting of 11 fine-grained event and 15 coarse-grained story types where the story extends the event types with 4 extra types; (3) Bounding box of players, giving the coordinates, width and length of the bounding box. In addition, we develop an annotation tool called Sports Video Dataset Markup (SVDM) for sports video data annotation and hope that more people join our work. We conduct event detection and player tracking experiments on our dataset, and the results show that the existing works are not completely suitable for solving soccer video analysis tasks. Our dataset is available at http://media.hust.edu.cn/dataset.htm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abu-El-Haija S, Kothari N, Lee J, Natsev P, Toderici G, Varadarajan B, Vijayanarasimhan S (2016) Youtube-8m: a large-scale video classification benchmark. arXiv:1609.08675

  2. Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. Proc IEEE Int Conf Acoust, Speech Sign Proc 6583–6587

  3. Baraldi L, Grana C, Cucchiara R (2015) Shot and scene detection via hierarchical clustering for re-using broadcast video. Int Conf Comput Anal Images Patt 1–11

  4. Bertinetto L, Valmadre J, Henriques J F, Vedaldi A, Torr P H S (2016) Fully-convolutional Siamese networks for object tracking. Eur Conf Comput Vis Workshops 850–865

  5. Bettadapura V, Pantofaru C, Essa IA (2016) Leveraging contextual cues for generating basketball highlights. 24th ACM Int Conf Multimed 908-917

  6. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. IEEE Comput Soc Conf Comput Vis Patt Recog 13–18

  7. Buch S, Escorcia V, Shen C, Ghanem B, Niebles JC (2017) SST: single-stream temporal action proposals. IEEE Conf Comput Vis Patt Recog (CVPR) 6373-6382

  8. Cernekova Z, Pitas I, Nikou C (2016) Information theory-based shot cut/fade detection and video summarization. IEEE Trans Circ Syst Video Technol 16(1):82–91

    Article  Google Scholar 

  9. Danelljan M, Häger G, Khan FS, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. IEEE Int Conf Comput Vis (ICCV) 4310-4318

  10. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. Eur Conf Comput Vis 472–488

  11. Danelljan M, Robinson A, Khan FS, Felsberg M (2017) ECO: efficient convolution operators for tracking. IEEE Conf Comput Vis Patt Recog (CVPR) 6931-6939

  12. Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. IEEE Conf Comput Vis Patt Recog (CVPR) 2625-2634

  13. Ekin A, Tekalp AM (2003) Shot type classification by dominant color for sports video segmentation and summarization. IEEE Int Conf Acoust, Speech, Sign Proc 173–176

  14. Fan H et al. (2019) LaSOT: a high-quality benchmark for large-scale single object tracking. IEEE Conf On Comput Vis Patt Recog (CVPR) 5374-5383

  15. Galoogahi H K, Fagg A, Huang C, Ramanan D and Lucey S (2017) Need for speed: a benchmark for higher frame rate object tracking. IEEE Int Conf Comput Vis (ICCV) 1134-1143

  16. Gan C, Wang N, Yang Y, Yeung DY, Hauptmann AG (2015) DevNet: a deep event network for multimedia event detection and evidence recounting. IEEE Conf Comput Vis Patt Recog (CVPR) 2568-2577

  17. Ghanem B, Heilbron FC, Escorcia V, Niebles JC (2015) ActivityNet: a large-scale video benchmark for human activity understanding. In: IEEE Conf Comput Vis Patt Recog (CVPR) 961–970

  18. Giancola S, Amine M, Dghaily T, Ghanem B (2018) SoccerNet: a scalable dataset for action spotting in soccer videos. IEEE/CVF Conf Comput Vis Patt Recog Workshops (CVPRW) 1711-1721

  19. Gorban A, Idrees H, Jiang Y-G, Zamir AR, Laptev I, Shah M, Sukthankar R (2015) THUMOS challenge: action recognition with a large number of classes

  20. Grigorios T, Mustafa J, Panagiotis T (2017) Goal!! Event detection in sports video. Electron Imaging, Comput Vis Appl Sports 15–20(6)

  21. Gygli M (2018) Ridiculously fast shot boundary detection with fully convolutional neural networks. Int Conf Content-based Multimed Index (CBMI) 1-4

  22. Hassanien A, Elgharib AM, Selim A, Hefeeda M, Matusik W (2017) Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks. arXiv:1705.03281

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conf Comput Vis Patt Recog (CVPR) 770-778

  24. Henriques JF, Caseiro R, Martins P, Batista JP (2012) Exploiting the Circulant structure of tracking-by-detection with kernels. Eur Conf Comput Vis 702–715

  25. Henriques JF, Caseiro R, Martins P, Batista J (2012) High-speed tracking with Kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  26. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  27. Homayounfar N, Fidler S, Urtasun R (2017) Sports field localization via deep structured models. IEEE Conf Comput Vis Patt Recog (CVPR) 4012–4020

  28. Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. IEEE Conf Comput Vis Patt Recog (CVPR) 1971-1980

  29. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  30. Jiang H, Lu Y, Xue J (2016) Automatic soccer video event detection based on a deep neural network combined CNN and RNN. IEEE 28th Int Conf Tools Artif Intel (ICTAI), 490-494

  31. Kapela R, McGuinness K, Swietlicka A, O'Connor N (2015) Real-time event detection in field sport videos. Comput Vis Sports 293–316

  32. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) large-scale video classification with convolutional neural networks. IEEE Conf Comput Vis Patt Recog 1725-1732

  33. Kristan M et al. (2015) The visual object tracking VOT2014 challenge results. Eur Conf Comput Vis Workshops 191–217

  34. Kuehne H, Jhuang H, Garrote E, Poggio TA, Serre T (2011) HMDB: a large video database for human motion recognition. Int Conf Comput Vis 2556–2563

  35. Lakshmi Priya GG, Domnic S (2012) Edge strength extraction using orthogonal vectors for shot boundary detection. 2nd Int Conf Commun, Comput Sec 247-254

  36. Lee K, Kölsch M (2015) Shot boundary detection with graph theory using Keypoint features and color histograms. IEEE Winter Conf Appl Comput Vis 1177–1184

  37. Li L, Duan L, Huang Q, Du J, Gao W (2009) A generic approach to classify sports video shots and its application in event detection. ICIMCS '09 Proc First Int Conf Int Multimed Comput Service 208–212

  38. Long X, Gan C, Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: purely attention based local feature integration for video classification. IEEE/CVF Conf Comput Vis Patt Recog 7834–7843

  39. Lu Z, Shi Y (2013) Fast video shot boundary detection based on SVD and pattern matching. IEEE Trans Image Process 22(12):5136–5145

    Article  MathSciNet  Google Scholar 

  40. Mohanta PP, Saha SK, Chanda B (2012) A model-based shot boundary detection technique using frame transition parameters. IEEE Trans Multimed 14(1):223–233

    Article  Google Scholar 

  41. Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for UAV tracking. Eur Conf Comput Vis 445–461

  42. Müller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. Eur Conf Comput Vis 310–372

  43. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. IEEE Conf Comput Vis Patt Recog (CVPR) 4293-4302

  44. Pappalardo L, Cintia P, Rossi A, Massucco E, Ferragina P, Pedreschi D, Giannotti F (2019) A public data set of spatio-temporal match events in soccer competitions. Sci Data 6:236

    Article  Google Scholar 

  45. Pettersen et al. (2014) Soccer video and player position dataset. MMSys '14 Proc 5th ACM Multimed Syst Conf 18–23

  46. Priya LGG, Domnic S (2014) Walsh-Hadamard transform kernel-based feature vector for shot boundary detection. IEEE Trans Image Process 23(12):5187–5197

    Article  MathSciNet  Google Scholar 

  47. Ramanathan V, Huang J, Abu-El-Haija S, Gorban AN, Murphy K, Fei-Fei L (2016) Detecting events and key actors in multi-person videos. IEEE Conf Comput Vis Patt Recog (CVPR) 3043-3053

  48. Ravinder M, Venugopal T (2016) Content-based cricket video shot classification using bag-of-visual-features. Artif Intel Evol Comput Eng Syst 599–606

  49. Rohrbach M, Amin S, Andriluka M, Schiele B (2012) A database for fine grained activity detection of cooking activities. In: IEEE Conference on Computer Vision and Pattern Recognition 1194–1201

  50. Shou Z, Wang D, Chang S-F (2016) Temporal action localization in untrimmed videos via multi-stage CNNs. IEEE Conf Comput Vis Patt Recog (CVPR) 1049-1058

  51. Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) CDC: convolutional-De-convolutional networks for precise temporal action localization in untrimmed videos. IEEE Conf Comput Vis Patt Recog (CVPR) 1417-1426

  52. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Int Conf Learn Represent

  53. Smeaton AF, Over P, Doherty AR (2010) Video shot boundary detection: seven years of trecvid activity. Comput Vis Image Underst 114(4):411–418

    Article  Google Scholar 

  54. Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468

    Article  Google Scholar 

  55. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402

  56. Sun B, Zhang D (2017) A method for video shot boundary detection based on HSV color histogram and DPHA feature. ICC '17 Proc Sec Int Conf Internet Things, Data Cloud Comput 34:1-34:4

  57. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. IEEE Conf Comput Vis Patt Recog (CVPR) 1-9

  58. Tang S, Feng L, Kuang Z, Chen Y, Zhang W (2018) Fast Video Shot Transition Localization with Deep Structured Models arXiv: 1808.04234

  59. Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using Bayesian network and copula. IEEE Trans Circ Syst Video Technol 24(2):291–304

    Article  Google Scholar 

  60. Teng Z, Xing J, Wang Q, Lang C, Feng S, Jin Y (2017) Robust object tracking based on temporal and spatial deep networks. IEEE Int Conf Comput Vis (ICCV) 1153-1162

  61. Tiwari M, Singhai R (2017) A review of detection and tracking of object from image and video sequences. Int J Comput Intell Res 13(5):745–765

    Google Scholar 

  62. Tong W, Song L, Yang X, Qu H, Xie R (2015) CNN-based shot boundary detection and video annotation. IEEE Int Sympos Broadband Multimed Syst Broadcast 1–5

  63. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. Neur Inform Proc Syst (NIPS) 809-817

  64. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks towards good practices for deep action recognition. Eur Conf Comput Vis 20–36

  65. Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. IEEE Conf Comput Vis Patt Recog (CVPR) 2411–2418

  66. Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  67. Wu L, Zhang S, Jian M, Zhao Z, Wang D (2018) Shot boundary detection with spatial-temporal convolutional neural networks. Chin Conf Patt Recog Comput Vis (PRCV) 479-491

  68. Yu J, Lei A, Song Z, Wang T, Cai H, Feng N (2018) Comprehensive dataset of broadcast soccer videos. IEEE Conf Multimed Inform Proc Retri (MIPR), pp 418-423

  69. Yu J, Lei A, Hu Y (2019) Soccer video event detection based on deep learning. Int Conf Multimed Model (MMM) 377-389

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Junqing Yu or Yi-Ping Phoebe Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, N., Song, Z., Yu, J. et al. SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos. Multimed Tools Appl 79, 28971–28992 (2020). https://doi.org/10.1007/s11042-020-09414-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09414-3

Keywords

Navigation