SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos

Feng, Na; Song, Zikai; Yu, Junqing; Chen, Yi-Ping Phoebe; Zhao, Yizhu; He, Yunfeng; Guan, Tao

doi:10.1007/s11042-020-09414-3

SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos

Published: 07 August 2020

Volume 79, pages 28971–28992, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Na Feng¹,
Zikai Song¹,
Junqing Yu^1,2,
Yi-Ping Phoebe Chen³,
Yizhu Zhao¹,
Yunfeng He¹ &
…
Tao Guan¹

1366 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

Soccer video analysis is the focus of sports video research as it receives widespread attention around the world. However, the lack of soccer datasets hinders the rapid development of this field. In this paper, we construct a soccer dataset named Soccer Dataset for Shot, Event, and Tracking (SSET), which can meet the research needs of shot segmentation, soccer event detection and player tracking. So far, we have collected 350 soccer videos, involving a variety of soccer games, for a total of 282 h. The dataset consists of three parts: (1) Shot, including five shot types and two shot transition types; (2) Event/Story, consisting of 11 fine-grained event and 15 coarse-grained story types where the story extends the event types with 4 extra types; (3) Bounding box of players, giving the coordinates, width and length of the bounding box. In addition, we develop an annotation tool called Sports Video Dataset Markup (SVDM) for sports video data annotation and hope that more people join our work. We conduct event detection and player tracking experiments on our dataset, and the results show that the existing works are not completely suitable for solving soccer video analysis tasks. Our dataset is available at http://media.hust.edu.cn/dataset.htm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Soccer Video Event Detection Based on Deep Learning

Soccer Video Event Detection Using 3D Convolutional Networks and Shot Boundary Detection via Deep Feature Distance

A survey on event detection based video summarization for cricket

Article 02 April 2022

References

Abu-El-Haija S, Kothari N, Lee J, Natsev P, Toderici G, Varadarajan B, Vijayanarasimhan S (2016) Youtube-8m: a large-scale video classification benchmark. arXiv:1609.08675
Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. Proc IEEE Int Conf Acoust, Speech Sign Proc 6583–6587
Baraldi L, Grana C, Cucchiara R (2015) Shot and scene detection via hierarchical clustering for re-using broadcast video. Int Conf Comput Anal Images Patt 1–11
Bertinetto L, Valmadre J, Henriques J F, Vedaldi A, Torr P H S (2016) Fully-convolutional Siamese networks for object tracking. Eur Conf Comput Vis Workshops 850–865
Bettadapura V, Pantofaru C, Essa IA (2016) Leveraging contextual cues for generating basketball highlights. 24th ACM Int Conf Multimed 908-917
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. IEEE Comput Soc Conf Comput Vis Patt Recog 13–18
Buch S, Escorcia V, Shen C, Ghanem B, Niebles JC (2017) SST: single-stream temporal action proposals. IEEE Conf Comput Vis Patt Recog (CVPR) 6373-6382
Cernekova Z, Pitas I, Nikou C (2016) Information theory-based shot cut/fade detection and video summarization. IEEE Trans Circ Syst Video Technol 16(1):82–91
Article Google Scholar
Danelljan M, Häger G, Khan FS, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. IEEE Int Conf Comput Vis (ICCV) 4310-4318
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. Eur Conf Comput Vis 472–488
Danelljan M, Robinson A, Khan FS, Felsberg M (2017) ECO: efficient convolution operators for tracking. IEEE Conf Comput Vis Patt Recog (CVPR) 6931-6939
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. IEEE Conf Comput Vis Patt Recog (CVPR) 2625-2634
Ekin A, Tekalp AM (2003) Shot type classification by dominant color for sports video segmentation and summarization. IEEE Int Conf Acoust, Speech, Sign Proc 173–176
Fan H et al. (2019) LaSOT: a high-quality benchmark for large-scale single object tracking. IEEE Conf On Comput Vis Patt Recog (CVPR) 5374-5383
Galoogahi H K, Fagg A, Huang C, Ramanan D and Lucey S (2017) Need for speed: a benchmark for higher frame rate object tracking. IEEE Int Conf Comput Vis (ICCV) 1134-1143
Gan C, Wang N, Yang Y, Yeung DY, Hauptmann AG (2015) DevNet: a deep event network for multimedia event detection and evidence recounting. IEEE Conf Comput Vis Patt Recog (CVPR) 2568-2577
Ghanem B, Heilbron FC, Escorcia V, Niebles JC (2015) ActivityNet: a large-scale video benchmark for human activity understanding. In: IEEE Conf Comput Vis Patt Recog (CVPR) 961–970
Giancola S, Amine M, Dghaily T, Ghanem B (2018) SoccerNet: a scalable dataset for action spotting in soccer videos. IEEE/CVF Conf Comput Vis Patt Recog Workshops (CVPRW) 1711-1721
Gorban A, Idrees H, Jiang Y-G, Zamir AR, Laptev I, Shah M, Sukthankar R (2015) THUMOS challenge: action recognition with a large number of classes
Grigorios T, Mustafa J, Panagiotis T (2017) Goal!! Event detection in sports video. Electron Imaging, Comput Vis Appl Sports 15–20(6)
Gygli M (2018) Ridiculously fast shot boundary detection with fully convolutional neural networks. Int Conf Content-based Multimed Index (CBMI) 1-4
Hassanien A, Elgharib AM, Selim A, Hefeeda M, Matusik W (2017) Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks. arXiv:1705.03281
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conf Comput Vis Patt Recog (CVPR) 770-778
Henriques JF, Caseiro R, Martins P, Batista JP (2012) Exploiting the Circulant structure of tracking-by-detection with kernels. Eur Conf Comput Vis 702–715
Henriques JF, Caseiro R, Martins P, Batista J (2012) High-speed tracking with Kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Homayounfar N, Fidler S, Urtasun R (2017) Sports field localization via deep structured models. IEEE Conf Comput Vis Patt Recog (CVPR) 4012–4020
Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. IEEE Conf Comput Vis Patt Recog (CVPR) 1971-1980
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Jiang H, Lu Y, Xue J (2016) Automatic soccer video event detection based on a deep neural network combined CNN and RNN. IEEE 28th Int Conf Tools Artif Intel (ICTAI), 490-494
Kapela R, McGuinness K, Swietlicka A, O'Connor N (2015) Real-time event detection in field sport videos. Comput Vis Sports 293–316
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) large-scale video classification with convolutional neural networks. IEEE Conf Comput Vis Patt Recog 1725-1732
Kristan M et al. (2015) The visual object tracking VOT2014 challenge results. Eur Conf Comput Vis Workshops 191–217
Kuehne H, Jhuang H, Garrote E, Poggio TA, Serre T (2011) HMDB: a large video database for human motion recognition. Int Conf Comput Vis 2556–2563
Lakshmi Priya GG, Domnic S (2012) Edge strength extraction using orthogonal vectors for shot boundary detection. 2nd Int Conf Commun, Comput Sec 247-254
Lee K, Kölsch M (2015) Shot boundary detection with graph theory using Keypoint features and color histograms. IEEE Winter Conf Appl Comput Vis 1177–1184
Li L, Duan L, Huang Q, Du J, Gao W (2009) A generic approach to classify sports video shots and its application in event detection. ICIMCS '09 Proc First Int Conf Int Multimed Comput Service 208–212
Long X, Gan C, Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: purely attention based local feature integration for video classification. IEEE/CVF Conf Comput Vis Patt Recog 7834–7843
Lu Z, Shi Y (2013) Fast video shot boundary detection based on SVD and pattern matching. IEEE Trans Image Process 22(12):5136–5145
Article MathSciNet Google Scholar
Mohanta PP, Saha SK, Chanda B (2012) A model-based shot boundary detection technique using frame transition parameters. IEEE Trans Multimed 14(1):223–233
Article Google Scholar
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for UAV tracking. Eur Conf Comput Vis 445–461
Müller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. Eur Conf Comput Vis 310–372
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. IEEE Conf Comput Vis Patt Recog (CVPR) 4293-4302
Pappalardo L, Cintia P, Rossi A, Massucco E, Ferragina P, Pedreschi D, Giannotti F (2019) A public data set of spatio-temporal match events in soccer competitions. Sci Data 6:236
Article Google Scholar
Pettersen et al. (2014) Soccer video and player position dataset. MMSys '14 Proc 5th ACM Multimed Syst Conf 18–23
Priya LGG, Domnic S (2014) Walsh-Hadamard transform kernel-based feature vector for shot boundary detection. IEEE Trans Image Process 23(12):5187–5197
Article MathSciNet Google Scholar
Ramanathan V, Huang J, Abu-El-Haija S, Gorban AN, Murphy K, Fei-Fei L (2016) Detecting events and key actors in multi-person videos. IEEE Conf Comput Vis Patt Recog (CVPR) 3043-3053
Ravinder M, Venugopal T (2016) Content-based cricket video shot classification using bag-of-visual-features. Artif Intel Evol Comput Eng Syst 599–606
Rohrbach M, Amin S, Andriluka M, Schiele B (2012) A database for fine grained activity detection of cooking activities. In: IEEE Conference on Computer Vision and Pattern Recognition 1194–1201
Shou Z, Wang D, Chang S-F (2016) Temporal action localization in untrimmed videos via multi-stage CNNs. IEEE Conf Comput Vis Patt Recog (CVPR) 1049-1058
Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) CDC: convolutional-De-convolutional networks for precise temporal action localization in untrimmed videos. IEEE Conf Comput Vis Patt Recog (CVPR) 1417-1426
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Int Conf Learn Represent
Smeaton AF, Over P, Doherty AR (2010) Video shot boundary detection: seven years of trecvid activity. Comput Vis Image Underst 114(4):411–418
Article Google Scholar
Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402
Sun B, Zhang D (2017) A method for video shot boundary detection based on HSV color histogram and DPHA feature. ICC '17 Proc Sec Int Conf Internet Things, Data Cloud Comput 34:1-34:4
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. IEEE Conf Comput Vis Patt Recog (CVPR) 1-9
Tang S, Feng L, Kuang Z, Chen Y, Zhang W (2018) Fast Video Shot Transition Localization with Deep Structured Models arXiv: 1808.04234
Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using Bayesian network and copula. IEEE Trans Circ Syst Video Technol 24(2):291–304
Article Google Scholar
Teng Z, Xing J, Wang Q, Lang C, Feng S, Jin Y (2017) Robust object tracking based on temporal and spatial deep networks. IEEE Int Conf Comput Vis (ICCV) 1153-1162
Tiwari M, Singhai R (2017) A review of detection and tracking of object from image and video sequences. Int J Comput Intell Res 13(5):745–765
Google Scholar
Tong W, Song L, Yang X, Qu H, Xie R (2015) CNN-based shot boundary detection and video annotation. IEEE Int Sympos Broadband Multimed Syst Broadcast 1–5
Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. Neur Inform Proc Syst (NIPS) 809-817
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks towards good practices for deep action recognition. Eur Conf Comput Vis 20–36
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. IEEE Conf Comput Vis Patt Recog (CVPR) 2411–2418
Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article Google Scholar
Wu L, Zhang S, Jian M, Zhao Z, Wang D (2018) Shot boundary detection with spatial-temporal convolutional neural networks. Chin Conf Patt Recog Comput Vis (PRCV) 479-491
Yu J, Lei A, Song Z, Wang T, Cai H, Feng N (2018) Comprehensive dataset of broadcast soccer videos. IEEE Conf Multimed Inform Proc Retri (MIPR), pp 418-423
Yu J, Lei A, Hu Y (2019) Soccer video event detection based on deep learning. Int Conf Multimed Model (MMM) 377-389

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Na Feng, Zikai Song, Junqing Yu, Yizhu Zhao, Yunfeng He & Tao Guan
Center of Network and Computation, Huazhong University of Science and Technology, Wuhan, 430074, China
Junqing Yu
Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria, 3086, Australia
Yi-Ping Phoebe Chen

Authors

Na Feng
View author publications
You can also search for this author in PubMed Google Scholar
Zikai Song
View author publications
You can also search for this author in PubMed Google Scholar
Junqing Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Ping Phoebe Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yizhu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yunfeng He
View author publications
You can also search for this author in PubMed Google Scholar
Tao Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Junqing Yu or Yi-Ping Phoebe Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, N., Song, Z., Yu, J. et al. SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos. Multimed Tools Appl 79, 28971–28992 (2020). https://doi.org/10.1007/s11042-020-09414-3

Download citation

Received: 11 June 2019
Revised: 19 June 2020
Accepted: 21 July 2020
Published: 07 August 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11042-020-09414-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos

Abstract

Access this article

Similar content being viewed by others

Soccer Video Event Detection Based on Deep Learning

Soccer Video Event Detection Using 3D Convolutional Networks and Shot Boundary Detection via Deep Feature Distance

A survey on event detection based video summarization for cricket

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos

Abstract

Access this article

Similar content being viewed by others

Soccer Video Event Detection Based on Deep Learning

Soccer Video Event Detection Using 3D Convolutional Networks and Shot Boundary Detection via Deep Feature Distance

A survey on event detection based video summarization for cricket

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation