Skip to main content
Log in

Discovering recurring activity in temporal networks

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Recent advances in data-acquisition technologies have equipped team coaches and sports analysts with the capability of collecting and analyzing detailed data of team activity in the field. It is now possible to monitor a sports event and record information regarding the position of the players in the field, passing the ball, coordinated moves, and so on. In this paper we propose a new method to analyze such team activity data. Our goal is to segment the overall activity stream into a sequence of potentially recurrent modes, which reflect different strategies adopted by a team, and thus, help to analyze and understand team tactics. We model team activity data as a temporal network, that is, a sequence of time-stamped edges that capture interactions between players. We then formulate the problem of identifying a small number of team modes and segmenting the overall timespan so that each segment can be mapped to one of the team modes; hence the set of modes summarizes the overall team activity. We prove that the resulting optimization problem is \(\mathrm {NP}\)-hard, and we discuss its properties. We then present a number of different algorithms for solving the problem, including an approximation algorithm that is practical only for one mode, as well as heuristic methods based on iterative and greedy approaches. We benchmark the performance of our algorithms on real and synthetic datasets. Of all methods, the iterative algorithm provides the best combination of performance and running time. We demonstrate practical examples of the insights provided by our algorithms when mining real sports-activity data. In addition, we show the applicability of our algorithms on other types of data, such as social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://doi.org/10.5281/zenodo.290629

  2. https://doi.org/10.5281/zenodo.160509

  3. This hashtag refers to the first semi-final of the 2014 World Cup held in Brazil. Germany beat home-team Brazil by 7–1.

References

  • Aggarwal A, Klawe M, Moran S, Shor P, Wilber R (1987) Geometric applications of a matrix-searching algorithm. Algorithmica 2(1–4):195–208

    Article  MathSciNet  MATH  Google Scholar 

  • Alamar BC (2013) Sports analytics: a guide for coaches, managers, and other decision makers. Columbia University Press, New York

    Book  Google Scholar 

  • Appan P, Sundaram H, Tseng B (2006) Summarization and visualization of communication patterns in a large-scale social network. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 371–379

  • Araujo M, Papadimitriou S, Günnemann S, Faloutsos C, Basu P, Swami A, Papalexakis EE, Koutra D (2014) Com2: fast automatic discovery of temporal (comet) communities. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 271–283

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics, pp 1027–1035

  • Asur S, Parthasarathy S, Ucar D (2009) An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM Trans Knowl Discov Data 3(4):16:1–16:36

  • Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4(6):284. doi:10.1145/366573.366611

  • Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: European conference on machine learning and knowledge discovery in databases, pp 115–130

  • Chen KT, Jiang JW, Huang P, Chu HH, Lei CL, Chen WC (2009) Identifying mmorpg bots: a traffic analysis approach. EURASIP J Adv Signal Process 2009:3

    Google Scholar 

  • Crowder M, Dixon M, Ledford A, Robinson M (2002) Dynamic modelling and prediction of english football league matches for betting. J R Stat Soc D 51(2):157–168

    Article  MathSciNet  Google Scholar 

  • Denman H, Rea N, Kokaram A (2003) Content-based analysis for video from snooker broadcasts. Comput Vis Image Underst 92(23):176–195

    Article  MATH  Google Scholar 

  • Eagle N, Pentland A (2006) Reality mining: sensing complex social systems. Pers Ubiquit Comput 10(4):255–268

    Article  Google Scholar 

  • Eppstein D, Galil Z, Italiano GF (1998) Dynamic graph algorithms. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Gao X, Xiao B, Tao D, Li X (2010) A survey of graph edit distance. Pattern Anal Appl 13(1):113–129

    Article  MathSciNet  Google Scholar 

  • Gift P, Rodenberg RM (2014) Napoleon complex: height bias among national basketball association referees. J Sports Econ 15(5):541–558

    Article  Google Scholar 

  • Gionis A, Mannila H (2003) Finding recurrent sources in sequences. In: International conference on research in computational molecular biology, RECOMB, pp 123–130

  • Goldsberry K (2012) Courtvision: new visual and spatial analytics for the nba. In: MIT sloan sports analytics conference

  • Greene D, Doyle D, Cunningham P (2010) Tracking the evolution of communities in dynamic social networks. In: IEEE of international conference on advances in social network analysis and mining, pp 176–183

  • Gudmundsson J, Horton M (2016) Spatio-temporal analysis of team sports—a survey. arXiv preprint arXiv:1602.06994

  • Guha S, Koudas N, Shim K (2006) Approximation and streaming algorithms for histogram construction problems. ACM Trans Database Syst 31(1):396–438

    Article  Google Scholar 

  • Halvorsen P, Sægrov S, Mortensen A, Kristensen DK, Eichhorn A, Stenhaug M, Dahl S, Stensland HK, Gaddam VR, Griwodz C, et al (2013) Bagadus: an integrated system for arena sports analytics: a soccer case study. In: Proceedings of the ACM multimedia systems conference. ACM, pp 48–59

  • Harville D (1980) Predictions for national football league games via linear-model methodology. J Am Stat Assoc 75(371):516–524

    Article  Google Scholar 

  • Hayet JB, Mathes T, Czyz J, Piater J, Verly J, Macq B (2005) A modular multi-camera framework for team sports tracking. In: IEEE conference on advanced video and signal based surveillance, pp 493–498

  • Heinen T (1996) Latent class and discrete latent trait models: similarities and differences. Sage Publications, Inc, Thousand Oaks

    Google Scholar 

  • Henzinger M, King V (1999) Randomized fully dynamic graph algorithms with polylogarithmic time per operation. J ACM 46(4):502–516

    Article  MathSciNet  MATH  Google Scholar 

  • Himberg J, Korpiaho K, Mannila H, Tikanmäki J, Toivonen H (2001) Time series segmentation for context recognition in mobile devices. In: IEEE international conference on data mining, pp 203–210

  • Holm J, De Lichtenberg K, Thorup M (2001) Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. J ACM 48(4):723–760

    Article  MathSciNet  MATH  Google Scholar 

  • Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519(3):97–125

    Article  Google Scholar 

  • Hvattum LM, Arntzen H (2010) Using elo ratings for match result prediction in association football. Int J Forecast 26(3):460–470

    Article  Google Scholar 

  • Ide T, Kashima H (2004) Eigenspace-based anomaly detection in computer systems. In: ACM SIGKDD international conference on knowledge discovery and data mining

  • Kasiri-Bidhendi S, Fookes C, Morgan S, Martin DT, Sridharan S (2015) Combat sports analytics: boxing punch classification using overhead depthimagery. In: IEEE International Conference on image processing (ICIP), pp 4545–4549

  • Kleinberg J, Papadimitriou C, Raghavan P (1998) Segmentation problems. In: ACM symposium on theory of computing, pp 473–482

  • Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: Machine learning: ECML 2004. Springer, pp 217–226

  • Kostakis O (2014) Classy: fast clustering streams of call-graphs. Data Min Knowl Disc 28(5–6):1554–1585

    Article  MathSciNet  Google Scholar 

  • Kumar R, Calders T, Gionis A, Tatti N (2015) Maintaining sliding-window neighborhood profiles in interaction networks. In: European conference on machine learning and knowledge discovery in databases. Springer, pp 719–735

  • Lucey P, Bialkowski A, Carr P, Morgan S, Matthews I, Sheikh Y (2013a) Representing and discovering adversarial team behaviors using player roles. In: IEEE conference on computer vision and pattern recognition, pp 2706–2713

  • Lucey P, Oliver D, Carr P, Roth J, Matthews I (2013b) Assessing team strategy using spatiotemporal data. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 1366–1374

  • Maheswaran R, Chang YH, Henehan A, Danesis S (2012) Deconstructing the rebound with optical tracking data. In: MIT sloan sports analytics conference

  • Miller TW (2015) Sports analytics and data science: winning the game with methods and models. FT Press, Upper Saddle River

    Google Scholar 

  • Mongiovi M, Bogdanov P, Singh AK (2013) Mining evolving network processes. In: IEEE international conference on data mining, pp 537–546

  • Obradovic Z (2007) Panathinaikos offense. Fiba Assist Mag 26:33–36

    Google Scholar 

  • Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30

    Article  Google Scholar 

  • Pei SC, Chen F (2003) Semantic scenes detection and classification in sports videos. In: IPPR conference on computer vision, graphics and image processing (CVGIP), pp 210–217

  • Pers J, Bon M, Vuckovic G (2006) Cvbase 06 dataset

  • Perše M, Kristan M, Kovačič S, Vučkovič G, Perš J (2009) A trajectory-based analysis of coordinated team activity in a basketball game. Comput Vis Image Underst 113(5):612–621

    Article  Google Scholar 

  • Pingali GS, Jean Y, Carlbom I (1998) Real time tracking for enhanced tennis broadcasts. In: Proceedings IEEE computer society conference on computer vision and pattern recognition, pp 260–265

  • Rayana S, Akoglu L (2016) Less is more: building selective anomaly ensembles. ACM Trans Knowl Discov Data 10(4):42

    Article  Google Scholar 

  • Rodenberg RM, Feustel ED (2014) Forensic sports analytics: detecting and predicting match-fixing in tennis. J Predict Mark 8(1):77–95

  • Rozenshtein P, Tatti N, Gionis A (2014) Discovering dynamic communities in interaction networks. In: European conference on machine learning and knowledge discovery in databases, pp 678–693

  • Sakoe H, Chiba S (1971) A dynamic programming approach to continuous speech recognition. Int Congr Acoust 3:65–69

    Google Scholar 

  • Shah N, Koutra D, Zou T, Gallagher B, Faloutsos C (2015) Timecrunch: Interpretable dynamic graph summarization. In: ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1055–1064

  • Shatkay H, Zdonik SB (1996) Approximate queries and representations for large data sequences. In: IEEE international conference on data engineering, pp 536–545

  • Sricharan K, Das K (2014) Localizing anomalous changes in time-evolving graphs. In: ACM SIGMOD international conference on management of data, pp 1347–1358

  • Stensland HK, Gaddam VR, Tennøe M, Helgedagsrud E, Næss M, Alstad HK, Mortensen A, Langseth R, Ljødal S, Landsverk Ø et al (2014) Bagadus: An integrated real-time system for soccer analytics. ACM Trans Multimedia Comput Commun Appl 10(1s):14

    Article  Google Scholar 

  • Sun J, Faloutsos C, Papadimitriou S, Yu PS (2007) Graphscope: parameter-free mining of large time-evolving graphs. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 687–696

  • Thorup M (2000) Near-optimal fully-dynamic graph connectivity. In: ACM symposium on theory of computing, pp 343–350

  • Travassos B, Davids K, Araújo D, Esteves PT (2013) Performance analysis in team sports: advances from an ecological dynamics approach. Int J Perform Anal Sport 13(1):83–95

    Google Scholar 

  • Wei X, Sha L, Lucey P, Morgan S, Sridharan S (2013) Large-scale analysis of formations in soccer. In: International conference on digital image computing: techniques and applications, pp 1–8

  • Zhong D, Chang SF (2001) Structure analysis of sports video using domain models. In: IEEE international conference on multimedia and expo, pp 713–716

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Orestis Kostakis.

Additional information

The work was carried out while the author was at Aalto University, Espoo, Finland.

Appendix: Proof of NP-hardness

Appendix: Proof of NP-hardness

To prove the np-hardness we use the following problem.

Problem 4

(Satisfy) Assume that we are given q formulas over \(\ell \) variables \(\left\{ v_i\right\} \) of form \(\lnot z = x \wedge y\), where x, y, and z are boolean variables or their negations. Decide whether these clauses can be simultaneously satisfied with \(v_1\) being set to true.

Proposition 10

Satisfy is NP-complete.

Proof

We will prove the hardness by reduction from 3SAT. Assume an instance of 3SAT with n variables and m clauses.

For each ith clause with two literals \(x \vee y\), add \(\lnot c_i = \lnot x \wedge \lnot y\).

For each ith clause with three literals \(x \vee y \vee z\), add two formulas \(h_i = \lnot x \wedge \lnot y\) and \(\lnot c_i = h_i \wedge \lnot z\).

If the ith clause contains one literal x, then refer to x as \(c_i\).

Add \(m - 1\) variables \(v_1, \ldots , v_{m - 1}\), and formulas \(v_i = v_{i + 1} \wedge c_i\), for \(i = 1, \ldots , m - 2\), and \(v_{m - 1} = c_{m - 1} \wedge c_m\).

It follows that ith clause can be satisfied if and only if \(c_i\) can be set to true. All \(c_i\)s can be set to true if and only if \(v_1\) can be set to true. \(\square \)

Proposition 11

(k, 1)-segmentation is NP-hard.

Proof

We will prove the hardness by reduction from Satisfy. Assume that we are given an instance of Satisfy with q formulas and \(\ell \) variables.

We begin by specifying the vertices. The total number of vertices is \(1 + 3 + 2\ell + r\), where \(r = (20q + 12\ell + 2)(3 + 2\ell )\).

The first vertex is \(\alpha \), and every edge will be adjacent to \(\alpha \). The next three vertices are \(t_1\), \(t_2\), and \(t_3\). Our construction will make sure that \((\alpha , t_i) \in E(G)\).

The next \(2\ell \) vertices correspond to the variables and their negations, we will denote them by \(v_i\) and \(u_i\), for \(i = 1, \ldots , \ell \). We will denote by X the set of possible edges between \(\alpha \) and these vertices. Define \(X' = X {\setminus } \left\{ (\alpha , u_1), (\alpha , v_1)\right\} \).

Finally, the last r vertices are auxiliary vertices that will allow us to force segmentation borders. We will denote the set of possible edges between these vertices and \(\alpha \) by B.

Our interation network consists of 3 parts, which in turn consists of sections. All these sections and parts are combined consecutively.

The first part, say \(P_1\), consists of \(2\ell \) sections, each containing 5 time points. The first \(\ell \) sections are defined as

$$\begin{aligned} \begin{array}{rllllll} (\alpha , v_i): &{} 1 &{} 1 &{} &{} &{} \\ (\alpha , u_i): &{} &{} &{} &{} 1 &{} 1 \\ (\alpha , t_1): &{} 1 &{} 1 &{} &{} 1 &{} 1 &{}\\ (\alpha , t_2): &{} 1 &{} 1 &{} &{} 1 &{} 1 &{}\\ (\alpha , t_3): &{} &{} 1 &{} 1 &{} 1 &{} &{}\\ \text {for every } e \in B: &{} 1 &{} &{} 1 &{} &{} 1 &{}\\ \end{array} \end{aligned}$$

They last \(\ell \) sections are copies of the first \(\ell \) sections, except that they also contain the remaining edges from X at 1st, 3rd, and 5th time point.

The second part, say \(P_2\), consists of 2q sections, each containing 7 time points. Let \(c_i = (\lnot z = x \wedge y)\) be the ith formula. By using the same letters to represent the corresponding vertices, taking account negations, we define the ith section, where \(i = 1, \ldots , k\), as

$$\begin{aligned} \begin{array}{rllllllll} (\alpha , x): &{} 1 &{} 1 &{} &{} &{} 1 \\ (\alpha , y): &{} 1 &{} &{} &{} 1 &{} 1 \\ (\alpha , z): &{} &{} 1 &{} 1 &{} 1 &{} &{} &{} 1\\ (\alpha , t_1): &{} 1 &{} 1 &{} &{} 1 &{} 1 &{} 1 &{} 1\\ (\alpha , t_2), (\alpha , t_3): &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1 &{} 1\\ \text {for every } e \in B: &{} 1 &{} &{} 1 &{} &{} 1 &{} 1 &{} 1 \\ \end{array} \end{aligned}$$

The \((q + i)\)th section is a copy of ith segment, except that they also contain the remaining edges from X at 1st, 3rd, and 5th–7th time points.

The last part, say \(P_3\), consists of \(10q + 6\ell + 2\) sections, each consisting of 1 single time point. Each section contains B, \((\alpha , t_i)\), and \((\alpha , u_1)\). Moreover, every even section contains edges in \(X'\).

We set \(k = 20q + 12\ell + 2\). We claim that Satisfy is true if and only if the optimal segmentation has a score of

$$\begin{aligned} \begin{aligned} \sigma =&{\left| P_1\right| }/2 \times (3(2\ell - 2) + 2) + {\left| P_2\right| }/2 \times (5(2\ell - 3) + 12) + {\left| P_3\right| } / 2 \times (2\ell - 2)\quad . \end{aligned} \end{aligned}$$

We will prove this in several steps.

Step (i): Every \(e \in B\) is contained in every segment exactly once. First, note that this segmentation is possible since B occurs at k different time points, the optimal cost of any such segmentation is bounded by r / 2, the number of possible edges times half the number of segments. Note that each \(e \in B\) occurs at the exact same time point. Thus there is an optimal solution with every \(e \in B\) either present or absent from the core. Assume that there is a segment that disagrees with the core. Then the cost is at least r. Consequently, every segment must contain every \(e \in B\). Since B occurs at k different time points, each segment can contain only one instance of each \(e \in B\).

Step (ii): It follows immediately, that the borders of the sections are included in the borders of the optimal segmentation. Moreover, each section in \(P_1\) part is divided into 3 segments, each section in \(P_2\) is divided into 5 segments, each section in \(P_3\) corresponds to exactly 1 segment.

Step (iii): \((\alpha , t_i) \in E(G)\), \(u_1 \in E(G)\) and \(v_1 \notin E(G)\). This follows immediately from the fact that each section in \(P_3\) corresponds to one segment, and \({\left| P_3\right| } > k / 2\), that is, \(P_3\) contains the majority of the segments.

Step (iv): The cost of i th and \((i + 1)\) th section in \(P_1\) is at least \(3(2\ell - 2) + 2\). This bound is reached if and only if G contains either \(u_i\) or \(v_i\), but not both. First note, that the middle segment in both sectons contains the 3rd time point. This means that the remaining edges in X will occur exactly 3 times in 6 segments. Thus, they induce a cost of \(3(2\ell - 2)\). A brute-force enumeration now implies that the involved edges induce a cost of at least 1, and this is possible if and only if G contains either \(u_i\) or \(v_i\), but not both.

Step (v): The cost of i th and \((i + 1)\) th section in \(P_2\) is at least \(5(2\ell - 3) + 12\). This bound is reached if and only if \((\alpha , z) \notin E(G) \Leftrightarrow (\alpha , x) \in E(G) \) and \((\alpha , y) \in E(G)\). First note, that the 2nd segment in both sectons contains the 3rd time point, and the 4th and 5th segments consists of exactly one time point. This implies that the remaining edges in X will occur exactly 5 times in 10 segments. Thus, they induce a cost of \(5(2\ell - 3)\). A brute-force enumeration now implies that the involved edges induce a cost of at least 6, and this is possible if and only if \((\alpha , z) \notin E(G) \Leftrightarrow (\alpha , x) \in E(G) \) and \((\alpha , y) \in E(G)\).

Step (vi): The cost of an odd and even section in \(P_3\) is equal to \(2\ell - 2\). This follows from the fact that the edges in \(X'\) occur exactly once in these two sections.

Step (vii): Steps (iv)(vi) imply that \(\sigma \) is a lower bound for the optimal score. This bound is reached if and only if, the lower bounds of each section is reached. This can happen if and only if each sentence in Satisfy can be satisfied. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kostakis, O., Tatti, N. & Gionis, A. Discovering recurring activity in temporal networks. Data Min Knowl Disc 31, 1840–1871 (2017). https://doi.org/10.1007/s10618-017-0515-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-017-0515-0

Keywords

Navigation