Data-driven detection of counterpressing in professional football

Bauer, Pascal; Anzer, Gabriel

doi:10.1007/s10618-021-00763-7

Data-driven detection of counterpressing in professional football

A supervised machine learning task based on synchronized positional and event data with expert-based feature extraction

Open access
Published: 08 July 2021

Volume 35, pages 2009–2049, (2021)
Cite this article

Download PDF

You have full access to this open access article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Data-driven detection of counterpressing in professional football

Download PDF

24k Accesses
26 Citations
15 Altmetric
Explore all metrics

Abstract

Detecting counterpressing is an important task for any professional match-analyst in football (soccer), but is being done exclusively manually by observing video footage. The purpose of this paper is not only to automatically identify this strategy, but also to derive metrics that support coaches with the analysis of transition situations. Additionally, we want to infer objective influence factors for its success and assess the validity of peer-created rules of thumb established in by practitioners. Based on a combination of positional and event data we detect counterpressing situations as a supervised machine learning task. Together, with professional match-analysis experts we discussed and consolidated a consistent definition, extracted 134 features and manually labeled more than 20, 000 defensive transition situations from 97 professional football matches. The extreme gradient boosting model—with an area under the curve of \(87.4\%\) on the labeled test data—enabled us to judge how quickly teams can win the ball back with counterpressing strategies, how many shots they create or allow immediately afterwards and to determine what the most important success drivers are. We applied this automatic detection on all matches from six full seasons of the German Bundesliga and quantified the defensive and offensive consequences when applying counterpressing for each team. Automating the task saves analysts a tremendous amount of time, standardizes the otherwise subjective task, and allows to identify trends within larger data-sets. We present an effective way of how the detection and the lessons learned from this investigation are integrated effectively into common match-analysis processes.

Expected passes

Article Open access 04 January 2022

Towards Expected Counter - Using Comprehensible Features to Predict Counterattacks

Filtering active moments in basketball games using data from players tracking systems

Article 16 November 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Acquiring accurate and high frequency positional and event data is common in most of the world’s top professional football (soccer) leagues. Manually annotated event data provides information about the one player carrying the ball at the time of a game relevant action only, whereas so called positional data can capture highly accurate positions of all 22 players up to 25 times a second.

Every professional football team spends a substantial amount of time analyzing and monitoring strategies such as counterpressing.—a complex team strategy for transition situations—of their own and opposing teams. Navarro and Javier (2018) defines counterpressing as simple as “[..] pressure after losing the ball”. Related to this, Pep Guardiola made the ’five second rule’ for counterpressing famous.^{Footnote 1} Another coach to experience tremendous success in recent seasons is Liverpool FC manager Jürgen Klopp. He is generally accepted as the originator of the term ’Gegenpressing’, which is well-known in both its German version and English translation.^{Footnote 2} It is apparent to football experts that Klopp’s counterpressing concept is closely related to Guardiola’s strategy of regaining the ball within the first five seconds.

There are significant differences in team’s defensive and offensive tactical line-ups (Bialkowski et al. 2014, 2015; Andrienko 2019; Shaw and Mark 2019). The transition phase describes the period following a win or loss of possession in which the team transitions between its offensive and defensive tactical line-ups and vice versa. When a team is in possession for at least a certain amount of time, we can assume that generally its tactical formation is optimized for offensive play and, consequently, sub-optimal in terms of defending its own goal (Andrienko 2019; Shaw and Mark 2019). Therefore, the first seconds after losing the ball are critical from a defensive perspective. Several studies proved that transition phases are a substantial factor for a team’s overall performance: As early as in 1968, Reep and Benjamin (1968) demonstrated in the first known football analytics study, that \(30\%\) of all regained possessions lead to shots on goal and \(25\%\) of all goals came from regained possessions in the attacking quarter. Grant et al. (1999) confirmed these findings for the 1998 World Cup. Both outcomes align perfectly with Jürgen Klopp’s statement that regaining the ball immediately after loosing it, potentially through successful counterpressing, "...is the best playmaker".^{Footnote 3} Klopp hereby claims that counterpressing can also be seen as an offensive strategy. Recent studies show that regaining the ball in open play likelier leads to a goal than a save build-up from a team’s own half (Vogelbein et al. 2014; Hobbs et al. 2018). Based on tracking data from the English Premier League, Hobbs et al. (2018) detected possession regains close to the opponent’s goal—potential counterpressing situations—highlighting their relevance once more. Even though many coaches and clubs affected the development of this sophisticated strategy, neither an objective proof of its efficiency, nor an analysis on its usage in top leagues is presented in the literature.

Hughes and Ian (2015) point out that team sports performance analysis tends to be operationalized on the basis of notation systems, described as a replicable and consistent method of recording sport performance. Recent literature explained a framework, where coaches‘ decisions are supported by several performance analysis reports from games, teams and players (Travassos et al. 2013) and pointed out that team tactics in football refer to both a priori decisions made before the match, and also real-time adaptions during the game (Rein and Daniel 2016). In accordance to that it is described as a complex process resulting from a network of inter-dependent parameters (Kempe et al. 2014). These processes are conducted in a time-critical set-up, especially when it comes to the world’s top leagues and competitions where teams need to encounter different opponents several times a week. Although many clubs extended their match-analysis departments considerably within the past years, the limited amount of time and resources during matches forced teams to seek ways to automate processes and gain insights faster in order to obtain a competitive edge.

These recent developments—the availability of accurate performance data and the need for a quick detailed tactical analysis—signifies a huge potential for the application of sophisticated machine learning techniques to football data and requires an efficient collaboration of computer-science and domain experts (Herold et al. 2019; Goes et al. 2020; Rein and Daniel 2016). Many recent scientific investigations aimed to establish new key performance indicator (KPI)—metrics quantifying certain aspects of the game: pass evaluation metrics were examined (Steiner et al. 2019; Goes et al. 2019), metrics to quantify controlled space were defined (Kim 2004; Fernandez and Bornn 2018; Brefeld et al. 2019) and several studies evaluated shot metrics (Lucey et al. 2014; Rathke 2017; Fairchild et al. 2018; Anzer and Bauer 2021)^{Footnote 4} and goal scoring opportunities through possession values (Link et al. 2016; Spearman 2018; Fernandez and Bornn 2018; Decroos et al. 2020). Additionally, there are many approaches for measuring the defensive behavior of teams (Santos et al. 2018; Andrienko 2019; Goes et al. 2019), and even approaches aiming to quantify pressing (Bojinov and Luke 2016; Andrienko 2017; Robberechts 2019). Although pressing and counterpressing are closely related, they are two different phenomena. An interesting conference proceeding describes how specific counterpressing situations can be derived from detected general pressing scenes.^{Footnote 5} Several approaches also showed, that analyzing these KPI’s or even aggregating simple statistics (e.g. the pass completion rate) over one or several seasons provides a helpful indication to practitioners (Power et al. 2018; Pappalardo et al. 2019). The primary goal of all these approaches is to derive new insights by processing vast amounts of information. Decroos et al. (2018) presented a first approach to detect interesting match-phases based on event data. To the best of our knowledge no peer-reviewed study focused on automating parts of the performance analysts everyday life by detecting complex tactical patterns based on positional and event data. However, a noteworthy approach aiming to detect counterattacks was presented in an established football analytics conference.^{Footnote 6}

With this practical need for process optimization in mind, it is the pivotal issue of this study to detect counterpressing situations without human-support and provide several ad-hoc reports for match analysts in near real-time. The outcome is optimized to fulfill their practical requirements and fit seamlessly into their tool-ecosystem. Additionally, the automated detection allows us to analyze large amounts of data that would exceed manual processing capacities. Consequently, our approach enables us to perform impartial long-term analysis of the German Bundesliga’s latest seasons investigating the following research questions:

Can we differentiate between varying regaining strategies and determine reasons for a short defensive reaction time (definition in Sect. 2.1.2), i.e. to which extent is a fast ball regain actually caused by counterpressing (RQ1)?
Can we set objective benchmarks to quantify counterpressing strategies (amount and effectiveness) on a match- and season-level and point out their correlation with a team’s overall success (RQ2)?
Do the established rules of thumb agree with the data (i.e. counterpressing is more effective close to the sideline) (RQ3)?
To what extent do team’s counterpressing strategies differ in the German Bundesliga (RQ4)?

All together, answering these research questions helps us to define the baseline for a qualitative discussion with experts, and thus allows them to formulate requirements for the practical application (PA) set-up.

The remainder of this paper is structured as follows: Sect. 2 provides a detailed description of the used data, the underlying rules and definitions, the labelling process and the extracted features. The outcomes in Sect. 3 are split into three parts: First in Sect. 3.1, we describe a statistical evaluation of the detection models. Section 3.2 presents a subject-specific evaluation by interpreting our results on six seasons of German Bundesliga. Lastly, in Sect. 3.3, we demonstrate how this approach can be operationalized in the performance analysis process. This application is based on two matches of the German national teams.^{Footnote 7}\(^{,}\)^{Footnote 8} All parts of this study were developed in close cooperation with the professional match-analysts and coaches (see Acknowledgements).

2 Methods

2.1 Data and definitions

2.1.1 Data collection

The present study uses positional and event data collected in more than six seasons (4118 matches) of the German Bundesliga and \(2{\text {nd}}\) Bundesliga, as well as the above mentioned two matches of the German national teams. Positional data is captured by optical tracking systems^{Footnote 9} and event data consists of manual annotations based on a dedicated official match data catalogue,^{Footnote 10} defining around 30 events with more than 100 attributes. The event data can be seen as a log of the ball relevant actions (e.g. passes, shots, tacklings or fouls), however, it does not cover complex team-tactical behaviors such as counterpressing.

Since the two data sources are collected independently of each other, they need to be synchronized before they can be processed together. Even though several steps of quality management from independent institutions are conducted on the manually collected event data, the assigned timestamp of a given event can differ significantly to one in the positional data. The synchronization of positional and event data is conducted by dedicated rules (per event) that extracts the exact timestamp and the exact location on the pitch from tracking data given a manually tagged event. For example, when synchronizing a pass, the sudden increase in the distance between the passing player and the ball, captured by the optical tracking, can be used to align both location and timestamp of the pass. The positional data is collected at a frequency of 25 Hz and includes the longitudinal, latitudinal, and in case of the ball, also the altitudinal positions of the players, ball and referees related to the pitch markings.

The information about which team is currently in possession of the ball (hereafter referred to as ball possession) and whether the game is running or currently stopped (hereafter referred to as ball status) are crucial for our survey. Both values are collected live in the stadium for every frame of the match by a skilled operator focused exclusively on this task.

2.1.2 Definitions

Since there are conflicting definitions of ball possession in the literature (Kempe et al. 2014), we decided to adopt published definitions with expert feedback. The above mentioned operators, dedicated to acquire information about ball possession and status, are briefed to mark ball possession for one team, if and from that time point a player of that team touches the ball with ball control, until the ball is out of play, or an opponent player touches the ball with ball control. Ball control is defined in this context, as the ability to conduct a contrived action with the ball. Whenever a pass is played between two players of one team, the ball possession belongs to that team as long as no opposing player intercepted that pass or won the ball within an individual duel. According to the definition from Link and Hoernig (2017) we also compute ball possession on a player level (individual ball possession). In the case of an interception, the ball possession change is detected exactly at the time of the first ball touch of the intercepting player. We use the term defensive reaction time—the time it takes to regain ball possession after losing it—as defined in Vogelbein et al. (2014). All situations where either the ball is beyond the pitch markings or the play is stopped by the referee (e.g. because of a foul) are labeled as out-of-play. Hence, if the ball goes out of bounds there must typically be a change in ball possession. Situations in which the touch of the player carrying the ball outside the markings is not declared as a ball possession due to missing control (e.g. a deflected shot), or when the individual possession model disagrees with the team possession flag are excluded. For all further investigations only the effective playing time (also referred to as net playing time)—defined as all the situations while the game is running—are considered. Shots, for example, always represent the end of a ball possession phase per definition. Ball possession phases that end with the halftime-, or final-whistle or a referee ball are excluded from our analysis.

In addition to these general rules, we developed the following transition-related definitions in consultation with match-analysis experts: A defensive transition phase is defined as the time-window when a team loses ball possession, but is not yet into their ideal defensive formation. Within these defensive transition phases,

a team conducts counterpressing if at least one player exerts (spatio and/or temporal) pressure on the ball carrier, or on the opponents close to the ball.

Note that there exist many different definitions for pressing: StatsBomb^{Footnote 11} defines pressing as a defensive player being within a five-yard radius of the ball-carrying opponent.^{Footnote 12} Very similarly, a more granular and non-binary definition, aggregating the pressure of several defensive players, is presented by Andrienko (2017). Based on these pressing definitions, counterpressing could be defined as situations where pressing is exerted immediately after a ball possession change (Navarro and Javier 2018). Both of these rule-based definitions are used as a baseline model for our investigation.

However, according to the match-analysts involved in this project, being close to the player in ball possession is not the only way to exert pressure. Attacking or blocking the easiest pass options could, for instance, also be seen as applying pressure.

To quantify the success of counterpressing, we consider it as successful if the ball is regained within five seconds and shots and goals, scored or received, are accredited to the previous counterpressing phase if they occur within the following 20 seconds.

From hereon the game is split into ball possession phases which could start and end either with an in-play ball possession change or a stoppage such as a set-piece. Note that the set-up of the in-play ball possession change, such as the defensive transition, might not be the only influence factor on the defensive reaction time—it can also occur due to short, uncontrolled ball possessions or risky passes of the opponent. Any ball possession phases that either start with a set-piece or end with a stoppage in play will not be considered further. Fig. 1 shows a heatmap displaying the occurrences of transition situations related to the pitch. It indicates, that most turnovers happen in the opposing half, especially near the sidelines. Ball possession changes due to a ball going out of bounds are added to the area touching the sideline. Easily identified is the high proportion of turnovers in the opponent’s six-yard box. This is likely because both saved shots and shots missing the goal wide are counted as a change in possession as soon as the goalkeeper receives the ball.

2.2 Supervised machine learning set-up

2.2.1 Hand-crafted labeling of defensive transition situations

Since the rule-based approaches to detect counterpressing we investigated lead us to an insufficient accuracy (see Sect. 3.1), we conducted a manual tagging procedure with trained student-analysts. It was their task to label situations with a detectable counterpressing strategy. In total, out of 11, 108 relevant defensive turnovers, 3, 196 situations were labeled as counterpressing. The labeling was conducted for the first eleven Bundesliga-matchdays of the 2018/2019 season from the perspective of the home team. The percentage of counterpressings detected per transitions differs significantly per team. Borussia Mönchengladbach presented the highest value (\(40.07\%\)), whereas only \(21.80\%\) of Hannover 96’s transitions have been labeled as counterpressing. The aggregated outcome of the labeling process per team of the German Bundesliga is displayed in Table 7 in the Appendix A.

To quantify the inter-labeler reliability, 20 matches were labeled by three different students. To compute the pairwise accuracy for each defensive turnover, we checked if both students had identified countpressing in the following two seconds. This yielded a pairwise accuracy of \(82.01\%\), i.e. in \(82.01\%\) of defensive turnovers both students agreed on the nature of the actions following a turnover.

As additional information, the experts tried to detect the exact start and end-frame of the respective transition situation. The average duration of all transitions phases is 9.34 s, 9.89 s for counterpressing, whereas all non-counterpressing turnovers took in average 9.11 s.

2.2.2 Expert-based feature extraction

We defined a list of 134 features that aim to characterize the transition. The features describe the location of a ball possession change, several relevant factors describing both teams’ exact positioning at the time of turnover and their movements in the first two seconds immediately after the ball loss. A time-window longer than two seconds was problematic, because it would cut off too many situations where the ball possession changed within that time.

A teams’ decision to conduct counterpressing is heavily influenced by the situation of the ball possession itself. To take this into consideration, all features are also calculated at the moment of the ball possession change. According to football experts, turnovers without the chance to counterpress are often characterized by immediate clearances or aerial duels. Therefore, we included the ball position, the ball height, and the individual ball possession time (Link and Hoernig 2017)—describing the time a player of the ball possessing team was in direct control over the ball. The involved football experts suggested, that counterpressing is often characterized by achieving a local compactness close to the ball. We aimed to cover this with several metrics measuring the regaining team’s positioning around the ball. For instance, we use the team’s covered area, global and local stretch indices (Bourbousson and Carole Sève 2010; Santos et al. 2018) as features in our model. A team primarily aiming to defend their own goal after losing possession does this usually with high-speed towards their own goal, whereas counterpressing requires often only players close to the ball to attack their opponents with a high speed towards the ball carrier. This is addressed by calculating several speed-values and considering each team’s average position, the so-called team-center (Bourbousson and Carole Sève 2010; Andrienko 2017). In contrast to a more conservative transition strategy, counterpressing’s primary objective is not to place many players in a compact unit behind the ball quickly, but rather to defend more aggressively up the pitch. Therefore, we calculate both the number of players in front and behind the ball, as well as their respective compactness. Although the pressing definition from Andrienko (2017) was not sufficient as a stand-alone rule-based counterpressing detection criteria (see Sect. 3.1), it is incorporated in various features of our model.

All features were discussed, consolidated and steadily improved within workshops and based on several steps of evaluation of the detection. A detailed list and description of the features is presented in Table 1, a video describing some of the features can be accessed here.

Table 1 The extracted features that are used for counterpressing detection. Features used in both dimensions of pitch coordinates (horizontally and vertically) and for different time points after the initial ball possession change are listed only once

Full size table

2.3 Model training

2.3.1 Detection of counterpressing as a supervised machine learning task

We trained several classification algorithms based on the 11, 108 labeled defensive turnover situations from 97 matches fulfilling our inclusion criteria (see Fig. 1).

We split the labeled data-set (\(75\%\) training data, \(25\%\) test data) by taking randomly \(25\%\) of all transitions out of every match to avoid over-representing teams, scores, or results.

Table 2 Hyperparameter-selection of the XGBoost models

Full size table

We used the above defined features (section 2.2.2 or Table 1) and evaluated the best performing models on our set of test data.

Among different basic-models, we applied extreme gradient boosting (hereafter referred to as XGBoost), a scalable tree boosting system, introduced by Chen and Carlos (2016), which outperformed traditional machine learning algorithms in numerous applications (Li et al. 2019; Liu et al. 2020; Zhang et al. 2020). For our investigation, we want to point out three major advantages of XGBoost: (a) To make use of our wide set of features without taking the risk of overfitting, an additional regularization term is added to the loss function. Additionally, (b) XGboost is a scalable machine learning model, which can be extended seamlessly with more data or more features being available. Furthermore, (c) no normalization is required.

Before training our model, a set of hyperparameters (shown in Table 2) has to be defined. As presented in Wang (2019), we applied Bayesian tree-structured Parzen Estimator hyperparameter optimization approaches to obtain the highest possible accuracy and avoid overfitting. By using tree-structured Parzen Estimators (Bergstra 2011) as the surrogate model in Bayesian optimization (Dewnacker et al. 2016), we reduce the running time of hyperparameter tuning and achieve better scores on the testing set. Note that the hyperparameter nrounds was set to a maximum of 400 iterations.

To further guarantee the stability of our model and avoid overfitting, we applied five-fold cross-validation on the training data.

As described in Sect. 2.1.2, we also implemented two rule-based baseline models that serve as a benchmark for our detection. A naive approach defines counterpressing as follows: whenever one or more players are within a five-yard radius around the ball carrier during the first individual ball possession phase following a turnover, it is classified as a counterpress (hereafter referred to as naive rule-based approach).^{Footnote 13} The second approach (hereafter referred to as Andrienko-approach) defines counterpressing as all turnovers whenever the first player in ball possession receives pressure exceeding a certain threshold according the pressure-definition in Andrienko (2017), whereby the final threshold of 0.74 was obtained by maximizing the \(F_1\)-score on the training set.

2.3.2 Effectiveness for counterpressing and fast possession regains

In order to define some success metric of a transition phase, the low scoring nature of football causes us to examine the following actions more granularly, rather than just checking whether they are followed by a goal. For both cases—successful and unsuccessful ball recoveries through counterpressing—we extracted taken shots, expected goals^{Footnote 14} and actual goals following a transition phase. To investigate this issue, two definitions had to be made: Which ball recovery latency of a counterpressing strategy should be considered as successful, and for how long a defensive and an offensive action would be accredited to the previous ball recovery (strategy). As a starting point for a potential threshold for successful counterpressing, a first indicator is given by Pep Guardiola’s five second rule. We queried relevant video scenes with possession regains after 3, 4, 5, 6, 7 and 8 seconds and discussed them with a group of professional match-analysts. The same procedure was conducted to investigate the follow-up goal-scoring opportunities. Here scenes with shots 10, 15, 17, 20 and 25 seconds after the initial ball loss were discussed. Through this procedure, we finally agreed on the definitions described in Sect. 2.1.2.

3 Results

3.1 Statistical evaluation

3.1.1 Detection of counterpressing

With the above described supervised machine learning set-up we are able to detect counterpressing situations with sufficient accuracy for practical applications (see also Sect. 3.3). Table 3 shows a statistical evaluation of the different models, from which XGBoost performed the best. Per team and per match, we detect around 20 to 30 counterpressing situations, out of around 90 to 200 transition situations.

Table 3 Statistical evaluation of the counterpressing outcome

Full size table

With the highest overall area under the curve (AUC) the above presented optimization (Table 3, row 1) is best suited for the long term analysis of several seasons with the goal to identify trends and underpin practitioner rules (RQ1-RQ4, Sect. 3.2). The XGBoost model with a class balancer (Table 3, row 4) has a higher recall of \(80 \%\) with still an acceptable false positive rate. Thus, it can be applied for specific performance analysis of either the own match or several matches of the next opponent (PA, Sect. 3.3)—where match-analysts spend a lot of time analyzing video footage either way. The optimal hyperparameters used for both models can be found in Table 2. When examining the results of the two baseline approaches, they exhibit a very low overall accuracy. The naive rule-based approach (Table 3, row 5) classifies \(72.41 \%\) of all turnovers as counterpressing, which leads to a high recall but also a large number of false positives. For the Andrienko-approach, selecting the threshold by optimizing the \(F_1\)-score lead to a more realistic percentage of predicted counterpressing situations in the test set (\(25.65\%\)), but are, nevertheless, significantly outperformed by either machine learning approach.

Another advantage of the XGboost approach is that the individual influences of our rich feature set can be somewhat quantified and interpreted by analyzing the respective SHAP-values.^{Footnote 15} The naming was both coined by their originator Lord Shapley, who introduced them in the context of cooperative game theory (Roth and Thomson 1988), but also by Lundberg and Su (2017), who used the concept to interpret the features for machine learning models. In comparison to traditional feature importance models (e.g. gain or Saabas method), SHAP-values present a consistent and locally accurate method to identify the individualized feature contribution to machine learning models. This method has been effectively used in different applications (Antipov et al. 2020; Meng et al. 2020; Ibrahim et al. 2020; Anzer and Bauer 2021).

Figure 2 displays the most influential features according to the SHAP values in two different representations. In the left Fig., each dot represents the contribution of the feature to the model, whereas the color-coding describes the value of that feature. Both, the absolute individual ball possession time (IndividualBallPossession (abs)) and the speed of the regaining team two seconds after the change in ball possession (SpeedRegainingTeamPlayer (2s)), have a very strong and linear impact on the predictions. Besides the fact, that both features have the highest overall influence on the prediction (widest dispersion of the dots on the left part of Fig. 2, the interpretation of the SHAP-values can be expressed as follows: the higher the absolute amount of individual ball possession time within the first three seconds and the higher the speed of the regaining team two seconds after the turnover, the more likely a defensive turnover is classified as counterpressing. The number of opposing players behind the ball after one second (PlayersOpposingTeamBehindBall (1s)) influences the prediction in a different way: A high number of players behind the ball increases the chances for a classification of counterpressing, but the relation is clearly non-linear. To get a better idea of the influence, we will have a look at the right part of Fig. 2. The value per feature is now displayed by the x-axis, whereas the model influence is shown on the y-axis. If less than four players are behind the ball, this feature on average decreases the chance of a counterpressing classification. The number of four defenders—almost half of the team—seems to be a decisive threshold. If four or more players are behind the ball, this feature has a positive contribution to the prediction. This not only aligns with the expectation of the practitioners, but also led to a very valuable discussion among the professional analysts. A more complex relation is shown by the local stretch index of the five closest player to the ball of the opposing team two seconds after the ball possession change (OpposingTeamlocal5 stretch (2s)). After a steep increase starting at 600 cm, the influence of that feature reaches its maximum at roughly 1, 000 cm (see right plot in Fig. 2) but decreases afterwards. This indicates that a higher stretch index (lower compactness) of the opposing team after two seconds increases the chances for counterpressing.

Excluding features with little to no influence according to the SHAP-values, did not improve neither the \(F_1\)-score nor the AUC of our prediction on the test data-set.

3.1.2 Effects of counterpressing

Table 4 shows the outcome of counterpressing regarding goals and shots scored or conceded within 20 seconds. If one is successful, i.e. wins back the ball within five seconds the chance of scoring increases tremendously, but if unsuccessful one is far likelier to concede a goal.

Table 4 For all 4118 considered matches of the German Bundesliga and \(2{\text {nd}}\) Bundesliga, this table shows the probability of shots and goals per team following counterpressing situations

Full size table

It is no surprise, that the chances to either shoot or score are significantly higher when counterpressing was applied successfully, since it implies the crucial attacking advantage of having gained the possession of the ball.

While using goals scored versus conceded would theoretically be ideal to measure success, the low scoring nature often prevents us from doing so. Therefore, we use shots to compare teams and coaches. Nevertheless, according to the experts, looking at both shot- and goal- (or even expected goal-)balance is a very valuable key-performance-indicator for counterpressing.

3.2 Subject-specific evaluation of six seasons of German Bundesliga data

In the following section, we use our quantitative results as a baseline for a qualitative, subject-specific evaluation and interpretation with the involved experts.

3.2.1 Lessons learned about defensive transitions (RQ1)

A common procedure when analyzing a team’s transition strategy is looking at the easily acquirable defensive reaction time (Vogelbein et al. 2014). This, however, comes with the drawback that it is not able to distinguish between situations with intentional counterpressing behavior and noise. Note that in non-trivial defensive turnover situations typically a team can choose between falling back or conduct counterpressing. But for the purpose of this study, the distinction between fallback and other non-counterpressing situations was removed for the sake of simplicity. An additional analysis of the expert-based labeling (see Appendix A) showed, that around \(62.5\%\) of all ball losses fulfilling the inclusion criteria cannot be assigned to either defensive strategy (counterpressing or fallback). The sheer number of these situations—with a very short defensive reaction time (on average 7.83 s) and without any defensive tactical choices being detectable—significantly influences the defensive reaction time when applied to all transitions. Further analysis (see Appendix A) shows that specifically turnovers with very short individual ball possession times fall in this category. Their exclusion presents a crucial step for a better understanding of transition situations.

In general, counterpessing is not always advantageous and needs to be executed well. Although, the expectation of some practitioners (invigorated by Jürgen Klopp’s statement "counterpressing is the best playmaker") may be different, it is still intuitive that the team in possession of the ball has a higher chance to perform a successful offensive action (e.g. a shot or goal; see Table 4, row 1). However, this is an average over all teams independent of their skill level and as we will later see, there are some teams/coaches that were able to apply counterpressing so successfully that they even ended up with a positive shot balance by creating more shots after counterpressing than conceding. In order to properly assess the risk versus reward nature of counterpressing, one would ideally compare it to its strategic counterpart falling back. But even then, one would need to carefully address potentially confounding variables describing the original situations, since, as it seems from the feature importance discussion, the situation typically dictates the strategic response. This goes, however, beyond the scope of this study, but could be the ground for interesting future work. Additionally, since all non-counterpressing situations consist of myriad of different circumstances, they do not serve as reasonable baseline to effectiveness of counterpressing.

3.2.2 Define and statistically underpin objective Benchmarks (RQ2)

Based on the above explained definitions and trained prediction models, several quotients and ratios were discussed with the experts. Aggregated on a season level, we analyzed the correlation with a team’s final ranking. Our detection provides several different metrics with a significant correlation to success. These performance indicators can be calculated per match, per match phase (e.g. one halftime) or even per turnover, which allows practitioners to objectively compare their teams performance with pre-defined benchmarks. With a negative Pearson correlation, the ratio of successful counterpressing-situations to the total number of transitions predicts a team’s final ranking the best (\(r=-0.44\)). Another metric, correlating with the final ranking that was very valuable to the experts due to its direct interpretability, was shot and goal balance (\(r=-0.36\) shots, \(r=-0.42\) goals)—describing, whether more shots or goals are taken after successful counterpressing, than are conceded after failed attempts. However, these metrics should be used carefully since they are based on small sample size and could contain confounding effects with the overall offensive or defensive qualities of a team. Loosing the ball, increases the probability of an opponent conducting an offensive action. Thus we present an effective strategy to monitor the outcome of counterpressing strategies, such as several performance indicators that enables coaches to objectively benchmark a team’s defensive transition behavior.

3.2.3 Approve established rules of thumb (RQ3)

A widely spread rule of thumb is that counterpressing is ideal after ball losses close to the sideline or close to the corners in the opponent’s half. Fig. 3 presents two heatmaps that underpin this statement. Secondly, we want to examine, whether a numerical superiority of players close to the ball increases the chance of a successful counterpress as assumed by many experts. For that we examine the 109, 852 detected counterpressing situations satisfying the inclusion criteria. Whenever the team out of possession has a numerical superiority within a 10 m radius around the ball, at the time of the turnover, they regain ball possession within 5 seconds \(36.2\%\) of the time, compared to only \(30.2\%\), when the other team has more players in that area. This indicates that the rule of thumb has some truth to it, but is far from the only influencing factor, deciding whether a counterpress will be successful.

3.2.4 Compare Team’s and Coach’s Counterpressing in German Bundesliga (RQ4)

Further investigations highlight to which extent teams use completely different defensive transition strategies. We investigated the coaches that were expected to have a pronounced counterpressing-behavior by the experts. The respective ranking among all coaches finishing a full season can be found in the Appendix C.

Table 5 Comparison of counterpressing-related performance indicators in Bundesliga. The colored arrows display the rankings within the respective groups compared to the average (teams by ranking; all teams by seasons; teams and coaches). For the columns presenting the shot and goal balance we used colored dots for the teams and coaches, showing whether the total outcome per match is positive, neutral or negative

Full size table

Table 5 also gives a first indication of which further aspects could be considered. Teams ending up in the top 5 of a season perform above average in almost all defined metrics. No consistent tendency over the considered seasons is detectable: games do not get more intensive in terms of total in-play transitions per match, nor are there significant changes in teams average counterpressing behavior. FC Bayern München has the best goal balance after counterpressing, which might be heavily influenced by their offensive efficiency. Jürgen Klopp and Ralf Ragnick performed extraordinary well in terms of ending up with more created than received shots after attempting counterpressing. Given that Jürgen Klopp ended only seventh place in one of his two considered seasons, this should be seen as an outstanding performance. Over the course of a whole season only nine coaches achieved a non-negative shot balance within 20 s after their own counterpressing—the average final ranking of the respective teams was four. Note that teams playing at home tend to conduct counterpressing slightly more often than away teams. Considering only home-teams, our model classifies \(27.24\%\) of all included defensive transitions as counterpressing, which is roughly in line with the labeled training data, that was conducted only on home teams (\(28.77\%\)).

Figure 4 shows a shortlist of teams’ counterpressing outcomes. Note that, since on average more shots/goals are conceded than created when counterpressing, all four axis are scaled differently. For both sub-figures, teams on the upper left, including, for example, 1. FC Nürnberg^{Footnote 16}, Hannover 96^{Footnote 17} and SV Darmstadt 98^{Footnote 18} in the left figure, perform worse. Teams on the bottom right tend to generate more shots/goals and allow fewer while counterpressing. Teams with high values in the top right quadrant like Borussia Mönchengladbach^{Footnote 19}, FC Augsburg^{Footnote 20} or TSG Hoffenheim^{Footnote 21} seem to employ risky defensive transition strategy, by both creating and also receiving many shots after their own counterpressing.

As a general recap of the Bundesliga analysis, we would like to point out that teams use significantly different transition strategies (RQ4). The experts’ expectations of which coaches use counterpressing more often and/or more efficiently were underpinned by the results.

3.3 Proof of concept

The central objective of this study is to automate the detection of counterpressing situations. This helps match-analysts in their daily processes by saving them time, but also by providing objective and comparable benchmarks. First, we describe the general set-up, whereas a prototypical application for two exemplary matches of German national teams is conducted.^{Footnote 22}\(^{,}\)^{Footnote 23} Based on the results described above, we are now able to provide match-analysts with two files fully automatically in virtually real-time: First, they receive a list of all detected counterpressing situations. To integrate this efficiently into their ecosystem, the files are produced in different file-formats, which can be imported into their video-analysis tool of choice (e.g. Hudl Sportscode,^{Footnote 24} Stats Edge Viewer^{Footnote 25}). Such tools basically help to handle tags or labels in combination with the video footage. Figure 5 shows how this eliminates the usual process of the match-analysts labeling the videos manually in an exemplary tool (here Hudl’s Sportscode). Usually, match-analysts use these tools to tag important situations live during the match but also in detail post-match for opponent analysis. Once a match or parts of it are tagged, the tool allows the analyst to output the tags either as a video-playlist or as an xml-file, containing the category and the time-frame of each tag. Depending on the coaches needs, the outcome is either presented as a video-playlist or a quantitative report giving an aggregated overview—which also is typically produced manually. An automatically generated counterpressing-playlist for the U21 match based on our prediction can be viewed here.

Second, we automatically provide coaches and analysts a counterpressing match-report with visualization after the respective match or entire season reports. For the U21 match an excerpt of the automated match report is presented in Fig. 6. Only two shots occurred within 20 seconds after either team counterpressed. However, these two shots after unsuccessful counterpressing attempts by the German team lead to two goals, which were decisive for the total match-outcome (Germany vs. Belgium, final score 2–3). Whereas situations leading to goals are analyzed on a highly detailed level by coaches and analysts either way, they arrived at the same conclusion. Nonetheless, this report helps coaching staffs to evaluate whether the German team had a general bias in their defensive transition behavior or whether the two goals happened as a consequence of extraordinary opposing actions or defensive errors from the German team. In this case, bad counterpressing behavior had a significant stake in the origin of both goals. Another outcome was that many successful counterpressing situations (\(1.23 \%\) above Bundesliga average) did not end up in a single shot.

The practical implementation of this study was prototyped by the example of two recent matches of the German national teams. After each match, both the Sportscode-xml file and the automated match-report sheets were produced and shared with the experts for their post-match-analysis process. Additionally, to validate the report, the matches were manually analyzed and provided a ground truth to compare our results with. The overall results are shown in Table 6.

Table 6 Statistical expert-evaluation of the counterpressing outcome. For the defensive transition column, the number in brackets displays the number of scenes excluded by our criteria (see Fig. 1)

Full size table

For the first game 153 relevant defensive transition situations of the German national team were queried by the above defined rules. Out of these, 25 scenes were detected as counterpressing from which 6 were manually excluded from the final counterpressing playlist. For 164 defensive transitions from the second match 33 situations were detected as counterpressing and analysts ruled out 5 manually. The manually excluded scenes were discussed with all experts and it turned out that different definitions and interpretations lead to different labels. In this case study, ten of the eleven manually excluded situations consisted of only one player exerting pressure. Although this fulfills our definition, some of the experts would classify situations with only one player defending actively towards the ball as fallback (definition in Appendix A). All scenes that were additionally labeled by the respective match-analysts (in total five) contained a clearance closely after the initial ball loss and where thus not clearly related to the counterpressing strategy.

Both the Sportscode-xml and the automated match report turned out to be valuable for the coaching staff. Due to the interpretability of the inaccuracies, the experts trust the outcome for further applications. They deemed the results to be sufficient in terms of a practical usage and noted that it saved vast amounts of resources in the pre- and post-match-analysis. The automated match-report (see Fig. 6) allows us to provide an objective comparison, with a flexible benchmark (here Bundesliga average) and therefore provide a new way to approach complex tactical strategies. The shot- and goal-balance are very intuitive and present a direct monitoring of the efficiency of conducted strategies.

4 Discussion

This paper shows that complex tactical strategies, such as counterpressing, can be detected automatically based on synchronized positional and event data. Comparing team’s counterpressing strategies objectively and on longer periods of time creates insights that could not have been achieved with traditional methods.

The interdisciplinary cooperation turned out to be a very beneficial factor for this study. In our opinion, such a set-up of competencies is necessary to obtain relevant results. Machine learning techniques are required to detect complex strategies from spatio-temporal data, but also tactical football expertise are inevitable to determine definitions, extract features and evaluate and interpret the resulting outcomes. A key lesson we learned through this study is that both definitions of complex strategies and their reading vary between football experts—this became apparent during an intensive process of expert-supported evaluations. One of our most meaningful key-performance-indicator is the shot-balance after counterpressing. Here, shots are used as a proxy for a successful attack. This is a common procedure in football analytics, however, the approach could be extended by using expected goals (e.g. Anzer and Bauer (2021)) or expected possession values (e.g. Spearman (2018)).

Since there does not exist a comparable approach for detecting counterpressing in the literature, we implemented two naive baseline approaches to benchmark our model against (see Table 3). While the approach based on Andrienko (2017) was originally not designed to quantify counterpressing but rather pure pressure, we build this rule based approach on Fernandez and Bornn (2018), who defined counterpressing as immediate pressure after losing the ball. Hence, it may not be the ideal approach, but due to complete the lack of alternatives in the literature, we use it as a benchmark model. Even though our model outperforms different rule-based baseline models (see Table 3) and the prediction accuracy is sufficient for practical application (see section 3.3), the basic limitation to achieve further accuracy is the inter-labeler reliability of \(82.01\%\). After discussing the definitions, no further steps of consolidation between the labelers were conducted—but we would highly recommend such a step including the strict monitoring of the inter-labeler reliability for similar investigations. However, data labeling is a time-consuming process which cannot be conducted for each occurring philosophy and definition. Furthermore, methodologies that reduce labeling efforts, such as weak supervision, should be implemented on top of general detections as the one presented here to adjust definitions to the specific needs and to improve both the accuracy and the degree of individualization (Ratner et al. 2016, 2017). With an even larger amount of labeled data, one could consider using continuous features, or even the raw positional data of all players instead of features at discrete time points. The application of labeling-support methods could lead to more individualized and accordant labels and thus to a better prediction.

With an even more accurate model using one of the above described approaches, an improved and team-individual success prediction model for counterpressing could support the reflection of teams’ decision making processes significantly. Also, the adaption of the counterpressing detection itself to team-specific definitions, provide a huge potential for further investigations.

Vogelbein et al. (2014) evaluated 306 matches of the 2010/2011 Bundesliga season and showed the time it takes to regain the ball also depends of the score at the time. They pointed out that teams with a lead tend to regain the ball slower than the ones that are trailing, and that teams finishing their season in the top third of the table regain ball possession significantly faster than the other teams—especially in drawn and loosing match states. We found that the defensive reaction time, which serves as a baseline for our success definition, typically includes many noisy situations, where no clear strategy is observable. Further lessons learned regarding defensive transitions are described in section 3.2.1 and extended in Appendix A. The high influence of individual ball possession times on the predictions (see Fig. 2) can be attributed to uncontrolled situations without the possibility for either defensive strategy (counterpressing, fallback). This also shows a limitation of our work, that some of the model’s most important features focus on the situation itself and fewer on the strategy conducted thereafter. A possible explanation for this is that besides filtering out noisy situations, the model found that most often the situation dictates the defensive response.

Nevertheless, a tendency of the opponent to play counterattacks, or especially risky passes could lead to many fast ball recoveries independent from the defensive transition strategy. This issue could be considered by either combining this approach with an equivalent offensive transition strategy detection as shown for example in Hobbs et al. (2018) or by including more features of the opposing team or even the raw data of all 22 players and the ball. Not only the question in which situations counterpressing induces a high chance for a possession regain, but furthermore—given a situation where counterpressing is conducted—how likely is it to take/receive a shot when conducting that strategy are of high interest for practitioners. Regaining the ball fast might not be the only objective of counterpressing. Consequently, future investigations should also consider quantifying alternative success definitions, e.g. slowing down the opposing attack, forcing back-passes etc.

Another missing piece which should be investigated further is an accurate selection of fallback situations. Comparing their risk-reward structure to counterpressing situations, could lead to crucial insights by evaluating a teams’ decision to counterpress versus falling back objectively. Since different teams may have their own club-specific definitions, our experimental set-up could be applied to arbitrary counterpressing-definitions or even other tactical patterns, as long as they gather sufficient labeled data. An interesting follow-up study, could investigate how many labeled matches would be necessary to achieve a sufficient accuracy depending on the definition. In our case we found 100 labeled matches to be sufficient, but also stress that a high inter-labeler reliability is necessary.

Note that counterpressing is only one example of a complex tactical pattern, that is of interest to match-analysts, but not covered in typical event level data. The needs of match-analysis departments combined with the growing availability and accuracy of positional and event data present a huge potential for task automating approaches.

5 Conclusions

Based on expert-evaluated definitions and hand-crafted labels, we are able to detect counterpressing strategies automatically with a sufficient accuracy in a supervised machine learning set-up. By producing both an understandable match-report and tagging-files suitable for conventional video-analysis software, the integration of the process into a match-analyst’s daily business saves a significant amount of time. The outcome helps to analyze the own team’s performance and provides helpful information about the next opponent’s defensive transition behavior (PA).

We can differentiate between intended counterpressing strategies and the many uncontrolled transition situations with short defensive reaction times. This provides not only a better understanding of transitions but also several more granular performance indicators describing defensive transitions (RQ1). The respective performance indicators, consolidated by statistical influences and expert opinions, derived interpretable and intuitive metrics (RQ2), such as the goal- or shot balance presenting an effective efficiency quantification for the counterpressing strategy—that were not used before but seem to have a huge potential according to the experts. Two of the proven rules of thumb are that counterpressing is more likely to succeed closer to the sidelines and a numerical superiority close to the ball increases the chance of winning it back (RQ3). Through analyzing different facets over several seasons we are also able to quantify trends over a large period of time: teams within the German Bundesliga follow appreciably different transition strategies (RQ4). Furthermore, successful teams—measured against their final ranking—tend to use the counterpressing strategy more efficiently, giving credence to the notion of declaring it as an offensive strategy (RQ2).

Notes

“[..] after losing the ball, the team has five seconds to retrieve the ball, or, if unsuccessful, tactically foul their opponent and fallback”, Pep Guardiola; https://www.theblizzard.co.uk/article/peps-four-golden-rules, accessed 06/20/2020.
https://www.sueddeutsche.de/sport/premier-league-bei-klopps-liverpoolern-klemmt-das-gaspedal-1.2695408-2, accessed 06/20/2020.
https://tactical-times.com/the-history-and-evolution-of-jurgen-klopp/, accessed 06/20/2020.
Often referred to as expected goals (xG) values.
Will Gürpinar-Morgen (2018). "How StatsBomb data helps measure counterpressing", Statsbomb Innovation in Football Conference 2018;https://statsbomb.com/2018/05/how-statsbomb-data-helps-measure-counter-pressing/, accessed 11/11/2020.
Karun Singh (2020). "Learning to watch football: self-supervised representations for tracking data" In OptaPro Analytics Forum, London; https://www.youtube.com/watch?v=H1iho17lnoI, accessed 11/11/2020.
Germany against Northern Ireland; 19th of November 2019, Commerzbankarena Frankfurt.
Germany U21 against Belgium U21; 17th of November 2019, Schwarzbaldstadion Freiburg.
https://chyronhego.com/products/sports-tracking/tracab-optical-tracking/, accessed 06/20/2020.
https://www.sportec-solutions.de/en/index.html, accessed 06/20/2020.
Statsbomb is a football event data provider based in the UK, https://statsbomb.com/, accessed 12/17/2020.
StatsBomb event data, including the pressing tag, are can be accessed for many professional leagues https://statsbomb.com/data/.
This approach has been suggested at the Statsbomb Innovation in Football Conference by Will Gürpinar-Morgen (https://statsbomb.com/2018/05/how-statsbomb-data-helps-measure-counter-pressing/), accessed 11/11/2020.
The “Expected Goal” (xG) value of a shot denotes the a priori probability of a shot being converted to a goal. Hence its value ranges from [0, 1]. The probability is estimated using both tracking and event data and applying a machine learning model, that was trained on more than 100, 000 shots. Details regarding the xG-model used can be found in Anzer and Bauer (2021).
The abbreviation SHAP stands for SHapley Additive exPlanation.
Highest in the left figure.
Green circle with the inscription 96.
Blue circle around a white lily.
Black and white hatched diamond logo roughly in the center of both plots.
Fifth highest in the left figure.
Second from the right in the right figure.
Germany against Northern Ireland; 19th of November 2019, Commerzbankarena Frankfurt.
Germany U21 against Belgium U21; 17th of November 2019, Schwarzbaldstadion Freiburg.
https://www.hudl.com/products/sportscode, accessed 06/20/2020.
https://www.statsperform.com/team-performance/football/stats-edge/, accessed 06/20/2020.

References

Andrienko G et al (2017) Visual analysis of pressure in football. Data Mining Knowl Discov 31(6):1793–1839. https://doi.org/10.1007/s10618-017-0513-2
Article MathSciNet Google Scholar
Andrienko G et al (2019) Constructing Spaces and Times for Tactical Analysis in Football. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/tvcg.2019.2952129
Article Google Scholar
Antipov EA, Pokryshevskaya EB (2020) Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapley values. J Revenue Pricing Manag 19(5):355–364. https://doi.org/10.1057/s41272-020-00236-4
Article Google Scholar
Anzer G, Bauer P (2021) “A Goal Scoring Probability Model based on Synchronized Positional and Event Data”. Frontiers in Sports and Active Learning (in print) Using Artificial Intelligence to Enhance Sport Performance, pp. 1-18. https://doi.org/10.3389/fspor.2021.624475.
Bergstra J et al (2011) “Algorithms for hyper-parameter optimization”. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, pp. 1–9
Bialkowski A et al (2014) “Large-Scale Analysis of Soccer Matches Using Spatiotemporal Tracking Data”. In: Proceedings - IEEE International Conference on Data Mining, ICDM 2015-January, pp. 725–730. issn:15504786. https://doi.org/10.1109/ICDM.2014.133
Bialkowski A et al (2015) “Identifying team style in soccer using formations learned from spatiotemporal tracking data”. In: IEEE International Conference on Data Mining Workshops, ICDMW 2015.January, pp. 9-14. issn: 23759259. https://doi.org/10.1109/ICDMW.2014.167.29 https://gdpr-info.eu/, accessed 07/20/20.Data-Driven Detection of Counterpressing in Professional Football 15
Bojinov I, Bornn L (2016) “The Pressing Game: Optimal Defensive Disruption in Soccer”. In: MIT Sloan Sports Analytics Conference, pp. 1–8
Bourbousson Jérôme, Carole Sève Tim McGarry (2010) “Space-time coordination dynamics in basketball: Part 2. the interaction between the two teams”. In: Journal of Sports Sciences 28.3, pp. 349–358. issn: 02640414. https://doi.org/10.1080/02640410903503640
Brefeld U, Lasek J, Mair S (2019) Probabilistic movement models and zones of control. Mach Learn 108(1):127–147. https://doi.org/10.1007/s10994-018-5725-1
Article MathSciNet MATH Google Scholar
Chen T, Guestrin C (2016) “XGBoost: A scalable tree boosting system”. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 13-17-Augu, pp. 785–794. https://doi.org/10.1145/2939672.2939785.
Decroos T, Van Haaren J, Davis J (2018) “Automatic discovery of tactics in spatio-temporal soccer match data”. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. isbn: 9781450355520. https://doi.org/10.1145/3219819.3219832. url: https://people.cs.kuleuven.be/~jesse.davis/decroos-kdd18.pdf
Decroos T et al (2020) “VAEP: An Objective Approach to Valuing On-the-Ball Actions in Soccer (Extended Abstract)”. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), pp. 4696–4700. issn: 10450823. https://doi.org/10.24963/ijcai.2020/648.
Dewnacker Ian, Michael McCourt, Scott Clark (2016) “Bayesian Optimization for Machine Learning. A Practical Guidebook”. arXiv:1612.04858
Fairchild A, Pelechrinis K, Kokkodis M (2018) “Spatial analysis of shots in MLS: A model for expected goals and fractal dimensionality”. In: Journal of Sports Analytics 4.3, pp. 165–174. issn: 2215020X. https://doi.org/10.3233/jsa-170207.
Navarro Férnandez, J (2018) Analysis of Styles of Play in Soccer and Their Effectiveness. isbn: 9788413060576
Fernandez J, Bornn L (2018) “Wide Open Spaces : A statistical technique for measuring space creation in professional soccer”. In: MIT Sloan Sports Analytics Conference, pp. 1–19
Goes F, Kempe M, Koen L (2019). “Predicting match outcome in professional Dutch football using tactical performance metrics computed from position tracking data”. In: MathSport International Conference June, pp. 4-5
Goes F et al (2020) Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapley values. Euro J Sport Sci. https://doi.org/10.1080/17461391.2020.1747552
Grant AG et al (1999) Analysis of the goals scored in the 1998 World Cup. J Sports Sci 17(10):826–827
Google Scholar
Herold M et al (2019) Machine learning in men’s professional football: current applications and future directions for improving attacking play. Int J Sports Sci Coaching. https://doi.org/10.1177/1747954119879350
Article Google Scholar
Hobbs J et al. (2018). “Quantifying the Value of Transitions in Soccer via Spatiotemporal Trajectory Clustering”. In: MIT Sloan Sports Analytics Conferencece, pp. 1–11
Hughes M, Ian M Franks (2015) Essentials of performance analysis in sport. Vol. 53. 04, pp. 53-1831. https://doi.org/10.5860/choice.193440. url:https://books.google.de/books?hl=de&lr=&id=KorCDwAAQBAJ&oi =fnd&pg=PT14&dq=the+essentials+of+performance+analysis&ots=ZhJd6413Fq&sig=FMnUcL21bpzACnWTzNhxESEgUY4#v=onepage&q=theessentialsofperformanceanalysis&f=false
Ibrahim L et al (2020) Explainable prediction of acute myocardial infarction using machine learning and shapley values. IEEE Access. https://doi.org/10.1109/access.2020.3040166
Article Google Scholar
Kempe M et al (2014) Possession vs. direct play: evaluating tactical behavior in elite soccer. Int J Sports Sci 4(6A):35–41. https://doi.org/10.5923/s.sports.201401.05
Article Google Scholar
Kim S (2004) Voronoi Analysis of a Soccer Game. Nonlinear Anal Model Control 9(3):233–240. https://doi.org/10.15388/na.2004.9.3.15154
Article MATH Google Scholar
Li TR et al (2019). Sentiment-based prediction of alternative cryptocurrency price fluctuations using gradient boosting tree model. https://doi.org/10.3389/fphy.2019.00098
Link D, Lang S, Seidenschwarz P (2016) Real time quantification of dangerousity in football using spatiotemporal tracking data. PLoS ONE. https://doi.org/10.1371/journal.pone.0168768
Article Google Scholar
Link D, Hoernig M (2017) Individual ball possession in soccer. PLoS ONE 12(7):1–15. https://doi.org/10.1371/journal.pone.0179953
Liu K, Chen W, Lin H (2020) XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites. Mole Genet Genom 295(1):13–21. https://doi.org/10.1007/s00438-019-01600-9
Article Google Scholar
Patrick L et al (2014) “Quality vs Quantity”: Improved Shot Prediction in Soccer using Strategic Features from Spatiotemporal Data”. In: MIT Sloan Sports Analytics Conference, pp. 1–9. url: http://www.sloansportsconference.com/?p=15790
Lundberg SM, SI Lee (2017) “Consistent feature attribution for tree ensembles”. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1–9
Yuan M et al (2020) What Makes an Online Review More Helpful: An Interpretation Framework Using XGBoost and SHAP Values. J Theor Appl Electron Commerce Res 16(3):466–490. https://doi.org/10.3390/jtaer16030029
Article Google Scholar
Pappalardo L et al (2019) A public data set of spatio-temporal match events in soccer competitions. Scientific Data 6(1):236. https://doi.org/10.1038/s41597-019-0247-7
Article Google Scholar
Paul P et al (2018) “Mythbusting Set-Pieces in Soccer”. In: MIT Sloan Sports Analytics Conference, pp. 1-12
Rathke A (2017) “An examination of expected goals and shot efficiency in soccer”. In: Journal of Human Sport and Exercise 12.Proc2. issn: 1988-5202. https://doi.org/10.14198/jhse.2017.12.proc2.05. url: http://www.redalyc.org/articulo.oa?id=301052437005
Ratner Alexander J et al (2017) “Learning to compose domain-specific transformations for data augmentation”. In: Advances in Neural Information Processing Systems Nips, pp. 3237–3247. issn: 10495258
Ratner A et al (2016) “Data programming: Creating large training sets, quickly”. In: Advances in Neural Information Processing Systems Nips, pp. 3574–3582. issn: 10495258
Reep C, Benjamin B (1968) Skill and Chance in Association Football. J Royal Stat Soc Series A (General). https://doi.org/10.2307/2343726
Article Google Scholar
Robert R, Memmert D (2016) “Big data and tactical analysis in elite soccer: future challenges and opportunities for sports science”. In: SpringerPlus 5.1. issn: 21931801. https://doi.org/10.1186/s40064-016-3108-2.
Robberechts P (2019) “Valuing the Art of Pressing”. In: StatsBomb Innovation in Football Conference 2019, p. 11. url: http://statsbomb.com/wp-content/uploads/2019/10/Pieter-Robberechts-Valuing-the-Art-of-Pressing.pdf
Roth Alvin E, Thomson W (1988) The Shapley Value: Essays in Honor of Lloyd S. Shapley. isbn: 052136177X. https://doi.org/10.2307/2554979.
Santos A. Benito et al (2018) “Data-driven visual performance analysis in soccer: An exploratory prototype”. In: Frontiers in Psychology 9. issn: 16641078. https://doi.org/10.3389/fpsyg.2018.02416.
Shaw L, Glickman M (2019) “Dynamic analysis of team strategy in professional football”. In: Barça sports analytics summit. Retrieved from https://static.capabiliaserver.com/frontend/clients/barca/wp_prod/wp-content/uploads/2020/01/56ce723e-barca-conferencepaper-laurie-shaw.pdf
Spearman W (2018) “Beyond Expected Goals”. In: MIT Sloan Sports Analytics Conference, pp. 1–17
Steiner S et al (2019) Outplaying opponents-a differential perspective on passes using position data. German J Exerc Sport Res. https://doi.org/10.1007/s12662-019-00579-0
Article Google Scholar
Travassos B et al (2013) Performance analysis in team sports: advances from an ecological dynamics approach. Int J Perform Anal Sport 13(1):83–95. https://doi.org/10.1080/24748668.2013.11868633
Article Google Scholar
Vogelbein M, Nopp S, Hökelmann A (2014) “Defensive transition in soccer - are prompt possession regains a measure of success? A quantitative analysis of German Fußball-Bundesliga 2010/2011”. In: Journal of Sports Sciences 32.11, pp. 1076-1083. issn: 1466447X. https://doi.org/10.1080/02640414.2013.879671.url: http://dx.doi.org/10.1080/02640414.2013.879671
Wang Y (2019) “A Xgboost Risk Model Via Feature Selection and Bayesian Hyper -Parameter Optimization”. arXiv:1901.08433
Zhang W et al (2020)“Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization”. Geoscience Frontiers 12(1): 469–477. https://doi.org/10.1016/j.gsf.2020.03.007

Download references

Acknowledgements

This work would not have been possible without the perspective of professional match-analysts from world class teams who helped us to define relevant features and spend much time evaluating (intermediate) results. We would cordially like to thank Dr. Stephan Nopp and Christofer Clemens (head match-analysts of the German mens National team), Jannis Scheibe (head match-analyst of the German U21 mens national team) as well as Sebastian Geißler (former match-analyst of Borussia Mönchengladbach). Additionally, the authors would like to thank Dr. Hendrik Weber and Deutsche Fußball Liga (DFL) / Sportec Solutions AG for providing the positional and event data.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Sport Psychology and Research Methods, Institute of Sports Science, University of Tübingen, Tübingen, Germany
Pascal Bauer & Gabriel Anzer
DFB-Akademie, Deutscher Fußball-Bund e.V. (DFB), Frankfurt, Germany
Pascal Bauer
Sportec Solutions AG, subsidiary of the Deutsche Fußball Liga (DFL), Munich, Germany
Gabriel Anzer

Authors

Pascal Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Anzer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascal Bauer.

Ethics declarations

Ethical Approval

By informing all participating players, all tracking is compliant to the general data protection regulation (GDPR) https://gdpr-info.eu/, accessed 07/20/20. An ethics approval for wider research program using the respective data is authorized by the ethics committee of the Faculty of Economics and Social Sciences at the University of Tübingen. The data are property of the DFL e.V. / DFB e.V. and cannot be shared public. However, interested researchers can request samples of data under non-disclosure agreement constraints at the respective institutions. With the description of the respective tracking vendors and systems, peers working in the football industry can reproduce the results by using any kind of professional football data.

Additional information

Responsible editor: Albrecht Zimmermann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 21678 KB)

Supplementary material 2 (mp4 180618 KB)

Supplementary material 3 (mov 124481 KB)

Appendices

Appendix

Appendix A gives additional information regarding the labeling conducted for this study including further tactical explanation of defensive transitions. Appendix B, C, and D show the outcome of the defined counterpressing performance indicator per team, per coach, and per season on the full Bundesliga data-set.

For all columns in Appendix B, C and D, T stands for turnovers, M for matches, CP for counterpressing, S for Shots and G for Goals. Whereas \(+\) indicates successful or positive offensive actions and − the other way around, \(+/-\) points out the offensive/defensive balance as described above. The first table in Appendix B compares all teams playing in Bundesliga between the 2013/2014 and 2019/2020 (until matchday 26) seasons. The teams are ordered by their average final ranking, the number of matches considered are shown in brackets. In a second table in Appendix C all coaches playing in Bundesliga between the 2013/2014 and 2018/2019 seasons are presented, whereas a third table shows all teams per season ordered first by their final ranking and secondly by the column CP+/T. Only full seasons with 34 matches one and the same coach where considered. The succession is made based on the highest correlating feature with teams final ranking, number of successful counterpressings per transition. Appendix D compares teams on a season-level and is sorted by the respective final ranking which is shown in brackets.

Appendix A

As discussed in Sect. 3.2, the actual labeling was simplified for our analysis. During the expert labeling process (see Sect. 2.2.1), not only counterpressing, but also fallback was labeled by the experts. This strategy is said to be the alternative to counterpressing and was defined as a defensive transition phase, where all players’ intention is to either react inactively, or move backwards to their defensive line-up without exerting pressure on the ball. The terms forward- and backward-defending are often used in this context interchangeably to counterpressing and fallback. Figure 7 shows two exemplary situations for both strategies and lists some typical observations for each strategy.

Turnovers in which neither of the two strategies can be identified are labeled as undefined or uncontrolled transition situations. As pointed out in this study, this usually occurs due to short or very uncontrolled ball possession phases (e.g. headers after a corner situation). These situations typically exhibit a very low (relative) individual ball possession time in the three seconds after the turnover. Therefore, it has a high impact on the resulting XGBoost model (see Fig. 2). In total, 20, 928 defensive transition situations from the first eleven matchdays of the 2018/2019 Bundesliga season were labeled based on the definitions formulated above (and in Sect. 2.1.2). The task was to label situations with a detectable strategy with either counterpressing or fallback and not to label any defensive transitions where no strategy was noticeable. Through this procedure the students with a background in football tactics excluded on average \(62.50\%\) of all turnover situations by implicitly labeling them as undefined. In total, out of 11, 108 relevant defensive turnovers (after the inclusion criteria), 3, 196 situations were labeled as counterpressing, 970 as fallback, and 6, 942 were explicitly dropped as uncontrolled. Table 7 shows the outcome of the full labeling.

Table 7 Outcome of the manual expert labeling. The first number per entry shows the number of scenes that fulfill the criteria defined in Sect. 2.1.2 and are used for the model building. The total number of tagged scenes is displayed in brackets

Full size table

Further evaluation of the labeling outcome showed that more than \(95 \%\) of all as fallback detected situations start with a goalkeeper catching the ball, what can also be queried solely rule-based on the event data. Due to this fact and because it was our focus to investigate counterpressing situations we decided to exclude the distinction between fallbacks and undefined ball possession changes for all further investigations. Note that fallback situations last on average 18.30 s, whereas undefined situations were the shortest with 7.83 s on average, which again highlights the tremendous influence on the defensive reaction time (counterpressing: 9.89 s; all turnovers average: 9.34 s).

Appendix B

Table 8 Counterpressing per team for Bundesliga seasons 2013/2014 to 2019/2020

Full size table

Appendix C

Table 9 Counterpressing per coach for Bundesliga seasons 2013/2014 to 2019/2020. Only full seasons with in total 34 matches per coach are taken into consideration

Full size table

Appendix D

Table 10 Counterpressing of Bundesliga teams on a season level (1/2). The number in brackets shows the respective position in the final ranking

Full size table

Table 11 Counterpressing of Bundesliga teams on a season level (2/2). The number in brackets shows the respective position in the final ranking

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bauer, P., Anzer, G. Data-driven detection of counterpressing in professional football. Data Min Knowl Disc 35, 2009–2049 (2021). https://doi.org/10.1007/s10618-021-00763-7

Download citation

Received: 17 August 2020
Accepted: 04 May 2021
Published: 08 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10618-021-00763-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data-driven detection of counterpressing in professional football

Abstract

Similar content being viewed by others

Expected passes

Towards Expected Counter - Using Comprehensible Features to Predict Counterattacks

Filtering active moments in basketball games using data from players tracking systems

Explore related subjects

1 Introduction

2 Methods

2.1 Data and definitions

2.1.1 Data collection

2.1.2 Definitions

2.2 Supervised machine learning set-up

2.2.1 Hand-crafted labeling of defensive transition situations

2.2.2 Expert-based feature extraction

2.3 Model training

2.3.1 Detection of counterpressing as a supervised machine learning task

2.3.2 Effectiveness for counterpressing and fast possession regains

3 Results

3.1 Statistical evaluation

3.1.1 Detection of counterpressing

3.1.2 Effects of counterpressing

3.2 Subject-specific evaluation of six seasons of German Bundesliga data

3.2.1 Lessons learned about defensive transitions (RQ1)

3.2.2 Define and statistically underpin objective Benchmarks (RQ2)

3.2.3 Approve established rules of thumb (RQ3)

3.2.4 Compare Team’s and Coach’s Counterpressing in German Bundesliga (RQ4)

3.3 Proof of concept

4 Discussion

5 Conclusions

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Additional information

Publisher's Note

Supplementary Information

Appendices

Appendix

Appendix A

Appendix B

Appendix C

Appendix D

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation