In this section we describe the approaches followed for estimating each of the components described in Equations 1, 2, 3 and 4. In general, we use function approximation methods to learn models for these components from spatiotemporal data. Specifically, we want to approximate some function \(f^{*}\) that maps a set of features x, to an outcome y, such that \(y=f^{*}(x)\). To do this, we will find the mapping \(y=f(x;\theta )\) to learn the values of a set of parameters \(\theta\) that result in an approximation to \(f^{*}\).
Customized convolutional neural network architectures are used for estimating probability surfaces for the components involving passes, such as pass success probability, the expected possession value of passes, and the field-wide pass selection surface. Standard shallow neural networks are used to estimate ball drive probability, expected possession value from ball drives and shots, and the action selection probability components. This section describes the selection of features x, observed value y, and model parameters \(\theta\) for each component.
Estimating pass impact at every location on the field
One of the most significant challenges when modeling passes in soccer is that, in practice, passes can go anywhere on the field. Previous attempts on quantifying pass success probability and expected value from passes in both soccer and basketball assume that the passing options a given player has are limited to the number of teammates on the field, and centered at their location at the time of the pass (Power et al. 2017; Cervone et al. 2016; Hubáček et al. 2018). However, in order to accurately estimate the impact of passes in soccer (a key element for estimating the future pathways of a possession), we need to be able to make sense of the spatial and contextual information that influences the selection, accuracy, and potential risk and reward of passing to any other location on the field. We propose using fully convolutional neural network architectures designed to exploit spatiotemporal information at different scales. We extend it and adapt it to the three related passing action models we require to learn: pass success probability, pass selection probability and pass expected value. While these three problems necessitate from different design considerations, we structure the proposed architectures in three main conceptual blocks: a feature extraction block, a surface prediction block, and a loss computation block. The proposed models for these three problems also share the following common principles in its design: a layered structure of input data, the use of fully convolutional neural networks for extracting local and global features and learning a surface mapping from single-pixel correspondence. We first detail the common aspects of these architectures and then present the specific approach for each of the mentioned problems.
Layers of low-level and field-wide input data To successfully estimate a full prediction surface, we need to make sense of the information at every single pixel. Let the set of locations L, presented in Sect. 3.3, be a discrete matrix of locations on a soccer field of width w and height h, we can construct a layered representation of the game state \(Y(T_t)\), consisting of a set of slices of location-wise data of size \(w\times h\). By doing this, we define a series of layers derived from the data snapshot \(T_t\) that represent both spatial and contextual low-level information for each problem. This layered structure provides a flexible approach to include all kinds of information available or extractable from the spatiotemporal data, which is considered relevant for the specific problem being addressed.
Feature extractor block The feature extractor block is fundamentally composed of fully convolutional neural networks for all three cases, based on the SoccerMap architecture (Fernández and Bornn 2020). Using fully convolutional neural networks, we leverage the combination of layers at different resolutions, allowing us to capture relevant information at both local and global levels, producing location-wise predictions that are spatially aware. Following this approach, we can produce a full prediction surface directly instead of a single prediction on the event’s destination. The parameters to be learned will vary according to the input surfaces’ definition and the target outcome definition. However, the neural network architecture itself remains the same across all the modeled problems. This allows us to quickly adapt the architecture to specific problems while keeping the learning principles intact. A detailed description of the SoccerMap architecture is presented in “Appendix 4”.
Learning from single-pixel correspondance Usually, approaches that use fully convolutional neural networks have the ground-truth data for the full output surface. In more challenging cases, only a single classification label is available, and a weakly supervised learning approach is carried out to learn this mapping (Pathak et al. 2015). However, in soccer events, only a single pixel ground-truth information is available: for example, the destination location of a successful pass. This makes our problem highly challenging, given that there is only one single-location correspondence between input data and ground-truth. At the same time, we aim to estimate a full probability surface. Despite this extreme set-up, we show that we can successfully learn full probability surfaces for all the pass-related models. We do so by selecting a single pixel from the predicted output matrix, during training, according to the known destination location of observed passes, and back-propagating the loss at a single-pixel level.
In the following sections, we describe the design characteristics for the feature extraction, surface prediction, and loss computation blocks for the three pass-related problems: pass success probability, pass selection probability, and expected value from passes. By joining these models’ output, we will obtain a single action-value estimation (EPV) for passing actions, expressed by \(\mathbb {E}[G | A=\rho , T_t]\). The detailed list of features used for each model is described in “Appendix 1”.
Pass success probability
From any given game situation where a player controls the ball, we desire to estimate the success probability of a pass attempted towards any of the other potential destination locations, expressed by \(\mathbb {P}(A=\rho , D_t | T_t)\). Figure 4 presents the designed architecture for this problem. The input data at time t is conformed by 13 layers of spatiotemporal information obtained from the tracking data snapshot \(T_t\) consisting mainly of information regarding the location, velocity, distance, and angles between the both team’s players and the goal. The feature extraction block is composed strictly by the SoccerMap architecture, where representative features are learned. This block’s output consists of a \(104\times 68\times 1\) pass probability predictions, one for each possible destination location in the coarsened field representation. In the surface prediction block a sigmoid activation function \(\sigma\) is applied to each prediction input to produce a matrix of pass probability estimations in the [0,1] continuous range, where \(\sigma (x) = \frac{e^x}{e^x+1}\). Finally, at the loss computation block, we select the probability output at the known destination location of observed passes and compute the negative log loss, defined in Equation 5, between the predicted (\(\hat{y}\)) and observed pass outcome (y).
$$\begin{aligned} \mathcal {L}(\hat{y},y) = - (y \cdot \log (\hat{y}) + (1-y) \cdot \log (1-\hat{y})) \end{aligned}$$
(5)
Note that we are learning all the network parameters \(\theta\) needed to produce a full surface prediction by the back-propagation of the loss value between the predicted value at that location and the observed outcome of pass success at a single location. We show in Sect. 6.6 that this learning set is sufficient to obtain remarkable results.
Expected possession value from passes
Once we have a pass success probability model, we are halfway to obtaining an estimation for \(\mathbb {E}[G|A=\rho , D_t, T_t]\), as expressed in Equation 3. The remaining two components, \(\mathbb {E}[G|A=\rho , O_p=1, D_t, T_t]\) and \(\mathbb {E}[G|A=\rho , O_p=0, D_t,T_t]\), correspond to the expected value of successful and unsuccessful passes, respectively. We learn a model for each expression separately; however, we use an equivalent architecture for both cases. The main difference is that one model must be learned with successful passes and the other with missed passes exclusively to obtain full surface predictions for both cases.
The input data matrix consists of 16 different layers with equivalent location, velocity, distance, and angular information to those selected for the pass success probability model. Additionally, we append a series of layers corresponding to contextual features related to outplayed players’ concepts and dynamic pressure lines. Finally, we add a layer with the pass probability surface, considering that this can provide valuable information to estimate the expected value of passes. This surface is calculated by using a pre-trained version of a model for the architecture presented in Sect. 5.1.1.
The input data is fed to a SoccerMap feature extraction block to obtain a single prediction surface. In this case, we must observe that the expected value of G should reside within the \([-1,1]\) range, as described in Sect. 3.2. To do so, in the surface prediction block, we apply a sigmoid activation function to the SoccerMap predicted surface obtaining an output within [0, 1]. We then apply a linear transformation, so the final prediction surface consists of values in the \([-1,1]\) range. Notably, our modeling approach does not assume that a successful pass must necessarily produce a positive reward or that missed passes must produce a negative reward.
The loss computation block computes the mean squared error between the predicted values and the reward assigned to each pass, defined in Equation 6. The model design is independent of the reward choice for passes. In this work, we choose a long-term reward associated with the observed outcome of the possession, detailed in Sect. 6.2.
$$\begin{aligned} \text {MSE}(\hat{y},y) = \frac{1}{N} \sum _i^N(y_i-\hat{y}_i)^2 \end{aligned}$$
(6)
Pass selection probability
Until now, we have models for estimating both the probability and expected value surfaces for both successful and missed passes. In order to produce a single-valued estimation of the expected value of the possession given a pass is selected, we model the pass selection probability \(\mathbb {P}(A=\rho , D_t | T_t)\) as defined in Equation 1. The values of a pass selection probability surface must necessarily add up to 1, and will serve as a weighting matrix for obtaining the single estimate.
Both the input and feature extraction blocks of this architecture are equivalent to those designed for the pass success probability model (see Sect. 5.1.1). However, we use the softmax activation function presented in Equation 7 for the surface prediction block, instead of a sigmoid activation function. We then extract the predicted value at a given pass destination location and compute the log loss between that predicted value and 1, since only observed passes are used. With the different models presented in Sect. 5.1, we can now provide a single estimate of the expected value given a pass action is selected, \(\mathbb {E}[G|A=\rho , T_t]\).
$$\begin{aligned} softmax (v)_i = \frac{e^{v_i}}{\sum _{j=1}^{K}e^{v_i}} \text {for } i=0,\ldots , K \end{aligned}$$
(7)
Estimating ball drive probability
We will focus now on the components needed for estimating the expected value of ball drive actions. In this work’s scope, a ball drive refers to actions where a player keeps control of the ball. For this implementation, ball drives lasting more than 1 second are split into a set of individual ball drives of 1-second duration. While keeping the ball, the player might sustain the ball-possession or lose the ball (either because of bad control, an opponent interception, or by driving the ball out of the field, among others). The probability of keeping control of the ball with these conditions is modeled by the expression \(\mathbb {P}(O_{\delta }=1 | A=\delta , T_t)\).
We use a standard shallow neural network architecture to learn a model for this probability, consisting of two fully-connected layers, each one followed by a layer of ReLu activation functions, with a single-neuron output preceded by a sigmoid activation function. We provide a state representation for observed ball drive actions that are composed of a set of spatial and contextual features, detailed in “Appendix 1”. Among the spatial features, the level of pressure a player in possession of the ball receives from an opponent player is considered to be a critical piece of information to estimate whether the possession is maintained or lost. We model pressure through two additional features: the opponent’s team density at the player’s location and the overall team pitch control at that same location. Another factor that is considered to influence the ball drive probability is the player’s contextual-relative location at the moment of the action. We include two features to provide this contextual information: the closest opponent’s vertical pressure line and the closest possession team’s vertical pressure line to the player. These two variables are expected to serve as a proxy for the opponent’s pressing behavior and the player’s relative risk of losing the ball. By adding features related to the spatial pressure, we can get a better insight into how pressed that player is within that context and then have better information to decide the probability of keeping the ball. We train this model by optimizing the loss between the estimated probability and observed ball drive actions that are labeled as successful or missed, depending on whether the ball carrier’s team can keep the ball’s possession during after the ball drive is attempted.
Estimating ball drive expectation
Finally, once we have an estimate of the ball drive probability, we still need to obtain an estimate of the expected value of ball drives, in order to model the expression \(\mathbb {E}[G|A=\delta ,T_t]\), presented in Equation 4. While using a different architecture for feature extraction, we will model both \(\mathbb {E}[G|A=\delta ,O_\delta =1,T_t]\) and \(\mathbb {E}[A=\delta ,O_\delta =0,T_t]\), following an analogous approach of that used in Sect. 5.1.2.
Conceptually, by keeping the ball, players might choose to continue a progressive run or dribble to gain a better spatial advantage. However, they might also wait until a teammate moves and opens up a passing line of lower risk or higher quality. By learning a model for the expression \(\mathbb {E}[G| A=\delta , T_t]\) we aim to capture the impact on the expected possession value of these possible situations, all encapsulated within the ball drive event. We use the same input data set and feature extractor architecture used in Sect. 5.2, with the addition of the ball drive probability estimation for each example. Similarly to the loss surface prediction block of the expected value of passes (see Sect. 5.1.2), we apply a sigmoid activation function to obtain a prediction in the [0, 1] range, and then apply a linear transformation to produce a prediction value in the \([-1,1]\) range. The loss computation block computes the mean squared loss between the observed reward value assigned to the action and the model output.
Expected goals model
Once we have a model for the expected values of passes and ball drives, we only need to model the expected value of shots to obtain a full value state-value estimation for the action set A. We want to model the expectation of scoring a goal at time t given that a shot is attempted, defined as \(\mathbb {E}[G|A=\varsigma ]\). This expression is typically referred to as expected goals (xG) and is arguably one of the most popular metrics in soccer analytics (Eggels 2016). For estimating this expected goals model we include spatial and contextual features related derived from the 22 players’ and the ball’s locations, to account for the nuances of shooting situations.
Intuitively, we can identify several spatial factors that influence the likelihood of scoring from shots, such as the level of defensive pressure imposed on the ball carrier, the interceptability of the shot by close opponents, or the goalkeeper’s location. Specifically, we add the number of opponents that are closer than 3 meters to the ball-carrier to quantify the level of immediate pressure on the player. Additionally, we account for the interceptability of the shot (blockage count) by calculating the number of opponent players in the triangle formed by the ball-carrier location and the two posts. We include three additional features derived from the location of the goalkeeper. The goalkeeper’s location can be considered an important factor influencing the scoring probability, particularly since he has the considerable advantage of being the only player that can stop the ball with his hands. In addition to this spatial information, we add a contextual feature consisting of a boolean flag indicating whether the shot is taken with the foot or the head, the latter being considered more difficult. Additionally, we add a prior estimation of expected goal as an input feature to this spatial and contextual information, produced through the baseline expected goals model described in “Appendix 5”. The full set of features is detailed in “Appendix 1”.
Having this feature set, we use a standard neural network architecture with the same characteristics as the one used for estimating the ball drive probability, explained in Sect. 5.2, and we optimize the mean squared error between the predicted outcome and the observed reward for shot actions. The long-term reward chosen for this work is detailed in Sect. 6.2.
Action selection probability
Finally, to obtain a single-valued estimation of EPV we weigh the expected value of each possible action with the respective probability of taking that action in a given state, as expressed in Equation 1. Specifically, we estimate the action selection probability \(\mathbb {P}(A | T_t)\), where A is the discrete set of actions described in Sect. 3.2. We construct a feature set composed of both spatial and contextual features. Spatial features such as the ball location and the distance and angle to the goal provide information about the ball carrier’s relative location in a given time instance. Additionally, we add spatial information related to the possession, and team’s pitch control and the degree of spatial influence of the opponent team near the ball. On the other hand, the location of both teams’ dynamic lines relative to the ball location provides the contextual information to the state representation. We also include the baseline estimation of expected goals at that given time, which is expected to influence the action selection decision, especially regarding shot selection. The full set of features is described in “Appendix 1”. We use a shallow neural network architecture, analogous to those described in Sects. 5.2 and 5.3. This final layer of the feature extractor part of the network has size 3, to which a softmax activation function is applied to obtain the probabilities of each action. We model the observed outcome as a one-hot encoded vector of size 3, indicating the action type observed in the data, and optimize the categorical cross-entropy between this vector and the predicted probabilities, which is equivalent to the log loss.