A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory

Complex interactions between two opposing agents frequently occur in domains of machine learning, game theory, and other application domains. Quantitatively analyzing the strategies involved can provide an objective basis for decision-making. One such critical scenario is shot-taking in football, where decisions, such as whether the attacker should shoot or pass the ball and whether the defender should attempt to block the shot, play a crucial role in the outcome of the game. However, there are currently no effective data-driven and/or theory-based approaches to analyzing such situations. To address this issue, we proposed a novel framework to analyze such scenarios based on game theory, where we estimate the expected payoff with machine learning (ML) models, and additional features for ML models were extracted with a theory-based shot block model. Conventionally, successes or failures (1 or 0) are used as payoffs, while a success shot (goal) is extremely rare in football. Therefore, we proposed the Expected Probability of Shot On Target (xSOT) metric to evaluate players' actions even if the shot results in no goal; this allows for effective differentiation and comparison between different shots and even enables counterfactual shot situation analysis. In our experiments, we have validated the framework by comparing it with baseline and ablated models. Furthermore, we have observed a high correlation between the xSOT and existing metrics. This alignment of information suggests that xSOT provides valuable insights. Lastly, as an illustration, we studied optimal strategies in the World Cup 2022 and analyzed a shot situation in EURO 2020.


Introduction
Understanding the interaction between agents, involving the dynamic exchange, communication, and coordination among them, is a fundamental issue in our social activities.It positively affects decision-making and teamwork, providing valuable insights into both the interactions and the agents involved; this holds great significance in various fields, including artificial intelligence (Tuyls et al., 2021), robotics (S.Liu et al., 2021), game theory (Palacios-Huerta, 2003), and social sciences (Axelrod & Hamilton, 1981).To study such topics, an investigation of the interaction between agents, which occurs between two entities or individuals under a specific context, is required.
In team sports, the interaction between two agents refers to the strategies, movements, and decisions made by two opposing players or teams.It includes factors such as positioning, communication, and cooperation between the players to outwit, outmaneuver, or counter each other's actions.The interaction between two agents in sports greatly influences the flow of the game, the outcome of specific plays, and even the final outcome.
Football has been one of the most influential team sports (Li, 2020;Yeung, Sit, & Fujii, 2023;Zhang et al., 2022), where the outcome of the game is greatly influenced by the critical event of taking a shot.However, despite its significance, the existing literature lacks effective, data-driven, and theory-based methods to comprehensively understand and analyze the interaction strategy between the shooter and the defender.
To address these issues, we propose a novel approach for gaining deeper insights into the interaction strategy between the shooter and the closest defender, as well as evaluating the shooter's decisions in each specific situation.The method employs game theory, which has been conventionally adopted for interaction strategy analysis, to determine the best interaction strategy for both opposing players.Nevertheless, since a goal is rare in football, it would be hard to determine the values of players' actions (payoff for each strategy under the game theory).Therefore, we employ Machine Learning models to estimate the values of players' actions.In addition, we proposed a novel theory-based model shot block model to extract more informative features for the machine learning model.Finally, Figure 1 depicts the concept of the proposed approach.
For a defender, blocking shots from the shooter might seem to be intuitive.However, there might be more effective strategies, for example, not blocking the shot.Recently, a professional football coach pointed out that, not attempting to block shots from long-range might be a smart choice 12 .
There are multiple benefits to not blocking long shots: First, tempting the offensive team to shoot from lower-value locations (i.e., locations that are unlikely to score Fig. 1 Refined concept of the proposed approach.The assessment of strategies in this study utilizes game theory as its foundation.Machine learning models are employed to estimate the payoffs and value associated with players' actions.Additionally, a theory-based model is utilized to extract further informative features, enhancing the analysis process. from) rather than seeking a better opportunity to attack the goal; Second, allowing the goalkeeper a clearer line of sight to control and predict the football; Last, when a block is made, it is likely to end up in a second-ball situation and in areas that cannot be predicted.However, as shown in this study, this might not be the optimal strategy that achieves Nash equilibrium (Nash, 1951).
To summarize, this research aims to analyze two-agent interactions in football shot-taking.The contributions of this research are as follows: 1) A novel framework and metrics that could analyze attacker and defender strategy, identify the optimal decision and evaluate player action value; 2) Proposed an effective approach to integrate the machine learning model, theory-based model, and game theory to analyze opposing agents' interaction under complex situations, typically in sports; 3) With openly accessible data, we verified our proposed framework and metrics by comparing them with baseline models, ablated models, and existing metrics.Moreover, examples of strategy analysis with World Cup 2022 and in-depth shot-taking situation analysis with EURO 2020 were included.

Related work
In this section, the existing literature on analyzing strategy and decision will be addressed.Subsequently, the evaluation of the actions and decisions of the players will be discussed.

Strategy and Decision Analysis in Football
In the domain of reinforcement learning, simulated football environments have been extensively utilized for studying football-playing strategies.They can be broadly categorized into two types: humans and robots.Firstly, environments developed based on real-world football.These include the Gameplay Football simulator3 , the older version of DeepMind MuJoCo Multi-Agent Soccer Environment (S.Liu et al., 2019), and Google Research Football (Kurach et al., 2020).Secondly, environments specifically designed for developing football-playing strategies for robots and humanoids.Such as the Robo Cup Soccer Simulator (Kitano, Asada, Kuniyoshi, Noda, & Osawa, 1997;Kitano et al., 1998), and the DeepMind MuJoCo Multi-Agent Soccer Environment (Haarnoja et al., 2023).Nonetheless, the strategies developed within these simulated environments have not been verified in real-world football scenarios.
Several efforts have been made to bridge the gap between simulated and realworld football environments.For instance, comparing strategy in simulated and realworld football environments via social network and correlation analysis (Scott, Fujii, & Onishi, 2021), applying the strategy developed in a simulated robot environment to real-world robot zero-shot (Haarnoja et al., 2023), and utilizing real-world data to develop the strategy in the simulated environment (Fujii et al., 2023).Nevertheless, the interaction strategy of both opposing agents involved has not been the focal point.
Conversely, with real-world data, reinforcement learning techniques, such as Markov Decision Processes (MDPs), have been utilized to identify actions that maximize rewards during a possession period (Rahimian, Van Haaren, Abzhanova, & Toka, 2022;Rahimian, Van Haaren, & Toka, 2023).The rewards are based on expected goals (xG) (Eggels, van Elk, & Pechenizkiy, 2016;Macdonald, 2012).Advancements have been made by expanding the action space to include shooting and movement options, as well as considering different pitch locations, enabling a more detailed optimal action analysis (Van Roy, Robberechts, Yang, De Raedt, & Davis, 2021;Van Roy, Yang, De Raedt, & Davis, 2021).However, reinforcement learning focuses on learning agents maximizing rewards (mainly goals), which makes it usually difficult to analyze the decision-making precisely.On the other hand, game theory emphasizes modeling and considering the strategies of both opposing agents involved.Hence, game theory is applied in this study.
In the domain of game theory, penalty kicks have been the primary focus of interaction strategies.Both goalkeepers' and shooters' optimized strategies have been analyzed using statistical methods and game theory (Palacios-Huerta, 2003).Building upon previous work, the inclusion of a clustering method to differentiate between player roles has allowed for a more in-depth and player role-driven analysis of strategies with game theory (Tuyls et al., 2021).
However, it is imperative to note that penalty kicks are rare events in a match and are independent of other outfield players or previous game states.Meanwhile, shots are significantly more frequent, hold equal importance, and involve more complex decision-making processes.Consequently, analyzing shots provides deeper insights into the game dynamics.

Player Action Evaluation
Goals have conventionally been employed, whether as rewards for reinforcement learning or for evaluating player and team performance.However, a notable drawback of using goals is their rarity, resulting in a scenario where the value or reward associated with them is often zero.This scarcity poses challenges for reinforcement learning algorithms because it becomes difficult to learn from and generalize such infrequent events.Similarly, when evaluating players, the limited occurrence of goals may hinder the accurate assessment of players' contributions.To address this limitation, researchers have employed machine learning and theory-based approaches.The approaches aim to model the expected probability of success of specific actions, using them as the value of player action either directly or indirectly.
Beyond evaluating players with goals, some metrics evaluate the expected probability of assist Expected Assist (xA)4 , the expected probability that the possession will lead to an attack, Possession Utilization score (poss-util) (Simpson, Beal, Locke, & Norman, 2022) and the advanced one (Yeung, Sit, & Fujii, 2023).
In the domain of theory-based models, important elements are often decomposed based on domain knowledge.Each element is then modeled using theories of statistics and physics.One such model is the Expected Threat (xT)5 , which quantifies the opportunities created by a player.xT breaks down the threat into probabilities of movement, shot, goal, and transition of zones, estimating these probabilities using historical statistics.
The Dangerousity (DA) metric (Link, Lang, & Seidenschwarz, 2016) estimates the probability of a player scoring a goal while in possession of the ball.It considers factors such as zone, control, pressure, and density (chance), and models each of these elements using theories of statistics and physics.Based on the DA, the Off-Ball Scoring Opportunity (OBSO) (Spearman, 2018;Spearman, Basye, Dick, Hotovy, & Pop, 2017) models the probability of an off-ball player scoring.Furthermore, researchers have integrated both machine learning and theory-based approaches.The C-OBSO (Teranishi, Tsutsui, Takeda, & Fujii, 2023) proposed a modified score model to consider the defenders' locations.
Nonetheless, most metrics have focused on the success or failure of actions, such as a shot, pass, or cross.However, the outcome of each action can have multiple outcomes; for example, the outcome of a shot can be categorized as shot on target, shot off target, or shot blocked.
Therefore, in this study, we not only considered various factors that influence the outcome of a shot, but we also decomposed the shot outcome and utilized machine learning models to predict each outcome of a shot.This approach can provide us with a deeper understanding of the game and enable more complex analyses.
Furthermore, we utilized an improved theory-based shot block model to estimate the probability of a shot being blocked for the shot block outcome, considering both the shooter and defender features.Subsequently, this shot block probability was incorporated as a feature in the machine learning model for the shot block.Our findings indicated that this approach outperformed directly fitting defender features into the machine learning model.Further details are mentioned in Section 3.3.

Methods
This section explains how the interaction between the shooter and the closest defender can be formulated as a game, along with the modeling of the relative payoffs.The framework commences with a feature set derived from event and freeze frame data.Subsequently, a combination of shot-blocking theory-based and deep learning (DNN) methods is employed to estimate the value of players' actions, specifically the probability of action outcomes.Finally, the determined value of a player's action is utilized to conduct a comprehensive analysis of their decision-making process and optimize interaction strategies using game theory.Figure 2 depicts the details of the proposed framework.

Define Interaction as a Static Game with Game Theory
The initial step in considering the interaction between the shooter and closest defender as a static game is to define the strategy profile S i for agent i and the corresponding payoff.The strategy profiles for the shooter and closest defender are defined as follows: S shooter ∈ {Shoot, Pass}, S def ender ∈ {Blocking, Not Blocking}. (1) The shooter of the attacking team has two options: either to shoot at their current location or to pass the ball to other players in the attacking team, allowing them to shoot at their respective locations.On the other hand, the closest defender also has two choices: attempting to block the shooter's shot or applying Liverpool's strategy, which involves not blocking the shot and potentially gaining certain benefits (as mentioned in Section 1).Furthermore, the payoffs for each combination of strategies depend on the current state of the football match game.The ultimate goal of every player is to win the match.Traditionally, the probability of scoring goals has been used as the payoff or reward.However, scoring goals is a rare event that involves randomness, and expecting players to score on every shot they take is unrealistic.Therefore, we focus on the minimum requirement of taking a shot, which is shot on target.We summarize the outcome event space of taking a shot as follows: Shot Outcome ∈ {Shot On Target, Shot Off Target, Shot Block}. (2) Fig. 2 Flow chart of the proposed framework.In terms of game theory payoffs, if the closest defender chooses not to block, the xSOT and xOSOT are calculated without incorporating the closest defender features.For xOSOT, the xSOTa is calculated using the same method, but instead of the shooter, it was replaced with the other attacker a.The left value indicates the payoffs for the shooter and the right value indicates the payoffs for the closest defender.
The detailed explanation can be found in Section A.
For the shooter, we define the payoff for shooting as the Expected Probability of Shot On Target (xSOT), representing the likelihood of the shot being on target.Conversely, the payoff for passing is defined as the Expected Probability of Off-Ball Player Shot On Target (xOSOT), indicating the probability of a successful shot from another player on the attacking team.As for the closest defender, their payoff is the negative of the shooter's payoff.When the closest defender chooses not to block, the xSOT and xOSOT are calculated without considering the closest defender.The payoffs for the shooter and defender are summarized in Table 1.
Moreover, finding the optimal interaction strategy for both the shooter and closest defender is equivalent to identifying the Nash equilibrium.The Nash equilibrium is defined as follows (Nash, 1951;Palacios-Huerta, 2003;Tadelis, 2013;Tuyls et al., 2021): Let s * = (s * i , s * −i ), s i ∈ S i be a strategy profile with a strategy for each agent, where s −i denote the strategy for agents other than agent i and i ∈ {attacker, defender}.Let u i (s i , s * −i ) be the payoff for agent i.The strategy profile s * is a Nash equilibrium if and only if, Lastly, the following assumptions are made for the game: • Relational decision maker: Each agent will make rational decisions by choosing the best strategy available to them (Tadelis, 2013).• Complete information: All agents possess complete knowledge of the game, and this knowledge is common among all participants (Tadelis, 2013).• Static one-stage game: The nature of the game, whether static or dynamic, is discussed in Section B. However, for the current analysis, we assume a static one-stage game due to the unavailability of players' velocity and other detailed data required to model and analyze their future movements.

Calulate xSOT with Machine Learning Models
When modeling the xSOT, we consider all possible outcomes of a shot, including shot on target (S on ), shot off target (S off ), and shot block (S block ) (details explanation in Section A).Since the set {S on , S off , S block } is taken as the sample space of shot outcomes, we can model the xSOT using the law of total probability.Consequently, the xSOT can be represented by the following equations: where the P (S off ) and P (S block ) are estimated with a Deep Neural Network (DNN) (also known as MLP: multilayer perceptron) for classification respectively, and trained with cross-entropy loss (CEL).The hyperparameters for the DNN and the optimized values are listed in Section C.Moreover, x, y are the input features vector and target features for the DNN model, respectively.Both x off and x block consist of the following basic shooter features, where the first three features are the event data and adhere to the definition from StatsBomb6 : • player role: The role of the player, for instance, center forward, center back, goalkeeper, etc. StatBomb has named this feature as position.
• location x: Football pitch coordinate x of the shooter.Represent the length dimension of the football pitch ranging from 0 to 120.• location y: Football pitch coordinate y of the shooter.Represent the width dimension of the football pitch ranging from 0 to 80. • Dist2Goal: Distance from the shooter to the middle of the goal line.Calculated with Equation 11. • Ang2Goal: Absolute angle from the shooter to the middle of the goal line.Calculated with Equation 11.
For x block , in addition to the previously mentioned features, we incorporate the location and position data of the off-ball players using StatsBomb freeze frame 360 data 7 ; this allows us to create the following additional features: • Theory-based shot block feature: Shot block probability estimation from a theorybased shot block model (Explained in Section 3.3) that utilizes the StatsBomb freeze frame 360 data 7 .The freeze frame 360 data includes the position, location x,y of other players on the pitch.However, since the data was collected from a video frame, data for any players that were not in the frame were not included.
The target variables y off and y block will take the value 1 when the outcome is shot off target and shot block, respectively.For all other outcomes, the target variables will have a value of 0.
We assess the performance of the DN N block and DN N off models by comparing them with baseline models that utilize the same feature set.These baselines include common statistical models, historical percentages derived from the dataset, and Elas-ticNet (Zou & Hastie, 2005).Additionally, we consider tree-boosting models, namely XGBoost (Chen & Guestrin, 2016) and CatBoost (Prokhorenkova, Gusev, Vorobev, Dorogush, & Gulin, 2018), which have been commonly employed in previous studies to model the expected probability of a goal (Eggels et al., 2016;Macdonald, 2012) as well as scoring and conceding patterns (Decroos et al., 2019;Toda et al., 2022;Umemoto et al., 2022), among others.More specifically, the probability of a single defender blocking a shot is modeled using a normal distribution probability density function (PDF).Additionally, the shot block probability is calculated by summing a discrete set of angles from the shooter to the goal line, bounded by the goal posts.

Create Additional Feature with Theory-Based Shot Block Model
We have made several improvements compared with the C-OBSO approach.Primarily, we excluded the goalkeeper from our considerations, as a saved shot is still counted as being on target.Moreover, we consider the angle to the goal as continuous rather than discrete.This change allows us to achieve a more precise value of the PDF function.
Moreover, we introduced a more realistic event space, in addition to assuming the probability of each defender is independent, to better reflect the realistic scenario.If one defender has already blocked the shot, other defenders won't be able to block it subsequently.Furthermore, we substituted the normal PDF with a truncated normal PDF.The truncated version restricts the reachable location of the defender rather than extending it infinitely.
Finally, to ensure the robust and rigorous foundation for our methodology.We explain a specific shot-taking situation as in Figure 4 and provide a statistical theorybased and detailed derivation of the theory-based shot block model as follows: Step 1: Filtering players.The filtering process begins by excluding the goalkeeper, players on the same team as the shooter, and defenders located outside the feasible block zone bounded by the coordinates of the shooter and the two intersection points between the penalty area line and the goal line (as indicated by the blue lines in Fig 5).The defenders that remain after this filtering process are labeled as defender 1, 2, ..., n = D and are sorted in ascending order based on their distance from the shooter.
Step 2: Consider angle to the goal.By applying the law of total probability, the shot block probability can be conditioned on the shot angle θ that the shooter takes.We assume that the shots are taken in straight lines, and each degree within the feasible angle corresponds to a specific shot angle.The feasible shot angle is defined as the angle formed with the straight line from the shooter to the left goal post (as  indicated by the left boundary or the red area in Figure 5) and the straight line from the shooter to the right post (as indicated by the right boundary or the red area in Figure 5).The total degree, equivalent to shot angle n, can be calculated using the law of cosines.
The shot block probability can be represented by the following equation for a continuous shot angle θ ∈ [0, n]: where P (θ) represents the probability of the shooter selecting shot angle θ to shoot.P were used to differentiate the estimation from the theory-based shot block model and estimation P from the DNN model, but they are both the estimated probability of shot block.
To simplify the analysis, we assume that P (θ) follows a continuous uniform distribution within the range of [0, n], resulting in the second equation.P (S block |θ) denotes the probability that the shot will be blocked by defenders in set D, given that shot angle θ is selected.The term c 3 represents a constant.
Step 3: Consider each defender.After considering each shot angle, we can incorporate each defender in set D. It is important to note that only one defender can block the shot.For instance, defender d (e.g., defender 13 in If |D| = 0, indicating that there are no defenders in set D, the probability P (S block |θ) becomes 0. Consequently, the overall shot block probability P (S block ) is also 0. Furthermore, we assume that the defenders' probabilities to block the shot are independent.With this assumption, the components of the above equation can be further dissected using the following equation: Step 4: Model each defender.We model each defender's expected probability of blocking shots using a truncated normal distribution probability density function (PDF).In this case, we treat the PDF as a simple function without statistical meanings.The use of a truncated normal PDF is preferred because it does not have a tail that extends to infinity, unlike the normal PDF; this ensures that the range of a defender's reach is bounded and helps avoid unrealistic assumptions.The function is as follows: where, erf(x) is the error function and is approximated numerically.
Step 5: Model calculation and optimization.To ensure computational efficiency, the trapezoidal rule is employed to approximate P (S block ) when |D| > 0 and set it equal to 0 otherwise.For the shot-taking situation in Figure 4, the probability of shot block at each shot angle is shown in Figure 6.Various common optimization methods were compared to optimize the parameters and constant terms c 1 , c 2 , c 3 , c 4 , a.These included iterative-based methods: Powell (Powell, 1964) and Nelder-Mead (Nelder & Mead, 1965), as well as gradient-based methods: CG (Hestenes, Stiefel, et al., 1952), L-BFGS-B (D.C. Liu & Nocedal, 1989), and SLSQP (Kraft, 1988).
The results of the optimization process, including the comparison between different optimization methods, can be found in Section 4.2.2.Powell was selected as the optimal choice after evaluating the performance of each method due to its superior performance, and the value for the optimized parameters are listed in Supplementary Table 14.

Calculate xOSOT
The xOSOT is calculated by determining the off-ball attacker who has the highest expected probability to shoot on target.The probability of the off-ball attacker being able to control the ball will also be considered since it first requires the shooter to pass the football to the off-ball attacker.This approach modifies the concept of OBSO introduced in Spearman (2018), and further explained in Section D. However, in this case, we consider only the off-ball attacker a ∈ A and the corresponding location that has the highest expected probability to shoot on target, rather than considering all locations on the pitch.The equation for xOSOT is as follows: where P (S on | Control a ) denotes the probability of a shot on target at the location of the off-ball attacker a, given that attacker a has controlled the football.Additionally, P (Control a ) represents the probability that the ball will be controlled by the off-ball player a.
Furthermore, xSOT a represents the xSOT calculated with the off-ball attacker a.Additionally, P P CF a denotes the theory-based PPCF model (Potential Pitch Control Field) (Spearman, 2018), which is calculated from time 0 to T , where T is the travel time of the football from the shooter to the off-ball attacker a.This is in contrast to the approach in (Spearman, 2018) where T → ∞.Considering the finite travel time T is more suitable, as it accounts for the fact that even if the off-ball attacker a gains control of the ball after time T , it is unlikely that they can shoot from their current location.
The PPCF is further explained in Section D, providing more details on how it is computed.

Experiments and Results
This section aims to verify the xSOT and xOSOT metrics, determine the optimized strategy for the interaction between the shooter and closest defender, and showcase the analysis of each shot-taking situation using xSOT and xOSOT.The code for this study is accessible on GitHub through the following link: https://github.com/calvinyeungck/Football-1-vs-1-Shot-Taking-Situations-Analysis.

Dataset and Preprocessing
Dataset: The dataset used for this study was based on the events and freeze frame data from the World Cup 2022 and EURO 2020 tournaments.The football events and freeze frame data were obtained from StatsBomb's free data 7 , available at https:// statsbomb.com/what-we-do/hub/free-data/.
The Euro 2020 dataset comprised 51 matches, while the World Cup 2022 dataset included 64 matches.In total, there were 2,575 shot-taking events recorded, with 1,043 shots off target, 850 shots on target, and 682 shots blocked.Additionally, the xG (expected goals) data was sourced from https://footystats.org/international/world-cup/xg, while the number of goals data was obtained from https://www.mykhel.com/football/fifa-world-cup-2022-team-stats-l4/.
Preprocessing: In order to address the limited amount of data, we performed data preprocessing by splitting the dataset into a train set and a test set using the train_test_split() function from the Python package sklearn.The ratio was set to 80/20, and the splitting was stratified based on the grouped shot outcome (for more details, refer to Section A).For training the DNN and baseline models, we utilized the train set with 5-fold cross-validation, implemented using the Stratif iedKF old() function from the sklearn package.
Furthermore, it is important to note that StatsBomb employs a football pitch coordinate system with x ranging from 0 to 120 and y ranging from 0 to 80.However, a professional football pitch typically has a size of 105 meters in length and 68 meters in width.Therefore, we appropriately scaled the xy coordinates.Additionally, we calculated the distance to the goal (Dist2Goal) and angle to the goal (Ang2Goal) features when computing xSOT.The equations for Dist2Goal and Ang2Goal are as follows:

Models and Framework Validation
Here, we validate the effectiveness of using the DNN for modeling the probability of shot off in Section 4.2.1 and shot block in Section 4.2.2.Additionally, we identify the optimal optimization methods for the theory-based shot block model in Section 4.2.2 and highlight the necessity of the theory-based shot block model in the framework in Section 4.2.3.

Shot Off Probability Model Validation
Beginning with the DN N off models, we assess their performance by comparing them against baseline models: historical percentage, ElasticNet, xGBoost, and CatBoost, using the same features set as the proposed model.These baseline models have been commonly used to model football event data in previous studies (see details in Section 3.2).The evaluation will be based on the binary Cross-Entropy Loss (CEL), where a lower CEL indicates better performance.The CEL is a commonly used scoring rule for probability estimation in a 2-class event.
In Table 2, the performance of the DNN model was compared with other models in estimating the probability of a shot off target.Our model, DN N off , had outperformed all baseline models, and achieved the lowest average CEL of 0.6696.However, it is important to note that DN N off did not possess an overwhelming advantage compared to other baseline models.More informative features could be engineered in future works.

Shot Block Probability Model Validation
Subsequently, for the theory-based shot block model, we compare the performance of different optimization methods: Powell, Nelder-Mead, CG, and SLSQP.These methods are commonly used in function optimization (see details in Section 3.3).The evaluation will be based on the binary Cross-Entropy Loss (CEL), where a lower CEL indicates better performance.The CEL is a commonly used scoring rule for probability estimation in a 2-class event.
Table 3 presents a comparison of the performance of various optimization methods for the theory-based shot block model.Among the optimization methods considered, the Powell method (Powell, 1964) achieved the lowest CEL of 0.9220.Overall, all five optimization methods exhibited similar performance, indicating that the choice of optimization method had a minimal impact on the performance of the theory-based shot block model.
For the DN N block models, we assess their performance by comparing them against baseline models: historical percentage, ElasticNet, xGBoost, and CatBoost, using the same features set as the proposed model, as the above shot off model verification.The evaluation will be based on the binary Cross-Entropy Loss (CEL), where a lower CEL indicates better performance.The CEL is a commonly used scoring rule for probability estimation in a 2-class event.
Table 4 provides a summary of the performance comparison between models in estimating the probability of a shot being blocked.Our model, DN N block , had outperformed all baseline models and achieved the lowest average CEL of 0.4876.This result validated that DN N block effectively provided inference for shot block probability and performed better than the baseline models.

Necessity of the Theory-Based Shot Block Model
Additionally, we assess the necessity of the theory-based shot block model and compare the performance of the DN N block when fitted with different sets of features.Specifically, we consider the methodology features (details in Section 3.2), an ablated version with only basic shooter features (details in Section 3.2), advanced shooter features (details in Section 4.2.1), and direct utilization of non-shooter player's role and xy coordinates8 (Unprocessed player features).
Furthermore, we verify the importance of combining the theory-based shot block and DNN models instead of using them independently.We compare their performance when used in combination and when used independently.The evaluation will be based on the binary cross-entropy loss (CEL), where a lower CEL indicates better performance.CEL is a commonly used scoring rule for probability estimation in a 2-class event.
In Table 4, we demonstrated the necessity of the theory-based shot block model by comparing the use of different feature sets.The results indicate that the proposed shot block DNN model with the proposed features, utilized the theory-based shot block model's predicted shot block probability as features, achieved the best performance of 0.49.This provided evidence for the necessity of the theory-based shot block model in the framework.Finally, we validated the importance of combining the theory-based shot block and DNN models.From Table 3, we observed that the theory-based shot block model alone achieved an average CEL of 0.92, and the DN N block alone achieved an average CEL of 0.55.However, when combined with the DNN model in the proposed method using the proposed features, as shown in Table 4, the average CEL largely improved to 0.49.This comparison highlighted the need for integrating both models, as it enhanced performance in estimating the probability of shot block.
In summary, our analysis provided evidence of the effectiveness of DN N off and DN N block in estimating the probabilities of shot off target and shot block, respectively.Additionally, we validated the necessity of the theory-based shot block model and demonstrated the importance of combining it with DN N block to achieve improved performance.

Predicted Probability Validation
After verifying the models and framework, we proceeded to validate the predicted probabilities of shot off and shot block from the models with the test set.The DN N off and DN N block models were trained using inverse class weighted CEL.The model parameters were open-sourced and were applied for the analysis hereafter.
The probabilities were then converted to binary values using a threshold of 0.5.The resulting confusion matrices for shot off and shot block could be found in Tables 5  and 6, respectively.On average, the correctly assigned class had the highest probability.This indicates that the predictions made by the DNN models contained valuable information and were consistent with the observed outcomes.

xSOT and xOSOT Verification
Furthermore, to validate the proposed metrics, we calculated the total xSOT (expected Shot On Target), xOSOT (expected Offense Shot On Target), and an additional metric called max_prob = max(xSOT, xOSOT), representing the maximum shot on probability a team could produce under a shot-taking situation.These calculations were performed for each team in the World Cup 2022, and averaged across matches (the final results were presented in Supplementary Table 15).We employed the Pearson correlation metric to evaluate the information provided by the proposed metrics, existing metrics, and statistics due to the absence of ground truth data regarding the value of player actions and the probability of a shot being on target.The Pearson correlation enabled us to evaluate their respective relationships.This analysis helped determine which metrics aligned with each other and provided consistent insights.
In Table 7, we observed that the xSOT metric exhibited a higher correlation 0.58 with the average goal compared to the correlation between xG and the average goal (0.46).This suggested that xSOT was a better metric for approximating the final performance of a team in terms of goal scoring.
Additionally, the proposed metrics, xSOT, xOSOT, and max_prob, demonstrated high correlations with xG of 0.88, 0.93, and 0.95, respectively.This indicates that these metrics could effectively capture the attacking abilities of both teams and individual players, similar to how xG reflects the expected goal-scoring capability.Thus, the proposed metrics could provide valuable insights for evaluating the value of a player's action and the attacking prowess of teams and aligned with the established xG metric.

Optimial Strategy in World Cup 2022
After successfully verifying the proposed models, metrics, and framework, we could now utilize them to uncover the optimal strategy for both the shooter and the closest defender in a shot-taking situation.By utilizing all available data, we filtered out situations where the set of filtered defenders D, with |D| = 0, indicating no defender being considered in the baseline model; we were left with 1468 shot-taking situations for analysis.Since defenders in a blocking position had the option to either move out of the way (not blocking).On the other hand, it would be challenging to block the shot if the defender was not in a blocking position initially.
To determine the optimal strategy, we calculated the expected payoffs for each possible strategy profile and summarized them in Table 8.According to the Nash equilibrium (Nash, 1951), the optimal strategy for the shooter was to pass the ball, The underline indicates the highest payoff for the attacker, given the defender selects the strategy in the column, and the overline indicates the highest payoff for the defender, given the attacker selects the strategy in the row.For more details regarding the calculation of Nash Equilibrium, please refer to (Tadelis, 2013).while the optimal strategy for the closest defender was to block the shot.Deviating from this strategy would not yield a higher expected reward for either agent.Mixed strategies need not be considered since we had a pure strategy in this case.Moreover, with more data, the above analysis can be performed per team or even per player role as in (Tuyls et al., 2021).Furthermore, it is worth noting that the payoff difference between shooting and passing was significant (±0.15) when the closest defender decided to block the shot.This suggested that, under expectation, there was an off-ball player who had a higher chance of successfully shooting on target.Therefore, passing became a more favorable option for the shooter, as it maximized the potential reward and could increase the team's chances of scoring.

EURO 2020 Shot-Taking Situations Analysis with xSOT and xOSOT
As previously mentioned, it was expected that there was an off-ball attacking player (illustrated with the blue color dots in Figure 7) who had a higher chance of shooting on target in shot-taking situations.By utilizing xSOT and xOSOT, we could determine whether the shooter should take the shot or make a pass for the off-ball attacker to shoot (counterfactual), as well as identify the optimal recipient of the pass.Additionally, through the construction of xSOT and xOSOT, it became possible to estimate the probabilities of shot off, shot block, and control for each attacker involved in the situation.This information could help us understand why the off-ball attacker had a higher expected probability of shooting on target.By analyzing these probabilities, football players and analysts could gain insights into the positioning and other factors that contributed to the off-ball attacker's increased probability of shooting on target.
Figure 7 illustrates a shot-taking situation from the EURO 2020 match between Italy and Wales.Table 9 provides the values of the proposed metrics for each attacker involved in the freeze frame.In this scenario, Attacker 9 (Jersey number) exhibited the highest probability of shooting on target 0.27 and the lowest probability of shooting off target 0.32.Since Attacker 9 is closer to the goal line.On the other hand, Attacker 20 had the second-best probability of shooting on target 0.23, and the lowest probability of the shot being blocked by defenders 0.03, because Attacker 20 faced fewer defenders but was in a further position to the goalline.Furthermore, Attacker 14 demonstrated the highest probability of controlling the ball 0.99, as no defenders were around.Therefore, passing to Attacker 14 would be the optimal choice to maintain possession and increase the team's chances of retaining control of the ball.
By analyzing these metrics, we could gain valuable insights into the shooting, blocking, and controlling probabilities of each attacker, which could guide decisionmaking in shot-taking situations and enhance the team's overall performance.

Conclusion
In summary, this research aims to provide an effective and data-driven method to comprehensively analyze the interaction strategy between the shooter and defender.To achieve this objective, we have proposed a novel framework that integrates the use of machine learning, a theory-based approach, and game theory.We have validated the models DN N off and DN N block for estimating event outcomes, the metrics xSOT and xOSOT for valuing players' actions, and provided examples to analyze team strategies and shot-taking situations with open-access data.We expect this framework to help teams gain a more in-depth understanding of shot-taking situations.Specifically, in difficult or controversial situations, xSOT would help perform an objective analysis, ultimately enhancing teams' performance.
In the future, since the metric xSOT provides the expected probability for all players in the data, the skill level of each player would affect the probabilities in the metric.It would be possible to estimate team or player-specific xSOT by incorporating player skills-related features into the DNN models, as demonstrated in Yeung, Bunker, and Fujii (2023).Additionally, we assumed the interaction was a static one-stage game due to the lack of velocity and other detailed data.If velocity and other detailed data become available, it would be possible to define a multi-stage game that incorporates the expected movement of the players.In conclusion, with more data related to players, shot-taking situations, and football matches, a more comprehensive version of this framework could be developed.Nevertheless, we expect that this framework will serve as inspiration for analyzing complex interaction situations, particularly in the realm of sports.this did not imply that the outcomes were sequentially independent, and therefore, we had to assume sequential independence (Static game).where, • D represents the current game stats.
• G r represents a goal scored from location r.
• C r represents the passing team controls the ball at location r.
• T r represents the next event that happens at location r.
where f j (t, − → r , T |s) denotes the probability that player j will reach location − → r and control the football within time T .λ j and s are optimizable parameters, and τ exp (t, − → r ) is the expected interception time, calculated based on the player's initial location, acceleration, and maximum speed.In this study, the PPCF is calculated by integrating Supplementary Equation 12 from time 0 to time T , where T represents the travel time required from the shooter's location to the off-ball attacker.

Fig. 3
Fig. 3 Flow chart of the theory-based shot block model.Each step is explained in detail in this section, Section 3.3.

Fig. 4
Fig. 4 Shot-taking situation example image.The image included all players who appeared in the freeze frame from the match Spain vs. Italy, EURO 2020.

Fig. 5
Fig. 5 Theory-based shot block model feasible block zone and feasible angle.The shot-taking situation is based on the one in Figure 4. Defenders inside the feasible block zone bounded by blue lines are considered.The probability of block is calculated based on shot angles inside red regions.The line from the left goalpost to the shooter is considered as shot angle 0, and from the right goalpost to the shooter is considered as shot angle n.

Fig. 6
Fig. 6 Probability of Shot Block for each feasible shot angle.The shot-taking situation is based on the one in Figure 4.

Fig. 7
Fig. 7 Shot-taking situation example image.The image included all players who appeared in the freeze frame from the match Italy vs. Wales, EURO 2020.
The potential pitch control field (PPCF) model is represented by the second term P (C r |T r , D) in the aforementioned equation.The equation for PPCF is as follows:dP P CF j dT (t, − → r , T |s, λ j ) = (1 − k P P CF k (t, − → r , T |s, λ j ))f j (t, − → r , T |s)λ j f j (t, − → r , T |s) = 1 + e

Table 1
Game theory payoff table.

Table 2
The performance of shot off probability prediction models with machine learning.

Table 3
Theory-based shot block model optimization methods performance.
The table shows the average (Avg) CEL and standard deviation (Std) on the 5 crossvalidations split validation set CEL.The table is ordered with the Avg CEL (the lower the better) in ascending order.

Table 4
The performance of shot block probability prediction models with machine learning and different feature sets.

Table 5
Shot off prediction test set confusion matrix.

Table 7
Correlation between the proposed metrics and the existing metrics.

Table 8
Payoff table for the attacker and closest defender.

Table 9
Shot-taking situation example statistics.

Table 12
Contingency table for the shot outcome.The columns Shot Off Model and Shot Block Model indicate the best hyperparameters for the respective model.

Table 14
Theory-based shot block model optimized parameter.

Table 15
World Cup 2022 team statistics.