Skip to main content

Strategy Optimization in Sports via Markov Decision Problems

  • Conference paper
  • First Online:
Modeling, Simulation and Optimization of Complex Processes HPSC 2018

Abstract

In this paper, we investigate a sport strategic question related to the Olympics final in beach volleyball: Should the German team play most aggressively to win many direct points or should it rather play safer to avoid unforced errors? This paper introduces the foundations of our new two-scale approach for the optimization of sport strategic decisions. We present possible answers to the benchmark question above based on a direct approach and the presented two-scale method. A comparison shows the benefits of the new paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In other words, an attack style is a mixed partial g-policy consisting only of decision rules that belong to some attack type.

  2. 2.

    Note that it would not be sufficient to assign a probability distribution over a set of actions to an s-policy, since any hit in a sequence of actions could fail with some probability, resulting in a state without feasible actions. In contrast to this, an attack plan, which uniquely determines a sequence of decision rules, returns for all possible resulting failure states an action to cope with it. For example, in a failed attempt to set the ball properly, the most risky smash available might be much less aggressive than the safest smash available after an excellent set—this possibility could not be covered by classifying actions.

  3. 3.

    The attacking player must be in front of the blocking player. An attack-hit from the last row of the court (\(P11-P14\) or \(Q11-Q14\)) can not be blocked.

References

  1. Anbarc, N., Sun, C., Ünver, M.: Designing fair tiebreak mechanisms: the case of penalty shootouts. Technical report, Boston College Department of Economics (2015)

    Google Scholar 

  2. Chan, T.C.Y., Singal, R.: A Markov decision process-based handicap system for tennis. J. Quant. Anal. Sports 12(4), 179–189 (2016). https://doi.org/10.1515/jqas-2016-0057

    Article  Google Scholar 

  3. Clarke, S.R., Norman, J.M.: Dynamic programming in cricket: protecting the weaker batsman. Asia Pacific J. Oper. Res. 15(1) (1998)

    Google Scholar 

  4. Clarke, S.R., Norman, J.M.: Optimal challenges in tennis. J. Oper. Res. Soc. 63(12), 1765–1772 (2012)

    Article  Google Scholar 

  5. Ferrante, M., Fonseca, G.: On the winning probabilities and mean durations of volleyball. J. Quant. Anal. Sports 10(2), 91–98 (2014)

    Google Scholar 

  6. FIVB: Official Beach Volleyball Rules 2009–2012 . Technical report, Fédération Internationale de Volleyball (2008)

    Google Scholar 

  7. Heiner, M., Fellingham, G.W., Thomas, C.: Skill importance in women’s soccer. J. Quant. Anal. Sports 287–302 (2014). https://doi.org/10.1515/jqas-2013-0119

  8. Hirotsu, N., Wright, M.: Using a Markov process model of an association football match to determine the optimal timing of substitution and tactical decisions. J. Oper. Res. Soc. 53(1), 88–96 (2002). https://doi.org/10.1057/palgrave/jors/2601254

    Article  MATH  Google Scholar 

  9. Hirotsu, N., Wright, M.: Determining the best strategy for changing the configuration of a football team. J. Oper. Res. Soc. 54(8), 878–887 (2003). https://doi.org/10.1057/palgrave.jors.2601591

    Article  MATH  Google Scholar 

  10. Hoffmeister, S., Rambau, J.: Skill estimates - olympic beach volleyball tournament 2012 (2019). https://epub.uni-bayreuth.de/4150/

  11. Kira, A., Inakawa, K., Fujita, T., Ohori, K.: A dynamic programming algorithm for optimizing baseball strategies. J. Oper. Res. Soc. Jpn. 62 (2019)

    Google Scholar 

  12. Koch, C., Tilp, M.: Analysis of beach volleyball action sequences of female top athletes. J. Human Sport Exer. 4(3), 272–283 (2009)

    Google Scholar 

  13. Miskin, M.A., Fellingham, G.W., Florence, L.W.: Skill importance in women’s volleyball. J. Quant. Anal. Sports 6(2) (2010)

    Google Scholar 

  14. Mitchell, T.M.: Estimating Probabilities. In: Machine Learning, Chap. 2, pp. 1–11. McGraw-Hill Science/Engineering/Math (2017)

    Google Scholar 

  15. Nadimpalli, V.K., Hasenbein, J.J.: When to challenge a call in tennis: a Markov decision process approach. J. Quant. Anal. Sports 9(3), 229–238 (2013)

    Google Scholar 

  16. Norman, J.M.: Dynamic programming in tennis - when to use a fast serve. J. Oper. Res. Soc. 36(1), 75–77 (1985)

    Google Scholar 

  17. Norris, J.: Markov Chains. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  18. Puterman, M.L.: Markov Decision Processes - Discrete Stochastic Dynamic Programming. Wiley, New York (2005)

    MATH  Google Scholar 

  19. Routley, K., Schulte, O.: A Markov Game Model for Valuing Player Actions in Ice Hockey. Uncertainty in Artificial Intelligence (UAI), pp. 782–791 (2015)

    Google Scholar 

  20. Schulte, O., Khademi, M., Gholami, S., Zhao, Z., Javan, M., Desaulniers, P.: A Markov game model for valuing actions, locations, and team performance in ice hockey. Data Mining Knowl. Discov. (2017). https://doi.org/10.1007/s10618-017-0496-z

    Article  MathSciNet  Google Scholar 

  21. Terroba, A., Kosters, W., Varona, J., Manresa-Yee, C.S.: Finding optimal strategies in tennis from video sequences. Int. J. Pattern Recogn. Artif. Intell. 27(06), 31 (2013). https://doi.org/10.1142/S0218001413550100

  22. Turocy, T.L.: In search of the “last-ups” advantage in baseball: a game-theoretic approach. J. Quant. Anal. Sports 4(2) (2008). https://doi.org/10.2202/1559-0410.1104

  23. Walker, M., Wooders, J., Amir, R.: Equilibrium play in matches: binary Markov games. Games Econ. Behav. 71(2), 487–502 (2011). https://doi.org/10.1016/j.geb.2010.04.011

    Article  MathSciNet  MATH  Google Scholar 

  24. Webb, J.N.: Game Theory - Decisions Interaction and Evolution. Springer, London (2007)

    MATH  Google Scholar 

  25. Wright, M., Hirotsu, N.: A Markov chain approach to optimal pinch hitting strategies in a designated hitter rule baseball game. J. Oper. Res. Soc. Jpn. 46(3), 353–371 (2003)

    MathSciNet  MATH  Google Scholar 

  26. Wright, M., Hirotsu, N.: The professional foul in football: Tactics and deterrents. J. Oper. Res. Soc. 54(3), 213–221 (2003). https://doi.org/10.1057/palgrave.jors.2601506

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We thank our student assistants Ronan Richter and Fabian Buck for their support in the extensive video analysis. Moreover, we thank the anonymous referees for valuable suggestions that greatly improved the presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susanne Hoffmeister .

Editor information

Editors and Affiliations

Appendices

Appendix 1

In this section, we give the details of the proofs for Theorems 2 and 1. One important observation is the following plausible lemma that holds for both special cases of s-MDPs. It says that it is no disadvantage for us when we have more points or the opponent has fewer points.

Lemma 1

The optimal expected reward-to-go \(v^*(x,y,k,\ell )\) satisfies \(v^*(x,y,k, \ell ) \le v^* (x+1,y,k, \ell )\) and \(v^*(x,y,k, \ell ) \ge v^* (x,y+1,k, \ell )\).

Proof

We prove this by comparing all possible realizations of the game separately. First of all, the outcome of future rallies does not depend on the score. Each winning scenario starting from state \((x,y,k,\ell )\) corresponds to a winning scenario with identical transitions starting from state \((x+1,y,k,\ell )\) with one stage less that has at least the same probability. Thus, the total winning probability starting from \((x+1,y,k,\ell )\) is no smaller than the one starting in \((x,y,k,\ell )\). Moreover, each losing scenario starting from state \((x,y,k,\ell )\) corresponds to a losing scenario with identical transitions starting from state \((x,y+1,k,\ell )\) with one stage less that has at least the same probability. Thus, the total losing probability starting from \((x,y+1,k,\ell )\) is no smaller than the one starting in \((x,y,k,\ell )\). The claim expresses exactly this in terms of the optimal reward-to-go in the respective states.

In the previous lemma we compared the winning probabilities in states with identical service components. We now explain why the winning probability increases when we win the next point.

Lemma 2

The optimal expected reward-to-go satisfies \(v^*(x+1,y,P,1) \ge v^*(x,y+1,Q,1)\).

Proof

Team \(P\), in order to win starting at state \((x,y+1,Q,1)\), first has to reach a score of \(x+1\) at some point in time. Thus, the main observation, denoted by (\(*\)), is that all winning scenarios starting from state \((x,y+1,Q,1)\) pass through exactly one of the states \((x+1,y+z,P,1)\), \(z = 1, \dots , 21-y\). Let W be the event that \(P\) wins, let E be the event that state \((x,y+1,Q,1)\) is passed, and for \(z = 1, \dots , 21-y\) let \(E_z\) be the event that state \((x+1,y+z,P,1)\) is passed. Then we compute:

$$\begin{aligned} v^*(x,y+1,Q,1)&= \text {Prob}(W|E)\nonumber \\&= \sum _{z=1}^{21-y} \text {Prob}(E_z|E) \text {Prob}(W|E_z) \quad \text {Markov-Property and}~*\\&= \sum _{z=1}^{21-y} \text {Prob}(E_z|E) v^*(x+1,y+z,P,1)\nonumber \\&\le \sum _{z=1}^{21-y} \text {Prob}(E_z|E) v^*(x+1,y,P,1) \quad \text {Lemma~1 and induction}\\&\le v^*(x+1,y,P,1) \qquad \text {by}~*. \end{aligned}$$

Thus, an optimal policy is myopic:

Corollary 1

The policy that always maximizes the probability to win the next point is optimal for the s-MDP for beach volleyball.   \(\square \)

Appendix 2

This appended section defines the details for the infinite-horizon, stationary g-MDP for a beach volleyball rally that was sketched in Sect. 6.

Let \(P\) and \(Q\) be the teams participating in the game. \(P_1\) and \(P_2\) are the players of \(P\); \(Q_1\) and \(Q_2\) are the players of \(Q\). Team \(P\) is the team for which we want to choose an optimal playing strategy, whereas team \(Q\) is the uncontrolled opposing team. That means, as in the s-MDP, team \(P\) is the decision making team, and the behaviour of team \(Q\) is part of the system disturbance in the transition probabilities. We have decision epochs \(T = \{1, 2, 3, \ldots \}\), and \(t \in T\) is the total number of ball contacts minus the blocking contacts in the rally so far.

A state in the g-MDP is a tuple that contains the players’ positions, the ball’s position, a counter of the number of contacts, the information which player last contacted the ball, a Boolean variable that indicates the hardness of the last hit, and the designated blocking player of the defending team for the next attack. A general formulation for a state is

$$\begin{aligned} (\text {\textit{pos}}(P_1), \text {\textit{pos}}(P_2),\text {\textit{pos}}(Q_1), \text {\textit{pos}}(Q_2),\text {\textit{pos}}(\text {\textit{ball}}), \text {\textit{counter}},\text {\textit{lastContact}},\textit{hard},blocker ). \end{aligned}$$

The function \(\text {\textit{pos}}(\cdot )\) returns the position of a player or the ball. A position on the court is defined on basis of the grid presented in Fig. 9.

Fig. 9
figure 9

Court grid

The components \(\text {\textit{counter}}\) and \(\text {\textit{lastContact}}\) are needed to implement the three-hits and the double contact rule respectively. The state variable \(\text {\textit{counter}}\) can take values from the set \(\{-1, 0,1,2,3\}\). The case “\(-1\)’ marks a service state. This way it is possible to forbid a blocking action on services. The counter stays \(-1\) if the ball crosses the net after a serve. This helps to distinguish between a reception or defence action. Consequently, if the counter is 0, the ball crossed the net via an attack-hit performed in a field attack. The information which player last contacted the ball is needed to implement the double-contact fault into the model. The state variable \(\text {\textit{lastContact}}\) takes values in \(\{P_1, P_2, Q_1, Q_2, \emptyset \}\). If the ball has just crossed the net or the state is a serving state, a \(\emptyset \)-symbol shows that both players are allowed to execute the next hit. The Boolean state variable \(\textit{hard}\) indicates the power of the last hit. If \(\textit{hard}= 1\), then the ball has a high speed when reaching the field, else the ball has normal speed. Finally, the state variable \(blocker \) takes values in \(\{P_1, P_2, Q_1, Q_2\}\) and indicates the designated blocking player of the currently defending team. It is necessary to save it in the state since the decision who blocks is made once at the beginning of the opponents attack plan and followed more than one time step. Besides these generic states, the g-MDP contains the absorbing states \(\text {\textit{point}}\) and \(\text {\textit{fault}}\), where \(\text {\textit{point}}\) and \(\text {\textit{fault}}\) is denoted from the perspective of team \(P\). The resulting g-MDP has around one billion different states. As an example \((P02, P33, Q12, Q13, P02, -1, \emptyset , 0,-)\) is a typical serving state for team \(P\).

Of course, some of the states occur more often in practice than others. Depending on the current state, there are different actions available to each player. The individual player actions of a player \(\rho \) consist of a hit \(h\) and a move \(\mu \). We distinguish between a one-field and a two-field movement. Also, the direction (f:= forward, fr:= forward-right, ...) of the movement matters. A blocking action belongs to the group of movements since ball possession is not required to perform a block. A blocking action can only be performed if the player is in a field at the net. All possible moves for team \(P\) are listed in Table 13. The moves of the players that belong to team \(Q\) are defined analogously.

Table 13 Move specification for \(\rho \) belonging to team \(P\)

Depending on the position of a player and on the position of the ball relative to the player, each player has a set of available hits. Sometimes, this set can consist solely of the hit no hit. A hit \(h^{\text {\textit{tech}}}_{\text {\textit{field}}}\) is defined by a hitting technique \(\text {\textit{tech}}\) and a target field \(\text {\textit{field}}\). Depending on the hit’s degree of complexity, there are different requirements such that the hit is allowed in the model. The function \(\text {\textit{neighbour}}(\text {\textit{field}})\) returns a set of all neighbouring fields of \(\text {\textit{field}}\) according to the grid presented in Fig. 9 and the field itself. All hitting techniques with their possible target fields and requirements are listed in Table 14. The hitting techniques for a player of team \(Q\) are defined analogously.

There are rules in the model that restrict the possible combinations of a hit with a move to a player action as well as rules that restrict the possible combinations of two player actions to a team action. Reasons for these restrictions are practical considerations. There are three rules on combining a hit with a movement to a player action. The first one is: If a player makes a real hit, i.e., a hit that is not no hit, due to timing reasons only a one-field movement is allowed. The second one is: If a player makes a hit that is performed with a jump, like, e.g., a jump serve, only a one-field movement in forward direction (i.e., towards the net) can follow. The third one is: If the hit requires a movement before executing the hit, no additional movement afterwards is allowed. This is, e.g., the case for a reception that takes place in a neighbouring field of the hitting player. We incorporate one restriction to the combination of player actions: If two player actions are combined to a team action, only one player may make a real hit. Team actions that themselves or whose player actions do not follow these rules are not available in the model—for both teams. Further conceivable restrictions could be easily implemented in the model whenever they only depend on the current state.

Table 14 Hit specification for player \(\rho \) of team \(P\) and ball \(\text {\textit{ball}}\); requires always \(\rho \ne \text {\textit{lastContact}}\) except the action no hit (\(^\star \) if \(\text {\textit{pos}}(\text {\textit{ball}}) \ne \text {\textit{pos}}(\rho )\) then no movement afterwards allowed)

Transition probabilities determine the evolution of the system if a certain action in a certain state is chosen. Assume, we know for each player \(\rho \) and each hitting technique \(h^{\text {\textit{tech}}}_{\textit {target }}\) the probability

$$\begin{aligned} p_{\text {\textit{succ}},\rho }\left( \text {\textit{pos}}(\rho ),\ h^{\text {\textit{tech}}}_{\textit {target }} \right) := \mathbb {P}\bigl (\text {\textit{pos}}^{t+1} (\text {\textit{ball}}) = \textit {target }\ | \ \text {\textit{pos}}^t(\rho ), h^{\text {\textit{tech}}}_{\textit {target }}\bigr ), \end{aligned}$$

i.e., the probability that the specified target field \(\textit {target }\) from \(\rho \)’s position at time t is met. In the notation used above, the terms \(\text {\textit{pos}}(\rho )\) and \(h^{\text {\textit{tech}}}_{\textit {target }}\) show the dependence on the position of the hitting player and the hit he uses. The probability is time-independent. The t on the right-hand side of the last equation is only used to indicate that \(\text {\textit{pos}}^{t}(\rho )\) is the position of \(\rho \) at time t while \(\text {\textit{pos}}^{t+1}(\text {\textit{ball}})\) is the position of the ball in the subsequent state. Similarly, assume, we know the probability of an execution fault

for player \(\rho \) using hit \(h^{\text {\textit{tech}}}_{\textit {target }}\) from position \(\text {\textit{pos}}(\rho )\). An execution fault includes hits where the ball is not correctly hit such that the referee terminates the rally. For serves and attack-hits an execution fault also includes that the ball is hit into the net.

Furthermore, assume that we know the blocking skills of each player. The parameter \(p_{\textit{block}, \rho }\) denotes the probability that player \(\rho \) touches the ball when performing the block b against an adequate attack-hit from the opponent’s side of the court. The probability \(p_{\textit{block}, \rho }\) is independent of the skills of the attacking player. There are three possible outcomes of that block. The block can be so strong that it is impossible for the opponent team to get the returned ball, and the blocking team wins the rally. This probability is denoted by \(p_{\textit{block},\rho ,point }\). Also, the block can result in a fault with probability \(p_{\textit{block},\rho ,fault }\). That happens if the ball is blocked into the net and can not be regained or the blocking player touches the net, which is an execution fault. None of the above happens with probability \(p_{\textit{block},\rho ,ok }:= p_{\textit{block}, \rho }- p_{\textit{block},\rho ,point }- p_{\textit{block},\rho ,fault }\). This is called an “ok”-block, and the ball lands in one random field on the opponents or own court side. We define \(p_{\textit{no block}, \rho }:=1-p_{\textit{block}, \rho }\) as the probability that the blocking player fails to get his hands at the ball. In this case, the landing field of the ball is not affected by the block. In total, the blocking probabilities are

$$\begin{aligned} p_{\textit{no block}, \rho }+ \underbrace{p_{\textit{block},\rho ,point }+ p_{\textit{block},\rho ,fault }+ p_{\textit{block},\rho ,ok }}_{p_{\textit{block}, \rho }}= 1. \end{aligned}$$

From all these input probabilities, we generate all transition probabilities in the g-MDP@. We explain how the next state evolves from the current state and the played team actions: The next player’s position depends only on the current position and the movement the player makes. An allowed movement will always be successful. The crucial component is the next position of the ball. Here, the individual skills of the hitting player enter the model. Assume first, no player of the opposing team is blocking. Then with probability \(p_{\text {\textit{succ}},\rho }\left( \text {\textit{pos}}(\rho ),\ h^{\text {\textit{tech}}}_{\textit {target }} \right) \) the ball’s next position will be the desired target field, and with probability \(p_{\text {\textit{fault}},\rho }\left( \text {\textit{pos}}(\rho ),\ h^{\text {\textit{tech}}}_{\textit {target }} \right) \) the hitting player makes an execution fault. The remaining probability

$$\begin{aligned} 1-p_{\text {\textit{succ}},\rho }\left( \text {\textit{pos}}(\rho ),\ h^{\text {\textit{tech}}}_{\textit {target }} \right) -p_{\text {\textit{fault}},\rho }\left( \text {\textit{pos}}(\rho ),\ h^{\text {\textit{tech}}}_{\textit {target }} \right) =: p_{\text {\textit{dev}},\rho }\left( \text {\textit{pos}}(\rho ),\ h^{\text {\textit{tech}}}_{\textit {target }} \right) \end{aligned}$$

will be the probability that the ball lands in a neighbouring field of the target field. We assume each neighbouring field is equally probable.

If the hit is an attacking hit to the opponent’s court side, then the ball may be blocked. The blocking action must be made from an adequate positionFootnote 3 such that the block can have an impact. If all preconditions are fulfilled, we first evaluate whether the hit is successful. A hit is successful if no execution fault occurs, the ball crosses the net, and approaches the target field or one of its neighbours with the respective probabilities. Given a successful attack, we evaluate in the next step the result of the block. If the blocking player does not touch the ball, then the next position of the ball will not be affected by the block. Otherwise, the outcome of the block is evaluated according to the blocking skill of that player and may be a point, fault or a different position of the ball. This need not automatically mean a point for the attacking team, since the defending team may perform a successful defence action in the next time step. Finally, in case of an execution fault or if the ball is not hit by any player, then the next state will be \(\text {\textit{point}}\) or \(\text {\textit{fault}}\), respectively, from the perspective of team \(P\).

Appendix 3

Table 15 Direct estimation of s-MDP probabilities for team Q

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hoffmeister, S., Rambau, J. (2021). Strategy Optimization in Sports via Markov Decision Problems. In: Bock, H.G., Jäger, W., Kostina, E., Phu, H.X. (eds) Modeling, Simulation and Optimization of Complex Processes HPSC 2018. Springer, Cham. https://doi.org/10.1007/978-3-030-55240-4_14

Download citation

Publish with us

Policies and ethics