A copula-based multivariate hidden Markov model for modelling momentum in football

We investigate the potential occurrence of change points—commonly referred to as “momentum shifts”—in the dynamics of football matches. For that purpose, we model minute-by-minute in-game statistics of Bundesliga matches using hidden Markov models (HMMs). To allow for within-state dependence of the variables, we formulate multivariate state-dependent distributions using copulas. For the Bundesliga data considered, we find that the fitted HMMs comprise states which can be interpreted as a team showing different levels of control over a match. Our modelling framework enables inference related to causes of momentum shifts and team tactics, which is of much interest to managers, bookmakers, and sports fans.


Introduction
Sports commentators and fans frequently use vocabulary such as "momentum", "momentum shift", or related terms to refer to change points in the dynamics of a match.
Usage of such terms is typically associated with situations during a match where an event -such as a shot hitting the woodwork in a football match -seems to change the dynamics of the match, e.g. in a sense that a team which prior to the event had been pinned back in its own half suddenly seems to dominate the match.A prominent example is the 2005 Champions League final between Milan and Liverpool, within which Liverpool was trailing by three goals after the first half, but fought back after half time and eventually won by penalty shootout.
Despite the widespread belief in momentum shifts in sports, it is not always clear to what extent perceived shifts in the momentum are genuine.From the literature on the "hot hand" -i.e. research on serial correlation in human performances -it is well known that most people do not have a good intuition of randomness, and in particular tend to overinterpret streaks of success and failure, respectively (see, e.g., Thaler and Sunstein, 2009;Kahneman and Egan, 2011).It is thus to be expected that many perceived momentum shifts are in fact cognitive illusions in the sense that the observed shift in a competition's dynamics is driven by chance only.
Momentum shifts have been investigated in qualitative psychological studies, e.g. by interviewing athletes, who reported momentum shifts during matches (see, e.g., Richardson et al., 1988;Jones and Harwood, 2008).Fuelled by the rapidly growing amount of freely available sports data, quantitative studies have investigated the drivers of ball possession in football (Lago-Peñas and Dellal, 2010), the detection of main playing styles and tactics (Diquigiovanni and Scarpa, 2018;Gonçalves et al., 2017) and the effects of momentum on risk-taking (Lehman and Hahn, 2013).In some of the existing studies, e.g. in Lehman and Hahn (2013), momentum is not investigated in a purely data-driven way, but rather pre-defined as winning several matches in a row.
In this contribution, we analyse potential momentum shifts within football matches.
Specifically, we investigate the potential occurrence of momentum shifts by analysing minute-by-minute bivariate summary statistics from the German Bundesliga using hidden Markov models (HMMs).The corresponding data is described in Chapter 2. Within the HMMs, we consider copulas to allow for within-state dependence of the variables considered.The corresponding methodology is presented in Chapter 3. Our results, which are presented in Chapter 4, suggest states which can be tied to different levels of control in a match.In addition, we investigate the causes of momentum shifts, e.g. the current score of the match.This type of insight could be of great interest to managers, bookmakers and sports fans.

Data
We analyse minute-by-minute in-game statistics of Bundesliga matches, taken from www.whoscored.com, to investigate to what extent momentum shifts in a football match are genuine, and what kind of events lead to a shift.Since the quality and tactics differ between the teams, we do not pool data from multiple teams, but consider data from a single team.Throughout this paper, we consider data from Borussia Dortmund.In the Supplementary Material, we present the same analysis for Hannover 96.
As proxy measures for the current momentum within a football match, we consider the number of shots on goal and the number of ball touches, with both variables sampled on a minute-by-minute basis.For match m, m = 1, . . ., 34, this results in a bivariate time series {y mt } t=1,2,...,Tm , with y mt = (y mt1 , y mt2 ) the pair of variables observed at time t (out of T m minutes played) during the match.
Due to injury times being added to the regular match length of 90 minutes, the lengths of the time series considered range from 91 to 100 minutes.The final data set then comprises n = 3, 214 bivariate observations from m = 34 matches of the season 2017/18.In addition, since the underlying dynamics of a match, from Borussia Dortmund's perspective, potentially depend on characteristics of the opponent (such as the strength of the squad) as well as events in the match (such as goals), the following four covariates are considered: • the market value of the opponent team (taken from www.transfermarkt.com); • the goal difference in the current score; • a dummy variable indicating whether the match is played at home or away; • the current minute of the match.
The first covariate considered is a (crude) proxy for the quality of teams and does not vary for a team in the given period of time.The difference in the current score is calculated from Borussia Dortmunds point of view, i.e. positive values refer to a lead of Dortmund whereas negative values represent that Dortmund is trailing.The dummy indicating whether the match is played at home is included since several studies provided evidence for a home field advantage, because of (e.g.) crowd effects and psychological advantage when playing at home (see, e.g., Pollard, 2008).Finally, to account for the potential state of exhaustion of players, the minute of the match is also included.The variables considered are summarised in Table 1.

Modelling momentum
Figure 1 underlines that there are periods in the match where Borussia Dortmund's number of ball touches and the number of shots on goal are fairly low (e.g.around minute 75-90), as well as periods with relatively many ball touches and shots on goal (e.g.around minute 15-30).HMMs hence constitute a natural modelling approach for the minute-by-minute bivariate time series data, as they accommodate the idea of a match progressing through different phases, with potentially changing momentum.
The states can be interpreted as the underlying momentum, i.e. as potentially different levels of control of the team considered.In the most simple model formulation with two states, the states could, for example, be interpreted as either the team considered or the opponent having a high level of control (i.e.dominating the match).In this chapter, the basic HMM model formulation will be introduced (Section 3.1) and extended to allow for within-state dependence using copulas (Section 3.2).The latter is desirable since the potential within-state dependence may lead to a more comprehensive interpretation q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 1 2 of the states regarding the underlying momentum.Finally, for the model formulation presented in Section 3.2, covariates will be included (Section 3.3).

A baseline model
HMMs involve two components: an unobserved Markov chain with N possible states, and an observed state-dependent process, whose observations are assumed to be generated by one of N distributions as selected by the Markov chain.For the data considered in this paper, the observations and the state process are denoted by y mt and {s mt } t=1,2,...,Tm , respectively.Switches between the state are modelled by the transition probability matrix (t.p.m.) Γ = (γ ij ), where γ ij = Pr(s mt = j|s m,t−1 = i), i, j = 1, . . ., N .Figure 2 shows the model structure as directed graph.For the model formulation of an HMM to be completed, the number of states N and the class(es) of state-dependent distribution(s) have to be selected.While choosing state-dependent distribution(s) is straightforward for univariate time series, it is generally not straightforward to define a multivariate distribution to allow for within-state dependence of the variables considered, unless a multivariate normal distribution can be assumed.Hence, for the vector of observations y mt , in the baseline model formulation we assume that the joint probability is obtained by the product of the marginal distributions, with K = 2 here.This assumption (also known as contemporaneous conditional independence) is often used in practice (see, e.g., Wall and Li, 2009;DeRuiter et al., 2017;Punzo et al., 2018;van Beest et al., 2019).In Eq. ( 1) f denotes a p.m.f.since we deal with discrete data, but in principle f could also denote a density without any further changes in the baseline model formulation.The contemporaneous conditional independence assumption will be modified in the next subsection.
Since both the number of shots on goal and the number of ball touches are count data, the Poisson distribution would be a standard choice for either of the two variables.
In general, the normalising constant Z(λ, ν) does not reduce to such a simple closedform expression.Asymptotic results are however available (Gillispie and Green, 2015).
To formulate the likelihood for the baseline model, the i−th diagonal element of the N × N diagonal matrix P(y mt ) consists of the joint probability of the observations y mt1 and y mt2 given state i, i.e. f (y Since the Conway-Maxwell-Poisson distribution contains an infinite sum in the normalising constant, the evaluation of the p.m.f. is not straightforward.Here, the R package COMPoissonReg was used for this purpose (Sellers et al., 2018).Since stationarity cannot reasonably be assumed in our setting, we estimate the initial distribution δ = Pr(s m1 = 1), . . ., Pr(s m1 = N ) , regarding the parameters of δ as N − 1 additional parameters to be estimated.With these quantities defined, the likelihood for one match m is given by: L = δP(y m1 )ΓP(y m2 ) . . .ΓP(y mTm )1 with column vector 1 = (1, . . ., 1) ∈ R N (see Zucchini et al., 2016).Calculation of this matrix product expression amounts to the application of the forward algorithm, which is a powerful recursive technique for efficiently calculating the likelihood of an HMM at computational cost O(T N 2 ) only (Zucchini et al., 2016).To obtain the likelihood for the full data set, we assume independence between the individual matches such that the likelihood is given by the product of likelihoods for the individual matches: (2) The model formulation presented here could be extended to account for momentum carry-over effects across matches, but this is not investigated in the present work since there is usually a time difference of 5-7 days between matches.The model parameters are estimated by numerical maximum likelihood estimation using the function nlm() in R (R Core Team, 2017).To avoid local maxima, we carefully selected starting values Figure 2: Dependence structure of the HMM considered: each pair of observations y mt is assumed to be generated by one of N (bivariate) distributions according to the state process s mt .
for the numerical maximisation by drawing random numbers from uniform distributions several times and choosing the model with the best likelihood.In addition, to speed up computation time, we implemented the forward algorithm in C++ using the Rpackage Rcpp (Eddelbuettel, 2013).For a model with N = 2 states, it takes less than a minute to numerically maximise the likelihood on a usual desktop computer.In the Supplementary Material of this article, we provide data and code for all models presented.

Modelling within-state dependence using copulas
In the baseline model formulation, we assume contemporaneous conditional independence, i.e. that there is no within-state correlation between the two variables considered.
However, when modelling momentum in football, it is of interest to explicitly model any within-state dependence to draw a comprehensive picture of the dynamics of a match.For example, high ball possession can be linked to both an attacking phase with lots of shots on goal, but also much less goal-oriented tactics, where the main aim is simply to control the match by keeping the ball, without much pressure on goal.The between-variable correlation would likely be very different in those two scenarios.By estimating the within-state correlation between the two variables, we are better able to distinguish between such fairly subtle differences in a team's style of play.
To modify the contemporaneous conditional independence assumption, a multivariate distribution needs to be assumed to specify the dependence structure between the variables considered within states.Here, we allow for within-state correlation of our variables y mt by formulating a bivariate distribution as state-dependent distribution using a copula.A copula is a multivariate probability distribution with uniform margins.
As introduced by Sklar (1959), the idea of a copula is to split a multivariate distribution into its univariate margins and the dependence structure, where the latter depends on the copula considered.Within the class of HMMs, copulas have previously been used by Härdle et al. (2015) where C(., .) is a bivariate copula.When deriving the corresponding p.m.f., differences are needed rather than derivatives, since the marginals are discrete (see, e.g., Nikoloulopoulos, 2013).Thus, the bivariate p.m.f. of y mt given state s mt is given by The copula C(., .)needs to be selected from the large number of possible copula functions available in the literature.Here, we focus on copulas that can model positive and negative dependence.Archimedean copulas (see, e.g., Nelsen, 2006, p. 116 for an overview) are convenient for this modelling purpose.We consider three different families of copulas, comparing their fit to the data in Chapter 4: first, the Frank-copula, which for two marginals u 1 and u 2 defined as second, the Clayton-copula, , and third, the Ali-Mikhail-Haq (AMH) copula, , where for each copula considered the dependence parameter is denoted by θ.With these quantities defined, the diagonal matrix P(y mt ) in the HMM likelihood (see Eq. 2) changes slightly.The i-th diagonal entry is now equal to f (y mt |s mt = i) as defined in Eq. ( 3) instead of the product of the marginals.The corresponding likelihood is then again numerically maximised using the function nlm() in R.

A model including covariates
In the previous subsections, the transition probabilities γ ij were assumed to be constant over time.To account for possible events which may lead to state-switching, and hence to possible momentum shifts, we modify this assumption by explicitly allowing the transition probabilities γ ij to depend on covariates at time t.This is done by linking p using the multinomial logit link: Since the transition probabilities depend on covariates, the t.p.m.Γ t is not constant across time anymore, i.e. the Markov chain is non-homogeneous.However, the structure of the HMM likelihood as stated in Eq. ( 2) is unaffected, such that the likelihood can still be maximised numerically.

Results
In this chapter, the different models presented in Chapter 3 are fitted to data on the matches of Borussia Dortmund in the 2017/18 Bundesliga season.To further illustrate the methodology, in particular for lower-ranked teams, in the Supplementary Material we provide the results also for Hannover 96.

Baseline model
For the baseline model, we make use of the contemporaneous conditional independence assumption, cf.Eq. ( 1 ).Following this approach, the means of the number of shots on goal are 0.138 and 0.175 for states 1 and 2, respectively.For the ball touches, the means are 4.080 (state 1) and 10.104 (state 2), respectively.Thus, state 2 can be interpreted as the team considered, Borussia Dortmund, being more dominant, i.e. having a higher level of control over the match, than when being in state 1.The t.p.m. is estimated as Γ = 0.867 0.133 0.280 0.720 , and the initial distribution as δ = (0.258, 0.742).According to the t.p.m. of the fitted model, there is some persistence in both states.Although this is the most simple model formulation considered here, the fitted model comprises interpretable states which refer to different levels of control over the match.The model can thus be regarded as a simple baseline model for capturing momentum shifts.We will now gradually increase its complexity to more fully capture the in-game dynamics.

Copula-based HMM with N = 2
To capture possible within-state correlation of the variables, a multivariate distribution needs to be considered.For Poisson marginals, the bivariate Poisson as proposed by Karlis and Ntzoufras (2003) would be a possible candidate.However, as discussed in Section 3.1, this approach would have two limitations, namely the inability to capture overdispersion (and underdispersion), and the restriction to positive between-variable correlation.Instead we use more flexible CMP distributions for the marginals, stitching them together using a copula as described in Section 3.2.
First, we investigate the consequences of relaxing the contemporaneous conditional independence assumption.To this end, Figure 3

Choosing the number of states
For the choice of the number of states, it is anything but clear how many states a given team may exhibit in a football match.To choose an appropriate number of states, and also a copula, we first consult the AIC and the BIC for the copula-based HMMs using different numbers of states and the three copulas considered above.The corresponding results are displayed in Table 2. Starting with the choice of the copula, the Clayton copula is preferred by both AIC and BIC.Hence, from now on, we use the Clayton copula.Choosing the number of states is not as conclusive: according to the AIC, the five-state model is preferred, whereas the BIC selects three states.As it is well-known that the AIC tends to select too many states in a HMM (see Pohle et al., 2017), a choice of N = 3 seems more appropriate based on these formal criteria.To make an informed choice based also on interpretability of the resulting model states, in Figure 4 we further inspect the fitted models with three and four states, respectively, by means of their estimated state-dependent distributions.Figure 4 illustrates that the general patterns of the state-dependent distributions from the three-state model are also included in the four-state model, whereas the state-dependent distribution of state 2 in the four-state model seems to refer to an underlying level of control which is not included in the three-state model.However, at closer inspection of the distributional shapes in the four-state model, there is a substantial overlap between the state-dependent distributions of state 2 and state 3, respectively.Hence, given that the BIC points to the three-state model, and since we do not see meaningful additional information in a potential fourth state, from now on we focus exclusively on three-state models.
Copula-based HMM with N = 3 For the Clayton-copula HMM with three states, Table 3 displays the estimated parameters of the marginal distributions as well as the dependence parameter of the copula.

A model including covariates
The models presented so far already provide interesting insights into the dynamics of football matches, since the state-dependent distributions can be tied to different levels of control of the team considered.To gain further insights, we incorporate covariates to investigate potential drivers of momentum shifts.According to the AIC, the model including all covariates considered is preferred over the model without covariates (∆AIC = 51); we do not conduct variable selection as we regard this analysis step as explanatory (rather than an attempt to find the best model).
For ease of interpretation, we suggest to visualise the estimated transition proba-  et al., 2009).To illustrate these two approaches, we present (i) the transition probabilities as functions of the covariate minute, and (ii) the stationary distributions with respect to the score difference.In Table 5 in the Supplementary Material, the estimated and their 95% CIs are displayed.
For (i), as displayed in Figure 5, the values of the score difference and the market value of the opponent are set to 0 and 200, respectively, corresponding to situations where the score is even and the opponent's strength is about average.In addition, we focus on home matches only, since the corresponding dummy variable in the linear predictor does not affect the overall pattern regarding the direction of the effect.
The confidence intervals (indicated by the dashed lines) are obtained based on Monte Carlo simulation from the approximate multivariate normal distribution of the estimator.According to the estimated effects, switching from state 1 (low control and counter attacks) and state 2 (balanced state) to state 3 (high-control state), respectively, becomes more likely at the end of matches.In addition, staying in state 3 also becomes more likely at the end of matches.
The stationary distributions for the score difference are shown in Table 4.The values of the minute and the market value of the opponent are fixed at 80 and 200, respectively, corresponding to situations in the final stage of a match with the opponent's strength being about average.The stationary distributions indicate that there is a high probability for Borussia Dortmund to be in state 3 (high-control state) either if they have a clear lead or if they are trailing.In contrast, if they hold only a slender lead, then the probability of being in state 1 (low control and counter attacks) is highest.
To further investigate typical patterns of momentum shifts according to the state process {s mt }, we calculate the most likely trajectory of the states for each match m.
Specifically, for a given match m, we seek (s * m1 , . . ., s * mTm ) = argmax s m1 ,...,s mTm Pr(s m1 , . . ., s mTm |y m1 , . . ., y mTm ), i.e. the most likely state sequence, given the observations.Maximising this probability is equivalent to finding the optimal of N Tm possible state sequences.This can be achieved at computational cost O(T m N 2 ) using the Viterbi algorithm (Zucchini et al., 2016).
Figure 6 displays the decoded sequences for the match Borussia Dortmund against Schalke 04 which was already shown in Figure 1.We see confirmed that Borussia Dortmund started the match in the balanced state with occasional switches to the low control state with counter attacks.According to the decoded state sequence, Borussia Dortmund is predominantly staying in the low control state with counter attacks after the half time, with occasional changing level of control around minute 75, where they switched to the high-control state.However, at the end of the match, they stayed in the low control state with counter attacks and conceded two more goals.
Table 4: Stationary distributions when fixing the score difference at certain levels.
Probabilities were calculated for each value of the score difference, with the market value of the opponent and the minute of the match fixed at 200 and 80, respectively, corresponding to situations in the final stage of a match against an opponent team of average quality.5 in the Supplementary Material displays the coefficients of the multinomial logistic regression underlying this figure.
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 1 2

Discussion
There is wide interest in the dynamics of football matches, and specifically in potential momentum shifts, in particular by fans and the media.From a managerial perspective, it is important to understand the causes of such shifts, and hence also how to potentially exert an influence on the match outcome.With data sets on in-game summary statistics becoming freely available, we now have the opportunity to statistically investigate the corresponding processes.To that end, here we provide a modelling framework -copulabased multivariate HMMs -which naturally accommodates potential changes in the dynamics of a match by relating the observed in-game match statistics to latent states.
A key strength of the proposed approach is that we not only partition a given match into different phases but also allow for the investigation into drivers of how a match unfolds dynamically over time.
In our proof-of-concept case study, we tested the feasibility of our approach by analysing minute-by-minute data on matches of one particular team, namely Borussia Dortmund.The underlying states of the fitted model correspond to match phases where Borussia Dortmund exhibits a low level of control with counter attacks, to phases where the match is balanced, and to those with high level of control, respectively.In addition, the estimated effects of the covariates shed some light on what kind of events may lead to switches between those states.Specifically, we found that Borussia Dortmund has the highest probability of being in the high-control state when having a clear lead or when trailing.
Although the states of the fitted models are tied to different levels of control, it remains unclear whether these are clearly attributed to shifts in the underlying momentum.Specifically, some of the reported effects may arise due to tactical considerations rather than momentum shifts.For example, for one-goal leads, being in the low control and counter attacks state may be a tactical consideration rather than a shift in the underlying momentum.The data considered here does not allow us to disentangle these two possible causes, rendering a definitive conclusion whether the switches between the states are momentum shifts or tactical considerations impossible.However, with the states and effects of the covariates considered (cf. Figure 5 and Table 4) being easy to interpret, they still provide interesting insights to dynamics of football matches.In addition, using copula-based HMMs as presented in this contribution may be helpful for bookmakers to obtain more precise estimations of betting odds.For instance, when modelling the time until the next goal during a football match, bookmakers could take into account the latent dynamics of a match as modelled here.
A clear limitation of the approach as presented here is that we focus on the in-game dynamics of only one of the two teams involved in a match, when in fact it is clear that the dynamics of a match result from the combination of both teams' actions.It seems conceptually desirable to extend the approach to allow for the joint modelling of both teams' in-game statistics.This could be achieved using a bivariate Markov chain to represent both teams' underlying states, resulting in N 2 combinations of states (see, e.g., Sherlock et al. 2013).To further improve the realism of these models, it would be beneficial to also include tracking data, e.g. by considering the distances run per minute as covariate information.
The modelling framework used in the present contribution, i.e. copula-based HMMs for modelling football minute-by-minute data, can easily be transferred to other sports for further investigations and possible characteristics of momentum shifts.These sports include, e.g., basketball, where the variables to be modelled comprise, for example, the number of points/shots, the number of rebounds and the number of blocks/steals.More general, sports with two individuals or teams competing against each other and multiple variables measured on a fine-grained scale are best suitable for analysing momentum shifts using the modelling framework provided here.

Supplementary Material
Coefficients in the model for Borussia Dortmund  -0.443; 0.235] [2.483; 9.995] [-1.876; -0.760] [1.231; 7.670] [-0.225; 0.780] [0.905; 3.391] Additional analysis of Hannover 96 data Compared to the state-dependent distributions of Borussia Dortmund (see Figure 3), Hannover 96 has less number of ball touches and shots on goal, which is intuitively plausible.For all copulas considered, state 1 refers to a high level of control, whereas state 2 can be interpreted as a low level of control.
To select a model for Hannover 96, we again compare the AIC and BIC values for different number of states and copulas, which are shown in Table 6.For the model selected by the BIC, i.e. the AMH-copula-based HMM with two states, the transition probabilities as functions of the covariate minute are shown in Figure 8.As chosen above for Borussia Dortmund, the values for the score difference and the market value of the opponent are fixed at 0 and 200, respectively.According to the estimated effects, staying in state 1 (high level of control) becomes less likely at the end of such matches, whereas staying in state 2 (low level of control) becomes more likely.The stationary distributions for given values of the score difference are shown in Table 7.The values of the minute and the market value of the opponent are again fixed at 80 and 200, respectively.We see that the probability for being in state 1 (high-control state) increases if Hannover is trailing.If the score is even or if they are leading, it is more likely that they are in state 2 (low control state) than in state 1, which again is intuitively plausible.,888 19,101 18,911 19,123 18,920 19,132 5 states 18,891 19,789 18,899 19,197 18,886 19,184 Table 7: Stationary distributions when fixing the score difference at certain levels.Probabilities were calculated for each value of the score difference, with the market value of the opponent and the minute of the match fixed at 200 and 80, respectively, corresponding to situations in the final stage of a match against an opponent team of average quality.

Figure 1 :
Figure 1: Bivariate time series of the number of shots on goal (top) and the ball touches (bottom) of Borussia Dortmund for one example match from the data set (Borussia Dortmund vs. FC Schalke 04).
), initially focusing on the case of N = 2 states.The corresponding parameter estimates associated with the number of shots on goal are λshots = (0.125, 0.149), νshots = (0.206, 0.001), while for the number of ball touches, they are λtouches = (0.971, 2.381), νtouches = (0.102, 0.390).It is not straightforward here to compute the means of the fitted distributions due to the infinite sum in the normalising constant.MacDonald and Bhamani (2018) discuss several approaches and suggest to calculate the mean by 1 Z(λ,ν) d k=0 kλ k /(k!) ν using a very large d (say d = 100 displays the estimated state-dependent distributions of two-state copula-based HMM formulations, using the Frank, Clayton and AMH copula, respectively.While visually there is no clear difference between the different copula functions considered, the application of the Clayton copula led to the highest likelihood of the fitted model.Compared to the baseline model, the copulabased model shows a clear improvement in the fit (∆AIC = 48; ∆BIC = 35).The fitted state-dependent distributions can again be interpreted as Borussia Dortmund exhibiting different levels of control, with state 1 corresponding to situations where the game is balanced, whereas state 2 refers to a high level of control.As for the baseline model, there is a fairly high persistence in the states, with the diagonal elements of the t.p.m. estimated as γ11 = 0.852 and γ22 = 0.706.

Figure 3 :
Figure 3: Fitted state-dependent distributions for the baseline two-state HMM for Borussia Dortmund.From left to right: Frank-, Clayton-and AMH-copula, respectively.

Figure 4 :
Figure 4: State-dependent distributions for the three-state (top row) and four-state (bottom row) Clayton-copula HMM, respectively.

Figure 5 :
Figure 5: Transition probabilities as functions of the covariate minute.The dashed lines indicate confidence intervals (obtained based on Monte Carlo simulation).The values of the score difference and the market value of the opponent are set to 0 and 200, respectively.Table 5 in the Supplementary Material displays the coefficients of the multinomial logistic regression underlying this figure.

Figure 6 :
Figure 6: Decoded most likely state sequence of the match Borussia Dortmund against Schalke 04 according to the three-state Clayton-copula HMM including covariates.The vertical dashed lines denote goals scored by Borussia Dortmund (yellow lines) and Schalke 04 (blue lines).

For
the analysis of Hannover 96, we use the same copula-based HMM model formulation as above for Borussia Dortmund.The state-dependent distributions for the fitted baseline model are shown in Figure 7.As for Borussia Dortmund, the choice of the copula function considered does not seem to change the shape of the distribution remarkably.

Figure 7 :
Figure 7: Fitted state-dependent distributions for the baseline two-state HMM for Hannover 96.From top to bottom: Frank-, Clayton-and AMH-copula, respectively.

Figure 8 :
Figure 8: Transition probabilities as functions of the covariate minute.

Table 1 :
Descriptive statistics of the variables analysed, 'shots' and 'ball touches', as well as the covariates 'market value' and 'score difference'.
One example bivariate time series from the data set, corresponding to the in-game statistics observed for Borussia Dortmund in the match against FC Schalke 04 played in November 2017 is shown in Figure1.In the media, this match was said to have a momentum shift, since Borussia Dortmund was in a 4:0 lead at half time, but Schalke 04 scored four goals in the second half such that the match resulted in a draw.
Lanchantin et al. (2011)005)ence in financial data, and byBrunel and Pieczynski (2005)andLanchantin et al. (2011)for image analysis.For our modelling approach, we again consider the Conway-Maxwell-Poisson both for the number of shots on goal and the number of ball touches as marginal distribution.With F 1 (y mt1 |s mt ) and F 2 (y mt2 |s mt ) denoting the (state-dependent) c.d.f. of the marginals, the bivariate state-dependent distribution is given by

Table 2 :
AIC and BIC for copula-based HMMs with different numbers of states.
bilities as functions of covariates, and present the theoretical stationary distributions of the Markov state process when fixing the covariate values at certain levels.The theoretical stationary distributions indicate how state occupancy, i.e. how much time is spent in a state, varies across different values of the covariate considered (Patterson

Table 5 :
Estimates of the coefficients determining the state transition probabilites as functions of covariates, in the final three-state Clayton copula HMM for the Borussia Dortmund data; 95% confidence intervals in brackets.

Table 6 :
AIC and BIC for copula-based HMMs with different numbers of states (Hannover 96).