1 Introduction

The development of mathematical models of sports faces many obstacles. Assessing the potential impact of unobservable variables and establishing the right relations among the observable ones are the main sources of hardships for this task. Even so, the study of team sports data has become increasingly popular in the last years. Several models have been proposed for the estimation of the parameters (characteristics) that may lead to successful results for a team, ranging from machine learning methods to predict outcomes Štrumbelj and Vračar (2012); Asif and McHale (2016); Baboota and Kaur (2019), fuzzy set representations Hassanniakalager et al. (2020), statistical models Dyte and Clarke (2000); Goddard (2005); Boshnakov et al. (2017) and Bayesian models Baio and Blangiardo (2010); Constantinou et al. (2012); Wetzels et al. (2016); Santos-Fernandez et al. (2019).

One of the main issues in the study of sports is to disentangle the relative relevance of the possible determinants of outcomes. While ability and luck constitute, at least for both the press and the fan base, the main explanatory factors of the degree of success in competitions, the motivation of players is usually invoked only to explain epic outcomes or catastrophic failures. One possible reason for the neglect of motivation is that, unlike ability and luck, it is hard to assess. In this paper we define a particular notion of effort in Rugby games as a proxy for motivation and develop a Bayesian model of the final scores of teams in the English Rugby Premiership 2020/2021. These outcomes will be explained by several variables, among which we distinguish the ability of the teams and the effort exerted by them. We also include as explanatory variables other possible sources of psychological stimuli, as to capture a pure motivation to win, separated from those other factors.

One of the main advantages of using Bayesian techniques to model sports are that beliefs or expert information can be incorporated as priors, to obtain posterior distributions of the parameters of interest, easily updated when new data becomes available, dealing more effectively with small data sets. In our case, we propose a Bayesian hierarchical model to explain score differences on a rugby match, i.e, the difference between home team points and away team ones. The main parameters of the model are the ability of teams, the effort exerted by them and the advantage (or disadvantage) of home teams.

There are many papers that use Bayesian methods to model the score of a rugby game. Stefani (2009) finds that the past performance is better predictor of score difference than of the score total, and suggest that teams should focus strategy on score differences (to win or draw) rather than in score total. Pledger and Morton (2011) use Bayesian methods to model the 2004 Super Rugby competition and explore how home advantage impacts the outcomes. Finally, Fry et al. (2021) propose a Variance Gamma model where analytical results are obtained for match outcomes, total scores and the awarding of bonus points. The main difference between these works and ours, is that their primary goal is to predict outcomes, while ours is to explain them.

The ability of a team can be conceived as its “raw material”. The skills of its players, the expertise of its coaches and its human resources in general (medical staff, managers, etc.) constitute the team’s basic assets. Their value can vary during a season due to injuries, temporary loss of skilled players called to play for the national team, players leaving the team, etc. In this model we assume that the capabilities of teams do not change much from a season to the next. Accordingly, the ability of a team at the start of the season is assumed to be at a bounded distance from the performance in the previous season.

Luck in games and sports has been largely studied, from philosophical perspectives Simon (2007); Morris (2015) to statistical ones Denrell and Liu (2012); Pluchino et al. (2018). Mauboussin (2012) define that games that are high in luck are the ones that are highly unpredictable, it is not able to achieve great advantages through repetition and the ‘reversion to the mean’ effect in performance is high. Elias et al. (2012) and Gilbert and Wells (2019) define many types of luck, that we will later introduce. In this model, following the line from the later authors, we consider that luck is when the unexplained part differs markedly from the mean value in the noise distribution. That is, a series of unobserved variables have a huge impact on the outcome.

Effort, in turn, can be conceived as the cost of performing at the same level over time and staying steadily engaged on a determinate task Herlambang et al. (2021). Different measures of effort can be defined. In the context of decision making, effort can be the total number of elementary information processing operations involved Payne et al. (1995) or the use of cognitive resources required to complete a task Russo and Dosher (1983); Johnson and Payne (1985). Another measure of effort (or lack of it) can be defined in terms of the extent of anchoring in a self-reported rating scales, that is, the tendency to select categories in close proximity to the rating category used for the immediately preceding item Lyu and Bolt (2022). From a sports science perspective, the effort exerted by a team can be seen as the sum of all the players loads, that Quarrie et al. (2017) define as ‘the total stressors and demands applied to the players’. These loads can include the physical motions of the players, the preparation for future matches, the food intakes, the intensity of interpersonal relationships, etc. Our formal definition of effort is intended as a proxy for the amount of some of these loads. Although a commonly accepted definition of effort is lacking, according to Massin (2017) effort is understood as the force exerted in order to reach a goal. Our definition of effort is indeed intended to capture this notion, in the understanding that one of the main goals (and the hardest to achieve) in a rugby game is to score tries. We consider that deviating from the goal of scoring tries, that is, exerting forces in order to reach a different goal, implies a reduction of the effort.

In order to define a rough measure of the effort exerted by a rugby team, we follow the lead of Lenten and Winchester (2015), Butler et al. (2020) and Fioravanti et al. (2021). These works analyze the effort exerted by rugby teams under the idiosyncratic incentives induced in this game. Besides gaining points for winning or drawing in a game, teams may earn “bonus” points depending on the number of times they score tries on a game. Accordingly, any appropriate effort measure should also be defined taking into account the number of tries. In our model the effort is measured as the ratio between the number of tries scored and the sum of tries and scoring kicks attempts.Footnote 1 In other words, following the same ideas of the literature discussed above, we intend to measure the effort of a team using observable variables such as tries and kicks. Attempts to increase the score with tries instead than with kicks can be seen, following the definition of Massin (2017), as indicating that the team is exerting more effort. Our idea is to emphasize on the identification of effort with the result of an offensive spirit, according to which a team maximizes this effort by seeking to get more tries, no matter what the final score is. But if we simply identify effort with the number of tries we run into a problem since the difference in scores (the dependent variable in our model) is highly correlated with the difference in tries. Considering instead the proportion of tries we do not run into that problem. Despite this, our formulation is not uncontroversial. It is easy to conceive a situation in which a team gets a lower value of our effort index by scoring more tries and kicks than a team that just scores one try with only one kick attempt. Again, our effort index intends to capture that tries involve an attacking, positive mindset, while penalties are a defensive, risk averse route to winning.Footnote 2 We understand that this proxy of effort has its limitations, since it tends to disregard the defensive skills of the teams. Still, we can justify this choice by our goal of remaining close to the literature while keeping the model simple with minimal information requirements. It is worth to mention that asking teams to concentrate their efforts on scoring more tries is very intuitive albeit somewhat detrimental to the sport, as the uncertainty in results becomes highly reduced Scarf et al. (2019). Similar counter-intuitive ideas have been also discussed for soccer in Fry et al. (2021).

Several studies detected the relevance of home advantage, i.e. the benefit over the away team of being the home team. Schwartz and Barsky (1977) suggested that crowds exert an invigorating motivational influence, encouraging the home side to perform well. Still, a full explanation of this phenomenon requires taking into account the familiarity with the field of the home team, the travel fatigue of the away team, the social pressure exerted by the local fans over the referees, among other factors. Many other researchers investigated this advantage from different points of view, such as the physiological Neave and Wolfson (2003), the psychological Agnew and Carron (1994); Legaz-Arrese et al. (2013), the economic one Carmichael and Thomas (2005); Boudreaux et al. (2017); Ponzo and Scoppa (2018) and even exploring the possibility that referees may be favorably biased towards home teams Downward and Jones (2007); Page et al. (2010). Home advantage in Rugby Union and Rugby League has been studied and confirmed by Kerr and van Schaik (1995), Jones (2007), Page and Page (2010), García et al. (2013) and even during the Covid-19 pandemic by Fioravanti et al. (2021). In our model, home advantage is explored depending on if there is public allowed to attend the game, and in what day it is played.Footnote 3 An extra parameter intends to capture the influence of factors other than public attendance inducing home advantage.

The plan of the paper is as follows. In Sect. 2 we present the data of the English Premiership Rugby Championship, played in 2020/2021. Section 3 presents a Bayesian hierarchical model of the variables that explain the difference of scores in that championship. Section 4 runs a statistical descriptive analysis of the matches of the Premiership Championship in the light of the variables defined in the Bayesian model. Section 5 presents the results of estimating our model with the data of the Rugby Union competition. Section 6 considers the outliers found in the previous section, treating them as the result of luck in those games. We assess the aspects that justify considering them as instances of luck. Finally, Sect. 7 concludes and discusses the opportunities for further research.

2 Data

The Premiership Rugby Championship is the top English professional Rugby Union competition. The 2020/2021 edition was played by 12 teams. The league season comprises 22 rounds of matches, with each club playing each other home and away. The top 4 teams qualify for the playoffs. Four points are awarded for the winning team, two to each team in case of a draw, and zero points to the loser team. However, a bonus point is given to the losing team in case the score difference is less than eight points. Teams also receive a bonus point in case they score four or more tries. In a game, a try is worth five points, a conversion two points and both penalty and drop kicks are worth three points each.Footnote 4 During this season, if a game was canceled due to Covid-19, two points were awarded to the team responsible, and four to the other, while the match result was deemed to be 0–0. The 2020/2021 season was won by the Harlequins, who claimed their second title after ending in the fourth league position.

The total score, number of tries, converted tries, converted penalties, attempted penalties, converted drops, attempted drops and attendance at each of the 122 games of the 2020/21 Premiership season have been taken from the corresponding Wikipedia entry.Footnote 5 We generate the priors of our Bayesian model based on the final ranking from the 2019/20 season as follows. An attack and defense ranking is built using the number of points scored and received by each team: the team with most tries scored and less points received is ranked first in both rankings. These rankings are then normalized, and their corresponding means are computed.

3 Model

We base our model on a previous work of Kharratzadeh (2017) that models the difference in scores for the soccer English Premier League. The score difference in game g, is denoted as \(y_g\), and is assumed to follow a \(t_{student}\) distribution,

$$\begin{aligned} y_g\sim t_\nu (a_{diff}(g)+eff_{diff}(g)+ha(g),\sigma _y), \end{aligned}$$

where \(a_{diff}(g)\) is the difference in the ability of the teams, \(eff_{diff}(g)\) is the difference in the effort exerted and ha(g) the home advantage at game g. We give it a N(0.5, 1) prior. In turn, we assign a prior Gamma(9, 0.5) to the distribution of degrees of freedom \(\nu \).Footnote 6

We model the difference in abilities as follows:

$$\begin{aligned} a_{diff}(g)=a_{hw(g),ht(g)}-a_{aw(g),at(g)} \end{aligned}$$

where \(a_{hw(g),ht(g)}\) is the ability of the home team in the week where the game g is played (analogously for the away team). We assume that the ability may vary during the season. More precisely, we assume that the ability at a period t is the ability at \(t-1\) plus a term representing the factors that may affect the ability at t:

$$\begin{aligned} a_{hw(g),ht(g)}=a_{hw(g)-1,ht(g)}+\sigma \cdot \eta _{hw(g),ht(g)}\, \text{ for } \, hw\ge 2 \end{aligned}$$

where \(\sigma \) and \(\eta \) have weak informative priors N(0, 0.1) and N(0, 0.5), respectively. The model is analogous for the away team. The abilities for the first week depend on the previous performance of the teams, again assuming that ability has some sort of “inertia”:

$$\begin{aligned} a_{1,ht(g)}=\beta _{prev}\cdot prevperf (ht(g))+\eta _{1,ht(g)} \end{aligned}$$

where \(\beta _{prev}\) is given the weakly informative prior N(0.5, 1) and prevperf(j) is the previous performance of team j. This value is obtained as follows: we build two rankings of tries scored and received during the last season, where a team is at the top of both rankings if it has scored the most and received the least tries in the last season. Then, the two rankings are normalized and averaged.

The variable that captures the relative motivations of the teams in the game is the difference in efforts:

$$\begin{aligned} eff_{diff}(g)=\beta _{effort}\cdot (effH(g)-effA(g)) \end{aligned}$$

where \(\beta _{effort}\) has a N(0.5, 1) prior and the effort of the home team is given an observational approximation (analogously for the away team):

$$\begin{aligned} effH(g)=\dfrac{number\, of\, home\, tries\, in\, game\, g}{number\, of\, home\, tries\, in\, game\, g + attempted\,home\,scoring\, kicks\, in\, game\, g}. \end{aligned}$$

Notice that the variable attempted home scoring kicks in game g has, in turn, three components. The number of conversions allowed after scoring tries, the number of penalty kicks attempted and the number of drop kicks attempted by the home team.

Our intention with this definition is to capture the idea that scoring tries demands more effort than other means of scoring points, and motivated teams try to maximize this value.

Finally, to capture home advantage, we consider both the attendance and non-attendance (such as the weather, long trips to play the game, etc.) effects .

$$\begin{aligned} ha(g)=\beta _{home}+\beta _{atten}\cdot atten(g), \end{aligned}$$

where \(\beta _{home}\) and \(\beta _{atten}\) have N(0.5, 1) priors and atten(g) is 0 if no fans were allowed and 1 otherwise. A graphical representation of the model is depicted in Fig. 1.

To ensure robustness in our results we work with four different models. Model I does not include attendance as a variable of home advantage. Model II includes the attendance variable, while Model III, incorporates a day variable day(g), with a N(0.5, 1) prior, which has value 1 if the game was played on Saturday or Sunday and 0 otherwise. This day variable allows to find out whether playing on a day in which almost all the fans can attend the game benefits either the home or the away team. Finally, Model IV includes a variation of prevperf(j), where instead of the tries, we use total points scored and received by each team. There is no crucial difference between the four models. The reason for including different specifications is to evaluate whether the coefficients corresponding to the variables of interest, common to the four models, are sensitive to the inclusion of other variables.

Fig. 1
figure 1

Graph corresponding to Model II

4 Descriptive statistics

On Tables 1 and 2 we can see the descriptive statistics for home and away teams respectively where Score, T, C, P, D, AC, AP, AD and eff indicate, respectively, total points, tries scored, conversions scored, penalty kicks scored, drop kicks scored, attempted conversions, attempted penalty kicks, attempted drop kicks and effort exerted.

Table 1 Home team
Table 2 Away team

Figure 2 shows that most of the score differences are around zero (even though there are very few draws obtained in the championship). This indicates that games usually end with little difference. Figures 3a and b depict the effort histograms. We can see great number of cases of \(effort=\frac{1}{2}\). This is because teams almost always have the chance to go for a conversion after scoring a try, except when a penalty-try is awarded; and in many games the teams did not attempt to kick a penalty (maybe because they have no kickable penalties available).

Fig. 2
figure 2

Score difference histogram

Fig. 3
figure 3

Effort histograms

5 Results

We estimate the model with the R package rstanTeam (2020). We use 4 cores, each one to run 2500 iterations and 1500 warm-up ones. The Stan code of the estimated model can be found in the Appendix.

The results obtained appear to hold a degree of robustness:

  • The Rhat (or Gelman-Rubin) statistic measures the discrepancies between the chains generated in simulations of Bayesian models. The further its value is from 1, the worse. But we can see in all our results that Rhat is very close to 1.

  • \(n_{eff}\) is an estimate of the effective sample (of parameters) size. A large value indicates a low degree of error in the expected value of the parameter. We can see that, indeed, this is the case for all the parameters of interest.

Tables 3 and 4 show the results of Model I and II, respectively. The difference between them is the presence of \(\beta _{atten}\) in the latter one. The posteriors of model II, and the corresponding histograms, shown in Figs. 4 and 5 , indicate that \(\beta _{prev}\) and \(\beta _{effort}\) are the parameters distributed above zero. On the other hand, \(\beta _{home}\) has a wider credible interval that includes zero value if \(\beta _{atten}\) is included, as in the comparison between Model I and Model II.

Table 3 Posterior summary statistics, model I
Table 4 Posterior summary statistics, model II

Table 5 shows the difference of estimations when adding the \(\beta _{day}\) parameter. The results seem to be stable, in the sense that the estimated values of \(\beta _{prev}\) and \(\beta _{effort}\) remain in similar intervals. We also show the result of changing the mean of the prior of \(\nu \) to 1.

Finally, Table 6 presents the results of Model IV. The ranking score is here based on the points (not the tries) of the previous season. We find in this case that the \(\beta _{effort}\) parameter is similar as that found in the other models, while \(\beta _{prev}\) has a lower mean.

Table 5 Posterior summary statistics, model III
Table 6 Posterior summary statistics, model IV
Fig. 4
figure 4

Posteriors for model II

Fig. 5
figure 5

Posteriors for model II (Cont.)

Fig. 6
figure 6

Bivariate relations between parameters

Besides the histograms corresponding to the values of the parameters , we can see in Figs. 6, 7, 8, and 9, further results of our model. Figure 6 illustrates the lack of relation between \(\beta _{prev}\) and \(\beta _{home}\) and \(\beta _{effort}\) (for instance, the slope of the line that best fits in (a) is 0.008703 with p-value 0.27 and \(R^2 = 0.008653\)). This is a strong suggestion that effort captures an effect that differs from both the ability of the team and the potential support (or antagonism) received in the field.

Figure 7, shows one of the 10,000 simulated histograms of score differences. It can be compared to the original histogram corresponding to the actual championship. This is further detailed in Fig. 8, where the histograms of the distributions of means and standard deviations, obtained in the replications, is depicted with gray bars, while the blue ones correspond to the values of the statistics computed from the observed data.

Notice that \(\sigma \) and \(\eta \) appear in the model only in a product and since both have Gaussian priors, it could be thought that they cannot be distinguished by likelihood. This would mean that only their product is identifiable but not the individual parameters. To check this we generate the trace-plots and histograms of these parameters. They are shown in Figs. 9 and 10, indicating that this concern can be discarded since \(\sigma \) and \(\eta \) are identifiable.

Fioravanti et al. (2021) indicate that score differences in favor of the home team could vary under different prior distributions . To explore this potential variability, we consider other prior distributions of \(\beta _{home}\), with means 2, 4 and 6. In all these cases, the results are similar to those obtained with mean 0.5: the posterior converge to values close to 0.3.

Fig. 7
figure 7

Simulated score difference histogram

Fig. 8
figure 8

Histogram of replicated score differences

Fig. 9
figure 9

Trace and histogram of nu

Fig. 10
figure 10

Trace and histogram of sigma

6 Luck in games

We assume that luck plays a significant role in games. According to the definition of Tango et al. (2007), luck in a game can be understood as the difference between the actual performance observed and the ability of a team. In our case, we could identify it with the difference between the performance and both the effort and ability of a team.

For Tango and coauthors, the variability of luck is \(\frac{p (1-p)}{g} \), where \( p = 0.5 \) and \( g = 22 \) is the number of games.Footnote 7 Then \(Var(Luck) = 0.0114 \). On the other hand, performance variability is the deviation in the number of games won by the teams in the league, yielding 0.0379. Finally, effort variability can be captured by the variance of the effort ratio, 0.0156. Then:

$$\begin{aligned} Var(Performance)= \;& {} Var(Luck) + Var(Effort) + Var(Ability) \\ 0.0379= \;& {} 0.0114 + 0.0156 + Var(Ability) \end{aligned}$$

then, we find that:

$$\begin{aligned} Var(Ability) = 0.0109 \end{aligned}$$

Then we can see that the variabilities in effort, ability and luck have slightly the same weight in the composition of the variability of performance.

An alternative definition of luck is that it arises when the residual in the regression of \(y_g\) on the explanatory variables defined in Sect. 3 differs markedly from the mean value in the distribution of noises. That is, the presence of luck is revealed by a large impact of unobserved variables Mauboussin (2012). Elias et al. (2012) and Gilbert and Wells (2019) consider four types of luck. The first one arises from physical randomization, by the use of dice, cards, etc. (I). The second kind of luck is due to simultaneous decision making (II). The third one is due to human performance fluctuating unpredictably (III), while the last one arises from matchmaking (IV).Footnote 8

The underlying theoretical framework is very rich. In the sequel we give a simple example of the kinds of analysis that may be ultimately possible. To start, consider the types of luck that may have affected the outcomes:

  • Round 5 - London Wasps 34 versus 5 Exeter Chiefs: contrary to what happened in the previous week, the Wasps regained its captain and 3 players from the English national team, while Exeter missed 8 of its players who, after playing for the national team, were either injured or where forced to rest (type IV). Also, the Chiefs conceded 15 penalties (a high number for this level of competition) and were shown a yellow card, letting the Wasps score a try during the sin bin time (types II and III).Footnote 9

  • Round 15 - Worcester Warriors 14 versus 62 Northampton Saints: the Warriors changed 10 players from the previous week’s game, and had three injured players (the fullback and two tighthead props), while the Saints recovered two players from the national team. A victory would close the gap for the Saints on the top four (type II and IV). A Warriors player was shown a red card at the 49th minute, allowing 6 tries after that (type III).

  • Round 20: Exeter Chiefs 74 versus 3 Newcastle Falcons: the Chiefs, already classified for the semifinals, needed a win to gain the home advantage during the playoffs. It was also the first game after more than 5 months that fans were allowed again to attend the play (type IV). Besides that, it was a record performance of Exeter, since it scored the largest difference in a top competition (type III). For the Falcons it was the longest trip of the tournament (type IV) and were shown a yellow card at minute 26, allowing two tries during the sin bin time (type II and III).

7 Discussion

In our analysis we found that while the results of rugby matches in the English Premiership can be explained by the ability of the teams, another highly significant variable is the motivation of players, reflected by the effort exerted by them. Luck, instead, seems to have had an impact only in games where the residuals are larger.

We have followed here Lenten and Winchester (2015), Butler et al. (2020) and Fioravanti et al. (2021), assuming that the number of tries is an important component of any measure of effort. While the results obtained in our Bayesian analysis are sound, there exist many other ways of defining a proxy of motivation in Rugby. We can argue that a large number of tackles reveals that a team has exerted a lot of effort, indicating that it is highly motivated. But this also indicates that team has not exerted a large effort aimed to keep the ball. One could also, with the help of GPS, track the physical effort of the players, and identify it with their motivation. In any case, selecting a certain measure of effort as a proxy of motivation is a rather arbitrary choice. But this also true for any proposed proxies for ability or luck.

Future lines of research involve exploring and comparing the impact of our proxy for motivation in other tournaments, and even postulate alternative definitions of ‘effort’ in different sports. Another topic that it is worth studying is the evolution of our measure of effort along time. The results of such investigation could be useful to assess how the incentives to the players may have changed, affecting the motivation for scoring tries rather than scoring kicks.