Bounding the final rank during a round robin tournament with integer programming

This article is mainly motivated by the urge to answer two kinds of questions regarding the Bundesliga, which is Germany’s primary football (soccer) division having the highest average stadium attendance worldwide: “At any point in the season, what is the lowest final rank a certain team can achieve?” and “At any point in the season, what is the highest final rank a certain team can achieve?”. Although we focus on the Bundesliga in particular, the integer programming formulations we introduce to answer these questions can easily be adapted to a variety of other league systems and tournaments.


Introduction
The Bundesliga consists of 18 teams and applies a system of promotion and relegation with the 2. Bundesliga. It is a double round-robin tournament where each club plays two matches against every other club. One at home and one away. Nowadays a victory is worth three points, a loss none, and a draw yields one point each. The team at the top of the league table at the end of the season is German Champion. Currently, the top six to seven clubs in the table qualify for profitable international competitions organized by the Union of European Football 1 3 Associations (UEFA). The two teams at the bottom of the table are relegated to the 2. Bundesliga, while the top two teams in the 2. Bundesliga are promoted. The 16th-placed team (third-last), and the third-placed team in the 2. Bundesliga play a two-leg play-off match. The winner of this match participates in the Bundesliga in the next season, while the loser has to play in the 2. Bundesliga. If teams are level on points, tie-breakers are applied (Liga 2019).
The Elimination Problem in modern European football, e.g., to decide whether a given team can still win the Bundesliga at some point in the season, was shown to be NP-complete independently by Bernholt et al. (1999) and Kern and Paulusma (2001). These results were further generalized to various other score allocation rules in Kern and Paulusma (2004). Schwartz (1966) was one of the first to develop necessary and sufficient conditions for a team to be eliminated in baseball using network flow techniques and algorithms. A first application of linear programming to the same problem was published by Robinson (1991). The computational complexity of determining the best and worst possible final rank of a team, among other related questions, is discussed in Schlotter and Cechlárová (2018) and Gusfield and Martel (2002). In Raack et al. (2014) a general ranking integer programming model that calculates the number of points that is needed to finish ith in any sport, including football, motor sports and ice hockey is developed. The authors also give a good overview of publications dealing with sport problems. The paper of Kendall et al. (2010) deals with the Brazilian national football championship and the calculation of the minimum number of points any team has to win to be sure it will be qualified for the play-offs. Also a method to compute the number of points each team has to win to have any chance to be qualified is presented. Again, a good overview of previous works is given.

Motivation
Der Postillon is a German website featuring satirical articles reporting on international, national, and local news in newspaper and TV format (Der Postillon 2019). In October 2015, it had more than 14 million visitors. It also comes as an English version titled The Postillon (The Postillon 2019). On 29 January 2018, Der Postillon announced, that the Munich based German sports club FC Bayern München, one of the most successful European clubs, celebrated non-relegation at an early stage of the Bundesliga season. It was reported that euphoric FC Bayern fans were singing in the town center of Munich until the early morning (Nießen 2018). A very similar version of this article was published on 3 February 2014 (Nießen 2014). At the time of publication, matchday 20 (in 2018) and matchday 19 (in 2014) were over. In February 2018 we became aware of these two articles. On the one hand we were astonished that avoidance of relegation was managed at such an early stage of the league. On the other hand we were not able to verify the statement with pencil and paper. Hence, we decided to set up a general integer programming formulation (Schrijver 1986) to obtain a mathematical proof for this kind of statements.
Bounding the final rank during a round robin tournament with…

Mathematical model for round-robin tournaments
Consider the directed multigraph G = (T, S, ) where T represents a set of n teams playing a round-robin tournament, S ⊆ T × T denotes the matches that have not been played yet, and ∶ S → ℕ states their multiplicity, i.e., (i, j) ∈ ℕ is the number of remaining home games of team i against team j. Further, let S m h ∶= ∑ (m,j)∈S (m, j) and S m a ∶= ∑ (i,m)∈S (i, m) denote the number of remaining home and away games of team m ∈ T , respectively. Each game (i, j) ∈ S ends with exactly one outcome where p h and p a are the points that are awarded to the home and the away team in case of outcome x , respectively. In theory, there may exist an indefinite number of outcomes, but we only consider the case that |X| = o ∈ ℤ ≥0 is finite in this article. Additionally, w.l.o.g. we assume that no points are deducted, i.e., that p h , p a ∈ ℤ ≥0 , and that there exists an outcome for each team such that no points are awarded. Finally, M h and M a denote the maximum number of points that the home and the away team can win and p m ∈ ℤ ≥0 denotes the number of points that team m ∈ T has already collected during the tournament.
The described tournament model is directly applicable to the top level of European football league systems including the Bundesliga, La Primera División (Spain), the Premier League (England), and the Serie A (Italy). They all are double round-robin tournaments, i.e., we have a simple directed graph with the multiplicity of each arc being at most one and the set of outcomes is given by Next, we model all possible courses of the ongoing tournament together with an implicated ranking for a fixed team t ∈ T . For each (i, j) ∈ S and indicating whether the corresponding outcome is realized or not. Equations (1) guarantee that each game ends with exactly one outcome: Additionally, for each team m ∈ T participating in the tournament we introduce an integer variable denoting its final number of points at the end of the tournament. The Minkowski sum, A ⊕ B , of two sets A and B is defined as a set of pairwise sums of points from A and B. In other words, In case of the Bundesliga, the computation of the Minkowski sum (2) yields Equations (4) ensure the correct final score of team m ∈ T depending on the results of its remaining games: Finally, we aim on modelling the ranking of team t ∈ T . Team t is ranked higher in the final table than team m ∈ T ∖ { t} if it has collected more points. In case that both teams are even on points, we assume that we can arbitrarily choose their order. Therefore, for each team m ∈ T ∖ { t} we introduce a binary variable c m indicating whether t is ranked higher than team m in the final table, i.e., c m = 1 , or if it has a lower rank, i.e., c m = 0 . The following two classes of constraints ensure a correct ranking of team t: In case that p * t − p * m > 0 we have that c m = 1 due to (5a) and if p * t − p * m < 0 we have c m = 0 due to (5b). The bounds used on the right hand sides are tight and describe the biggest possible point differences: t always wins the maximum number of points and m receives no more points in their remaining matches (5a), and vice versa (5b). If p * t = p * m both values for c m are feasible, i.e., we can arbitrarily choose the order of t and m.
Using the variables and constraints introduced above, we are able to formulate integer linear programs answering the following two questions: 1. What is the lowest rank team t ∈ T can reach in the final table? 2. What is the highest rank team t ∈ T can reach in the final table?
The model answering the first question, which we call the Lower Bound for Final Rank Model (LBFR), can be stated as while the model answering the second question, which we call the Upper Bound for Final Rank Model (UBFR), is given by Due to the absence of constraints (5b), c m = 1 for any m ∈ T ∖ { t} is always feasible for model LBFR even though the final number of points of team t may be smaller than or equal to the number of points of team m. But since we maximize the expression − ∑ m∈T∖{ t} c m , i.e., we minimize the number of teams that end up with strictly less points than t, c m = 1 iff p * t > p * m holds in each optimal solution. Hence, using (6a) as objective, LBFR yields the worst case final position of team t. An analogous argument shows that UBFR with objective (6b) yields the best case final position of team t even though Eq. (5a) is not included. Using the idea from Robinson (1991), the model can also easily be adapted to other league systems such as the "Big Four" MLB, NBA, NFL and NHL in the United States and Canada. Here, we additionally have to account for cross-divisional matches. If we want to determine the lowest possible rank of a team t ∈ T in its division, we simply assume that t does not gain any points in its remaining games against teams outside of its division and that all other teams m ∈ T ∖ { t} score the maximum possible number of points in their remaining cross-divisional games. To determine the best possible rank we assume the opposite. In case of the four mentioned tournaments, it is important to note that our model does only determine the best possible rank of a team within its division and not whether it is eliminated from the tournament or not. For example, in case of the MLB, besides the six winners of the divisions, additionally the two teams having the best statistics within the two leagues, which consist of three divisions each, qualify for the postseason play-offs. Hence, even a team finishing third in its division may not be eliminated.

Computational results
We solved 14,688 instances of the integer programming models defined above originating from the Bundesliga standings during seasons 2003/2004 -2017/2018. That is, we solved the models LBFR and UBFR for each participating team and for each matchday of these seasons. We omitted matchdays with obvious solutions, such as matchdays one to eight. To translate the data into .lp-format files we used the software ZIMPL (Koch 2004). As MIP solver we used SCIP 5.0.1 (Achterberg 2009;Gleixner et al. 2019) with standard settings. All instances were solved on a standard Linux PC with an Intel® Core™ i5-3570 CPU @ 3.40 GHz and 16 GB RAM.
The average solution time of all instances was 0.024 s, while 11.84 s was the longest solution time that we observed and it took 909 Branch-and-Bound nodes, i.e. subproblems with an increasing number of fixed binary variables, to prove optimality for the respective instance. Surprisingly, this "hard" instance is just problem LBFR for FC Bayern on matchday 21 in 2014. Only 0.89% of the problems were not solved in the root node, i.e., 99.11% of the models could be solved just by using preprocessing techniques, heuristics, and the examination of the linear relaxation of the original MIP and without entering the branch and cut phase of the MIP solver.

Problems emerging from the motivational chapter
Now we are ready to get back to our motivating problem from Sect. 2. Is it true that FC Bayern managed to avoid relegation already on matchday 20 in 2018? To answer this questions, we set up model LBFR such that t = "Bayern_Muenchen", (p m ) 18 m=1 as listed in the Pts-column of Table 1 and S = {{"Eintracht_Frankfurt", "Bor_Moenchengladbach"}, {"Bayern_Muenchen", "TSG_Hoffenheim"}, … , {"VfL_Wolfsburg", "FC_Koeln"}, {"FC_Schalke_04", "Hannover_96"}} according to the remaining schedule, see Olympia-Verlag GmbH (2019b). It turned out, that after matchday 20 was over, it was still possible to construct a course of the league such that FC Bayern would be second to last with 50 points, while 1. FC Köln becomes German Champion with 53 Points. To double-check our model, we put its results into an online standings calculator (Olympia-Verlag GmbH 2019a), see Fig. 1. Analogously, we found out that in 2014 FC Bayern managed to avoid relegation not until matchday 20.

Further illustrative problems
As already mentioned, we solved the models LBFR and UBFR for each team t and for every matchday of the Bundesliga season 2017/2018. Figure 2 contains four charts that visualize the results obtained by the models LBFR and UBFR during the season for the exemplarily chosen teams FC Bayern München, FC Schalke 04, Bayer 04 Leverkusen and Hamburger SV. In the first round of the season, we configured the optimization models in a way such that the final position is understood as the final position after the first round of the double round-robin tournament. This is the reason why the lines worst-and best-case final position in Fig. 2 are getting closer and closer and finally meet at the vertical dashed line at matchday 17 (end 1 3 Bounding the final rank during a round robin tournament with… The left y-axes are used to depict the current rank, the best-case and the worst-case final rank according to the respective matchday (x-axes). The right y-axes shows the number of points of the first round). On matchday 18, the corridor between the two lines abruptly widens, because with respect to the end of the tournament now "everything" is possible again. In the chart for FC Bayern München the triangle symbolizing the fact that non-relegation was certain only on matchday 21 is encircled. By the way, 14 points from seven games marks the worst start to the season in seven years. For the first time in the club's history, FC Bayern München lost a two-goal lead twice in a row (this cannot be seen from the graphic). However, FC Bayern München was top of the table for the rest of the season as of matchday 10 and became German Champion with 72 points already on matchday 29, as the worst-and best-case lines meet at position one on that matchday. As the only team in the Bundesliga Bayer 04 Leverkusen scored at least one goal in all 17 first-round matches. The team has worked its way up in the first-round matches and showed continuity in the second round. Hamburger SV, being the only constant member of the Bundesliga since its establishment, came second to last and was finally relegated to 2. Bundesliga. Due to a win of VfL Wolfsburg against 1. FC Köln, relegation became inevitable on the very last matchday, as the chart shows.

Remark 1
The models and data of all instances examined in the article can be downloaded from https ://cloud .zib.de/s/bunde sliga _data/downl oad.