Managerial Behavior in Fund Tournaments—The Impact of TrueSkill

Measuring mutual fund managers’ skills by Microsoft’s TrueSkill algorithm, we find highly skilled managers to behave self-confident resulting in higher risk-taking in the second half of the year compared to less skilled managers. Introducing the TrueSkill algorithm, which is widely used in the e-sports community, to this branch of literature, we can replicate previous findings and theories suggesting overconfidence for mid-years winners.


Introduction
Fund managers compete for investor's money by signaling their ability to generate riskadjusted returns (or alpha) to the market. Using Microsoft's TrueSkill to estimate each manager's skill, we study the impact on the portfolio's risk level. We find highly skilled managers to take systematically more risk within one year's tournament compared to less skilled managers. These results are robust regarding different market phases, different years with pronounced risk-shifting incentives, and different empirical approaches.
Our work contributes to the existing literature by introducing Microsoft's TrueSkill algorithm as a new measure and thus regarding the tournament nature of the fund managers as a "game". Building upon Bayesian network theory, TrueSkill identifies and tracks the skills of managers in a competitive setting in which the belief about a manager's skill is estimated on the basis of a manager's past performance relative to all other active managers. Despite broad evidence for the long-term underperformance of active managers against a benchmark (Fama, 1965), individual managers seem to outperform the market in the short-term, resulting in higher fund inflows and compensation (Sirri and Tufano, 1998;Kempf and Ruenzi, 2008b), hence promoting a competitive environment among fund managers.
Second, we extend the empirical work in the area of fund tournaments, which was first introduced by Brown et al. (1996). They analyze the behavior of mutual fund managers within one year and detect a risk-seeking investment style for mid-term losers. Replicating their findings, our results indicate winners increasing their risk suggesting a different trend of individual behavior in tournaments in recent years. We then follow Kempf et al. (2009) and highlight risk-shifting differences in years driven by incentives (winners are rewarded for their outperformance) and years driven by unemployment risk (losers are facing high chances of having their funds closed due to underperformance). We extend this area of research by detecting certain investment patterns based on the individual skill level of the managers and highlight the correlation between skill and risk-seeking.
The remainder is structured as follows: In Section 2, we introduce the fund tournaments' setup and Microsoft's TrueSkill, Section 3 contains the empirical analysis, while the final section concludes.

The Economics of Tournaments
Research in the fields of managerial tournaments is considered as a subset of the agency theoretic contracting theory, which deals with the disparity between principals' and agents' interests and risk aversions. Bolton et al. (2005) summarize the basic assumptions and implications for multiple scenarios in different areas of economics.
The underlying premise of our analysis is to view the market for portfolio management service as a multi-period decision making. This implies that investors decide in a cyclical pattern which fund service to invest in. One significant aspect of this investment process is the established compensation structure within the fund industry. Fund managers are often compensated based on their funds' assets under management which implies large incentives to generate high fund inflows. Empirical evidence for the positive correlation in a multi-period context of the past performance of the individual fund and new fund inflows has been provided for example by Sirri and Tufano (1998).
This correlation leads to the plain risk adjustment hypothesis in literature of losing managers increasing their risk at mid-term in order to catch up on the leading managers within their peer (cf. Brown et al. (1996)): where σ pL indicates the risk level of a loser's portfolio in period p ∈ {1, 2} of a twoperiod annual tournament and σ pW the risk level of a winner's portfolio, respectively.
Multiple researchers followed this hypothesis and analyzed various aspects and implications such as different time periods, competition within fund families, the impact of the selected fund segment, among others. Important ideas and results can be found in the works of Chevalier and Ellison (1997), Busse (2001), Deli (2002, Kempf and Ruenzi (2008a), Kempf and Ruenzi (2008b), and Bär et al. (2010). Despite the findings of all these researchers, there are still contrary opinions about the existence of such tournament behavior between managers and especially the exact behavioral aspects for winners and losers, respectively.

The Impact of Prior Performance through TrueSkill
New fund inflows are positively correlated with the standings of the individual fund at the end of the tournament, i.e. the end of the year. Most investors tend to trust in the past performance of a fund and expect it to result in positive returns at or around the benchmark level once the fund claims a top-level within a certain year. Hence, investors update their beliefs about the strength of an individual manager based on past, observed returns, and prior beliefs. In empirical research, this behavior has been modeled for example by Berk and Green (2004), who use a model that includes two key aspects: First, the performance of fund managers is not persistent and, second, investors behave as Bayesians. The first aspect can be interpreted as fund managers are not outperforming a passive benchmark continuously. Second, investors update their belief about the strength of an individual manager based on past, observed returns, and prior beliefs. This leads to the concept of conditional probabilities also known as Bayesian probability where the probability is interpreted as some reasonable expectation based on prior beliefs and knowledge.
The TrueSkill algorithm has been developed by a team from Microsoft Research in 2005 and is used for match-making in various online games ever since. The purpose of this ranking system is to detect and track the skill of individual players despite playing in teams, derive public rankings, and implement a match-making system that allows players of the same skill to play against each other. The general idea behind TrueSkill is to update the presumption about a player's skill based on the observed outcome of a given game. This technique is called Bayesian inference as explained for example by Box and Tiao (2011). TrueSkill characterizes the belief of a manager's skill as Gaussian uniquely described by its mean µ and standard deviation σ (cf. Microsoft Research (2005)). The parameter µ can be interpreted as the average manager's skill belief whilst σ describes the uncertainty about that skill level. The more games a participant plays, the smaller becomes his σ and therefore, the knowledge about a player's skill becomes more precise. Furthermore, his average skill level µ is updated based on the match outcome.
One of the most important advantages of TrueSkill is its adaptivity to any underlying setup of ranking match outcomes. It only needs a clear ranking for each match -whether teams are compared with each other or individuals. We will give a brief overview of the underlying process of TrueSkill in order to derive a basic understanding of its functionality.
However, we will not explain every mathematical step and its technical realization within the algorithm but refer to the paper of Herbrich et al. (2006).
Let k managers with a total of n funds {1, ..., n} compete in a match. Each fund is uniquely assigned to a single manager resulting in k disjunct subsets A j ⊂ {1, ..., n}. For each match, the outcome r := (r 1 , ..., r k ) ∈ {1, ..., k} indicates the match specific ranks r j for each manager j in an ascending order; i.e. r j = 1 is the winner and possible draws are given as r i = r j . Making use of Bayes' rule, the conditional probability P (

Identifying Skill Based Tendencies in Risk-Shifting
In a first step, we calculate the six-month rolling information ratio as a performance measure of each fund. We use these ratios to create a rating of the funds on a monthly base to feed-forward to the TrueSkill algorithm. At this stage, funds with less than  (2006)) for the resulting joint distribution p(s, f, ,|r, A) of three managers with a total of four funds and manager 1 winning whilst manager 2 and manager 3 draw (k = 3, and the ranking r := (1, 2, 2)). The black boxes represent the factor functions which are used to calculate the local variables -visualized by the light grey circles. The grey arrows indicate the initial calculation of the skill level for all three managers followed by the 'inner iteration circle'. This circle is used to approximate the new skill level of all managers whilst after that, the black arrows indicate the updates of the skill beliefs for each individual fund.
one year of tracking record prior to the start of the tournament year are also included due to the initial calculation of skill levels. Second, the funds included in the annual tournaments compete against each other on a monthly base whereas their skill leveland therefore the skill level of each manager -is calculated by TrueSkill based on the performance rankings. To compare the skill level of different fund managers, we use only each manager's expected average skill level µ once the skill development is calculated. To overcome biases for new managers who have not reached their intentional skill level yet, we only consider managers and therefore funds with at least one year of tracking record.
This leads to at least 18 matches between all managers and their funds before they are categorized at the end of a tournament's interim period for the first time.
To analyze the skill dependent risk-shifting, we use conditional transition matrices for the best 20% (high skill), the next 60% (medium skill), and the least 20% (low skill) of each year's managers. We follow the work of Ammann and Verhofen (2009) and adapt this transition approach, commonly known from credit default analyses. The transitions are based on the historical volatility of each manager's portfolio, whereas each manager is assigned to a risk tercile: with e i1 characterizing the risk tercile of manager i in the interim period and e i2 the risk tercile in the second half of the year's tournament. Here, 1 indicates the highest risk tercile and 3 the lowest, respectively. These migration events of the same kind are now aggregated in a 3x3 matrix of migration frequencies where the generic element is the number of migration events from j to k and 1{. . . } the indicator function.
Furthermore, we assume that the observations e i2 are the realization of the random variables e i2 with conditional probability distribution with the probability p jk of the risk level of a manager's portfolio to change from the j-th to the k-th tercile. Therefore, we use the migration rates as observed: with n j = 3 k=1 c jk . To identify any differences between the differently skilled managers, we use a chi-squared test to check for pairwise homogeneity of the transition matrices. The test statistic is asymptotically χ 2 -distributed with two degrees of freedom. The variablep + jk models the estimated probability based on the aggregated data of the two transition matrices and s is the index for the respective sample, e.g. high-and medium-skilled manager.
By the nature of this approach our analyses put emphasis on the whole dynamics of the risk-shifting tendencies of differently skilled fund managers. Transition matrices as employed in this study are, inter alia, widely used in the literature on credit risks (cf.

Höse et al. (2009) for details) and in previous studies focusing on prior performance and
risk-taking of mutual fund managers Verhofen, 2007, 2009).

Empirical Results
Our empirical analysis builds on the two databases Morningstar and Bloomberg. Following Brown et al. (1996)  Furthermore, we tackle the fact that various funds are team-managed and multiple managers handle more than one fund by using a string matching algorithm to identify funds managed by the same managers. We exclude all team-managed funds and match the remaining funds clearly to a single manager. This results in 559 individual managers who hold at least one fund on their own within the given time period.
We include all funds in each year's tournament which have at least one year of tracking record and do not miss any data point in the given period. Also, we use two periods of six months to analyze the risk-shifting, which leads to June being the end of the interim period. Those managers above the average at that point are classified as winners and those below as losers. Managers with two or more funds fulfilling these requirements are considered to hold an equally weighted portfolio of their funds to reduce the impact of proactive risk-shifting across multiple funds. To calculate benchmark related performance measures, we use the data of the MSCI North America for the same period. An overview of the annual tournaments and the average performance of its participating manager against the benchmark is given in Figure 2.
There are several options to measure risk-levels of mutual funds. Examples are the return standard deviation, the tracking-error standard deviation which is the standard deviation of the excess returns of the fund over a benchmark, or the systematic risk a fund takes which is commonly estimated via a market model. However, the latter two are rather uncommon in mutual fund tournament studies. We follow previous studies and measure risk by the annualized standard deviation of the monthly fund returns (Brown et al., 1996;Kempf and Ruenzi, 2008b).

Measuring Performance with TrueSkill
We start our empirical analysis by demonstrating TrueSkill's capability to take prior performance into account. Figure 3 shows the development of the Pearson correlation coefficients between the TrueSkill based rankings of all participating funds within the tournament of five, two, and one years and their information ratio rankings. The left panel shows the correlation with TrueSkill levels being calculated for 4 years prior to 2015, the middle one with 1 year prior, and the right one with TrueSkill establishment just starting in 2015. Hence, Figure 3 underlines the time dependence of TrueSkill and its adaptation of prior performance while establishing skill levels. Since investors' decisions are often based on behavioral aspects such as prior performance or performance of fund family members (e.g. Sirri and Tufano (1998), Nanda et al. (2004)), TrueSkill is an adequate skill measure due to its capability of incorporating these aspects. Table 1 shows the aggregated risk-shifting tendencies for the whole sample period. It is structured into four panels -the first one is showing the unconditional transition rates Panel D shows significant differences to the unconditional case at the 5% and 1% level for winners and losers, respectively. Indeed, the tendencies in increasing the risk levels are much lower for managers with less skill than for those with high skill.

Skill Driven Risk-Shifting
The first observable pattern is the difference in general risk-seeking between winners  The subsample for high-and medium-skilled managers are also closely related to the unconditional one. The χ 2 -test values show no significant differences here. In contrast, the subsample of low-skilled managers differs from the unconditional sample at the 5% level for winners and even at the 1% level for losers. This indicates more controversial behavior for the minority of less-skilled managers, who seem to secure their wins if possible and cut their losses during bad tournaments.
In the next step, we take a closer look at years of extreme risk-shifting. Therefore, we aggregate the five years with the highest risk decreasing by losers and those with  1992, 1995, 2006, 2014, and 2017, identified in 1993, 2000, 2001, 2004, and 2016; these are the years where losers have extremely low risk adjustment ratios at mid-term.

Robustness Tests
Underlying performance measure The most important parameter within the TrueSkill setup seems to be the choice of the underlying performance measure to calculate monthly rankings, which are the start of further skill calculations. We test for the impact of different performance measures by repeating our analysis with monthly active returns of all participating managers.  (2007). We calculate the Spearman rank-order correlation coefficients inclusive a twosided p-value for a hypothesis test with the null hypothesis of non-correlation between the data series for three different measures. Table 6 in Appendix B outlines the strong and significant correlation between the Sharpe ratio, Information ratio and active return rankings. We conclude that the choice of the underlying performance measure does not affect our initial results significantly.

Skill thresholds
The results could be driven by the choice of quantiles that classify managers into their skill level. In the main specification, we classified the top 20% as highly skilled and the bottom 20% as low-skilled which leaves a 20-60-20 split. Other reasonable splits, e.g. 10-80-10, lead to the same conclusions as we show in untabulated results.
Risk adjustment ratio approach Our next robustness test deals with the general tournament behavior regardless of the individual skill of each manager. Therefore, we replicate the contingency table approach introduced by Brown et al. (1996) based on the risk adjustment ratios. The results presented in Table 4 in Appendix A are in line with our results of skill-driven investments, indicating a different trend of individual behavior in tournaments in recent years. Winners have higher RARs in most of the years, which is in contrast to earlier findings of Brown et al. (1996). Still, this demonstrates that our findings are in line with previous methodologies.
Hyperparameter of the prior distribution In our empirical analysis, we set the initial prior distribution of the fund managers' skills as described in Section 2.2 as f (s) := n i=1 N (µ i , σ 2 i ) with µ i = 25 and σ i = µ i 3 ≈ 8.33. Please note that the average skill level µ i is not of much interest in absolute terms since all managers are assumed to start with the same initial skill. Since we do not define a unit to measure the skill other than using the Gaussian's parameters µ i and σ i , the relative belief of two fund managers given by their skill distribution is of higher relevance. In that terms, it does not make much difference whether we start with a level of 10, 100, or the standard level of 25 1 as proposed by Herbrich et al. (2006), which originates from TrueSkill's early comparability with the ELO ranking.
To underline the low impact of the initial priors on our results, we vary the relation between µ i and σ i , i.e. σ i ∈ { µ i 2 , µ i 4 }. The results are qualitatively similar to our base case σ i = µ i 3 , see Tables 7 and 8 in Appendix B.
The neglectable impact of the priors is in line with the theoretical expectation about their impact: With sample size n −→ ∞ the difference between two posteriors based on different Gaussian priors tends towards zero. The same holds for larger prior variances σ i , as outlined for example by Ley et al. (2017).
Different benchmark indexes Within our analysis, we use a risk-adjusted approach to determine the rankings of each manager used for the TrueSkill algorithm. In fact, our measure of choice is the information ratio as a market model adjustment measure where the benchmark is the MSCI North America. Given the different setup of mutual funds and their long-term purposes, e.g. equity-only, long-only, multi-asset, and so on, our chosen benchmark might not be appropriate for every mutual fund in the universe.
Nevertheless, we restrict our fund sample to growth-oriented US equity mutual funds as earlier researchers before (Brown et al., 1996;Taylor, 2003;Kempf and Ruenzi, 2008b).
The categorization is based on the widely accepted classification by Morningstar, which leads to a quite homogeneous sample. We qualify this putative sample restriction by similar arguments used in earlier research.
However, Morningstar specifies two benchmark indexes for each of its categories. The primary index for all three categories used in this study is the S&P500 which correlates almost perfectly with the MSCI North America. The secondary benchmark index differs for each category. 2 We repeat our analysis benchmarking each fund on its secondary benchmark index and report the results in Table 9. Overall, the conditional transition matrices differ stronger from the unconditional transition matrix than in our baseline case. In line with our previous findings, we find a tendency that winning managers increase their risk more than losers and that managers classified as low-skilled seem to adjust their risk less than managers classified as high-skilled.
Regression approach On the basis of the conditional transition matrix approach, our results suggest that the risk-shifting tendencies are significantly different for low-and high-skilled fund managers and, beyond that, that high-skilled managers tend to increase their risk-levels to a higher extent compared to low-skilled managers. We acknowledge that conclusions like these have to be interpreted with caution due to unobservable covariates that might influence the results. To mitigate the effect of omitted variables and provide further empirical evidence for our conclusions, we formulate the following regression model: where the dependent variable, ∆σ i,t = σ Second Half , is the change in standard deviations of fund i's returns from the first to the second half of the year t. Rank i,t denotes the rank of the fund manager with respect to all other managers scaled to the interval [0, 1] (1 being best). High respectively low manager skill is denoted by D * i,t with * ∈ {H,L}. In a further specification, we replace Rank i,t with dummy variables indicating that a fund manager ranked in the top 20% respectively bottom 20% of all active managers analogous to the main analysis. For all specifications, we include time and fund-company fixed effects. The latter control, for example, for all time-invariant characteristics attributable to a manager's company that may influence the results.
We present the results of four specifications in Table 3, two each using either the information ratio or active returns to estimate the managers' skill levels via TrueSkill.
All specifications indicate that high-skilled fund managers significantly increase their risk after performing well in the first half of the year. Contrary, we find the opposite signs for any coefficient associated with risk-shifting of less-skilled fund managers. Equality The explained variation in risk-shifting amounts to ≈ 75%, which is a common value in fund tournament studies. Overall, the results support our conclusions drawn from the conditional transition matrix approach and provide further insights on the channels that foster the results.  Notes: This table presents results of a regression of fund managers' performance in the first half of the year on their risk-shifting in the second half, conditional on their estimated skill levels (high, low). Rank denotes the rank of the fund manager scaled to the interval [0, 1] (1 being best) and D indicate dummy variables for high or low skill, or for ranking among the top 20% best or worst managers. Robust standard errors are clustered by year. ***, **, and * indicate significance at the 1%, 5%, and 10% level (two-tailed tests), respectively.
Comparison to ELO Last, we compare TrueSkill with another popular skill measure -the ELO rating, most known from the world of chess. The ELO ranking system is used in competitive chess as well as various unofficial rankings, e.g. online gaming or football tournaments. It is much simpler in its calculations and therefore not capable to adapt teams playing each other. Figure 4 shows the skill development of three random managers of the whole period sample measured by TrueSkill and ELO. Both ratings are based on monthly matches between all managers participating in a given year's tournament.
The skill levels are normalized to make them comparable since the absolute level differs between both systems. Due to our premise of being included in a year's tournament if and only if there is more than one year of tracking record, the managers seem to start with different levels, but in fact, they started all with the same setup, initially. The ELO ratings vary rapidly on a high frequency, whilst the TrueSkill ratings are adjusting themselves much slower and only react to unexpected outcomes.

Conclusion
Our results highlight the self-confident behavior of skilled managers by holding or increasing their portfolio risk in almost every situation compared to those with less skill.
Applying the TrueSkill algorithm to display investors' beliefs about the individual skill level of fund managers, we present a way to model the positive correlation of prior performance and new investment decisions.
The impact of good performance in recent years seems to lead to an over-confident

A General Tournament Behavior
To detect general tournament behavior, we follow Brown et al. (1996) by using their contingency table approach to determine the risk adjustments during the second half of the tournament: The performance of every manager i is given as the information ratio IR iM against the MSCI North America in the first 6 months of the year to identify mid-term winners and losers as those above or below the median information ratio, respectively.
All managers hold an equally weighted portfolio of their funds j ∈ {1, ..., n} included in the tournament with a portfolio return of r port ik .
In order to calculate the information ratio for each fund given as IR = active premia tracking error , we determine the cumulative return as RT N i = 6 k=1 1 + r port ik − 1, which finally leads to the formula for the information ratio of manager i at the end of June The variable RT N b is the cumulative return of benchmark returns r bk of month k ∈ {1, ..., 6}. The risk adjustment ratio (RAR) of manager i for the given tournament year with interim assessment date in June is: withr i(12−6) andr i representing the mean portfolio return of fund manager i before and after the assessment date, respectively. This variable measures the risk adjustments of a given portfolio within the two periods of the year's tournament by comparing the portfolio's volatility in both periods. Thus, we rank the RAR in a similar way as the IR and determine high RAR as those above the median and low RAR as those below the median for the first and second period, respectively.  Notes: The ranks are calculated over a time period from 1992 to 2017. ***, **, and * indicate significance at the 1%, 5%, and 10% level (two-tailed tests), respectively. Each manager is classified as being a winner (loser) if his performance measured by the active return lies above (below) the median at the end of the interim period. χ 2 -values testing H0-hypotheses of conditional transitions being equal to the unconditional. ***, **, and * indicate significance at the 1%, 5%, and 10% level (twotailed tests), respectively. Notes: This table shows the risk-shifting tendencies from the year's first-half tercile to the tercile in the second half of the year for the full aggregated data set for a different choice of hyperparameter for the prior distribution of the fund managers' skills, e.g. σ = 6.25. Panel A represents the whole sample, whilst Panel B to D show the transitions for different skill levels. Each manager is classified as being a winner (loser) if his performance measured by the active return lies above (below) the median at the end of the interim period. χ 2 -values testing H0-hypotheses of conditional transitions being equal to the unconditional. ***, **, and * indicate significance at the 1%, 5%, and 10% level (twotailed tests), respectively.