Performance and Market Maturity in Mutual Funds: Is Real Estate Different?

Despite the lack of convincing evidence that active investment fund managers add value, the number of actively-managed US mutual funds has increased substantially over the last 25 years. While non-sector diversified mutual funds have received much attention, sector funds, except real estate mutual funds (REMFs), have not. In this paper, we provide new and more robust evidence on the performance of active REMFs compared to all actively managed mutual funds. We use the Carhart four-factor model with an additional liquidity factor as a risk-adjusted benchmark. We use wild bootstrap methods to deal with small samples, non-normality and heteroscedasticity, and we control for the false discovery of significant results. For portfolios of fund types, we find evidence of both significant outperformance and underperformance, net of fees, during 1992-2016. We consider non-overlapping five-year and three-year periods and find very limited evidence of persistent outperformance. For individual funds, we find that, for both sector and diversified funds, net of fees, only 0.79% are skilled. We find persistence in skills for only two individual fund managers of diversified funds. We investigate the effects of the outsourcing of management and of team versus individual management. Outsourcing has no effect on performance of non-RE sector funds but, for cap-based funds and style-based funds, it has a negative effect. There is some evidence that this may also be true for REMFs. Team management has no effect for any types of funds. Overall, we conclude that REMFs are generally no different from other sector funds.


Introduction
The ability of active investment fund managers to add value has long been the subject of academic research. Although this research produces limited evidence of superior risk-adjusted performance from active management, the number of actively-managed US mutual funds has increased substantially. Between 1992 and 2016, the number of active non-sector diversified mutual funds grew from 1330 to 4179, while mutual funds specialising in a particular industrial sector grew at the same rate, from 169 to 540, but represented only 11 percent of mutual funds. 1 The argument for these actively-managed funds remains that of Grossman and Stiglitz (1980) that active managers are able to develop superior skills in stock selection and the timing of purchases and sales.
Despite the growth of actively-managed sector mutual funds, there are no significant studies of such funds, with the exception of real estate mutual funds (REMFs). This paper fills the gap. The previous focus could be justified, in part, by the long history and the relatively large number (173) of real estate funds, and by claimed information inefficiencies in the underlying real estate investment market. Such studies compare fund performance to that of a risk-adjusted benchmark and examine the regression constant, alpha, as the measure of differential performance. However, overall, these real estate studies provide limited evidence of added value. Although a couple of studies (Gallo et al. 2000;Kallberg et al. 2000) identify some superior outperformance for REMFs, other studies do not (O'Neal and Page 2000;Lin and Yung 2004;Rodriguez 2007;Chiang et al. 2008;Hartzell et al. 2010;Chou and Hardin 2014). However, no study compares REMFs with other sector funds nor with the much larger number of non-sector diversified mutual funds. Nor does any consider the effects of outsourcing or of team versus individual management.
It might be argued that, as they invest predominantly in REITs, the mutual fund managers are dealing with securitized investments so there is no fundamental difference from any other sector and no reason to expect added value in this sector. A more general argument is that of market maturity, whereby a new or rapidly growing mutual fund sector may generate initial superior performance but that this is soon lost as the market matures and becomes more informed and competitive.
In this paper, we provide new and robust evidence on the performance of activelymanaged REMFs. We place this performance in the context of all types of U.S. domestic mutual funds that actively manage their portfolio holdings. We are interested both in groups of funds of the same type and in individual funds. We also consider the persistence of performance for the fund types and for individual funds. Finally, we consider the effects on performance of outsourcing of fund management, and of whether funds are team-or individually-managed.
For our analysis, we use the CRSP Mutual Fund database, which is free of survivor and incubation biases, and we test a variety of benchmarks. In order to deal with manager skills, as opposed to luck, we use the wild bootstrap method, which deals with heteroscedasticity and the small sample sizes used to estimate the performance statistics. And, as we have multiple tests in our analysis of the performance of thousands of funds, we control for false discovery.
The rest of the paper is organised as follows. Section "Literature Review" provides a literature review, Section "Methodology" sets out the methodology and "Data" discusses the data. Then, Section "Empirical Results" presents the empirical results and "Conclusion" provides a conclusion.

Literature Review
Although there has been extensive research on actively-managed, diversified, equity mutual funds, there has been no general consideration of specialised sector funds, which are typically excluded from analyses of mutual funds because of their distinctively different investment strategies. The exception is REMFs, which have received some attention but there is no research to suggest whether REMFs are similar to, or different from, other sector funds or, indeed, non-sector diversified funds.
The literature on diversified funds provides 'little convincing evidence' of mutual fund outperformance (Kallberg et al. 2000, 387). We do not review this literature here 2 but two papers are of relevance to the consideration of sector funds. The first is by Chen et al. (2004) who find that small mutual funds, on average, outperform large mutual funds. They attribute this to the interaction of liquidity and organisational diseconomies. In a small organisation, it is easier to convince decision-makers of the value of soft information, that is, information that can be directly verified only by the person who produces it. In the context of mutual funds, such information is most likely to be research or investment ideas related to companies located near a fund headquarters. It is also possible that, while a small fund can invest in the best opportunities, a large fund may be forced into some poorer investments that erode overall performance. This might suggest the possibility of superior performance in general for sector funds, which are, on average, smaller than diversified funds, and in particular for the real estate sector, where softer and more local knowledge is likely to be important. The second paper is by Kacperczyk et al. (2005) who find that industrial concentration, on average, improves mutual fund performance. They argue that the reason may be that managers believe some industries will outperform or that they possess superior information to enable them to select under-priced stocks in specific industries. Sector funds are unable to move from a concentration in several industries to a concentration in others, which might affect their performance. Thus, activelymanaged REMFs merit consideration and comparison to other actively-managed sector funds and to other types of actively-managed funds.
Despite the potential importance of sector funds, only real estate has received any significant attention, so there has been no direct comparison of REMFs with other sector mutual funds nor with the large diversified mutual fund market. REMFs emerged in the late 1980s and became the largest of the mutual fund sectors by the end of 2016. Most studies of REMFs, in line with the literature for diversified mutual funds, find no evidence of outperformance, a finding which is robust to different methods. While a couple of studies (Gallo et al. 2000;Kallberg et al. 2000) do identify superior outperformance for REMFs, other studies (O'Neal and Page 2000;Lin and Yung 2004;Rodriguez 2007;Chiang et al. 2008;Hartzell et al. 2010;Chou and Hardin 2014), using different time periods and benchmarks, find little or no evidence of superior performance among REMFs. Gallo et al. (2000) find that 24 REMFs during 1991-7, on average, outperformed both the Wilshire RES index (REITs) and a three factor model including the Wilshire and performance relative to a stock and to a bond index. However, they attribute this outperformance not to stock selection but to fund manager decisions to overweight outperforming real estate types. Kallberg et al. (2000, 387) consider 44 REMFs during 1986-98 and claim that 'the average and median alphas (net of expenses) are positive'. However, the result is sensitive to the choice of benchmark. For single factor models, the result is significant for the S&P 500 and two of the four RE benchmarks; for multi-factor models, it is significant for a four-factor model (the Fama-French three-factor model plus excess returns against a bond index) but, when an RE index is added, alpha is significant only when the RE index includes RE Operating Companies (REOCs) as well as REITs. They attribute the result to superior information among RE fund managers because of the costs of acquisition in the RE market.
In contrast, O'Neal and Page (2000), in a study of 50 funds during 1996-8, find no evidence of superior REMF performance against a benchmark comprising indices of REITs, a small stock index, a general stock index and a global stock index. Similarly, Lin and Yung (2004) find that 83 REMFs during 1993-2001 did not outperform the market, on average and net of fees, regardless of the benchmark used -they tried a simple CAPM model, the Fama-French three-factor model (Fama and French 1993) and the Carhart four-factor model (Carhart 1997), all with either a stock index or the NAREIT index. Nor does Rodriguez (2007) find any evidence of outperformance in a study of 35 REMFs during 1999REMFs during -2004, although his benchmark is a series of RE sub-indices. Chiang et al. (2008), for a study of 55 REMFs during 1982REMFs during -2003, conclude that, on average, REMFs are not capable of outperformance. Their initial results, using the CAPM and a Fama-French three-factor benchmark, indicate superior performance, but they argue that only outperformance relative to the NAREIT benchmark should be considered and that produces a result of no superior performance. They conclude (Chiang et al. 2008, 60) that the result is 'consistent with an equilibrium in which competition drives away abnormal returns'. This is an argument to which we will return in our empirical results. Hartzell et al. (2010) examine 132 funds' returns, before and net of expenses, during 1994-2005, using various benchmarks and only find evidence of outperformance with respect to real estate index benchmarks. They use three benchmarks derived from portfolios of REITs. The first benchmark uses the Fama and French (1993) and Carhart (1997) factors, but with the factor returns constructed from REITs rather than common stocks; the second consists of the returns of portfolios sorted by property type; and the third combines the first two. Finally, as REMFs sometimes invest in non-REIT real estate companies, an index of homebuilders' stock returns and two different REOC indices are included. The analyses show that a value-weighted portfolio of REMFs fails to outperform any of these benchmarks net of fees. And, although benchmark choice has little effect on the aggregate portfolio, 'the performance of individual mutual funds can be much more sensitive to the benchmark choice' (Hartzell et al. 2010, 124). Finally, Chou and Hardin (2014) use a sample of 160 funds during 1994-2006 and against benchmarks of CAPM, Carhart (1997) and Carhart with four real estate industry indices, and find that REMFs do not outperform.
In contrast to the attention given to the REMFs, there has been very limited research on other sectors, which we redress. Khorana and Nelling (1997) consider 147 funds in seven sectors, but excluding real estate, during 1976-92. Against the S&P500, there was no outperformance but, when sector specific benchmarks were used, there was. The only other work of significance that we can find on sector funds is two working papers (Tiwarivijh andVijh 2001, 2004). The first of these considers persistence in performance among 607 actively managed mutual funds in six sectors during 1990-2000. They suggest that the arguments in favour of sector funds are framed in terms of some sectors being 'characterized by a greater degree of information asymmetry between insiders and outsiders' and point specifically to the real estate and technology sectors (Tiwarivijh and Vijh 2001, 3). They find no persistence in performance. In the second paper, they use the Carhart model plus a sector index as the benchmark and find that sector funds neither out-nor underperform but that diversified funds underperform, although the difference is not economically significant.
From the above review, it is clear that choice of benchmark is an important aspect of the modelling of performance. While earlier studies used stock, bond and real estate indices, there seems to have emerged a broad consensus on the use of factors models, such as the Fama and French (1993) three-factor model, the Carhart (1997) four-factor model (which is the Fama-French three-factor model plus a momentum factor) and, more recently, on the Fama and French (2015) five-factor model. 3 In the first two cases, these have been used with sector indices and also with factors constructed from real estate data rather than common stock data. Most studies test a variety of benchmarks, but there is no universal answer to the optimal benchmark specification. 3 The three original factors in Fama and French (1993) are: the excess return rate of the value-weighted aggregate market portfolio of stocks traded at the NYSE, Amex, and NASDAQ (MKT); the size risk factor (SMB); and the value growth risk factor (HML). The Fama and French (2015) five-factor model is constructed by adding two additional factors to their three-factor model, motivated by the dividend discount valuation model and anomalies unexplained by the three-factor model. Their fourth factor measures operating profitability risk and is the difference between the return rates of diversified stock portfolios of companies with robust profitability to those with weak profitability (RMW). Their fifth factor is the difference between returns on diversified portfolios of stocks of conservative (low) and aggressive (high) investment firms (CMA). The Carhart (1997) momentum factor is the difference between the average return rates of diversified stock portfolios of companies sorted by the previous 12-month return rates.
The importance of liquidity risk in the pricing of real estate securities has been addressed by Soyeh and Wiley (2019) and Hoesli et al. (2017). None of the above benchmarks explicitly considers liquidity. However, DiBartolomeo et al. (2020), in their analysis of the liquidity risk of REITs, use a liquidity measure proposed by Pástor and Stambaugh (2003). This risk factor captures liquidity related to temporary price fluctuations induced by order flow, and represents the market-wide systematic measure for liquidity fluctuations. Dong et al. (2019) also follow this approach and find that this liquidity factor plays an important pricing role in the cross-section of mutual funds. Accordingly, we include this factor.
We are interested not only in the existence of out-(and under-) performance but also in its persistence. This has been examined using several methods. Grinblatt and Titman (1992) compare two five year periods by regressing fund alphas in one period on those for the other. They test the slope coefficient and fail to reject the hypothesis of positive persistence. Gruber (1996) constructs decile portfolios based on one and three-year performance, calculates the rank correlation coefficients of prior and subsequent performance, and finds significant results. In contrast, Carhart (1997) constructs decile portfolios of equity mutual funds and concludes that persistence in performance is found mostly among poorly performing funds; and Kallberg et al. (2000), using the same approach, with quintile portfolios, find little evidence of persistence in returns in REMFs over six-month or 12-month periods. 4 Nor do Tiwari and Vijh (2001), also using quintile portfolios, find any persistence. Finally, for real estate, Lin and Yung (2004) estimate an autocorrelation model for the residuals from benchmark models of individual fund performance and find persistence in the short term (up to eight months) for both over-and underperforming funds. We consider persistence and are specifically interested in whether the results are consistent with a hypothesis of market maturity and the competing away of superior performance.
None of the studies considered above addresses the need to distinguish between skills and luck. It is possible that fund managers with significantly positive riskadjusted returns may not be genuinely skilled and may only achieve outperformance through luck. Most studies rely on a t-test of the null hypothesis that the outperformance, as measured by the constant, alpha, in the regression models outlined above, is zero. This is subject to a Type I error, termed false discovery or family-wise error, which is problematic when multiple hypotheses are tested simultaneously for all fund managers. Further, the test used typically assumes a normal distribution for a fund's return history, which is a poor approximation in practice.
While none of the real estate studies has considered false discovery, recent work by Kosowski et al. (2006), Fama and French (2010), and Barras et al. (2010) has addressed the issue. We explain the technical aspects in the next section and here address only the results. Both Kosowski et al. (2006) and Fama and French (2010) find that few active growth fund managers possess genuine skills to produce outperformance when expenses are taken into account, but neither considers sector funds. Barras et al. (2010) find a downward trend in the proportion of skilled funds and an upward trend in the proportion of unskilled funds, with a significant proportion of skilled managers before 1996 but almost none after 1996. They suggest that the decreasing average alpha may be the result of increased competition but also that the flow of funds to successful managers may compete away any surplus alpha. Overall, these results for outperformance suggest merit in applying a variant of these approaches to sector funds, specifically to REMFs.
Two other issues in the literature merit attention: the impact of outsourcing of fund management, and whether the funds are individually-or team-managed. Outsourcing is a common practice: according to Chen et al. (2013, 530), roughly 41% of fund families at least partially delegate the management process to an unaffiliated adviser and, in terms of total net assets, outsourced funds represent 26% of funds in a typical fund family. This may be because of cost efficiencies and capacity constraints. The mutual fund company, that is the fund family of an outsourced fund, monitors investment performance in terms of return and risk-taking behaviour. Chen et al. (2013, 532) note that the 'outsourced funds tend to be younger (8.0 years to 11.4 years)' and suggest that fund families are more likely to close outsourced funds because of poor performance or excessive risk-taking behaviour (p.545). They conclude that, while outsourcing produces roughly the same market beta, it leads to underperformance of at least 50 bps a year. They explain this as an agency problem. Chuprinin et al. (2015) consider subcontracting among international mutual funds and find that in-house managed funds outperformed outsourced funds by 85 basis points per year. They attribute this (p.2275) to 'preferential treatment of in-house funds via the preferential allocation of IPOs, trading opportunities, and cross trades'. Bliss et al. (2008) point to a growth in team management of equity mutual funds, from 30% in 1993 to 56% in 2003. They suggest (p110) that this could be to 'avoid falling victim to "stars" that leave' or because 'groups make better decisions in the areas of selecting and managing a stock portfolio'. However, using the Fama and MacBeth (1973) method and the Carhart model, they find 'no statistically or economically significant differences between individually managed and team-managed mutual funds' (p.115). Massa et al. (2010) focus on named versus anonymous fund managers but also consider individual versus team management. They find no significant difference in either case, using the CAPM and Carhart benchmarks. 5 Finally, Patel and Sarkissian (2017), who argue that their data on managers is more reliable, use the Carhart model, both without and with the Pástor and Stambaugh (2003) liquidity factor, and conclude that team-managed funds have higher risk-adjusted annual returns by 30-40 bps. They also find a non-linear relationship between team size and fund performance with three-member teams being best. However, none of these papers on outsourcing or the size of the management team considers sector funds.
In this study, we improve on previous REMF analyses in seven main ways. First, we compare REMFs to the universe of actively-managed equity mutual funds, comprising funds in 11 sectors and seven diversified fund types. Second, we use bootstrap approaches to deal with non-normality and heteroscedasticity. Third, unlike all previous real estate and sector studies, we address the issue of false discovery to distinguish between skilled and lucky managers. Fourth, after testing a range of factor models for mutual funds, we chose a model with an additional liquidity factor. Fifth, unlike some previous studies 6 , we use the CRSP Mutual Fund database, which is free of survivor bias. We also control for incubation bias (Evans 2010), which may also lead to errors in assessment of performance and was not considered by existing sector fund and REMF studies. Sixth, we consider persistence in a number of ways, both at fund type and individual fund levels. Finally, we examine the effect of outsourcing and of individual versus team management on mutual fund performance for both diversified funds and sector funds, using a more disaggregated approach than other studies.
The next section details technical issues of the methods.

Methodology
Introduction A skilled fund manager is able to generate return rates that at least compensate investors for the risk taken. An unskilled manager might generate such return rates too, only occasionally, and as the result of luck. This brings two complications for the assessment of fund managers. First, return rates must be measured against what the market sees as fair compensation. Second, the usually short history of fund data and the time-varying volatility of return rates requires careful statistical analysis. Our empirical examination of the performance of fund managers builds on the regression (1) The return rate r i,t in month t is in excess of the risk-free rate. The index i relates either to an individual fund or a portfolio of funds with a weighted return rate. The fair compensation x t β i is based on a linear asset pricing model with K risk factors, all traded, collated in the row vector x t . The factor loadings are collated in the column vector β i . The innovations i,t can by heteroscedastic, might follow a nonsymmetric distribution, and are likely to be correlated contemporaneously. We write Eq. 1 compactly as The column vector r i stacks the T i return rates, ι i a column vector of ones, and X i stacks the factor vectors x t that correspond to i's return rates. We estimate this linear equation with OLS. The estimator for Jensen's alpha is the average of the return rate in excess of the estimated fair compensation, see Appendix A.1. We expect α i = 0 if i is managed passively and α i > 0 if managed actively by managers with skill.

Tests of Asset Pricing Models
It is essential that we use the correct asset pricing model to estimate the fair compensation. As passive funds do not try to beat the market, the correct model-represented by the traded risk factors included in x t -will lead to alphas that are zero. Individual passive funds can close and merge and we construct portfolios of funds for the tests. This ensures that each portfolio has T monthly return rate observations. The monthly return rate of a portfolio is computed as the value-weighted average return rate of the funds included.
We then estimate θ i separately for each portfolio and test whether the alphas for all portfolios are jointly zero. 7 We use the robust covariance estimator of Kiefer and Vogelsang (2002) in the joint test statistic for α i = 0, all i, to account for heteroscedasticity and autocorrelation as it improves inference in finite samples. For comparison, we conduct the same test separately also for portfolios of active funds. This provides results for 18 portfolios over the full sample period. In addition, we also estimate (2) over rolling windows of five years to examine the behaviour of the alpha estimates over time.

Tests of Fund Performance
We estimate (2) separately for each fund i to assess average performance. As T i T , it is no longer possible to estimate a covariance matrix and to conduct a joint test on alphas. Instead, we will test whether alphas are zero for funds individually, but we will take account of conducting the tests simultaneously.
As the fund returns rate series do not always overlap and because stock return rates show heavy tails and can be heteroscedastic, we rely on the bootstrap to improve the finite sample inference. In particular, we use the fixed-design wild bootstrap to estimate the distribution of the individual test statistics under the null. There is evidence that the wild bootstrap performs well if the data are generated by a dynamic process characterized by heteroscedasticity of unknown form (Gonçalves and Kilian 2004).
First, we use the actual data (r i , X i ) and estimate (2) under the restriction of a zero constant (α i = 0) for each of the I funds. 8 We keep the estimated vector of factor loadingsβ i and the vector of re-centered residuals˜ i . We generate bootstrap notional return rate replications (b = 1, . . . , B) 7 Stacking the portfolio regressions leads to identical estimators as separate regressions, see Appendix A.1. 8 For details on the implementation, see Flachaire (2005), Davidson et al. (2007Davidson et al. ( , 2008. which impose that the manager of fund i has no skill. The bootstrap variation comes from the diagonal of the (T i × T i ) matrix ϒ b i , which consists of draws from the Rademacher distribution. 9 New realizations from this distribution are drawn for each b 1. We conduct B bootstrap replications, which leads to the set . Second, we fit the regression (2) for each of the replications in B i and compute the asymptotically pivotal t-statistict with the heteroscedasticity-consistent estimator for the standard error 10 The covariance matrixˆ b i in Eq. 6 haŝ (1 − z i,hh ) 2 on its diagonal and zeros elsewhere. z i,hh is the h'th diagonal element of the hat matrix Z i (Z i Z i ) −1 Z i and˜ i,t is the re-centred residual from the restricted regression. Finally, we use the estimated distribution of test statistics under the null and compute wheret i is the test statistic for α i when we fit (2) to the actual data. 11 If we test α i = 0 at significance level γ = 0.05 and the hypothesis is true, we will make a false discovery with a probability of 5%. If we test the same (true) hypotheses for two funds each with individual tests at γ = 0.05, the probability for at least one false discovery will be larger than 5%. Control of the family-wise error rate is one approach to deal with this problem. However, as investors want to learn about funds that are worth further investigation, this strict approach will not be attractive -see Appendix A.2. We rely instead on the false discovery rate to account for the simultaneous testing of the performance of thousands of funds. According to Storey (2002), the positive false discovery rate is 9 Possible outcomes of the distribution are υ ∈ {−1, 1} with P (υ = −1) = P (υ = 1) = 0.5. 10 The (1 × T i ) vector e i has a one as first element and zeros elsewhere. 11 The indicator function 1(·) becomes one if the argument is true and zero else. We estimate the standard errorσ i fort i with Eq. 6, but useˆ i instead of the re-centred residuals.
π 0 is the proportion of funds in the data that have managers with no skill, γ is the significance level used in the individual tests of manager skill, and P (γ ) is the probability of rejections. Therefore, Eq. 8 relates the expected proportion π 0 γ of false discoveries to the expected proportion of all discoveries. Appendix A.3 motivates and provides details on the estimation of π 0 . We estimate the denominator of Eq. 8 witĥ We use the q-value introduced by Story (2002, Algorithm 2) to make statements on the skills of individual fund managers while taking into account that we consider I managers simultaneously. First, we sort the p i -values and set q(p (I ) ) = pFDR p (I ) , wherep (I ) is the largest of the I p-values and pFDR is given in Eq. 8. This gives the minimum pFDR we can archive if we reject for allp i p I . For the next largest p-value, the pFDR is computed as q(p (I −1) ) = min pFDR p (I −1) , q(p (I ) ) (10) and so forth. This procedure ensures that the q-values follow the same order as the p-values. If we find a rate of false discoveries acceptable, for instance q * = 10%, then we will call all those funds discoveries for which q i q * .

Performance and Fund Characteristics
Fund characteristics might have an impact on the returns above the fair compensation that a fund generates. For instance, it might matter for the performance whether a fund is the managed by an entity that is affiliated with the fund management family, the sponsor, or not. Other characteristics such as age and TNA can also impact on the performance.
We use cross-sectional regressions for this analysis and proceed in three stages. First, we separate the funds that exist each month into four groups based on investment focus (cap, style, real estate, other sector). We then split each of these groups further into funds that are managed by an entity affiliated with the fund management family and into funds that are managed externally. For each of these eight groups, we sort the funds into quintiles based on their TNA and compute equally-weighted return rates. This leads to 8 × 5 = 40 return rate series for these portfolios. We also generate a second grouping, where we split each of the investment focus groups into groups of funds that are managed by a team and those that are managed by a named manager. 12 We follow the cross-sectional regression approach of Chen et al. (2013), which is similar, but not identical, to the approach introduced by Fama and MacBeth (1973). Using five years of returns (60 months), we estimate first (2) separately for each of the portfolios, where we use the factors x t from the asset pricing model that passed the test. The resulting factorβ p are then used as regressors in a cross-sectional regression for the following month (month 61). In particular, we regress fund return rates for this month, r i,t , on a constant and theβ p(i,t) of the group to which fund i belongs. This regression leads to estimates of the risk premiums (γ 0,t ,γ 1,t ). The same regression is fitted for the next eleven months. After that, the factor loadings are updated to provide new estimates ofβ p for the next twelve months.
Given the risk premiums, we compute the fund return rate adjusted for the fair compensation Stacking all return rates for a given month gives the second cross-sectional regression model where the matrix Z t contains the constant term, characteristics of the fund and control variables. As the number of funds varies over time, the number of rows in Z t changes accordingly. However, the dimension K ofφ t stays the same. Finally, we run the regressionφ Each element ofδ is simply the time-average of the corresponding elements inφ t . We estimate the standard errors with a robust covariance estimator to account for heteroscedasticity and correlation.

Data
The data for the sector and non-sector funds comes from the survivor-bias-free U.S. Mutual Fund Database of the Center for Research in Security Prices (CRSP), covering January 1992 to December 2016. The data provide a comprehensive coverage of monthly return rates, total net assets (TNA), operating expenses and fund management companies.
The CRSP objective code in the data provides a relatively consistent classification for the U.S. domestic sector and diversified funds. In CRSP, there are 13 U.S. domestic sectors, including healthcare, consumer goods, consumer services, commodities, financial services, gold, industrial, materials, real estate, natural resources, technology, telecommunication, and utilities. We exclude the commodities sector from our analysis owing to missing data during 1992-1996, and we exclude the gold sector as the CRSP classification is not consistent during 1992-2016. For REMFs, we use the classification provided by the CRSP Style Code. 13 As for all funds, we do not impose an additional filter for fund size but require that a fund has been in existence for a minimum of five years. The portfolio holdings of REMFs are shown in Fig. 1. 14 Approximately 80% is invested in REITs, although this has ranged from 75-90%, and other RE investment is around 10%, although it has ranged from 0-20%.
For the diversified funds, there are seven types for style and cap-based funds: growth & income, growth, hedged, income, mid cap, micro cap, and small cap. We omit large cap as the series starts after 1992. Thus, we examine funds from the 18 fund segments given in Table 1.
Mutual funds tend to offer different shareclasses 15 to investors, even though the returns come from the same portfolio. The data report net return rates for each fund shareclass separately. For each fund and month, we compute the weighted net fund return rate by averaging over the net return rates of a fund's different shareclasses using, as weights, the ratios of shareclass net assets to the fund's total net assets (TNA). The resulting net return rate is what the average investor receives when investing in the fund. Shareclass aggregation prevents newly-created shareclasses of a fund from causing duplication of return data that comes, effectively, from the same portfolio. 16 Monthly gross return rates are not reported in the data. To calculate these, we use the expense ratio, which is the ratio of a fund's operating expenses to its TNA. We compute the monthly gross return rates for each fund by adding one-twelfth of its yearly expenses ratio to its monthly net return rates. If a fund's expense ratio is missing in any year, we follow Fama and French (2010) and fill in the missing values with the average expense ratios of active funds with similar assets under management (AUM).
As our interest lies in actively managed funds, we need to be able to identify them. This is no problem from June 2008 onward as they are identified in the database but, before that date, CRSP does not identify a fund as active or passive. We follow the procedure suggested by Gil-Bazo and Ruiz-verdú (2009) to identify active and passive funds before June 2008. For details, see Appendix A.5.
Although the CRSP mutual fund database is free from survivor bias, incubation bias is another concern raised in the literature (Evans 2010), which may cause overestimation of fund performance. Fund management companies commonly provide seed money to newly-launched funds to develop a longer return history. Incubation bias occurs when funds open to the public, and their pre-release return history is included in the fund database if it appears to be attractive. We use the approach in Evans (2010) to minimize incubation bias by excluding returns from the period before a fund received a ticker 17 from NASDAQ. To reduce the regression estimation error, we focus on active funds with more than 60 observations, as is standard in the literature. This gives us 5589 active funds (635 sector, 4954 non-sector funds) in an unbalanced panel from January 1992 to December 2016. Table 2 presents average return rates in excess of the risk-free rate for the different active fund types over the sample period from 1992 to 2016. For each sector and month, the return rate of a fund type is computed by weighting the return rates of the individual funds active in the specific month using the associated AUM. Excess return rates are then computed by subtracting the one-month T-bill rate for the respective month. An investor could set up such sector portfolios only at high cost, because of shareclass fees and rebalancing cost. Table 2 shows average monthly gross and net returns for sector and diversified funds and for Dow Jones sector indices and the general Dow Jones index. The sector portfolios generally have higher return rates than diversified portfolios at both gross and net level. When compared to the average return rate of the sector-appropriate Dow Jones index, only six out of 11 of the notional sector active fund portfolios produced larger average net returns. These sectors are health care, consumer goods, consumer services, industrial, materials and telecommunication. The real estate sector funds equal the Dow Jones sector index. For diversified funds, the picture is worse, with only one fund type, mid cap, having net returns in excess of the general Dow Jones index.
Compared to their passive counterparts, nine out of 11 active fund sectors produce higher returns. 18 In contrast, only two out of seven active diversified fund types outperform their passive counterparts, and one equals it. Given these results and that active funds are likely to take on extra risk, there is little initial evidence that 17 A ticker is an abbreviation used to uniquely identify publicly traded shares of a particular stock on a stock market. 18 The passive funds may provide gross return rates lower than the Dow Jones index because of the different indexes they track. The one month T-bill rate is used as the risk-free rate. The notional fund portfolio return rates are averages of the return rates of all funds that existed in a given month in the respective fund type. The value-weighted (VW) portfolio return rates weight the individual rates with the fund's assets under management. The average return rates are reported separately for actively and for passively managed fund,s and for return rates before (gross) and after (net) the deduction of manager expenses. The 'DJ Index' column gives the average excess return rates over the Dow Jones (DJ) stock index for the respective sector or general market many of the active fund managers have real talents to outperform the market on the risk-adjusted basis.
In the next section, we assess different potential benchmarks, and then report our results using our preferred benchmark, the Carhart four factor model with the liquidity factor of Pástor and Stambaugh (2003). The time series observations for the risk factors are from French's and Pastor's webpages. We do not provide summary statistics here.

The Benchmark Model
We tested a variety of potential benchmark models for the sector and diversified funds using value-weighted portfolios of both passive and active funds. If a pricing model explains the expected returns of an asset, the intercept in the time series regression of the asset's excess returns on the model's factors would be indistinguishable from zero (Fama and French 2018). We ran the regressions on panels of portfolio types and a variety of benchmark models. 19 The coefficient t-statistic estimates were adjusted using heteroscedasticity and autocorrelation robust standard errors (Kiefer and Vogelsang 2002). Table 3 shows the results for the joint tests that alpha is zero for passive funds, and for active funds. We also consider, separately, sector funds, cap-based funds and style-based funds as we use these categories to consider the impact on performance of outsourcing and team management. If the benchmark is appropriate for assessing risk-adjusted performance, and if some active fund portfolios deliver outperformance, we would expect the test to be significant for active funds and insignificant for passive funds. We seek a benchmark that meets these criteria for all funds and for the three fund categories.
For the active portfolios, all benchmark models have non-zero alphas for all funds and for the three separate categories. For the passive portfolios, only for the Carhart model, with an added liquidity factor, do we fail to reject the null hypothesis of zero alphas for all funds and for the three separate categories. Accordingly, we adopt it as the benchmark. 20

Performance of the Fund Industry
We conducted the performance analysis for value-weighted portfolios of all types of active funds. Table 4 shows the results for the gross and net monthly alpha, risk factor loadings, and their associated t-statistics with Kiefer and Vogelsang (2002) standard errors (in parentheses), from the regressions of the Carhart plus liquidity model for all 18 fund types.
Overall, the the benchmark model is a better fit for the diversified fund portfolios than for sector fund portfolios, as seen in the adjusted R-squared ranging from 86% to 98% for diversified funds and 46% to 88% for sector funds.
The Wald joint tests, that all gross or all net alphas equal zero, show significant rejection of the null for both gross and net, implying that at least one fund type can produce superior performance even after fees are deducted. Some fund types clearly outperformed: healthcare, consumer services, industrials and telecom were able to beat the market, net of fees. No sector fund types had significantly negative net alphas. In contrast, for the diversified funds, while five out of seven produced significantly positive gross alphas, none did so for net returns, and alpha was significantly negative for growth funds.
Most of the market betas are close to unity. The highest are tech (1.32) and telecom (1.19), and the lowest are hedged (0.33) and utilities (0.66). All others are in the range 19 We also tested the benchmarks using the approach of Chen et al. (2013), which had passed on the method of Fama and MacBeth (1973), and drew the same conclusion on the preferred benchmark. 20 We also undertook the subsequent analyses using the other benchmarks which had failed some of the tests. The main inferences on the performance of different types of funds remain qualitatively robust when these are used.  The gross returns are computed as the net returns plus 1/12th of the fund's annual expense ratio. The t-statistic estimates are adjusted using heteroscedasticity and autocorrelation robust (Kiefer and Vogelsang 2002) standard errors. The risk factor loading estimates are presented only once on both net returns and gross returns, since they are the same up to 2 decimal places 0.72 to 1.01. Real estate, at 0.71, is the second lowest for sector portfolios and the third lowest overall. In general, the market betas for sectors funds are slightly higher that those for diversified funds, with an average 0.92 against 0.85. The coefficients, the risk factor loadings, on the size factor (SMB) among sector fund portfolios are positive except for consumer goods, financial and utilities, with the last two being significantly negative. For the diversified fund portfolios, four of the SMB coefficients are positive and three negative, all significantly so, suggesting, as would be expected, a wider range of investment strategies. Real estate funds, at 0.34, have the second highest SMB loading of the sector portfolios, but mid, micro and small cap portfolios all have higher SMB loadings. Overall, this confirms a size tilt towards smaller companies in these portfolios and is consistent with the findings in most REMFs studies that REMFs are mostly small-cap (Chiang et al. 2008).
For the value risk factor (HML), eight out of 11 sector fund portfolios have positive loadings (of which six are significantly so), and three are negative (all significant). For the diversified funds, three out of seven are positive (two significant) and three are negative (all three significant) and one is zero. For sector portfolios, the range is -0.81 to 0.66, with real estate, at 0.56, the second highest overall; for diversified portfolios, the range is less, -0.26 to 0.25.
For the momentum factor (MOM), three sector fund portfolios have positive loadings (two significant) and eight are negative (six significant). For diversified fund portfolios, four are positive (three significant) and three are negative (two significant). The loadings are generally low, for diversified fund portfolios all are within the range -0.03 to 0.07, and those for sector funds have a wider range, from -0.11 to 0.12, perhaps suggesting the different sector conditions. For real estate, it is -0.06, which is the second equal most negative.
Finally, for the liquidity risk factor (LIQ), five sector fund portfolios have positive loadings (two significant) and three negative (two significant), and three are zero; and for the diversified fund portfolios, four are positive (two significant) and three are zero. The loadings are very low: for sector funds, the range is -0.05 to 0.09; and for diversified funds, it is trivial, 0.00 to 0.01. Real estate is -0.03 while healthcare, at -0.5, is the only other negative and significant loading. This lends some weight to the findings by Subrahmanyam (2007) and DiBartolomeo et al. (2020) that the REIT and non-REIT markets are affected differently by liquidity shocks.
When the correlations between alpha and the factors are considered, there are no significant results. And the only significant correlation between factors is the market beta with the SMB factor (0.37). This confirms that alpha is measuring something other than that measured by the load factors.
And when the factor loadings of REMFs are correlated with those of other fund types, the correlations are strongest with industrials (0.89), materials (0.88), financial services (0.86), resources (0.84) and consumer goods (0.78). No other correlations are above 0.75. Three are below 0.50: telecom (0.45), healthcare (0.42) and technology (0.34). Those with diversified funds are all in the range 0.50-0.75.
Overall, these results indicate a broad difference in the investment strategies between sector and diversified funds but also important differences within these groups. While REMFs have the expected characteristics, they do not stand out as particularly different from most other sector funds.
Next, we consider the estimated alphas for each fund type in non-overlapping fiveyear and three-year periods to establish if out-and underperformance in one period persists to the next period. We have 18 funds and five or eight periods, so there are, respectively, 72 or 126 possible opportunities for persistence in returns. The results are shown in Tables 5 and 6.
For the five-year periods, we identify only one case of persistence: for micro cap, significant gross out-performance in 1992-6 is followed by significant net outperformance in 1997-2001. For the three-year periods, we also find only one: again for micro cap, with significant gross out-performance in 1998-2000 followed by significant net out-performance in 2001-3. Thus, once expenses are taken into account, there are no examples of persistence in performance.
In the final analysis of this section, we consider the estimated alphas for each fund type in rolling windows of five years. 21 We do so as we expect that different fund types will be affected differently by different aspects of economic fundamentals and the business cycle, and that sector funds are prone to sector-specific factors. It is also possible that the extent of market competition and the number of skilled managers in the different fund types may vary during different periods (Barras et al. 2010). Our interest is in the time patterns of excess returns as much as in the statistical significance. The results are shown in Fig. 2.
There are two striking features from this analysis: first, the estimated alphas for the diversified funds are close to zero and are much less variable than for the sector funds; and, second, the sector fund alphas, generally, vary between positive and then negative for lengthy periods, which differ by sector. This is what might be expected as diversified funds are more resistant to shocks and are able to move investments according to actual or expected changes in the investment environment. In contrast, sector funds must retain their sector composition and so are exposed to sector specific factors that persist. Overall, these results are consistent with a view that outperformance is soon competed away and that there are systematic factors that affect all funds of a particular type.

Performance of Individual Funds
We now report the results of the analysis of individual funds, and we focus on the alphas. Recall that we need to calculate q-values of each fund to control for false discovery among multiple testing. For each fund, we estimate the one-sided p-values from Eq. 7 based on t-statistics estimated from the wild bootstrap. After using Eq. 8 22 to calculate q-values of each active fund, we apply 20%, 10% and 5% significance levels to test for significantly positive alphas with false positives controlled for. A The gross returns are computed as the net returns plus 1/12th of the fund's annual expense ratio. The t-statistic estimates are adjusted using heteroskedasticity and autocorrelation robust (Kiefer and Vogelsang 2002) standard errors  fund is considered skilled and free from false discoveries once it has a q-value less than the chosen significance level. We aggregate the number of skilled funds by their fund types, and present the percentage of funds with q-values less than 20%, 10%, and 5% in columns three to five in Table 7. To compare the findings of conventional t-tests before we control for false discoveries, we also present the number of funds with their bootstrapped p-values (one-sided) less than 5% in the last column in Table 7. Overall, only 5.80% of funds display skills at the gross returns level at 20%, and 1.84% at 5%, reducing to 0.79% in both cases, when fees are taken into account. There is almost no difference between sector and diversified funds as a whole. There are outperforming funds at the gross level in all fund types but outperformance is concentrated. For active sector funds, at the gross returns level, healthcare and technology are the sectors where individual funds are most likely to outperform. There are 9.46% of the 74 healthcare funds and 8.76% of the 137 technology funds with their associated q-values less than than 20%. These percentages reduce to 1.35% and 2.19%, respectively, at a 5% significance level. For net returns, the percentages are 1.35% and 0.73%, respectively, at 20%, and the same at 5%. Table 7 All Active Funds with Bootstrapped q-value≤20%, 10%, 5%: Presents the total number of funds for fund type (Total) and associated percentage of funds with their alpha t-statistic q-values ≤20%, 10%, and 5% (q 20%, q 10%, q 5%), for each fund type The last column shows the percentage of funds with their bootstrapped p-values less than 5% significance level (p 5%) for each type For diversified funds, the best performing types are micro cap, hedged and mid cap, with 24.00%, 13.64% and 11.37% showing skills at the 20% significance level for gross returns, reducing to 8.08% 6.06% and 3.73% at 5%. When fees are taken into account, these figures reduce to 4.00%, 2.53% and 2.35%, respectively, at both 20% and 5%.
Without false discoveries being controlled, we find that the number of skilled funds derived from the individual t-tests is inflated by falsely treating lucky funds as skilled. The more funds included in the fund category, the more severe is false discovery. The number of truly skilled growth funds (net of expenses) reduces from 232 to 44, once the false discovery rate of 20% is applied.
Turning now specifically to the real estate sector, once we have controlled for false discoveries and expenses have been deducted, only one of the 173 funds can be regarded as skilled at both 20% and 5% significance. However, in total, there are only five sector funds with skills, and no more than one from each sector. Thus, there is no evidence of real estate being any different from other sector funds.
We now consider the estimated alphas for individual funds, in non-overlapping five-year and three-year periods, to establish if outperformance in one period persists to the next period. First, we consider persistence within each fund type, that is, whether there are any outperforming funds of each type in successive time periods. However, it is possible that the outperforming funds of a particular type are different in successive periods and that fund flows can compete away the alpha for individual funds (Barras et al. 2010). So, we then consider whether specific funds outperform in successive periods. For each sub-period, we only include funds with more than 60 observations and repeat the procedures to derive their q-values.
The results are shown in Tables 8, 9 and 10. Table 8 presents the results for persistence between five-year periods for any funds within a type. For gross performance at a q-value of 20%, some funds in growth and small cap appear in all periods, some funds in growth & income and mid cap appear until the final five year period and some funds in hedged, micro cap and income appear in two consecutive periods. However, when fees are taken into account, only small cap (1992-6 to 1997-2001), micro cap (1997-2001 to 2002-6) and hedged (2007-11 to 2012-16) show persistent performance. In all three cases, only one fund is involved. There is no evidence of persistence among any sector funds at either gross or net levels. Table 9 shows the results of the same analysis for three-year periods. For gross returns, there are 28 separate instances of outperformance in consecutive periods. Growth & income accounts for seven of these and appears in every one of the eight three-year periods; mid cap has four consecutive instances (five consecutive periods of outperformance); growth has a sequence of three instances and a separate sequence of two; hedged has three instances; small cap has two separate sequences of two; tech has two instances; and income, real estate and resources have one each. For net outperformance, there are three sequences of three for growth & income and mid cap; and one each for growth and real estate. Again, the evidence of persistence among sector funds is less than among diversified funds.  We now turn to persistence of performance of individual funds rather than of funds within a fund type. The results are shown in Table 10 and are striking. At a q-value of 20%, for net out-performance, there is persistence only between 1992-4 and 1995-7, and only for two individual funds, one growth & income fund and one growth fund.
Overall, these analyses in the last two sections suggest: -There is evidence of gross outperformance of portfolios of most fund types for the study period as a whole but only for four types of sector funds when fees are considered. -The outperformance is concentrated in periods of three or five years and there is no persistence beyond the initial period of out-performance. -The outperformance is heavily concentrated in a small number of funds -fewer than six percent of funds outperform at a gross level and fewer than one percent (44 funds out of 5589) at a net level. -Within some types of diversified funds, there is some limited evidence of persistence of significant positive performance of that type of fund but not of individual funds. -Diversified funds differ from sector funds in that they exhibit some limited persistence of significant positive performance within fund types but we identified only two such funds with persistent out-performance in successive periods. -There is evidence of sector funds having a period of positive performance followed by a period of negative performance. -Persistence, where it exists, seems either to be competed away quickly or to be linked to systematic factors affecting a fund type and not to the specific skills of individual managers. -REMFs are no different from other sector funds. Table 9 All Active Funds with Bootstrapped q-value≤20%, 10%, 5% during Three-year Subperiods: Presents the total number of funds for fund type (Total) and associated percentage (%) for funds with their alpha t-statistic q-values ≤20%, 10%, and 5% for each fund type (q 20%, q 10%, q 5%)

Robustness Checks on Fund Performance
We implemented a series of sensitivity tests to examine whether our conclusions on active sector and diversified funds' performance are robust to the funds' sample selection, benchmark choices, inter-fund dependency, and other multiple testing approaches. 23 The sample selection test consists of changing the required number of observations to 36, 48 and 72. We find the results remain qualitatively similar, and only marginally different from the findings for the sample with the requirement of 60 observations. To examine the sensitivity of performance to specifications of the benchmark, we employed all other factor models which passed the tests summarised in Table 3. The results remain robust. Finally, the method for utilizing false discovery rates in this study is from Storey (2002Storey ( , 2004. It assumes the p-values are independent or weakly dependent. 24 Romano et al. (2008) propose a procedure to control false discoveries under weak assumptions that incorporate information about the dependence structure of the test statistics. Since cross-fund correlation among return rates is less of a problem for mutual funds compared to hedge funds, we do not consider the control for cross-fund dependence among all active funds. However, we examine the time-series autocorrelation among each fund returns, by implementing the bootstrapped p-values generated from the stationary block bootstrap in Politis and Romano (1994). We compare the bootstrapped results with block length as 2, ..., 10. Overall, the inferences on results remain robust.
We also implement the Bonferroni method to control for false discoveries among active funds, since it is the most familiar multiple testing approach. Using this approach, we reject the null if the p-value α/N, where N is the number of funds. The disadvantage of the Bonferroni method is that it is strong, resulting in loss of power. Nonetheless, after applying the Bonferroni bound, we find the number of skilled funds reduces significantly, from 232 to 2 at net alpha level, and from 419 to 9 at gross alpha level (α = 5%). However, the result on which fund types are skilled remains robust.

The Impact of Outsourcing and Team Management on Fund Performance
The results presented in this section use the same basic method as Chen et al. (2013), as set out in the Methodology section, but there are differences in our research Table 10 Persistence of All Active Funds: A fund has its q-value ≤ 20% (or 10%, 5%) in two consecutive non-overlapping five-year/three-year periods is reckoned as a persistent winner We present the total number of funds (Total) and associated percentage for funds with their gross and net alpha t-statistic q-values ≤20% (or 10%, 5%) for each fund type (q 20%, q 10%, q 5%), respectively questions and in the approach. We do not consider bond funds, but we do consider sector funds, in particular, we consider REMFs separately; we divide diversified funds into cap-based funds and style funds; and we also consider team-versus individually-managed funds. Table 11 shows the time series averages of the cross-section estimates of the coefficients estimated using Eq. 12 for all funds and, separately, for REMFs, other non-RE sector funds, style-based funds and cap-based funds. 25 The left panel shows the results of the analysis using eight portfolios of funds, sorted by type (four) and whether outsourced or in-house; and the right panel shows the results by type and whether the fund is team-or individually-managed. 26 27 For the sorting by in-house/outsourced, the coefficient on Outsource is statistically insignificant for non-RE sector funds. In contrast, it is negative and statistically significant for all funds together and for the other fund types. Thus, overall, funds with outsourced fund management underperformed funds with in-house management by 2.2 basis points per month, or -0.26% annually. For fund types, the annual figures are -0.11% for style-based funds, -0.70% for cap-based funds and -1.1% for REMFs. For the sorting by team/individually-managed, the figures are similar, respectively, -0.29% overall, -0.20% style-based funds, -0.59% for cap-based funds and -0.70% for REMFs. 28 The results suggest that managers of outsourced non-RE sector funds have the same skills as in-house managers. However, caution is required in drawing inferences from the relatively small REMF sample. 29 Possible explanations for the overall result include a classic principal-agent problem between the fund family and the out-sourced managers, and possible preferential treatment of the in-house managed funds. Despite the difference in outcomes, the proportion of outsourcing among sector and diversified funds is similar at around 20%.
Whether a fund is team-or individually-managed is also a factor to consider when attributing performance. Funds managed by a team may perform better than those managed by individual managers, as a result of being relatively free of constraints of resources and networks. However, a team-management structure is typically considered to be less efficient in terms of the coordination of personnel and organisation. We is the logarithm of one plus the TNA of the family that the fund belongs to, excluding its own TNA. Expense is the annual expense ratio over the fund's assets under management. Age is the number of years since the inception of the fund. F low is the percentage of new fund flow into the fund over the previous 12 months. CumRet is the cumulative gross return over the previous 12 months. The sample period is from January 1992 to December 2016 (240 months). The coefficients estimates are the time-series average of cross-section estimates from each month. The N range shows the number of funds included in the cross-section regression. The t-statistics are adjusted using robust Newey-West standard errors from lag 3. R 2 is the time-series average of monthly R 2 estimates. The first panel shows the results based on sorting by fund type (real estate sector, non-RE sector, cap-based, and style-based) and whether outsourced. The second panel shows the results based on sorting by fund type and whether team-managed find no significant differences between team-and individually-managed funds either when we sort by outsourcing or by team management. This is true overall and for all types of funds. This confirms the results of both Bliss et al. (2008) and Massa et al. (2010) who find no significant difference between team-managed and individuallymanaged funds. In contrast, Patel and Sarkissian (2017), who argue that their data are of better quality than other studies, find that team-managed funds out-perform by 30-40 basis points, with maximum out-performance from teams of three.
We turn now to the control variables included in the analysis. 30 Fund size is consistently significantly negative for all funds and for each fund type. Larger funds perform worse but the effect is smallest for REMFs. Although we include sector funds and exclude bond funds, and we have 40 rather than 20 portfolios, our results for the coefficients are broadly similar to those of Chen et al. (2013) although the significance of their result depends on the benchmark used. More generally in the literature, there is no conclusive finding on the effect of fund size on performance. Large funds may have advantages over smaller ones owing to economies of scale from allocating costs over a larger asset base but, on the other hand, they may also face potential dis-economies of scale. Chen et al. (2004) find that fund returns decrease with the lagged fund size. They suggest that liquidity may be an important reason explaining why size erodes performance.
We also include the number of funds in the family of which a fund is a member and the value of other funds in the family. The former is significant and positive for cap-based funds, whether the sorting is by outsourced or team, but otherwise is insignificant. Chen et al. (2013) find number of funds to be insignificant for all benchmarks. The difference may be because they do not undertake their analysis by fund type or because of their different choice of fund types and different time period. Thus, our finer sorting produces a new result for cap-based funds.
The latter, family fund size, is significant and positive for all funds, for non-RE sector funds and for style-based funds, in both sortings. Overall, a fund which belongs to a large fund family does better. Chen et al. (2013) find this variable to be significant and positive. Their coefficient values (for a range of benchmarks) are larger than ours, except that our non-RE sector coefficient is about twice the size, suggesting another difference for sector funds, which are not in their analysis.
We find the expenses ratio to be insignificant at 5% for all fund types in both sortings. Chen et al. (2013) also find it to be highly insignificant. The impact of expenses on fund performance is typically considered to be negative, since they are regarded as the price paid by investors to fund managers. Some studies find evidence supporting a negative relationship between expenses and performance before or net 30 In the reported version, we do not include fund turnover, loads and 12b fees as controls because there are missing data. When turnover and loads are included, the minimum/maximum sample (depending on the year) overall falls from 1415/2821 to 666/1931; and when 12b fees are included to 344/1415. The REMFs sample falls from 24/105 to 11/70 and then to 8/51. When turnover and loads are included outsourced becomes insignificant for REMFs; number of family funds becomes insignificant for cap-based funds; and age becomes insignificant for style-based funds an all funds. When 12b fees are added, outsourced becomes only marginally significant, at around 10%, for all funds for both panels; and flow becomes significant and negative for non-RE sectors funds. We attribute these changes to sample size and composition although the key results remain largely unaffected. of expenses (Carhart 1997;Gil-Bazo and Ruiz-verdú 2009). But Chen et al. (2004) find no statistically significant relationship between expenses and performance.
The coefficient for age is positive but only significant at 10% and only for all funds and non-RE sector funds, when sorted by outsourced/in-house, and additionally for style-based funds when sorted by team/individually-managed. 31 Chen et al. (2013) find age to be highly insignificant. More generally, the literature is inconclusive on the relationship between fund age and performance. A younger fund may be more at risk of failure owing to lack of experience but may also be more likely to outperform by taking larger risks. Ferreira et al. (2013) find no significant relation between age and performance for funds invested inside the U.S. but a negative relationship between age and fund performance for funds invested outside of the U.S., with younger funds performing better. However, in contrast, Chuprinin et al. (2015) find a significantly positive relationship between international fund performance and age.
The next variable is fund flow. According to the 'smart money' hypothesis proposed by Gruber (1996), fund flow is positively related to future performance because investors can detect and reward the skilled managers by investing in them. His empirical evidence supports this hypothesis. Ferreira et al. (2013) find that the smart money effect is more evident in the global market, suggesting that investors are better at detecting skilled managers outside of the U.S. We find no significant evidence supporting this hypothesis for either sorting or for sector or diversified funds. In contrast, Chen et al. (2013) find significantly negative results for all five of their benchmarks, although, to three decimal places, the coefficients are zero so, as the variable is measured as a percentage, are not of economic significance.
The final variable is the cumulative return during the previous 12 months. We find no significant results, in contrast to Chen et al. (2013) who find a significantly positive result for all five benchmarks.
We also undertook the analysis using fixed effects for adviser company and fund family instead of fund family number and fund family size. As there were around 780 advisers and a similar number of fund families, sample size restrictions meant that we could not undertake the analysis separately for REMFs and non-RE sector funds. Table 12 presents the results for all funds. This analysis confirms our overall results for outsourcing, with an annual underperformance of just under -0.5%. It also confirms the result for firm size and the marginal result for age.
Our key results are: -Outsourcing has no effect on the performance of non-RE sector funds.
-In contrast, it has a negative effect on the performance of all other fund categories. However, caution is required for the REMFs result as the sample size is 31 For fund size, number of funds in the family, value of the other funds in the family and age, we tested for concave or convex relationships by adding the squared variable. In both sortings, there was an effect only for age, and only for all funds and for style-based funds. In the former case, the linear term was significant at 5% and the squared term was significant at 10% and, in the latter case, both linear terms were significant only at 10%, and the squared terms at 10% and 12%. In these cases, the overall effect was positive and increasing until during year four, thereafter, the effect decreased and became negative during year seven. The fund monthly gross benchmark-adjusted returns are calculated as the fund's gross returns minus the product of the observed risk factors and associated loadings from the benchmark model (Carhart with a liquidity factor). Outsource is a dummy variable that equals one if the fund is outsourced. T eam is a dummy variable that equals one if the fund is team-managed. log(Size) is the logarithm of TNA of fund. Expense is the annual expense ratio over the fund's assets under management. Age is the number of years since the inception of the fund. F low is the percentage of new fund flow into the fund over the previous 12 months. CumRet is the cumulative gross return over the previous 12 months. The sample period is from January 1992 to December 2016 (240 months). The first panel shows the results based on sorting by fund type and whether outsourced. The second panel shows the results based on sorting by fund type and whether team-managed small and, when additional control variables are added and the sample size is further reduced, it becomes insignificant. -There are no significant differences between team-and individually-managed funds for any fund type.
The clear difference between sector and non-sector funds for outsourcing is a new result in the literature and justifies our approach of considering fund type. A possible explanation lies in the sizes of the funds and their fund families. Table 13 shows that, on average, non-RE sector funds are less than half of the size of style-based funds and similar in size to cap-based funds and REMFs. However, their fund families have around 50% more funds than the other categories, and the TNA of their fund families is between three and six times larger. Sector funds are also half as likely to be teammanaged (13% compared to roughly 26%), and less so (5%) if they are outsourced. They also have much larger percentage fund flows. It is likely that highly specialized skills are required for managing sector funds and only very large fund families would have the capacity to develop these skills inhouse. This probably explains why those fund families that manage their sector funds in-house are over twice the size of those which outsource. Nonetheless, even the fund families that outsource the management of sector funds are significantly larger than other fund families, suggesting a strong market position.
Companies which undertake the outsourced management also require specialist skills and have to focus on small market segments, providing specialist services to large organisations which have a strong market position and which are more likely to have significant oversight capacity. Thus, companies that manage the outsourced funds need to perform well, so the skills differences between in-house and outsourced managers should be less, and the principal-agent problems may be reduced.

Conclusion
In this study, we have examined the performance of REMFs within the context of all fund types, both sector and diversified, to see whether their fund managers are skilled. To do so, we used as a benchmark the Carhart four-factor model with an added liquidity factor, and examined the performance, both gross and net, of the active funds by type and individually. We have added to the literature by separating skills from luck through use of a wild bootstrap, and by controlling for false discoveries, where we have demonstrated that the conventional approach produces more favourably positive results. We have also examined the time-dependency of performance and persistence during successive sub-periods. Finally, we considered the impact on performance of outsourcing and team management and established a new result in the literature.
Our key results are as follows.
-The Carhart plus liquidity model captures the cross-sectional return variations of the fund types. -In the actively-managed mutual fund industry, seven out of 11 sectors, and five out of eight diversified fund types, over the period 1992-2016, displayed skills rather than luck in achieving out-performance. This reduces to four out of 11 and none out of eight, respectively, after consideration of fees. -There is evidence of significant lack of skills, rather than bad luck, in the growth & income fund type for net returns. -The out-performance of fund types is concentrated in periods of five or three years and there is no persistence beyond the initial period of out-performance. -At the individual fund level, we found little evidence for the existence of skills, after deduction of fees. Across all fund types only 5.8% (324) gross and 0.8% (44 funds out of 5589) net of funds display skills. The result is the same for sector and diversified funds. Only five sector funds demonstrate out-performance and only one of these is a real estate fund.
-When we examined individual fund performance in five-year and three-year periods, it was clear that, generally, significant skills did not persist. Of the sector funds, only real estate showed any persistence of net out-performance. Within several types of diversified funds, there is evidence of persistence of gross and net out-performance of that type of fund but not of individual funds. We identified only two individual diversified funds with out-performance in successive periods. -The analysis using rolling windows showed relative stability of the performance of portfolios of diversified funds but that portfolios of sector funds tended to have lengthy periods of poor or good performance. Thus, persistence would appear to be linked to systematic factors affecting a fund type and not to the specific skills of individual managers. It also seems likely that competition, with market maturity, drives out abnormal returns in all fund types. -Outsourcing has no effect on the performance of sector funds but a negative impact on cap-based and style-based funds. We attribute this to a smaller principal-agent problem in sector funds and to a lesser differentiation in skills levels. There is some evidence that REMFs are different from other sector funds but this result relies on a small sample and is sensitive to the control variables included. -In contrast, whether a fund is team-or individually-managed makes no difference to performance for any type of fund. -Overall, there is little to suggest that REMFs are different from other sector funds.
These findings are consistent with other findings in the literature that the evidence for outperformance is weak, both for the mutual fund industry as a whole and, specifically, for the real estate sector. So, to answer the question in our title -the real estate sector is not better than other sector funds. Nor do diversified funds generally fare better. Few individual funds have demonstrated skills to produce outperformance, and some even display sufficient lack of skills so as to generate significant underperformance.
The results raise some general issues about the structure of the investment market. From an institutional economics perspective, the current structure should reflect the minimisation of transaction costs and access to specialist information and skills by fund managers. There is no strong evidence of the latter so, why do these funds exist? It is perhaps easier to construct an argument for sector funds.
From the behavioral finance perspective, the proliferation of sector funds could be regarded as the product of marketing strategies used by fund management companies to exploit investors' heterogeneity, such as sector preferences and risk appetites (Massa 2003).
From the perspective of an active investor with a multi-sector portfolio, sector mutual funds allow sector positions, either under-or over-weight, to be taken quickly. As our results show that, unlike other fund types, non-RE sector funds do not suffer when outsourcing is used, investors need not factor outsourcing into their choice of fund. So, if the multi-sector managers believe they have forecasting skills, but not stock selection skills, they could adopt this approach. As some sector markets do outperform in some periods, this appears to be a defensible strategy.
However, as most funds within a sector do not outperform, either skills are required in manager selection, or a wide and diversified range of funds should be held in the sector portfolio, or investment should be in passive rather than active funds. The first of these would not appear prudent as the performance of individual funds is not persistent, and manager selection skills are required; the second requires a diversified portfolio of funds within a sector; whereas the third would be easier to achieve. This conclusion has to be qualified as, net of fees, for nine of the 11 sectors, a portfolio of active funds outperforms one of passive funds. We note a general trend to a higher proportion of passive funds, and that this is greater among sector funds. The stacked regression model in the case of two portfolios is The portfolios cover the full T month and share the factors X. OLS gives and the right-hand side shows that these are just the estimators for θ 1 and θ 2 from the individual regressions. It is clear that this outcome extend to the case where we have more than two portfolios.

A.2 Simultaneous testing
If we test the hypothesis whether a fund manager has no skill with significance level γ , the error rate (the probability of rejecting falsely) is P {R i (γ ) = 1|H 0,i } = γ . If we test the performance of I fund managers without skill simultaneously, each individually at level γ , the familywise error rate is with R(γ ) = I i=1 R i (γ ). It is obvious that Eq. 18 converges to one with I and it becomes certain that at least one false rejection will occur. The Bonferroni bound guards against this outcome and controls the familywise error rate at least at γ by tightening the significance level for the individual tests to γ b = γ I −1 , because where π 0 is the share of fund managers for whom the null hypothesis is correct. If indeed all fund managers are unskilled, π 0 = 1, the Bonferroni bound controls exactly at γ . If some fund managers have skill, π 0 < 1 and the Bonferroni bound controls at a significance actually stricter than γ . This excess control comes at a cost to statistical power. 32

A.3 Estimation of the share of correct nulls
We motivate the estimator for π 0 first with the graphical approach suggested by Schweder and Spjøtvoll (1982). For the actual estimation, we use a more elegant estimator. In order to estimate π 0 , we use the observed cross-section of p-values, most of them generated under the null of managers with no skill, the rest generated under the alternative of managers with skill. The p-values have the distribution function For observations from the null, p-values will follow a uniform distribution, F 0 (p) = p. 33 For observations from the alternative, p-values will be small and there will be a cut-off level λ < 1, so that F 1 (λ) = 1. We can therefore write F (p) = π 0 p + (1 − π 0 )F 1 (p) for p ∈ [0, λ) π 0 p + (1 − π 0 ) for p ∈ [λ, 1] Figure 3 shows F (p). To the left of λ, the slope is F (p) = π 0 + (1 − π 0 )F 1 (p) and flattens out; to the right of λ, the slope is F (p) = π 0 and constant. Therefore, λ is the point at which the slope of F (p) becomes linear. With F (λ) from Eq. 22 33 If t comes from the null distribution, it will not be in the acceptance region A γ with probability P 0 (t / ∈ A γ ) = P 0 (p < γ ) = F 0 (γ ) = γ (21)

Fig. 4 Distribution of Gross t (α) p-values of All Active Funds:
The p-values are one-sided, generated from the wild bootstrap procedure. The q-values are generated from the p-values and plotted as the dotted line To estimate this, we could determineλ visually from Fig. 3 and estimate the distribution function as the fraction of p-values that are at most as large asλ. 34 (Storey 2002, Section 9) has suggested a rigorous estimator for λ based on a bootstrap procedure. This estimator givesπ 0 = 0.89. 35 Figure 4 presents the histogram of the p-values for the individual tests of the null that a fund's manager has no skills.

A.4 Fund shareclasses
Data directly reported from CRSP are at the shareclass level. CRSP provides a separator in the fund name (":" or "/"), and information after the separator denotes subclasses. We split the 'fund name' by this separator, into the fund family name and subclass (A, B, C). This approach can not distinguish all shareclasses thoroughly, thus along with name-splitting approach, we also use the method proposed by Gil-Bazo and Ruiz-verdú (2009), by using the management company as the identifier for the shareclasses in the same fund family. The combined method can successfully separate fund family name and shareclasses. 34 The density function of Eq. 22 becomes constant after λ and a histogram of the p-values is an alternative approach to determineλ visually. 35 We use the R package from https://github.com/StoreyLab/qvalue and apply the correction term explained in Storey (2002, p.483).

A.5 Exclusion of index funds
To ensure our results are purely driven by fund manager active management, we need to remove the passively operated index funds. CPSP provides the passively managed fund identifier 'index fund flag' since June 2008. Strict use of this method would omit some index funds whose inception dates are prior to 2003. Thus, we manually check the passively managed funds prior to 2003. Firstly we generate a list of common phrases that appear in fund names identified by CRSP as index funds. We then compile a list of theses common phrases in the labelled index funds, such as 'Index', 'Idx', 'Ix', 'Indx', 'NASDAQ', 'Nasdaq', 'Dow', 'Mkt', 'DJ', 'S & P 500', 'BARRA'. The use of this phrases has been proven accurate for an thorough coverage of index funds by Gil-Bazo and Ruiz-verdú (2009). We check the accuracy of this manual approach by applying it to the funds after 2008, which have the passive fund identifiers. And we find our approach can successfully identify all passive sector funds.