Do Stars Shine? Comparing the Performance Persistence of Star Sell-Side Analysts Listed by Institutional Investor, the Wall Street Journal, and StarMine

We investigate the profitability persistence of the investment recommendations from analysts listed in four different star rankings, Institutional Investor magazine, StarMine’s “Top Earnings Estimators” and “Top Stock Pickers”, and The Wall Street Journal, and show the predictive power of each evaluation methodology. We found that only Buy and Strong Buy recommendations from the entire group of Star analysts outperform those of the Non-Stars in the year after election, while Sell and Strong Sell recommendations performed as those of the Non-Stars. We document that the highest average monthly abnormal return of holding a long-short portfolio, 0.97 %, is obtained by following the recommendations of the group of star sell-side analysts rated by StarMine’s “Top Earnings Estimators” during the period from 2003 to 2014. Since earnings are one of the main drivers of stock prices, the results obtained are in line with the notion that focusing on superior earnings forecasts is one of the top requirements for successful stock picks.


Introduction
This study analyzes whether investors can profit from the recommendations of ranked security analysts. We examine whether an investor's choice of a rating Bagency^matters and how the methodologies used by different star rankings are able to predict the investment value of the recommendations. By investigating the precision of signals that the various methodologies use in determining who the stars are, we distinguish between those star-selection methodologies that capture a short-term stock-picking profitability and those methodologies that emphasize more persistent skills of the analysts. As a result, this study documents that there are starselection methods that select analysts on the basis of more enduring analyst skills, and, thus, their stars' performance persists even after ranking announcement.
We compare the performance of the rankings by The Wall Street Journal and StarMine that are explicitly based on the analysts' past performance, which is objectively measured, with those of the Institutional Investor rankings that are based on subjective survey assessments by the analysts' buy-side clients. The expected differences between objective and subjective rankings in predicting stock recommendation performance depend on the persistence of the outperformance of a small group of objectively ranked analysts. Such outperformance is composed of both the stock-picking skill and luck. If outperformance is persistent, objective methods will have an edge over subjective methods. However, if outperformance is primarily due to luck and it is not persistent, then subjective rankings might work better if the buy-side has insight on which analysts have better stock-picking skills amidst the noise. We have to acknowledge that the different rankings could be directed to different clienteles rather than focusing on stock-picking skills. In contrast, our study focuses on the persistence in economic performance by the analysts, measured by portfolio returns.
Academic theory and banks do not reach the same conclusions about the value of security analysts. The semi-strong form of market efficiency states that investors should not be able to earn excess returns from trading on publicly available information, such as analysts' recommendations. However, banks and other firms spend large amounts of money on research departments and security analysts, presumably because they and their clients believe that security analysis can generate large abnormal returns. The importance of security analysis and analysts is also manifest in the establishment, in 1998, and growth of StarMine, a competitor to The Wall Street Journal and Institutional Investor's rankings of analysts. 1 In the following, we analyze the economic value of security analysisan activity performed by thousands of professionals in the finance industry with the goal of improving their clients' return performance.
The possibility that there could be profitable investment strategies based on the published recommendations of security analysts is supported by multiple studies (Stickel 1995;Womack 1996;Barber et al. 2001;Boni and Womack 2006;Barber et al. 2010;Loh 2010) that show that favorable (unfavorable) changes in individual analysts' recommendations are accompanied by positive (negative) returns at the time of their announcements. Hence, early work by Womack (1996) documents a post-recommendation stock price drift for upgrades that lasts up to one month and for downgrades that lasts up to six months.
Our perspective, however, differs from that of the above-mentioned studies. While the studies cited focus on measuring the average price reaction to changes in individual analysts' recommendations, we compare the profitability of recommendations issued by different groups For our sample period, we find that only Strong Buy and Buy recommendations of star analysts generated higher monthly average excess returns (alphas) (0.33 %) than those recommendations by Non-Stars (0.18 %), while Hold, Sell and Strong Sell recommendations performed insignificantly different from those of the Non-Stars.
Among the entire groups of stars, the best performance was observed for StarMine's BTop Earnings Estimators^, with a monthly excess return of 0.97 % followed by The Wall Street Journal with 0.63 %, and StarMine's BTop Stock Pickers^stars with 0.54 %. The worst performance was observed for Institutional Investor, with an excess return of 0.14 %. However, on a detailed level, Long portfolio of StarMine's BTop Stock Pickers^is the number one portfolio, but its Short portfolio is the number three portfolio, which we interpret as suggesting that StarMine's BTop Stock Pickers^might focus more on buy recommendations. Comparing the Long portfolios of the top-ranked analysts, we find that the analyst ranked number one by StarMine's BTop Earnings Estimators^had higher returns than the group of Non-Star analysts and that the difference in returns was statistically significant.
Our results show that star analysts who are ranked in terms of the accuracy and timing of their earnings forecasts, as in the methodology of StarMine's BTop Earnings Estimators^, show performance that is more persistent from the year of evaluation to the year after than the star analysts who are ranked exclusively based on the previous performance of their recommendations (stars listed by StarMine's BTop Stock Picker^and The Wall Street Journal). This result reveals that focusing on earnings forecasts when ranking analysts provides higher predictive power in selecting skilled analysts, while considering only the profitability of the previous year's recommendations leads to a large influence of luck. Previously, this result has also been documented by Loh and Mian (2006). We interpret our results as indirect empirical evidence in support for valuation models in the accounting and finance literature, underlining the role of future earnings in forecasting future stock price movements as in Ohlson (1995).
Our contribution is the comparison of four different star rankings with a focus on the profitability of investment recommendations using a recent dataset with a unique (handcollected) list of star analysts. Emery and Li (2009) use the information ratio, which is the tstatistic of the intercept of the regression estimation, rather than a direct performance measure of the profitability of recommendations, as is used here. Fang and Yasuda (2014) use a 1994-2009 sample period and a 30-day holding period and find that recommendations issued by Institutional Investor stars have significantly higher returns than those of all other analysts (Non-Stars). In our study, we use a 2003-2014 sample period and a one-year time horizon for recommendations issued by analysts and find that Institutional Investor has no statistically significant predictive power. We document that other rankings using other measures in the evaluation of analysts give better predictive power.

Ranking evaluation approaches
Analysts are rated as a Bstar^based on their quality of previous reports, accuracy of forecasts and returns generated for clients (Loh and Mian 2006). The four different rankings of sell-side analysts in our study (see Table 1) can be divided into two main groups according to the evaluation approach used: objective (StarMine and The Wall Street Journal) or subjective (Institutional Investor). Two objective rankings are based exclusively on the investment value of recommendations: BBest on the Street,^issued by The Wall Street Journal, and BTop Stock Pickers,^issued by Thomson Reuters' StarMine. A third objective ranking is BTop Earnings Estimators,^also issued by StarMine. It measures the accuracy and timing of each analyst's To select the members of the BAll-America Research Team^ranking, Institutional Investor (I/I) magazine sends a questionnaire to buy-side investment managers asking them to evaluate various attributes of sell-side analysts. I/I magazine ranks three analysts in each industry and also provides names of so-called Brunners-up^who are promising and could possibly be chosen in subsequent years. This list of stars is published in October and is usually supplemented by the 12 attributes in the survey that investors view as the most important to possess. Attributes such as industry knowledge and integrity are listed among the most important, while stock selection and earnings estimates are among the lowest-ranked attributes. Thus, the I/I ranking is not primarily focused on stock-picking ability but rather covers a wide range of attributes that are perceived to relate directly or indirectly to the ability of an analyst to make profitable recommendations. These attributes could be of high value for some clientele, though these qualitative attributes are not possible to measure by portfolio returns.
Previous research shows mixed results regarding the profitability of recommendations issued by I/I stars. Measuring the investment value of recommendations during the period from 1994 to 2009, Fang and Yasuda (2014) show that I/I stars outperformed the group of Non-Stars, finding Carhart 4-factor monthly alphas of 1.25 % for Long portfolios and −0.83 % monthly alphas for Short portfolios of I/I stars compared with 1.09 % and −0.71 % for Long and Short portfolios for Non-Stars, respectively. Using historical data from 1993 to 2005, Emery and Li (2009) investigate I/I and WSJ rankings. The authors identify the determinants of star status and compare the two rankings on the basis of earnings per share (EPS) accuracy and the industry-adjusted performance of investment recommendations in the year before and one year after analysts become stars. Emery and Li (2009) find, for the period from 1993 to 2005, that after becoming stars, star analysts' forecast accuracy of EPS does not differ from that of their non-star peers; the recommendations of I/I stars are not statistically better than those of Non-Stars, while the recommendations of WSJ stars are significantly worse. They conclude that both rankings are largely Bpopularity contests^and do not provide any significant investment value. In contrast, Leone and Wu (2007) investigate the investment value of I/I stars' recommendations issued from 1991 to 2000 and find that star analysts persistently issued profitable recommendations and that this outperformance was not due to luck but to a superior ability to pick stocks.
Since 1993, The Wall Street Journal (WSJ) has published a list of BBest on the Street^analysts (before 2000, this ranking was named BAll-Star Analysts^), with five analysts ranked in each industry during 2003-2011 and three analysts per industry in the years of 2012 and 2013. This ranking is based on the score that an analyst obtained during the previous year, calculated as the sum of one-day returns of recommendations (if an investor would invest one day before a recommendation is announced and realize the return by the end of the recommendation day) (Emery and Li 2009). Such an evaluation methodology focuses on short-term price forecasts and favors analysts who issue recommendations on days when a price changes the most. At the same time, it penalizes analysts who issue their recommendations before or after such days of sharp price changes. Additionally, to benefit fully from such recommendations, investors should be able to receive a recommendation one day before it is announced, which could be the case for a limited number of investors with privileged access to analysts' recommendations. According to Yaros and Imielinski (2013), WSJ's evaluation method is not able to avoid analysts who announce their recommendations on the same day but after a significant price change has already occurred. All of these considerations may generate significant randomness in the election of analysts into the WSJ star ranking. As mentioned earlier, Emery and Li (2009) find that, after becoming stars, WSJ star analysts issue recommendations that underperform the group of Non-Stars. They interpret this result as an effect of regression to the mean, as the short-term recommendation performance includes a substantial random component.
Thomson Reuters' StarMine BTop Stock Pickers^(STM-TSP) and BTop Earnings Estimator( STM-TEE), which both include three analysts per industry, have been issued annually since 1998. They are both issued around October each year (except of those lists that were announced in December 2009, May 2012, and August 2013. The STM-TSP ranking is based on the excess returns of a non-leveraged portfolio built on all of the recommendations of each analyst. The returns of each analyst are calculated using the long and short buy-and-hold portfolio method relative to the market capitalization-weighted portfolio of all of the stocks in a given industry. The portfolio is rebalanced each month and whenever the analyst changes rating, adds coverage or drops coverage. The STM-TEE ranking measures the accuracy of each analyst's earnings forecasts and it is a measure of relative accuracy, since the analysts are compared against their peers. The measure accounts for several factors: the analyst's forecast error, the variance of the analysts' errors, the analyst's error compared to other analysts, the timing of the estimates, and the absolute value of the actual earnings of the firm. The measure is computed daily and aggregated to provide scores on individual stocks, industries and the analyst overall (StarMine 2015b). Up to 2012, STM-TEE's evaluation was based on earnings forecasts from the previous calendar-year. However, from 2012, STM-TEE uses earnings from the immediate year before announcement of the rankings lists. To summarize, the STM-TEE ranking differs from the STM-TSP and WSJ rankings since it does not consider the investment value of analysts' recommendations, thus STM-TEE does not measure the abnormal returns on portfolios.
Although StarMine's rankings appeared much later, they play an essential role in sell-side research by providing an B…influential and an important reference in the industry^ (Kim and Zapatero 2011). According to Beyer and Guttman (2011) and Ertimur et al. (2011), many Wall Street firms use StarMine rankings when defining payments to their analysts. Recent work by Kerl and Ohlert (2015) investigates the accuracy of earnings per share forecasts and target prices of StarMine analysts compared with their non-star peers one year after the analysts became stars. They find that analysts possess a persistent ability to issue accurate earnings forecasts, and after becoming stars, they continue to issue more accurate earnings forecasts than non-star analysts. Regarding the accuracy of target prices (TP), the authors cannot find any difference between the two groups of analysts. The insignificant difference in TP forecasts could be due to the research methodology: star analysts with BStock Picking Awards^and BEarnings Estimate Awards^are grouped together to compare their accuracy with that of Non-Stars without splitting the sample of StarMine's stars into Top Stock Pickers and Top Earnings Estimators. However, according to the StarMine methodology for determining the BStock Picking Awards^, analysts are not evaluated on the basis of accuracy of EPS. Thus, it is possible that, even in the year before they receive an award, this mixed group of stars does not outperform non-star peers in terms of the accuracy of their forecasts. Furthermore, Kerl and Ohlert (2015) focus solely on the accuracy of EPS and TP and the factors that influence such accuracy and do not compare the performance of recommendations issued by star analysts with that of Non-Stars.

Data and descriptive statistics
We use four data sources. The Thomson Financials Institutional Brokers' Estimate System (I/B/E/S) Detail Recommendations File provides standardized stock recommendations for all of the various brokers' scales by mapping all of the recommendations on a final scale from 1 to 5, where 1 corresponds to BStrong Buy^, 2 to BBuy^, 3 to BHold^, 4 to BSell^and 5 to BStrong Sell^. The Daily Stock File from the Center for Research in Security Prices (CRSP) provides daily holding period stock returns, which include dividends as well as price and cash adjustments. The Fama-French Factors -Daily Frequency database provides daily returns for the factors of value-weighted market index, size, book-to-market and momentum. We We use the following filters to the dataset. We keep only recommendations for stocks classified as ordinary shares or American Depository Receipts (CRSP Share Codes 10,11,12,30,31,and 32). To avoid the influence of Bpenny stocks^on our conclusions, we exclude recommendations for stocks with a price that is less than one dollar. We also exclude the recommendations from anonymous analysts or if the brokerage firm's name or code is missing.
We consider only recommendation changes and ignore re-iterations since previous research confirms that the changes carry more information than re-iterations (Boni and Womack 2006;Barber et al. 2010). Our final recommendation sample consists of three levels: BLong( includes Strong Buy and Buy recommendations), BHold^, and BShort^(includes Sell and Strong Sell recommendations). Thus, if a particular analyst for a given company issues a Buy recommendation soon after Strong Buy, the second recommendation, that is Buy, is considered to be a re-iteration and thus it is omitted in our sample.
We use similar approach as Loh and Stulz (2011) for dealing with overall rating distribution changes that occurred primarily due to the National Association of Securities Dealers (NASD) Rule 2711 in 2002. In response to NASD Rule 2711, many brokers changed from a five-point scale to a three-point scale for their recommendations (Kadan et al. 2009). By using the I/B/E/ S Stopped Recommendations File, we locate the dates when brokers stopped all previously issued recommendations, and then check the following 60 days whether a broker resumed coverage but on a three-point scale by having a new ratings distribution of either [1,3,5] or [2,3,4]. If a broker stopped the recommendations in order to re-initiate them on a three-point scale, we arrange in sequence the recommendations before and after Rule 2711 as if there were no BStop Recommendation^signal. According to our methodology that is discussed in detail in Section 3.1 (Methods), we focus only on recommendation changes between levels. Thus, Buy and Strong Buy recommendations are considered to be on the same level, and if one follows another, we treat the latter one as re-iteration and exclude it from our analysis. The same is valid for Sell and Strong Sell recommendation. Hence, the subsequent recommendations after Rule 2711 that remain on one of the levels of Long, Hold, or Short as before Rule 2711 will be treated as re-iterations and, thus, ignored. At the same time, recommendations that were resumed on a different level (e.g. changed from Long to Hold or Short) are considered as recommendation changes and will affect our portfolios. As a result, the changes in brokers' distribution ratings do not affect the results of our tests.
Our final database contains 172,525 recommendation changes for 6443 companies listed on the NYSE, AMEX and NASDAQ markets that were announced between January 2002 and October 2014.
The entire sample of analysts is divided into the following groups: (1) Stars and Non-Stars; (2) Institutional Investor (I/I When a particular analyst is rated as a star in two different industries, the analyst is included only once in a particular group of stars. However, the same analyst can appear in more than one ranking group. The similarities between the lists are discussed below and are reported in Table 5.
We compare these groups using two time frames: 1) The Year Before is the calendar year before a ranking is announced. For example, the WSJ list of stars is announced in May 2003. Thus, the previous calendar year, from January 2002 through December 2002, is the evaluation year for the WSJ ranking. As a result, the whole sample period for Year Before spans from January 2002 until December 2012. We exclude the first month of January 2002 from our regression analysis because some portfolios contained too few stocks and have extraordinary returns at the beginning of that month. We evaluate analysts during the evaluation year using our portfolio approach in order to compare the rankings in a uniform way independently of the methodology used by a particular ranking.

2) The
Year After is the one-year period that begins on the day that a particular ranking is announced and ends when the next year ranking list is announced (or twelvemonth period for the last year 2013   The average overlap among the ranking lists in each sample year is presented in Table 3. It shows the number of analysts listed by different rankings, the number of the same analysts in each pair of rankings, and the portion of the same analysts in each ranking list. Panel A presents these data for the entire groups of stars, while Panel B reports the results for the number-one ranked stars. The table also presents the percentages of analysts who appear in Table 2       Journal have the lowest similarity. Such overlap is expected given the degree of similarities in the evaluation methods used. It also shows how different the lists of Star analysts are, which might explain the differences in the returns from their recommendations. Table 4 shows the number of firms in the sample, which ranged from 3356 in 2010 to 3935 in 2006, and the percentage of firms covered by each group. On average per year, I/I, STM-TEE, and STM-TSP star analysts covered 22-23 %, while WSJ covers 27 % of the firms in the sample. Out of the total number of 6443 firms in our sample, I/I stars covered 36 % of the firms, STM-TEE -47 %, STM-TSP -50 %, and WSJ -58 %. This difference suggests that these groups have different firm coverage, which could be explained by the fact that the WSJ list has the highest turnover of names (they issue recommendations for different universes of firms). Number-one ranked stars cover 10 % of the firms, except WSJ-1, which covers 8 % of the firms.
As seen in Table 5, the group of Stars issues on average 20 % of all recommendations in our sample. Both WSJ and I/I stars issue more recommendations than STM-TEE and STM-TSP stars.   To measure the profitability of the recommendations, we apply a well-established methodology by constructing dynamic portfolios. We construct buy-and-hold BLong^, BHold^, and BShort^portfolios for each sub-group of analysts in the year subsequent to the year in which the rankings were assigned (referred to as Year After) and for the year during which the analysts were evaluated (referred to as Year Before) (Barber et al. 2006;Fang and Yasuda 2014). For each new Strong Buy or Buy recommendation, $1 is invested at the end of the recommendation announcement day (or at the close of the next trading day if the recommendation is issued after the closing of trading or on a non-trading day) into the BLong^portfolio. The stock is held in the portfolio for the following calendar year if there are no recommendation revisions or recommendation changes by the same analyst. If, during the following year, the analyst changes his or her recommendation level from Strong Buy or Buy to Hold or Sell or Strong Sell, then the stock is withdrawn from the BLong^portfolio and placed in the BHold^or BShortp ortfolio by the end of the trading day on which the new recommendation is issued (or at the close of the next trading day if the recommendation is issued after the closing of trading or on a non-trading day). If there is a recommendation revision, but the new recommendation is on the same level (that is, Buy or Strong Buy), then the stock is not kept in the same portfolio for an additional calendar year, but only until the next recommendation change within one year from the initial recommendation. Thus, re-   iterations of recommendations are not included in the portfolios. The same procedures are applied to a BHold^(includes only Hold recommendations) and BShort^(includes Sell and Strong Sell recommendations) portfolios. As a result of this strategy, the calendar day t gross return on portfolio ρ includes from n = 1 to N ρt recommendations and could be defined as: where X n, t-1 is the cumulative total gross return of stock i n from the next trading day after a recommendation was added to the portfolio to day t-1, which is the previous trading day before t, that is: X n;t−1 ¼ R i n ;recdat n þ1 R i n ;recdat n þ2 *:::*R i n ;recdat n t−1 Daily excess returns for each group's BLong^, BHold^and BShort^portfolios are estimated as an intercept (alpha) that is calculated according to the four-factor model proposed by (Carhart 1997): where & Rm τ is a daily market return & Rf τ is the risk-free rate of return, SMB τ is a size factor, that is, the difference between the value-weighted portfolio returns of small and large stocks & HML τ is a book-to-market factor, that is, the difference between the value-weighted portfolio returns of high book-to-market and low book-to-market stocks & UMD τ is a momentum factor, that is, the difference in the returns of stocks with a positive return momentum and those with a negative return momentum over months τ-12 and τ-2.
The alpha differentials (differences in alphas) are statistically tested using two approaches. Alphas for groups in the same year, that is, Year After or Year Before, are compared using daily differences in gross returns, which are regressed on four factors according to Equation (3). An intercept from this regression returns the difference in alpha, and a t-test indicates whether this difference is statistically significant. To compare excess returns between Year After with Year Before, the seemingly unrelated estimation is accompanied by a test for significant differences in the intercepts from various regressions (suest and test procedures in STATA). 4 Even though all reported excess returns and alpha differentials are calculated on a daily basis, we report figures in monthly values by multiplying daily values with 21 trading days.   portfolios, while the third row (Long-Short) presents the total return on all of the recommendations for a particular group, which is the Long minus the Short portfolio returns. As we can see in Table 6, Panel A, the Long-Short portfolio of Stars, with monthly alphas of +0.53 %, performed insignificantly different from the Non-Stars, with monthly alphas of +0.47 %, leading to a statistically insignificant difference of 0.06 percentage points in abnormal returns for a Long-Short portfolio in the year after rankings were published. For alpha differentials among all portfolios in the Year After, only Long portfolio of Stars is statistically different from that of the Non-Stars. As can be expected, during the evaluation year (Panel B in Table 6), Stars had higher recommendation returns, of +1.08 %, than Non-Stars, with +0.40 %. When we analyze the differences in returns from the Year Before to the Year After, as reported in Panel C, we conclude that the Stars do not continue to perform on the same level, which is reflected as a significant difference in the returns on their Long, Short and Long-Short portfolios, while the group of Non-Stars had an insignificant difference in the returns on their Long, Short and Long-Short portfolios. Hence, we conclude that only Buy and Strong Buy recommendations of the group of Stars persistently outperform those of their nonstar peers, although the Stars show a decrease in their performance in the Year After. Table 7 shows the excess returns from recommendations issued by entire groups of stars: As can be seen in Panel C of Table 7, Long, Short and Long-Short portfolios from WSJ and STM-TSP stars exhibit the greatest significant decrease in performance after election as a star, that is −1.48 % for WSJ, and −1.37 % for STM-TSP for their Long-Short portfolios. This decrease can be explained as the regression to the mean, which shows that it is very difficult to issue recommendations consistently generating portfolios with very high abnormal returns. At the same time, STM-TEE shows persistence because their Long-Short portfolio in the Year Before is insignificantly different to the Year After (albeit the group of STM-TEE has significantly different performance for their Short portfolio, as we can see in Panel C). For I/I stars, the drop is −0.26 %. Table 8 shows the average monthly excess returns for the top-ranked analysts (number-one ranked analysts) for the Year After election (Panel A) and the Year Before (Panel B) and the difference between the Year After and the Year Before (Panel C). We find that the Hold portfolios for all groups and all time periods perform insignificantly different from the market, having alphas insignificantly different from zero.

Results and discussion
In the Year After election, the groups of STM-TEE-1 and I/I-1 stars show positive and statistically significant alphas for their Long-Short portfolios, while STM-TSP-1 and WSJ-1 stars performed on the market level, which is explained by relatively low and insignificant alphas for their Short portfolios. The highest return, and the only statistically significant one among the Short portfolios in the Year After, was generated by the STM-TEE-1 group, with −0.70 %. Excess returns of Long portfolios for all groups in the Year After are positive and significantly different from zero. Panel C shows that Long, Short and Long-Short portfolios from STM-TSP-1 and WSJ-1 stars show a statistically significant decrease in performance from the Year Before to the Year After, while I/I-1 and STM-TEE-1 exhibit persistence from the Year Before to the Year After. Hence, comparing the returns in the Year After election with the Year Before election in Panel C of Table 8, the returns of STM-TSP-1 and WSJ-1 decrease, with a significant difference of −1.61 and −2.67 % between alphas in the Year After and Year Before, respectively. Table 7 Average monthly abnormal returns (alphas) for each group of star analysts. Rankings by Institutional Investor (I/I), Thomson Reuters' StarMine BTop Earnings Estimators^(STM-TEE) and BTop Stock Pickers( STM-TSP), and The Wall Street Journal (WSJ). Portfolios are built according to recommendations: when a new recommendation is announced, $1 is invested in the recommended stock by the end of the trading day (or on the next trading day if the recommendation is issued after the closing of trading or is announced on a non-trading day), and the stock is held for one year or until the same analyst changes his or her recommendation or drops coverage, in which case the stock is withdrawn by the end of that trading day. All figures are obtained as intercepts from the regressions of the daily returns time series on four standard risk factors (Carhart' In Tables 9 and 10, we report the alpha differentials obtained by comparing the abnormal returns between groups of stars in the Year Before and Year After. In both tables, we report Table 8 Average monthly abnormal returns (alphas) for each group of number-one ranked analysts. Rankings by Institutional Investor (I/I), Thomson Reuters' StarMine BTop Earnings Estimators^(STM-TEE) and BTop Stock Pickers^(STM-TSP), and The Wall Street Journal (WSJ). Indexation by -1 signifies a group of number-one ranked analysts. Portfolios are built according to recommendations: when a new recommendation is announced, $1 is invested in the recommended stock by the end of the trading day (or on the next trading day if the recommendation is issued after the closing of trading or is announced on a non-trading day), and the stock is held for one year or until the same analyst changes his or her recommendation or drops coverage, in which case the stock is withdrawn by the end of that trading day. All figures are obtained as intercepts from the regressions of the daily returns time series on four standard risk factors (Carhart's four-factor model) Table 9), we find that the groups of I/I, STM-TSP and WSJ stars and their number-one sub-groups significantly outperformed Non-Stars, while returns of the STM-TEE and number-one ranked STM-TEE-1 had insignificant differences with Non-Stars. Similar results were observed for the Short portfolios (Panel B in Table 9), where only STM-TSP and WSJ as well as their number-one sub-groups significantly outperformed Non-Stars. When we combine the performance of the Long and Short portfolios, that is the Long-Short portfolio (Panel C), we can see that STM-TSP and WSJ stars outperformed the Non-Stars. Such a finding is expected considering the similarity of the methods used by STM-TSP and WSJ rankings (recommendation-based) and our performance measures.
In line with the fact that STM-TSP and WSJ are based on the investment value of recommendations, and the methodologies of I/I and STM-TEE do not focus on the performance of the recommendations, we find that the Long, Short, Long-Short portfolios from the groups of STM-TSP and WSJ and their sub-groups of the number-one stars perform significantly better than both groups of I/I and STM-TEE and their number-one stars I/I-1 and STM-TEE-1 (Panel A, B and C in Table 9). Considering the returns for the Long and Short portfolios in the Year Before, the Long-Short portfolios represent similar patterns: stars ranked by the methodologies that take into account investment value of recommendations (STM-TSP and WSJ) outperformed those that do not use the recommendations in their election methods (I/I and STM-TEE), with those differences being statistically significant. This result is expected given the evaluation methodologies applied to rank stars, since both STM-TSP and WSJ rankings are objectively measuring the investment value of the recommendations, whereas STM-TEE focuses on the accuracy and timing of earnings forecasts, and the I/I ranking uses investment value as one of several attributes in its subjective methodology (survey) to select stars.
Comparing the returns of the Long portfolios in the Year After of all of the groups of stars with those of Non-Stars (first column in Panel A of Table 10), we find that only returns of I/I-1, STM-TSP and STM-TSP-1 stars significantly outperformed those of the group of Non-Stars at the 10 % significance level, while entire groups of I/I, STM-TEE, and WSJ as well as numberone ranked stars from STM-TEE-1 and WSJ-1 did not significantly differ from Non-Stars. Analyzing Panel B of Table 10 for the Short portfolios in the Year After, we find that the differences between the excess returns of all of the groups of stars and those of Non-Stars were insignificant, except for I/I who underperformed, and STM-TEE who outperformed, the Non-Stars. This high performance of the Short portfolio for the STM-TEE and low performance for I/I was reflected in how the returns of these groups differ from the others. Hence, the differences in returns among most of the Short portfolios in the Year After are insignificant, except for I/I being significantly lower than some other groups of stars, and STM-TEE being significantly better than STM-TSP and their sub-group of STM-TSP-1. The results for the Long and Short portfolios explain why Long-Short portfolios of I/I underperformed Non-Stars and some of the groups of stars, while the STM-TEE and their number-one analysts performed significantly better than the Non-Stars and STM-TSP. We interpret this result to reflect the importance of making accurate earnings forecasts 5 in order to predict a future price decrease, which facilitated the outperformance of the Short portfolios of the STM-TEE stars in order to outperform Non-Stars as well as STM-TSP stars.
By analyzing the persistence in the performance for the Long portfolios from the Year Before to the Year After, we conclude that the groups of I/I-1, STM-TSP and STM-TSP-1 performed significantly better than Non-Stars in both the Year After and Year Before (Panel A in Tables 9 and 10), while the group of STM-TEE with their sub-group of number-one ranked Table 9 Alpha differentials in the Year Before calculated as the difference in the excess return from the vertical group minus the excess return for a horizontal group of stars. Rankings by Institutional Investor (I/I STM-TSP-1 1.38 *** 1.38 *** 1.19 *** -0.13 -0.33 0.92 ** 1.20 *** WSJ-1 2.85 *** 2.84 *** 2.65 *** 1.33 *** 1.13 *** 2.38 *** 2.66 *** 1.46 *** *** p < 0.01. ** p < 0.05. * p < 0.1 STM-TEE-1 stars had insignificantly different returns from Non-Stars before and after rankings were announced. Additionally, it is important to mention that the difference in the returns of the Long portfolios in the Year Before (Panel A in Table 9) among the entire groups of star analysts shows that, while STM-TSP and WSJ stars and their sub-groups of number-one STM-TSP-1 and WSJ-1 stars significantly outperformed I/I, STM-TEE and their number-one subgroups, the differences in returns among all of the groups in the Year After are insignificant   Table 10). This result confirms the assumption that, in most cases, there is the regression to the mean, which explains why in the Year Before the differences in returns were mostly statistically significant while in the Year After all of the groups of stars perform insignificantly different from each other. Figure 1 shows a comparison of frequency of months when a particular group appears to be the best group compared with other groups within the same comparison pool. For example, using raw monthly returns (calculated as geometrically-aggregated daily returns within a given month), Stars are compared with Non-Stars: the number of months when Stars outperformed Non-Stars is divided by the total number of months in the sample period. These results are in line with the abnormal returns analyzed above. We observe that Long portfolio of Stars outperformed Non-Stars for 59 % of the months in our sample period. However, the Short portfolio of Stars outperformed that of the Non-Stars during only 46 % of the months. In the pool with the entire groups of stars for both Long and Short portfolios, STM-TEE, STM-TSP and WSJ stars show very similar frequencies of months when a given group outperformed the others, and only I/I stars had a substantially lower number of months than the other three groups. For top-ranked analysts, the Long and Short portfolios of all sub-groups of stars exhibit similar patterns with the WSJ-1 analysts having the highest frequency for both portfolios.
Using the returns obtained by comparing the portfolio returns in the years after election and analyzing the frequency of months in which particular groups outperformed the others, we conclude as follows: Firstly, we document that only the Long portfolio of the group of Stars which has all unique names of ranked analysts from I/I, WSJ and StarMine rankings performs better than that of the Non-Stars. Note, though, that Short and Long-Short (Long minus Short) portfolios from both groups have statistically insignificant differences in returns. Our finding that the group of Stars outperforms Non-Stars only for Buy and Strong Buy but not for Sell and Strong Sell recommendations is consistent with a hypothesis that reputation helps in mitigating conflicts of interest. According to Fang and Yasuda (2009), reputation effects from star rankings can improve the quality of analyst research and this improvement might happen more for favorable ratings like Buy and Strong Buy, which are more likely to be conflicted (Michaely and Womack 1999;Barber et al. 2006).
Considering that, in the year before election, Stars had significantly higher returns for their Long-Short portfolios and the differences between Long and Short portfolios were greater than in the year after, we conclude that, on average, Stars are not able to repeat the observed high excess returns after they were selected as Stars. However, this result does not imply that all rankings' methodologies have the same low predictive power of the future returns.
Secondly, we investigated the differences among subjective (I/I) and objective (StarMine and WSJ) rankings and found that the returns of the recommendations from analysts ranked by the subjective ranking from the Institutional Investor had relatively low returns in the Year Before as well as in the Year After election. Indeed, most of the portfolios from I/I stars underperformed the other groups of stars from objective rankings and even underperformed the non-star peers. Though this conclusion might be essential for the investors who are concerned about the investment value of recommendations, it is important to remember that the Institutional Investor ranking does not consider the recommendations as the primary evaluation criteria for selecting analysts. According to a list of attributes mentioned by the Institutional Investor magazine and to Table 1 in Bagnoli et al. (2008), stock selection is not the number one criteria institutions are looking for when ranking analysts. In fact, stock selection is typically not even among the top five attributes that institutional investors are looking for. Thus, our findings do not contradict the intentions of the I/I ranking (that is, to evaluate subjectively the services provided by sell-side analysts).
An additional point to consider concerns conflicts of interest, which analysts face if they want to be selected by the subjective rankings, and which could reduce the value for investors

Number-one ranked stars
Long Short Thus, analysts may issue overoptimistic recommendations on a stock that a particular buy-side client is overweighting. While this might lead to the poor performance of the analyst's stock picks, it might help to buy some votes in the coming I/I poll. This reasoning goes in line with the previous findings by Mola and Guidolin (2009), who reported that I/I star analysts are more prone to issue toooptimistic ratings for the stocks that affiliated mutual funds hold. Among the objective rankings, we find significant differences in their predictive power, which we discuss as the persistence of the performance from the Year Before to the Year After election. We document a significant decrease in performance from the Year Before to the Year After for the analysts who were ranked according to the investment value of the recommendations (STM-TSP and WSJ). This result might be explained by the low predictive power of such recommendation-based election methods which lead to the regression to the mean, whereby the previous year's best performers should exhibit results that are closer to the average in subsequent years. However, we observed significantly positive returns for the Long portfolios of the STM-TSP and STM-TSP-1 stars, which outperformed Non-Stars in the year after election, even though there is a decline in performance compared with the evaluation year. Overall, our results confirm the previous findings that reported low predictive power for the methods which focus solely on the past performance of the recommendations (Emery and Li 2009).
In contrast to the results for objective recommendation-based rankings, for STM-TEE ranking that is based on the accuracy and timing of earnings forecasts, the returns for Long-Short portfolios in the Year After differ insignificantly from those of the Year Before. More important is that this group outperforms Non-Stars and some other groups of stars (I/I and STM-TPS) in the year after election. We document that this outperformance is explained by the high excess returns for the Short portfolio of STM-TEE, while their Long portfolio was insignificantly different from the Non-Stars as well as those of the other groups of stars. This conclusion emphasizes the importance of accurate earnings forecasts being used in the valuation models in order to predict future stock prices, especially the decrease in prices, which lead to the outperformance among Short portfolios.
Overall, we conclude that the stock-picking skill is difficult to capture by focusing only on the performance of the recommendations over a one-year horizon. However, methodologies which consider other fundamental skills, such as earnings forecasts that are necessary for successful stock picks, gave much higher predictive power for the performance of future recommendations, even though the previous year returns by our evaluation methodology for those earnings-based stars were not as high as for the recommendation-based rankings.

Robustness test
In this paper we employ a method used by other researchers (Barber et al. 2007;Fang and Yasuda 2014) holding the stocks in the portfolio for one year or until the next recommendation change from the same analyst, whichever happens earlier. Our baseline analysis considers all recommendation changes issued by all analysts listed in different rankings. In this section we discuss the results from imposing additional filters on the samples of recommendations and analysts. Also, we discuss alternative holding periods and construction of the Short portfolios.
First, we exclude the recommendation changes which were issued on the same days as earnings announcement days in order to avoid contamination in measuring analyst stock-picking skill. While the previous study by Li et al. (2015) documents that the contemporaneous-with-news recommendation changes have higher market impact than the earnings announcements themselves since such recommendations help investors interpret the news, some of such recommendations might simply be piggybacking on the news without providing more value to investors. Excluding the recommendations that were issued on the same days as the earnings announcement dates obtained from the Compustat database (accounted for 3.4 % of recommendations), we find that our main conclusions remain the same, though the alphas were insignificantly decreased. Excluding the recommendations that were issued within a three-day window around earnings announcements (17.3 % of recommendations), we found slight changes in conclusions from our baseline analysis. Specifically, the Long portfolios of I/I in the Year After performed significantly better than the Non-Stars; also, the differences in returns for Long-Short portfolios among STM-TEE and STM-TSP became insignificant. However, the results of the comparison for the Long-Short portfolios of each group of stars with the Non-Stars remain unaffected: STM-TEE outperformed, while I/I underperformed the group of Non-Stars.
Second, we repeat our calculations using two filters on our sample of analysts: (i) nonoverlapping sample of stars; and (ii) in each industry, limit the number of stars to one, two, and three only (applies to I/I and WSJ, while StarMine lists remain unchanged). While the nonoverlapping samples exclude those analysts who are performing well according to some or even all evaluation methods (subjective or objective), such a limitation might help to distil the differences between rankings' performance. By considering only three stars in each industry in I/I and WSJ lists, we try to reduce the differences in the number of analysts in each group of stars and eliminate the influence of bottom-ranked analysts (runners-up) on our conclusions. In both cases, we focused on the differences among groups of stars and found that, in the case of focusing on only three stars per industry, our main conclusions remain the same. After examining the results for non-overlapping samples, we found that the highest change was for the groups of I/I and STM-TEE stars whose Long-Short portfolio returns were lower as compared to our baseline analysis. The decrease of the returns for some portfolios for nonoverlapping groups of stars led to an insignificant difference in performance between groups of STM-TEE and I/I with STM-TSP stars. Since some star analysts appear in several rankings in a given year and the results are lower with non-overlapping samples of STM-TEE and I/I, we interpret this fact as an indication that the best analysts are selected as stars in several rankings at the same time.
Third, we use some alternative portfolio constructs: (i) we take the immediate year before rankings were announced instead of the previous calendar year; and (ii) we include Hold recommendations into Short portfolios, thus having only two portfolios to consider, namely Long and Short.
We take an immediate year before the rankings' announcement to use data that are more current and relevant. The fact that I/I and StarMine rankings are announced around October each year, and WSJ publishes its list of stars in May, potentially creates a bias, as information used for analysis is relatively stale for I/I and StarMine compared to WSJ. After we repeat our calculations for a new time-frame of the Year Before, we find that the excess returns for I/I changed insignificantly, while the returns for WSJ and STM-TSP decreased (especially for STM-TSP, since STM-TSP is published in October, the overlap of the immediate year before ranking is announced with the previous calendar year is lower than for WSJ, which is published in May). This decrease in the performance for recommendation-based rankings led to insignificant differences between the returns for Long-Short portfolios of STM-TSP and WSJ with those of the STM-TEE in the Year Before. Also, the difference among Long-Short portfolios for STM-TSP and I/I stars became insignificant. These results are in line with our main findings as they show that high excess returns were documented only during the same year as the evaluation year, while they regressed to the mean in subsequent time periods.
We also tested alternative portfolio constructs by including Hold recommendations into the Short portfolio as in Barber et al. (2007) and in Fang and Yasuda (2014). Taking into account that analysts are reluctant to issue sell recommendations (Barber et al. 2007), investors might treat Hold ratings as a signal to sell, especially when this Hold is a downgrade from Buy or Strong Buy. While this might lead to a negative price reaction on some Hold recommendations, the upgrades from Sell/Strong Sell to Hold are expected to generate near market returns (Barber et al. 2010). We repeated our calculations and found that the alpha levels for Short portfolios that contain Hold/Sell/Strong Sell recommendations on a one-year horizon were close to zero for almost all portfolios. 6 That result is expected considering that such Short portfolios had about 90 % weight in Hold recommendations, and according to our results for BLong, Hold, Short^portfolio calculations, Hold portfolios have alphas insignificant from zero for all groups of analysts (except for the Hold portfolio of I/I stars, see Panel A in Table 7).
Additionally, we reconstructed all portfolios using shorter holding periods, such as 30 days as in (Fang and Yasuda 2014) and realized that the Short portfolios (Sell/Strong Sell recommendations) for some groups of stars had a very low number of stocks. Thus, Short portfolios of STM-TSP had only 9.1 stocks on average, STM-TEE had 7.5 stocks, and portfolios for the number-one ranked StarMine analysts had on average fewer than 5 stocks. We also tested other time periods within a horizon of one year (namely, 45, 60, 90, 180 days) and found that up to 180 days, the Short portfolios from the number-one ranked stars had a relatively low number of stocks (less than 10). Hence, those portfolios were highly undiversified and their excess returns were unreliable. Consequently, we decided to stick to a one-year horizon for considering the recommendations to be valid if they are unchanged, and we found that for short portfolios the number-one stars had on average 20 stocks and for the entire groups of stars the average number was more than 50 stocks during the investigated time period.
Finally, we tested combinations of different restrictions, such as a non-overlapping sample of only three stars in each industry, and excluding the recommendations on earnings announcement days. We find that all possible filters affect the alpha levels, but do not influence our main conclusions on the differences between groups of Stars and Non-Stars, while the differences among Long-Short portfolios for the groups of stars become statistically insignificant. All the results of the robustness tests remain unpublished and are available upon request from the authors.

Conclusion
The goal of this study was to determine whether star rankings can be employed as an indicator of the future profitability of analysts' recommendations. By using a unique database for the period from 2003 to 2014, we find that sell-side analysts indeed issue profitable recommendations. This conclusion is supported by the previous research of Mikhail et al. (2004), who find that sell-side analysts are persistent in issuing profitable recommendations. Our results challenge the finding by Emery and Li (2009) that star rankings are largely Bpopularity contests^. In our study, we found that only Buy and Strong Buy recommendations from the entire group of Star analysts outperform those of the Non-Stars in the year after election, while Sell and Strong Sell recommendations performed as those of the Non-Stars. While Stars had significantly higher returns for their portfolios during the evaluation year, the group of Stars was not able to repeat the excess returns at the same level after rankings were announced.
After investigating the differences among subjective (I/I) and objective (StarMine and WSJ) rankings, we found that the returns of the recommendations from analysts ranked by the subjective ranking from I/I underperformed most of the other groups of stars as well as the group of Non-Stars. However, this finding does not contradict the intentions of the I/I ranking, whose methodology does not focus on the investment value of recommendations. Among objective rankings, we found that the most persistent results were observed for the group of STM-TEE analysts, who were selected based on the accuracy and timing of the earnings forecasts. Their recommendations outperformed the groups of stars from STM-TSP and I/I as well as Non-Stars in the year after election. We document that the recommendation-based STM-TSP and WSJ rankings show a significant decrease in performance from the evaluation year to the year after rankings were published, thus confirming the previous findings that reported a low predictive power for the methods which focus solely on the past performance of the recommendations (Emery and Li 2009).
In summary, the choice of which analysts to work with is of great importance for the longterm growth of an investor's portfolio. Our results show that the stock-picking skill is difficult to capture by focusing only on the performance of the recommendations over a one-year horizon. In our study, we provided empirical evidence regarding which star rankings of sellside analysts a potential investor should have relied on, namely, the StarMine's BTop Earnings Estimators^. We find it comforting that estimation of future earnings is important for predicting portfolio returns, since valuation models used for valuing stocks are built on a company's future earnings. In conclusion, our results show that stock-picking ability reflects a set of skills that can be captured using more fundamental evaluation methods such as those that consider earnings forecasts.