What reviews foretell about opening weekend box office revenue: the harbinger of failure effect in the movie industry

We empirically investigate the harbinger of failure phenomenon in the motion picture industry by analyzing the pre-release reviews written on movies by film critics. We find that harbingers of failure do exist. Their positive (negative) pre-release movie reviews provide a strong predictive signal that the movie will turn out to be a flop (success). This signal persists even for the top critic category, which usually consists of professional critics, indicating that having expertise in a professional domain does not necessarily lead to correct predictions. Our findings challenge the current belief that positive reviews always help enhance box office revenue and shed new light on the influencer-predictor hypothesis. We further analyze the writing style of harbingers and provide new insights into their personality traits and cognitive biases.


Introduction
Reviews play a significant role in various industries and business settings. In retailing, for example, reviews can be directly related to the ultimate success or failure of a product (Ho-Dac et al., 2013;Chevalier & Mayzlin, 2006). In the motion picture industry, it is hypothesized that movie critics can play a dual role. They can act as influencers when their reviews influence consumers' movie-viewing behavior, and/or they can act as predictors when they merely predict it. Previous research on this question produced mixed conclusions. Eliashberg and Shugan (1997) find that critics can correctly predict box office performance but do not influence it. Basuroy et al. (2003), on the other hand, find that critics can also influence the outcome of a movie during the early stages of a film's run. In this work, we offer new findings to the influencer-predictor hypothesis by empirically investigating this question in a pre-release movie setting and demonstrate that a distinction should be made among film critics.
Pre-release screenings are a common practice in the motion picture industry. Producers and distributors invite film critics in order to test the wider audience appeal of the film. The film critics then publish their pre-release reviews in mediums like Rotten Tomatoes so people can read the critics' opinions about the quality of the upcoming movie. However, critics' opinion is not always in alignment with the public taste as anecdotal evidence suggests. 1 The movies Baywatch and Tomb Raider, for example, received positive pre-release reviews but generated dismal opening box office results. On a seemingly unrelated research topic, Anderson et al. (2015) demonstrated the existence of certain customers in the retail industry, called harbingers of failure, whose early adoption of a new product is a strong predictor that the product will fail in the market. Motivated by this finding, we empirically investigate in this work whether the harbinger phenomenon extends to the motion picture industry. Does the positive feedback given by some film critics before the release of a movie signal its financial failure instead of its future success as currently believed?
To address this question, we use pre-release movie reviews from Rotten Tomatoes in combination with other movie-related data collected from different data sources. Our empirical findings document the existence of harbingers of failure in the motion picture industry. We identify these harbingers and show that they systematically misclassify flop movies as successful. Their positive pre-release reviews of a movie provide a strong predictive signal that the movie will fail financially in its opening weekend box office revenue. This finding persists even when we account for sleeper movies. Sleeper movies are critically acclaimed movies, such as The Shawshank Redemption and Children of Men, that tank in their opening weekend box office revenue but eventually become successful. Moreover, we find the harbinger effect to be symmetric. That is, there are harbinger critics that review a movie negatively and then the movie turns out to be successful. We then investigate if the harbinger phenomenon still holds true for the top critic category, which usually consists of professional and experienced critics. Surprisingly, we find a strong harbinger effect indicating that having expertise in a professional domain does not necessarily lead to correct predictions.
To further gain insights into the personality traits of harbinger film critics, we use text analytics to analyze their writing style and compare it with that of non-harbinger critics. We find that harbinger critics employ a formal and analytical style of writing and have a lower rate of self-reference pronouns compared to non-harbingers. This indicates that harbinger critics show overconfidence in their abilities, while nonharbinger critics are more self-reflective about the audience's opinion and convey more confidence in their reviews. Interestingly, we find that these differences persist when we stratify the analysis by the critic's status category. In addition, we find that top critic harbingers are also using more adverbs than their counterparts in their positive reviews, a trait that is associated with extraversion and over-optimism (Tausczik & Pennebaker, 2010a).
To sum up, the main contributions of this work are as follows. First, we document the harbinger effect in the motion picture industry. This documentation sheds new light on the influencer-predictor hypothesis (Eliashberg & Shugan, 1997). We find that, in a pre-release setting, a distinction should be made among film critics. In particular, there exist harbinger critics who do not influence the outcome of a movie but rather mispredict it. Second, we add to the existing literature on harbingers with the fact that the harbinger phenomenon is not only practiced by everyday consumers, but also by professionals with expertise domain knowledge. Last, we analyze the writing style of harbingers and provide insights into their personality traits. Previous research has used transactional data to provide correlational evidence that the mechanism behind the harbinger phenomenon is preference minorities (Waldfogel, 1999). In our case, we have text data, a much richer source of data that we use to offer evidence about the intrinsic personality characteristics of harbingers that drive their choices.

Harbingers of failure in the motion picture industry
Several works have explored the effect that movie reviews have on box office revenue (Reddy et al., 1998;Basuroy et al., 2006;Boatwright et al., 2007;Dellarocas et al., 2007;Duan et al., 2008;Chintagunta et al., 2010;Song et al., 2019). Litman (1983) suggested that movie critics' positive reviews can influence the popularity of a movie in the early weeks of its release, but never tested this hypothesis due to the absence of opening box office data. Eliashberg and Shugan (1997) first proposed and tested the two different roles that movie critics can play, namely influencers and predictors. They find that movie critics can predict the outcome of a movie but do not influence it. Basuroy et al. (2003) break down this hypothesis across the stages of a movie's lifetime and find that movie critics' opinions can have an influencer effect in the early stages of a film's run before the word-of-mouth mechanism emerges. Moreover, movie critics' reviews are predictive of the box office outcome on the entire run of the movie. Both their positive and negative reviews are significantly correlated with box office revenue, with the impact of negative reviews (but not positive reviews) diminishing over time. Anderson et al. (2015) first documented the existence of harbingers of failure. These are consumers whose early endorsement of a new product is a strong signal that mainstream adoption of the product will not follow. They also find the harbinger effect to be symmetric. That is, there are customers who tend to avoid buying successful products. Simester et al. (2019) later documented the existence of the harbinger phenomenon in political campaign donations and zip codes. Both of these works study groups of customers who do not influence the decisions of others. Our work diverges and expands on this subject by investigating how the critics' reviews could influence the general audience in a pre-release movie setting.
Combining the above literature, we expect to find harbingers of failure in the motion picture industry, such that their positive pre-release movie reviews will signal the movie's failure at the opening box office, while their negative pre-release movie reviews will signal the movie's success at the opening box office.

Writing style and personality traits
Previous research has shown that the writing style of a person reflects the person's personality (Pennebaker & King, 1999;Küfner et al., 2010). There are three different writing styles, namely formal, analytic, and narrative, each associated with different personality traits and cognitive biases. In what follows, we develop our hypotheses on the potential writing style of harbinger and non-harbinger critics in the motion picture industry and link each writing style with their potential cognitive biases.

Extraversion
Extraversion is a major trait in the most prevalent theories of personality, such as Eysenck's three-factor model (Eysenck, 1991) and Costa and McCrae's five factor model (Costa Jr & McCrae, 1992). Optimism bias is linked to extraversion and is the tendency of people to place a greater probability in experiencing positive events while underestimating the probability of negative events. Studies report that a large majority of the population display optimism bias across many different settings (Sharot, 2011). Moreover, people maintain their overly positive expectations even in the face of disconfirming evidence because they tend to update their beliefs more in response to positive than negative information about the future. On the basis of these findings, we hypothesize that harbinger movie critics will display optimism bias in their positive pre-release movie reviews. In terms of writing style, extravert people tend to use fewer adjectives and more adverbs (Tausczik & Pennebaker, 2010b). Hence, we expect harbinger critics to use fewer adjectives and more adverbs in their positive reviews compared to non-harbinger critics in order to express their aspects of optimism. We propose the following hypothesis: Hypothesis 1: Harbinger movie critics will use fewer adjectives and more adverbs in their positive reviews compared to non-harbingers to express their aspects of overoptimism.

Conscientiousness
Conscientiousness is another major personality trait. Conscientious people tend to be self-reflective, efficient, and organized (Barrick & Mount, 1991). Being a conscientious movie critic can be rewarding since the critic can put aside personal biases and reflect on how the audience will perceive a movie. Hence, we expect non-harbinger critics to be conscientious. In contrast, we hypothesize that harbinger critics will be more concerned with status and power, and be less self-reflective. Writers with these attributes tend to employ a formal writing style (Pennebaker et al., 2003). Thus, we expect harbinger critics to employ a more formal writing style compared to nonharbingers. The formal writing style is characterized by an impersonal, objective, and precise use of language, and is associated with a lower rate of self-reference pronouns, fewer fillers, and fewer nonfluencies (Davis & Brock, 1975). It also includes a higher rate of uncommon words and hyphens. We propose the following hypothesis: Hypothesis 2: Non-harbinger movie critics are more conscientious and selfreflective about general audience preferences compared to harbinger critics.

Overconfidence
Overconfidence bias is the tendency of people to overestimate their abilities relative to others (West & Stanovich, 1997). In the case of movie reviewers, this means that harbingers will place much more weight on their own opinion and believe that the general audience will agree with them. People who tend to be overconfident generally use I-words at very low levels. Hence, we expect harbinger critics to have a lower rate of self-reference pronouns, such as I, me, and my. In contrast, we expect nonharbingers to have a higher rate of first-person singular pronouns to convey more confidence in their reviews and a lower rate of first-person plural pronouns such as we, us, and our (Slatcher et al., 2007). High usage of we-words in movie reviews can be perceived by the audience as cold and not authentic. In summary, we propose the following set of hypotheses: Hypothesis 3: Harbinger movie critics are more overconfident in their positive reviews than non-harbinger critics.
Hypothesis 4: Non-harbinger movie critics are conveying more confidence in their reviews compared to harbinger critics.

Data overview
To conduct this research, we combined data sets from three different data sources. First, we collected movie-related variables including movie title, genre, studio, whether the movie is a sequel, release date, opening weekend revenue, and gross domestic revenue from Box Office Mojo. 2 Second, we augmented our movie-related information by collecting the movie budgets from The Numbers. 3 We use this information to define our movie success metrics. Last, we obtained our review-related information from Rotten Tomatoes. 4 Rotten Tomatoes is an aggregator review website for movies and television shows. It aggregates reviews across different mediums (e.g., blog posts, and websites) in order to increase their usefulness. The company was launched in 1998 by three undergraduate students at the University of California, Berkeley, and it is now owned by the American ticketing company Fandango. It attracts around 26 million unique monthly visitors globally, out of which 14 million are in the USA. To become a critic for Rotten Tomatoes, you have to go through a vetting process that occurs twice a year, and you need to produce consistent review output for at least 2 years. The critics must display journalistic integrity and ethical behavior in their reviews. Rotten Tomatoes also has a top critic designation that is given to selected candidates after careful evaluation by their advisory committee. The requirements to become a top critic are stringent. For example, you need to have written reviews professionally for a minimum of 5 years and you also need to have a verifiable social media profile with a large audience. The long-term tenure of critics in Rotten Tomatoes is crucial for our investigation of harbingers, since in order to show their repeated tendency to misclassify flop movies, we need access to their long-term review data. Our review data include the critic's name, the date that the review was posted on Rotten Tomatoes, whether the critic is a top critic or not, the review's text, and the Fresh Or Rotten rating. The latter variable is worth further explaining. Since Rotten Tomatoes aggregates reviews across different mediums, these reviews come with different grading scales. For example, some critics use a 4-point scale, some use a 10-point scale and others provide a percentage. To avoid comparing grades from across different scales, a team of curators at Rotten Tomatoes reads the reviews and classifies them as Fresh if they are positive, or Rotten if they are negative. Therefore, we also use the Fresh Or Rotten rating to classify our reviews into positive or negative in our analysis. After merging all the aforementioned data, we end up with 448 movies released from 2015 to 2019, and 28,884 pre-release reviews written about them. A detailed overview of our final dataset and of all the variables used in our models can be found in Appendix A.

Defining whether a movie is a success or a flop
We first need to classify a movie as being successful or not. After consulting with movie industry experts, we classify a movie as successful if its Return on Investment (ROI) is greater than 0.5, where the ROI is calculated by dividing the opening weekend revenue of the movie over its production budget. In other words, a successful movie earns more than half of its production budget in the opening weekend. More formally, we have that: where ROI = Opening Weekend Revenue Production Budget Overall, 44.86% of the movies in our dataset are successful.

Defining whether a movie critic is a harbinger
Following the methodology of Anderson et al. (2015), we adopt the flop affinity ratio to measure the propensity of a movie critic to review a flop movie positively. For each movie critic i, we calculate the proportion of positively reviewed movies that turned out to be flops over their total number of positively reviewed movies. That is:

Number of Flop Movies Reviewed Positively Number of Movies Reviewed Positively
For example, if a movie critic gave positive reviews to ten movies before their release and eight of them turned out to be flops, then his flop affinity ratio is 0.8. Critics with a low flop affinity rate indicate that their movie opinion coincides with public taste. On the other hand, a high flop affinity ratio implies that the critic is likely to be a harbinger of failure. Note here that the flop affinity ratio does not give a higher weight to critics with a higher total number of reviews. For example, there might be two critics with 0.5 flop affinity ratio, but one critic has written four reviews while the other critic has written twenty reviews. To this end, we also use an alternative grouping method as a robustness check. In this approach, we weigh the critic's number of positive/negative reviews with regard to their overall number of positive/negative reviews. The pattern of the results under this new grouping method remains unchanged. For more details, see Appendix B.
We then proceed to classify movie critics into five groups based on their flop affinity score. Group 1 represents the cohort with the lowest flop affinity scores, while group 4 is on the other end of the spectrum. Last, critics who reviewed no more than two movies before their release are placed in the other group. Since we cannot judge the harbinger effect of critics with such a limited number of written reviews, we exclude them from our analysis. Table 1 shows all the relevant information on the critics' grouping.

Modeling
To investigate whether the harbinger effect exists in the movie industry, we estimate two competing logistic regression models. More specifically, we have the following models: The unit of analysis for both models is a movie denoted by i. The dependent variable is the log odds of success of movie i as defined in Section 3.2.1. The key difference between the two competing models is in their independent variables. The TotalPositiveReviews in model 1 refers to the total number of positive reviews on movie i, and under this specification, all critics are treated equally. However, model 2 incorporates the harbinger effect by including all four groups of critics. For example, the variable Group1PositiveReviews i counts the number of positive reviews posted by the critics in group 1 for movie i.

Model evaluation
To evaluate our models, we split our data into two random non-overlapping sets across movies: the classification and the prediction set. The classification set contains 286 movies released from 2015 to 2017, out of which 111 (38.8%) are successful, while the prediction set contains 162 movies released from 2018 to 2019, out of which 90 (55.5%) are successful. Note here that both sets are balanced. We use the classification set to get our model estimates. Holding the movie critic groups stable, we then use the aforementioned model estimates to examine whether we can accurately predict the outcome of movies in the prediction set. The fact that our movie critic groups do not change over time ensures that if a harbinger effect indeed exists, then that will mean that harbinger critics repeatedly get it wrong.
To measure the performance of our models in the prediction set, we report the AUCPR score. The primary reason we chose to report this metric is that AUCPR does not care about true negatives, and is much more sensitive to true positives, false positives, and false negatives than AUC. In our case, a true negative corresponds to the case where a movie critic predicted that a movie would be a flop and turned out to be a flop; hence, it is not relevant to the harbinger effect. We also report the F1 score to evaluate the robustness of the findings.

Harbinger effect
We present the results of our models in Table 2. The coefficients of interest in equations 1 and 2 are estimated using the 285 movies in the classification set. We observe that model 2 has a significant improvement over model 1 in their goodness of fit, as shown by the chi-square test on the likelihood ratio of the two models. Moreover, the coefficient estimates of group 1 and group 4 in model 2 turn out to be statistically significant. This means that reviews from these two critic groups are informative regarding the financial outcome of the movie. However, group 1 has positive marginal effects, while group 4 has negative, indicating the existence of harbingers of failure among the movie critics. More specifically, we see that the expected change in log odds for a one-unit increase in the reviews of group 1 is 0.59. Equivalently, a one-unit increase in the reviews of group 1 increases the odds of a movie being successful by 80%. On the other hand, the expected change in log odds for a one-unit increase in the reviews of group 4 is −0.47. In other words, a one-unit increase in the reviews of group 4 decreases the odds of success by about 37%. We should note here that the addition of control variables as shown in Appendix A does not change the results. Table 2 also shows the performance of our models in the prediction set. The AUCPR of model 2 is .70 compared to 0.56 of model 1, a relative improvement of 25%. Moreover, we also get an outstanding boost on the F1 score from 0.22 to 0.57, a relative improvement of 159%. As a robustness check, we replicate the analysis to account for sleeper movies and report the complete findings in Appendix C. The pattern of findings is unchanged. Furthermore, the coefficients of group 1 and group 4 are now larger in absolute size indicating a stronger harbinger effect. The expected change in log odds for a one-unit increase in the reviews of group 4 is now −0.57, or equivalently it decreases the odds of success by about 43%. Last, we also investigate whether the harbinger effect is symmetric in the case of movies and find strong evidence that this is the case. More specifically, there are film critics that review a movie negatively and then the movie turns out to be successful. We report the complete findings in Appendix D. We observe in Table 11 that both the coefficient estimates of group 1 and group 4 in model (D2) turn out to be statistically significant. Group 1 in this case has negative marginal effects, while group 4 has positive. The latter indicates that the more negative reviews given by high-success avoidance critics, the more likely it is that the movie will be a success. Specifically, we see that the expected change in log odds for a one-unit increase in the negative reviews of group 4 is 0.75. In other words, a one-unit increase in the reviews of group 4 increases the odds by 111% for the movie being successful.

Writing style of harbingers
We use here the Linguistic Inquiry and Word Count (LIWC) text analytics software  to investigate if there are any differences in the writing style between harbinger and non-harbinger movie critics. The comparison revealed some interesting differences in their writing styles of positive and negative reviews, which are summarized in Table 3.
First, with regard to positive reviews, we find that harbinger movie critics exhibit a much lower rate of self-reference pronouns compared to non-harbinger critics. This indicates that harbinger critics show overconfidence in their abilities, while non-harbinger critics convey more confidence and leadership through their reviews providing support for our hypotheses 3 and 4, respectively. Moreover, harbinger critics have a very low rate of fillers, which in combination with the low rate of self-reference pronouns, indicates that they tend to employ a formal writing style. This provides support for our hypothesis 2 that harbinger critics are more concerned with status and power, and are less self-reflective about the audience's opinion. With respect to negative reviews, we observe an additional emerging pattern. Harbinger critics tend to employ a combination of analytical and formal types of writing. Analytical writing seeks to go beyond the mere presentation of facts to the reader by providing substantive analysis and evaluation of a topic. This type of writing, however, increases the cognitive complexity and might backfire when writing a negative movie review as it can potentially create confirmation bias among the harbinger critics. More specifically, they might use analytical thinking to list all the potential factors of failure of a movie, and write their movie review by sticking to information that confirms their preconceptions (Nickerson, 1998).
Motivated by the above findings, we investigate further to see if they still hold true when we stratify across critic status category. The results for positive reviews are presented in Table 4 and for negative reviews in Table 12. Surprisingly, we find that non-harbinger critics use more self-reference pronouns in their positive reviews than non-harbinger critics regardless of status category, but this difference becomes even stronger in the case of top critics. This indicates that non-harbinger critics are more self-reflective overall compared to harbinger critics who are overconfident in their abilities when writing a positive review. What is more is that in the top-critic category, harbinger critics use fewer adjectives and more adverbs. These two features have Table 4 This table reports the average % of writing style metrics and the t-test results for harbinger and non-harbinger movie critics stratified by status category and review sentiment Writing style comparison of group 1 and group 4 top critics -positive reviews Writing style comparison of group 4 and group 1 non-top criticspositive reviews been shown to be associated with extraversion (Dewaele & Furnham, 1999). Hence, we find support for our hypothesis 1 that top harbinger critics are over-optimistic in their positive reviews. With respect to negative reviews, we find that harbinger critics write in an analytical and formal style, whereas non-harbinger critics engage in a more narrative style of writing, but this difference becomes much stronger for the top-critic category.

Conclusions and discussion
Predicting a movie's box office revenue is one of the most fundamental needs in the motion picture industry. This becomes even more challenging when someone tries to predict the opening weekend box office revenue since there is little to no information available about the wider audience's reaction. As Cabral and Natividad (2016) show in their work, doing well at the box office during the opening weekend has an economically and statistically significant effect on the movie's eventual performance. Prior research has explored how different attributes of a movie, such as star inclusion (Elberse, 2007;Karniouchina, 2011;Liu et al., 2014), the activity level of editors and viewers of the movie's corresponding entry in Wikipedia (Mestyán et al., 2013), and competition among movies that are released at the same time (Ainslie et al., 2005;Hennig-Thurau et al., 2007;Delre et al., 2016), could predict the box office performance. With regard to online movie reviews, Moon et al. (2010) show that movie ratings from professional critics and viewers' communities are predictive of total box office revenue. Moreover, Basuroy et al. (2003) find that both positive and negative reviews are significantly correlated with box office revenue with the impact of negative reviews (but not positive reviews) diminishing over time. We diverge from this research and propose that a distinction should be made among movie critics because as we demonstrate not all positive (negative) reviews are a signal for success (failure). More specifically, we combine three different data sources to empirically investigate the harbinger of failure phenomenon in the motion picture industry. We analyze the pre-release reviews written on movies by film critics and find that harbingers of failure do exist. Their positive pre-release reviews provide a strong predictive signal that the movie will turn out to be a flop. Moreover, we find the harbinger effect to be symmetric. That is, there are harbinger critics who give negative reviews and the movie turns out to be successful. These findings shed new light on the influencerpredictor hypothesis. We document that in a pre-release setting, there is a portion of movie critics that neither influence nor predict the right outcome of a movie. On the contrary, the outcome of the movie turns out to be the exact opposite of their prediction.
We further analyze the writing style of film critics and connect them to potential cognitive biases that might give rise to the harbinger phenomenon. We find that harbinger critics engage in an analytical and formal style of writing and have a lower rate of self-reference pronouns compared to non-harbingers. These differences indicate that harbinger critics are less self-reflective about the audience's opinion compared to non-harbingers and tend to show overconfidence in their abilities. When we stratify the analysis across the critic status category, we find the aforementioned differences between harbingers and non-harbingers to be even more significant. In the cohort of top critics, which usually consists of professional and experienced reviewers, we find that top critic harbingers are also using more adverbs than their counterparts in their positive reviews indicating that they are over-optimistic in their assessments of movies. In the case of negative reviews, we find that the top critic harbingers are significantly more analytical than their counterparts.
Our findings have important managerial implications for the motion picture industry and its key channel entities: movie studios, distribution companies, and movie theaters. First, our research provides a methodology based on pre-release film reviews that allows movie producers and distributors to identify early on which movies are going to perform badly. This will in turn allow them to make better prelaunch marketing decisions and save significant marketing costs on flop movies. It will also allow theaters to better allocate their theater space, a finite resource that is crucial to the success of theaters. Second, film studios can greatly benefit from identifying the harbinger critics and using them during the early stages of production. More specifically, movie studios can use harbinger critics to select the scripts that will maximize their box office revenue instead of using mere guesswork. Our approach complements that of Eliashberg et al. (2007), which uses natural-language processing to select the winning scripts. Third, our approach can potentially serve as a diagnostic for reviewers across fields. It is crucial for companies that employ reviewers to know whether their reviewers' opinions can be used as a diagnostic tool to determine success or failure. Based on that, companies might want to reclassify who they designate as a "top reviewer" or create a new class of reviewers.
We should acknowledge here the limitations of this work and present potential future research avenues. The main limitation of our research, similar to Anderson et al. (2015) and Simester et al. (2019), is that we do not provide a causal explanation about where the harbingers' preferences are coming from. However, we do provide insights into their personality traits and cognitive biases based on their writing style. This paves new avenues for further experimental behavioral research about the underlying mechanisms of the harbinger phenomenon. A second potential limitation of our research is that movie producers and distributors could potentially strategically pick which critics to invite to their pre-release screenings. However, there is no substantial evidence that this is happening in the movie industry, as it would jeopardize the reputation of, and confidence in, movie studios and movie critics alike. Last, another possibility, unobserved to the researcher, is that critics are getting influenced by other critics before they submit their review by either reading their reviews or talking to them. Future research might include further contextual variables, such as the choice of movies to review by harbingers, the timing of the reviews, and the lack of learning, to further the theoretical understanding of the harbinger phenomenon in the motion picture industry.
To conclude, does the positive (negative) feedback given by film critics before the release of a movie signal its financial failure (success) instead of its success (failure) as currently believed? Our findings document that this is not always the case. At least in a pre-release movie setting, a distinction should be made among film critics because of the existence of harbinger critics; their endorsement of a movie is a signal of the opposite outcome.  In addition to the flop affinity ratio, we group critics based on their F1 score. F1 score originates from machine learning and measures the model accuracy by taking the harmonic mean of precision and recall. That is: In our setting, F1 score can serve as a measurement of the movie critics' tastes relative to the public taste. Specifically, precision is the number of correctly predicted positive cases, divided by the total number of positive cases identified by the model, including both correct and incorrect predictions. Recall has the same numerator as precision, but its denominator is the number of true positive cases in reality. If a movie critic can be construed as a predictive model, one's precision score will be the number of successful movies positively reviewed divided by the number of movies positively reviewed, and the recall score will be the number of successful movies positively reviewed divided by the number of actual successful movies the critic commented on. Therefore, the formula of F1 score can be derived using the following equation: where TP is the number of successful movies reviewed positively, FP is the number of flop movies reviewed positively, and FN is the number of successful movies reviewed negatively.
After computing the F1 score for each critic, we group the critics using F1 score quartile cutoff. Under this approach, the groups are distributed more evenly, and group 1 has the largest number of harbingers of failure effect (as the incorrect predictions are in the denominator), leading to a lower value of F1 score. Then, we estimate models 1 and 2 similarly to the flop affinity approach. The chi-square test on the likelihood ratio of model 1 and model 2 shows that the improvement of model 2 fitness is significant. The model results are summarized in the table below. The coefficients

Appendix C. Sleeper movies
To evaluate the robustness of our results, we rerun the analysis to account for sleeper movies. Sleeper movies are movies that tanked in their opening box office but became a success later on. Using overall domestic box-office revenue data, we find that 20 out of the 448 movies in our dataset are sleeper movies. We re-classify movie critics into five groups based on their flop affinity score. Table 8 shows all the relevant information on the critics' grouping. We re-estimate models 1 and 2, and present the results in Table 9. First, we observe that the pattern of findings remains unchanged. Moreover, we find that the coefficients of group 1 and group 4 are now larger in absolute size indicating a stronger harbinger effect. More specifically, we see that the expected change in log odds for a one-unit increase in the reviews of group 1 and group 4 are now 0.66 and −0.57, respectively. Equivalently, a one-unit increase in the reviews of group 1 increases the odds of a movie being successful by 93%, while a one-unit increase in the reviews of group 4 decreases the odds of success by about 43%. Other (≤ 2 positive reviews) 226

Appendix D. Success avoidance
We investigate here whether the harbinger effect is symmetric. To do this, we group critics based on their success avoidance score. For each movie critic i, we calculate the proportion of negatively reviewed movies that turned out to be successful over their total number of negatively reviewed movies. That is: Success Avoidance i = Number of Successful Movies Reviewed Negatively Number of Movies Reviewed Negatively Standard errors are in parentheses *p-value ≤ .1, **p-value ≤ .05 ,***p-value ≤ .01 Other (≤ 2 negative reviews) 389 Following this definition, we get the following grouping of critics shown in Table 10. Our two competing models are modified as follows: Model 1: log(Odds of Success of Movie i) = α + β 0 TotalNegativeReviews i (D1) Model 2: log(Odds of Success of Movie i) = α + β 1 Group1NegativeReviews i + β 2 Group2NegativeReviews i + β 3 Group3NegativeReviews i + β 4 Group4NegativeReviews i Observing the results in Table 11, we can conclude that the harbinger effect is symmetric.