Write better, publish better

There is evidence that having more readable abstracts and introductions help authors get cited. I show that, in economics, there is also an effect of readability on the probability of publishing in a Top 5 journal (and in a higher-ranked journal in general). I compute readability measures for a set of working papers and examine the journals in which they get published. My results suggest that previous estimates of the effect on citations are downward biased, as higher-ranked journals are more widely read and cited.


Introduction
In The Tyranny of the Top Five Journals, 1 James Heckman and Sidharth Moktan state that "Without doubt, publication in the T5 [Top 5] is a powerful determinant of tenure and promotion in academic economics" . They discuss the consequences of this state of the profession and propose solutions. Unfortunately, until the tyranny is brought down, it will still be of great importance for researchers, especially at the beginning of their careers, to publish in the top five journals. In this article, I ask whether writing clearly increases the likelihood of publishing in one of them.
Relying on formal readability measures, previous studies have shown that readability has a positive effect on citations (Dowling et al. 2018;McCannon 2019), on winning more awards (Sawyer et al. 2008) and on having more downloads (Guerini et al. 2012). Journals also benefit from this, as shown by Richardson (1977) and Swanson (1948), the more readable academic journals enjoy larger readerships.
Complementing Dowling et al. (2018) and McCannon (2019) who work with the abstracts from the Economics Letters and the introductions of the American Economic Methodology As Dowling et al. (2018) argue, abstracts are the portion of an article that is the most widely read. Furthermore, there is some evidence suggesting the possibility of a correlation between the style of the abstract and other sections of a paper (Hartley et al. 2003). 2 As long as this correlation applies in the economics context, studying the abstracts would be justified. In the rest of the paper I focus on the abstracts.
I scrape abstracts of working papers from the website of the NBER for the period 1998-2019. However, in the analysis, I restrict the sample to working papers issued before 2015 since they might still be in publication process. 3 NBER working papers can only be uploaded by its members, which allows me to control for researcher quality, as they belong to the top institutions in the US. After cleaning the data, the final sample consists of 9757 working papers written by 7621 authors (Marino Fages 2019).
In linguistics, there is a myriad of measures of writing quality, ranging from basic readability indices (length of syllables, words, sentences, number of complex words and so on) to the use of adjectives or adverbs (Okulicz-Kozaryn 2013) and linguistic complexity (Lu et al. 2019). In this paper, I focus on traditional readability measures and throughout the paper I use the terms "higher readability" and "better-written" interchangeably. However, the reader should notice that these measures cover only some aspects of the quality of writing.
Since there are hundreds of readability measures, I follow Hengel (2017), who claims that the most widely used, tested and reliable measures are the following: Flesch Reading Ease (FRES), Flesch-Kincaid (FKS), Gunning Fog (FOG), Simple Measure of Gobbledegook (SMOG) and Dale-Chall (DCS). The exact formulae are the following: 4 where a complex word is defined as one with three or more syllables excluding common endings (e.g. '-ing') and difficult words are those not found on a list of 3000 words understood by 80 percent of fourth-grade readers (aged 9-10).
Since readability has a direct relation to the FRES but an inverse relation to the other measures, I multiply all the others by − 1 such that a larger score indicates better readability. After the transformation, all measures correlate strongly and positively. Table 1 presents the means and standard deviations of the main variables for the full sample and the Top 5 journals (See "Appendix"). The Top 5 journals represent 17% of the sample, have higher readability scores in all measures and less single-authored papers. The last two variables are the Scimago Journal Rank Indicator (1999) and the Impact Factor (1997). In both cases, a higher value represents a better journal. I use these years to prevent the indexes from being affected by the papers in the sample. 5 The correlation between the Scimago Index and the Impact Factor is 0.82 . 6 Table 2 summarizes the means of the readability scores, the proportion of single authors and the Scimago Index and Impact Factor for the Top 5 journals separately. Econometrica scores comparatively low in all measures, bringing the averages presented in Table 1 closer.

Descriptive statistics
I also include Economics Letters as it allows me to compare abstracts from working papers with the published version. Dowling et al. (2018), using the published version (from 2003 to 2012) obtains a FRES = 42.46, FOG = −16.08 and SMOG = −14.20. All of them are better than in the working paper version, which is consistent with the submission process improving the readability.

Analysis
In Table 3 (columns 1-5), I estimate by OLS a linear probability model 7 of Top 5 on each of the readability measures: i.e. Prob[Top5 = 1] = + * ReadabilityScore + * Controls + . Since the results virtually do not change, I only report the full specification regressions, where I control for JEL codes, number of pages, number of coauthors, its square, and dummy variables for the working paper's year and the year of publication.
Almost all regressions show a significant positive effect of the readability on the probability of becoming a Top 5 that ranges between 0.1 and 0.6% points. This is not a small effect considering the standard deviations in Table 1 and the fact that it is not related to the actual scientific contribution of the papers. For instance, increasing one standard deviation of the FRES is associated with a 1.16% points increase in the probability of being in a Top 5 journal.
The number of coauthors has a positive and decreasing effect on Top 5 (this is consistent with Hollis 2001). In non-reported results, I also find that the number of coauthors is also strongly associated with better readability measures (and also decreasing).  As a robustness check, I standardize the five measures and take the average of them. The full regression with the JEL codes gives a coefficient of 0.019 (p value = 0.143) which means that increasing the readability in 1 standard deviation in this global measure would increase the probability of being a Top 5 of 1.9% points.
As a second robustness check, in Table 3 (columns 6-10), I restrict the sample to working papers that were published in highly ranked journals (see "Appendix"). The effects do not change significantly and, if anything, get slightly stronger. However, the significance of the number of authors disappears.
Next, I extend the analysis to check if readability is associated with the ranking of the journal in general (i.e. not only Top 5). For this, I use two measures. First, I use the Scimago Journal Rank Index (Table 4). To prevent the papers in the sample to affect the index, I use the earliest available (1999) and restrict the sample to the papers issued since 2000 onwards. Again, I only present the full specification controlling for JEL codes, number of pages, number of coauthors and its square, and dummy variables for the working paper's year and the year of publication. For all measures, the readability has a significant effect on the Scimago index. On the other hand, the number of coauthors presents no effect. Second, I use the Impact Factor from 1997. In this case I do not need to drop the years before 2000, but the sample is smaller because there are some journals with no Impact Factor. The coefficients are all positive and significant at the 1% level in all cases (the results are available in the do file). The number of coauthors in this case has a negative effect.
Finally, instead of focusing on papers, I aggregate at the journal level, keeping the score of the median paper. This allows me to check whether journals that are better written are higher in the ranking. In spite of having only 54 observations (see the list in the "Appendix") three out of five measures are positive and significant (Table 5). As a robustness check, I run the same regression using the Impact factor of 1997 as a dependent variable. All the results (available in the do file) have positive coefficients but, probably because of a lack of power (only 43 observations) none of them can be rejected. 8 The smallest p value is 0.12 (results available in the dofile).

Discussion
In the conclusion of his paper, McCannon (2019) argues that his estimates might be downward biased because he is not taking into account the effect of the selection of papers by the journals. In this article, I confirm his hypothesis. Using a fairly homogeneous set of papers in terms of quality, I provide suggestive evidence that the quality of the writing, measured by formal readability scores, is associated with better publications. However, because of the lack of a credible identification strategy, I cannot claim any causality in the relations. In particular, reverse causality might be an issue if researchers adapt the quality of the writing depending on their expectations of the quality of the papers. I find a positive and significant effect of having better-written abstracts on the probability of being published in a Top 5 journal. The effect seems to be of great magnitude considering that the measures do not include anything related to the actual scientific contribution of the papers. More generally, I find that higher readability is associated with a higher-ranked journal (even when restricting the sample to the very top). These results still apply when I look at the journal level, journal readability correlates with its Scimago index.
Finally, the number of collaborators in a paper has a positive and significant effect on the readability measures and on the probability of becoming Top 5, however, I find no effect on the Scimago index and even a negative effect on the Impact Factor.
Although the data does not allow me to explore the mechanisms, there may be two possible reasons for my results. First, a psychological explanation would say that easier papers to read are received more favorably by editors and reviewers because they require less effort. Second, 9 if the papers are circulated as working papers, then they can already start being cited. In a model in which editors are only interested in getting the journal cited, they may only accept the already highly cited working papers. In this case, the effect of readability on the ranking of the journal in which the article gets published comes indirectly from the effect of readability on citations as a working paper. Contrasting these two mechanisms would be an interesting next step.