Skip to main content

Write better, publish better

Abstract

There is evidence that having more readable abstracts and introductions help authors get cited. I show that, in economics, there is also an effect of readability on the probability of publishing in a Top 5 journal (and in a higher-ranked journal in general). I compute readability measures for a set of working papers and examine the journals in which they get published. My results suggest that previous estimates of the effect on citations are downward biased, as higher-ranked journals are more widely read and cited.

Introduction

In The Tyranny of the Top Five Journals,Footnote 1 James Heckman and Sidharth Moktan state that “Without doubt, publication in the T5 [Top 5] is a powerful determinant of tenure and promotion in academic economics” . They discuss the consequences of this state of the profession and propose solutions. Unfortunately, until the tyranny is brought down, it will still be of great importance for researchers, especially at the beginning of their careers, to publish in the top five journals. In this article, I ask whether writing clearly increases the likelihood of publishing in one of them.

Relying on formal readability measures, previous studies have shown that readability has a positive effect on citations (Dowling et al. 2018; McCannon 2019), on winning more awards (Sawyer et al. 2008) and on having more downloads (Guerini et al. 2012). Journals also benefit from this, as shown by Richardson (1977) and Swanson (1948), the more readable academic journals enjoy larger readerships.

Complementing Dowling et al. (2018) and McCannon (2019) who work with the abstracts from the Economics Letters and the introductions of the American Economic Review, respectively, I focus on the abstracts of working papers published in the National Bureau of Economic Research (NBER) website. It must be noted that only affiliated members can publish working papers in the NBER. On the website one can read that “the Research Associates at NBER hold tenured positions at their home institutions” and that they “are the leading scholars in their fields” . There are two reasons to focus on these working papers. First, it allows me to study a broader spectrum of journals. Second, since the affiliated members are highly biased towards top institutions, the sample should be more homogeneous in terms of researcher abilities, resources, and so on, which helps mitigate the potential selection problem that would arise if the quality of the writing is correlated with the quality of the research.

Methodology

As Dowling et al. (2018) argue, abstracts are the portion of an article that is the most widely read. Furthermore, there is some evidence suggesting the possibility of a correlation between the style of the abstract and other sections of a paper (Hartley et al. 2003).Footnote 2 As long as this correlation applies in the economics context, studying the abstracts would be justified. In the rest of the paper I focus on the abstracts.

I scrape abstracts of working papers from the website of the NBER for the period 1998–2019. However, in the analysis, I restrict the sample to working papers issued before 2015 since they might still be in publication process.Footnote 3 NBER working papers can only be uploaded by its members, which allows me to control for researcher quality, as they belong to the top institutions in the US. After cleaning the data, the final sample consists of 9757 working papers written by 7621 authors (Marino Fages 2019).

In linguistics, there is a myriad of measures of writing quality, ranging from basic readability indices (length of syllables, words, sentences, number of complex words and so on) to the use of adjectives or adverbs (Okulicz-Kozaryn 2013) and linguistic complexity (Lu et al. 2019). In this paper, I focus on traditional readability measures and throughout the paper I use the terms “higher readability” and “better-written” interchangeably. However, the reader should notice that these measures cover only some aspects of the quality of writing.

Since there are hundreds of readability measures, I follow Hengel (2017), who claims that the most widely used, tested and reliable measures are the following: Flesch Reading Ease (FRES), Flesch–Kincaid (FKS), Gunning Fog (FOG), Simple Measure of Gobbledegook (SMOG) and Dale–Chall (DCS). The exact formulae are the following:Footnote 4

$$\begin{aligned} {\text{DCS}}&={\left\{ \begin{array}{ll} 1 &\quad\text{if }\left( \frac{{\text{difficult}}}{{\text{words}}}\right)>0.05+15.79* \left( \frac{{\text{difficult}}}{{\text{words}}}\right)+0.0496 \left( \frac{{\text{words}}}{{\text{sentences}}}\right) \\ 0&\quad {\text{otherwise}} \end{array}\right. } \\\text{FRES}&=206.835-1.015*\left(\frac{{\text{words}}}{{\text{sentences}}}\right) -84.6*\left(\frac{{\text{syllables}}}{{\text{words}}}\right) \\\text{FKS}&=-15.59+0.39*\left(\frac{{\text{words}}}{{\text{sentences}}}\right) +1.18*\left(\frac{{\text{syllables}}}{{\text{words}}}\right) \\ \text{FOG}&=0.4\left[ \left(\frac{{\text{words}}}{{\text{sentences}}}\right) +100 \left(\frac{{\text{complex}}}{{\text{words}}}\right) \right] \\ \text{SMOG}&=3.1291+1.043*\sqrt{\text{complex}*\frac{30}{\text{sentences}}} \end{aligned}$$

where a complex word is defined as one with three or more syllables excluding common endings (e.g. ‘-ing’) and difficult words are those not found on a list of 3000 words understood by 80 percent of fourth-grade readers (aged 9–10).

Since readability has a direct relation to the FRES but an inverse relation to the other measures, I multiply all the others by − 1 such that a larger score indicates better readability. After the transformation, all measures correlate strongly and positively.

Descriptive statistics

Table 1 presents the means and standard deviations of the main variables for the full sample and the Top 5 journals (See “Appendix”). The Top 5 journals represent 17% of the sample, have higher readability scores in all measures and less single-authored papers. The last two variables are the Scimago Journal Rank Indicator (1999) and the Impact Factor (1997). In both cases, a higher value represents a better journal. I use these years to prevent the indexes from being affected by the papers in the sample.Footnote 5 The correlation between the Scimago Index and the Impact Factor is 0.82 .Footnote 6

Table 1 Descriptive statistics

Table 2 summarizes the means of the readability scores, the proportion of single authors and the Scimago Index and Impact Factor for the Top 5 journals separately. Econometrica scores comparatively low in all measures, bringing the averages presented in Table 1 closer.

Table 2 Top 5 journals statistics

I also include Economics Letters as it allows me to compare abstracts from working papers with the published version. Dowling et al. (2018), using the published version (from 2003 to 2012) obtains a FRES = 42.46, FOG = −16.08 and SMOG = −14.20. All of them are better than in the working paper version, which is consistent with the submission process improving the readability.

Analysis

In Table 3 (columns 1–5), I estimate by OLS a linear probability modelFootnote 7 of Top 5 on each of the readability measures: i.e. \(\text {Prob}[\text {Top}5=1]=\alpha +\beta *\text {ReadabilityScore}+\delta *\text {Controls}+\epsilon \). Since the results virtually do not change, I only report the full specification regressions, where I control for JEL codes, number of pages, number of coauthors, its square, and dummy variables for the working paper’s year and the year of publication.

Table 3 OLS regressions of Top 5 on readability scores

Almost all regressions show a significant positive effect of the readability on the probability of becoming a Top 5 that ranges between 0.1 and 0.6% points. This is not a small effect considering the standard deviations in Table 1 and the fact that it is not related to the actual scientific contribution of the papers. For instance, increasing one standard deviation of the FRES is associated with a 1.16% points increase in the probability of being in a Top 5 journal.

The number of coauthors has a positive and decreasing effect on Top 5 (this is consistent with Hollis 2001). In non-reported results, I also find that the number of coauthors is also strongly associated with better readability measures (and also decreasing).

As a robustness check, I standardize the five measures and take the average of them. The full regression with the JEL codes gives a coefficient of 0.019 (p value = 0.143) which means that increasing the readability in 1 standard deviation in this global measure would increase the probability of being a Top 5 of 1.9% points.

As a second robustness check, in Table 3 (columns 6–10), I restrict the sample to working papers that were published in highly ranked journals (see “Appendix”). The effects do not change significantly and, if anything, get slightly stronger. However, the significance of the number of authors disappears.

Next, I extend the analysis to check if readability is associated with the ranking of the journal in general (i.e. not only Top 5). For this, I use two measures. First, I use the Scimago Journal Rank Index (Table 4). To prevent the papers in the sample to affect the index, I use the earliest available (1999) and restrict the sample to the papers issued since 2000 onwards. Again, I only present the full specification controlling for JEL codes, number of pages, number of coauthors and its square, and dummy variables for the working paper’s year and the year of publication. For all measures, the readability has a significant effect on the Scimago index. On the other hand, the number of coauthors presents no effect. Second, I use the Impact Factor from 1997. In this case I do not need to drop the years before 2000, but the sample is smaller because there are some journals with no Impact Factor. The coefficients are all positive and significant at the 1% level in all cases (the results are available in the do file). The number of coauthors in this case has a negative effect.

Table 4 OLS regressions of Scimago index on readability scores

Finally, instead of focusing on papers, I aggregate at the journal level, keeping the score of the median paper. This allows me to check whether journals that are better written are higher in the ranking. In spite of having only 54 observations (see the list in the “Appendix”) three out of five measures are positive and significant (Table 5). As a robustness check, I run the same regression using the Impact factor of 1997 as a dependent variable. All the results (available in the do file) have positive coefficients but, probably because of a lack of power (only 43 observations) none of them can be rejected.Footnote 8 The smallest p value is 0.12 (results available in the dofile).

Table 5 OLS regressions of Scimago index on median journal score

Discussion

In the conclusion of his paper, McCannon (2019) argues that his estimates might be downward biased because he is not taking into account the effect of the selection of papers by the journals. In this article, I confirm his hypothesis.

Using a fairly homogeneous set of papers in terms of quality, I provide suggestive evidence that the quality of the writing, measured by formal readability scores, is associated with better publications. However, because of the lack of a credible identification strategy, I cannot claim any causality in the relations. In particular, reverse causality might be an issue if researchers adapt the quality of the writing depending on their expectations of the quality of the papers.

I find a positive and significant effect of having better-written abstracts on the probability of being published in a Top 5 journal. The effect seems to be of great magnitude considering that the measures do not include anything related to the actual scientific contribution of the papers.

More generally, I find that higher readability is associated with a higher-ranked journal (even when restricting the sample to the very top). These results still apply when I look at the journal level, journal readability correlates with its Scimago index.

Finally, the number of collaborators in a paper has a positive and significant effect on the readability measures and on the probability of becoming Top 5, however, I find no effect on the Scimago index and even a negative effect on the Impact Factor.

Although the data does not allow me to explore the mechanisms, there may be two possible reasons for my results. First, a psychological explanation would say that easier papers to read are received more favorably by editors and reviewers because they require less effort. Second,Footnote 9 if the papers are circulated as working papers, then they can already start being cited. In a model in which editors are only interested in getting the journal cited, they may only accept the already highly cited working papers. In this case, the effect of readability on the ranking of the journal in which the article gets published comes indirectly from the effect of readability on citations as a working paper. Contrasting these two mechanisms would be an interesting next step.

Notes

  1. 1.

    Heckman and Moktan (2018).

  2. 2.

    The study relies on papers from the Journal of Education Psychology.

  3. 3.

    Earlier working papers may also be in publication process, however, I consider this to be a reasonable cut-off.

  4. 4.

    I compute the scores using textatistic 0.0.1 for python (See https://pypi.org/project/textatistic/).

  5. 5.

    For the analysis with the Scimago Journal Rank Indicator, I restrict the sample to working papers issued since 2000.

  6. 6.

    For a comparison of the pros and cons of these indexes see Falagas et al. (2008).

  7. 7.

    Results of probit models are similar.

  8. 8.

    I get the same results if I use the Impact Factor of 1999 instead.

  9. 9.

    I thank one of the referees for this insight.

References

  1. Dowling, M., Hammami, H., & Zreik, O. (2018). Easy to read, easy to cite? Economics Letters, 173, 100–103.

    MathSciNet  Article  Google Scholar 

  2. Falagas, M. E., Kouranos, V. D., Arencibia-Jorge, R., & Karageorgopoulos, D. E. (2008). Comparison of SCImago journal rank indicator with journal impact factor. The FASEB Journal, 22(8), 2623–2628.

    Article  Google Scholar 

  3. Guerini, M., Pepe, A., & Lepri, B. (2012). Do linguistic style and readability of scientific abstracts affect their virality? In Sixth international AAAI conference on weblogs and social media (pp. 475–478).

  4. Hartley, J., Pennebaker, J. W., & Fox, C. (2003). Abstracts, introductions and discussions: How far do they differ in style? Scientometrics, 57(3), 389–398.

    Article  Google Scholar 

  5. Heckman, J. J, & Moktan, S. (2018). Publishing and promotion in economics: The tyranny of the top five. Technical Report, National Bureau of Economic Research.

  6. Hengel, E. (2017). Publishing while Female. Are women held to higher standards? Evidence from peer review.

  7. Hollis, A. (2001). Co-authorship and the output of academic economists. Labour Economics, 8(4), 503–530.

    Article  Google Scholar 

  8. Lu, C., Bu, Y., Dong, X., Wang, J., Ding, Y., Larivière, V., et al. (2019). Analyzing linguistic complexity and scientific impact. Journal of Informetrics, 13(3), 817–829.

    Article  Google Scholar 

  9. Marino Fages, D. (2019). Data for: Write Better, Publish Better, Mendeley Data, V1. https://doi.org/10.17632/mmfttcywgy.1.

  10. McCannon, B. C. (2019). Readability and research impact. Economics Letters, 180, 76–79.

    Article  Google Scholar 

  11. Okulicz-Kozaryn, A. (2013). Cluttered writing: Adjectives and adverbs in academia. Scientometrics, 96(3), 679–681.

    Article  Google Scholar 

  12. Richardson, J. V, Jr. (1977). Readability and readership of journals in library science. Journal of Academic Librarianship, 3(1), 20–22.

    Google Scholar 

  13. Sawyer, A. G., Laran, J., & Xu, J. (2008). The readability of marketing journals: Are award-winning articles better written? Journal of Marketing, 72(1), 108–117.

    Article  Google Scholar 

  14. Swanson, C. E. (1948). Readability and readership: A controlled experiment. Journalism Bulletin, 25(4), 339–343.

    Article  Google Scholar 

Download references

Acknowledgements

I thank Facundo Albornoz, Federico Bernini, Kris Gulati, Gabriele Luchetti and two anonymous referees for useful comments; and Federico Bennett, Katie Harrison and Richard Mills for their help in improving the readability.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Diego Marino Fages.

Appendix

Appendix

See Table 6

Table 6 List of highly ranked journals with Scimago Index (1999) and Impact Factor (1997)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Marino Fages, D. Write better, publish better. Scientometrics 122, 1671–1681 (2020). https://doi.org/10.1007/s11192-019-03332-4

Download citation

Keywords

  • Readability
  • Journals
  • Ranking
  • Top 5
  • NBER

JEL Classification

  • A1