No evidence of publication bias in climate change science
Non-significant results are less likely to be reported by authors and, when submitted for peer review, are less likely to be published by journal editors. This phenomenon, known collectively as publication bias, is seen in a variety of scientific disciplines and can erode public trust in the scientific method and the validity of scientific theories. Public trust in science is especially important for fields like climate change science, where scientific consensus can influence state policies on a global scale, including strategies for industrial and agricultural management and development. Here, we used meta-analysis to test for biases in the statistical results of climate change articles, including 1154 experimental results from a sample of 120 articles. Funnel plots revealed no evidence of publication bias given no pattern of non-significant results being under-reported, even at low sample sizes. However, we discovered three other types of systematic bias relating to writing style, the relative prestige of journals, and the apparent rise in popularity of this field: First, the magnitude of statistical effects was significantly larger in the abstract than the main body of articles. Second, the difference in effect sizes in abstracts versus main body of articles was especially pronounced in journals with high impact factors. Finally, the number of published articles about climate change and the magnitude of effect sizes therein both increased within 2 years of the seminal report by the Intergovernmental Panel on Climate Change 2007.
Publication bias in scientific journals is widespread (Fanelli 2012). It leads to an incomplete view of scientific inquiry and results and presents an obstacle for evidence-based decision-making and public acceptance of valid, scientific discoveries and theories. A growing trend in scientific inquiry, as practiced in this article, includes the meta-analysis of large bodies of literature, a practice that is particularly susceptible to misleading and inaccurate results given a systematic bias in the literature (e.g. Michaels 2008; Fanelli 2012, 2013).
The role of publication bias in scientific consensus has been described in a variety of scientific disciplines, including but not limited to medicine (Kicinski 2013; Kicinski et al. 2015), social science (Fanelli 2012), ecology (Palmer 1999), and global climate change research (Michaels 2008; Reckova and Irsova 2015).
Despite widespread consensus among climate scientists that global warming is real and has anthropogenic roots (e.g., Holland 2007; Idso and Singer 2009; Anderegg et al. 2010), several end users of science such as popular media, politicians, industrialists, and citizen scientists continue to treat the facts of climate change as fodder for debate and denial. For example, Carlsson-Kanyama and Hörnsten Friberg (2012) found only 30% of politicians and directors from 63 Swedish municipalities believed humans contribute to global warming; 61% of respondents were uncertain about the causes of warming, and as much as 9% denied it was real.
Much of this skepticism stems from an event that has been termed Climategate, when emails and files from the Climate Research Unit (CRU) at the University of East Anglia were copied and later exposed for public scrutiny and interpretation. Climate change skeptics claimed the IPCC 2007 report—the Intergovernmental Panel on Climate Change Fourth Assessment Report (IPCC 2007), which uses scientific facts to argue humans are causing climate change—was based on an alleged bias for positive results by editors and peer reviewers of scientific journals; editors and scientists were accused of suppressing research that did not support the paradigm for carbon dioxide-induced global warming. In 2010, the CRU was cleared by the Muir Russell Committee of any scientific misconduct or dishonesty (Adams 2010; but see Michaels 2010).
Although numerous reviews have examined the credibility of climate researchers (Anderegg et al. 2010), the scientific consensus on climate change (Doran and Kendall Zimmerman 2009) and the complexity of media reporting (Corner et al. 2012), few studies have undertaken an empirical review of the publication record to evaluate the existence of publication biases in climate change science. However, Michaels (2008) scrutinized the two most prestigious journals, Nature and Science, in the field of global warming, and by using vote-counting meta-analysis, confirmed a skewed publication record. Reckova and Irsova (2015) also detected a publication bias after analyzing 16 studies of carbon dioxide concentrations in the atmosphere and changes in global temperature. Although publication biases were reported by Michaels (2008) and Reckova and Irsova (2015), the former test used a small set of pre-defined journals to test the prediction, while the latter test lacked statistical power given a sample size of 16 studies. In contrast, here we conducted a meta-analysis on results from 120 reports and 31 scientific journals. Our approach expands upon the conventional definition of publication bias to include publication trends over time and in relation to seminal events in the climate change community, stylistic choices made by authors who may selectively report some results in abstracts and others in the main body of articles (Fanelli 2012) and patterns of effect size and reporting style in journals representing a broad cross-section of impact factors.
We tested the hypothesis of bias in climate change publications stemming from the under-reporting of non-significant results (Rosenthal 1979) using fail-safe sample sizes, funnel plots, and diagnostic patterns of variability in effect sizes (Begg and Mazumdar 1994; Palmer 1999, 2000; Rosenberg 2005). More specifically, we (a) examined whether non-significant results were omitted disproportionately in the climate change literature, (b) if there were particular trends of unexpected and abrupt changes in the number of published studies and reported effects in relation to IPCC 2007 and Climategate, (c) whether effects presented in the abstracts were significantly larger than those reported in the main body of reports, and (d) how findings from these first three tests related to the impact factor of journals.
Meta-analysis is a powerful statistical tool used to synthesize statistical results from numerous studies and to facilitate general trends in a field of research. Unfortunately, not all articles within a given field of science will contain statistical estimates required for meta-analysis (e.g., estimate of effect size, error, sample size). Therefore, the literature used in meta-analysis is often a sample of all available articles, which is analogous to the analytical framework used in ecology and typically uses a sub-sample of a population to estimate parameters of true populations. For the purpose of our meta-analysis, we sampled articles from the body of literature that explores the effects of climate change on marine organisms. Marine species are exposed to a large array of abiotic factors that are linked directly to atmospheric climate change. For instance, oceans absorb heat from the atmosphere and mix with freshwater run-off from melting glaciers and ice caps, which changes ocean chemistry and puts stress on ocean ecosystems. For example, the resulting changes in ocean salinity and pH can inhibit calcification in shell-bearing organisms that are either habitat-forming (e.g., coral reefs, oyster reefs) or the foundation of food webs (e.g., plankton) (The Copenhagen Diagnosis 2009).
Results of our meta-analysis found no evidence of publication bias, in contrast to prior studies that were based on smaller sample sizes than used here (e.g., Michaels 2008; Reckova and Irsova 2015). We did, however, discover some interesting patterns in the numbers of climate change articles being published over time and, within journal articles, stylistic biases by authors with respect to reporting large statistically significant effects. Finally, results are discussed in the context of social responsibility borne by climate scientists and the challenges for communicating science to stakeholders and end users.
2 Materials and methods
Meta-analysis is a suite of data analysis tools that allow for quantitative synthesis of results from numerous scientific studies, now widely used from medicine to ecology (Adams et al. 1997). Here, we randomly sampled articles from a broader body of literature about climate change in marine systems and withdrew statistics summarizing magnitude of effects, error, and experimental sample size for meta-analysis.
2.1 Data collection
We surveyed the scientific literature via the ISI Web of Science, Scopus and Biological Abstracts, and in the reference sections of identified articles for experimental results pertaining to climate change in ocean ecosystems. The search was performed with no restrictions on publication year, using different combinations of the terms: (acidification* AND ocean*) OR (acidification* AND marine*) OR (global warming* AND marine*) OR (global warming* AND ocean*) OR (climate change* AND marine* AND experiment*) OR (climate change* AND ocean* AND experiment*). The search was performed exclusively on scientific journals with an impact factor of at least 3 (Journal Citation Reports science edition 2010).
We restricted our analysis to a sample of articles reporting (a) an empirical effect size between experimental and control group, (b) a measure of statistical error, and (c) a sample size with specific control and experimental groups (see Supplementary Material S1–S3 for identification of studies). We identified 120 articles from 31 scientific journals published between the years 1997 and 2013, with impact factors ranging from 3.04 to 36.104 and a mean of 6.58. Experimental results (n = 1154) were extracted from the main body of articles; 362 results were also retrieved from the articles’ abstracts or summary paragraphs.
Data from the main body of articles and abstracts were analyzed separately to test for potential stylistic biases related to how authors report key findings. The two datasets, hereafter designated “main” dataset and “abstract” dataset, were also divided into three time periods: pre-IPCC 2007 (x–2007-November), After IPCC 2007/pre-Climategate (2007-December–2009-November) and after Climategate (2009-December–December 2012), based on each article’s date of acceptance. We used November 2007 as the publication date for the IPCC Fourth Assessment Report, which was an updated version of the original February 2007 release.
We extracted graphical data using the software GraphClick v. 3.0 (2011). Each study could include several experimental results, which could result in non-independence bias driven by studies with relatively large numbers of results. Therefore, we assessed the robustness of our meta-analysis by re-running the analysis multiple times with data subsets consisting of one randomly selected result per article (Hollander 2008).
Hedges’ d was the mean of the control group (XC) subtracted from the mean of the experimental group (XE), divided by the pooled standard deviation (s) and multiplied by a correction factor for small sample sizes (J). However, since sample sizes vary among studies, and variance is a function of sample size, some form of weighting was necessary. In other words, studies with larger sample sizes are expected to have lower variances and will accordingly provide more precise estimates of the true population effect size (Hedges and Olkin 1985; Shadish and Haddock 1994; Cooper 1998). Therefore, a weighted average was used in the meta-analysis to estimate the cumulative effect size (weighted mean) for the sample of studies (see Rosenberg et al. 2000 for details).
2.2 Funnel plot and fail-safe sample sizes
Both funnel plots and fail-safe sample size were inspected to test for under-reporting of non-significant effects, following Palmer (2000). Extreme publication bias (caused by under-reporting of non-significant results) would appear as a hole or data gap in a funnel plot. Also, if there is no bias, the density of points should be greatest around the mean value and normally distributed around the mean at all sample sizes. To help visualize the threshold between significant and non-significant studies, 95% significance lines were calculated for the funnel plots.
Robust Z-scores were used to identify possible outliers in the dataset, as such values could distort the mean and make the conclusions of a study less accurate or even incorrect. Instead of using the dataset mean, robust Z-scores use the median as it has a higher breakdown point and is therefore more accurate than regular Z-scores (Rousseeuw and Hubert 2011). The cause for each identified outlier was carefully investigated before any value could be excluded from the dataset (Table S1).
All data were analyzed with MetaWin v. 2.1.5 (Sinauer Associates Inc. 2000), and graphs for visual illustrations were created using the graphic data tool DataGraph v. 3.1.2 (VisualDataTools Inc. 2013).
3.1 Publication bias
3.2 Number of studies, effect size, and abstract versus main
3.3 Impact factor
Our meta-analysis did not find evidence of small, statistically non-significant results being under-reported in our sample of climate change articles. This result opposes findings by Michaels (2008) and Reckova and Irsova (2015), which both found publication bias in the global climate change literature, albeit with a smaller sample size for their meta-analysis and in other sub-disciplines of climate change science. Michaels (2008) examined articles from Nature and Science exclusively, and therefore, his results were influenced strongly by the editorial position of these high impact factor journals with respect to reporting climate change issues. We believe that the results presented here have added value because we sampled a broader range of journals, including some with relatively low impact factor, which is probably a better representation of potential biases across the entire field of study. Moreover, several end users and stakeholders of science, including other scientists and public officials, base their research and opinions on a much broader suite of journals than Nature and Science.
However, our meta-analysis did find multiple lines of evidence of biases within our sample of articles, which were perpetuated in journals of all impact factors and related largely to how science is communicated: The large, statistically significant effects were typically showcased in abstracts and summary paragraphs, whereas the lesser effects, especially those that were not statistically significant, were often buried in the main body of reports. Although the tendency to isolate large, significant results in abstracts has been noted elsewhere (Fanelli 2012), here we provide the first empirical evidence of such a trend across a large sample of literature.
We also discovered a temporal pattern to reporting biases, which appeared to be related to seminal events in the climate change community and may reflect a socio-economic driver in the publication record. First, there was a conspicuous rise in the number of climate change publications in the 2 years following IPCC 2007, which likely reflects the rise in popularity (among public and funding agencies) for this field of research and the increased appetite among journal editors to publish these articles. Concurrent with increased publication rates was an increase in reported effect sizes in abstracts. Perhaps a coincidence, the apparent popularity of climate change articles (i.e., number of published articles and reported effect sizes) plummeted shortly after Climategate, when the world media focused its scrutiny on this field of research, and perhaps, popularity in this field waned (Fig. 1). After Climategate, reported effect sizes also dropped, as did the difference in effects reported in abstracts versus main body of articles. The positive effect we see post IPCC 2007, and the negative effect post Climategate, may illustrate a combined effect of editors’ or referees’ publication choices and researchers’ propensity to submit articles or not. However, since meta-analysis is correlative, it does not elucidate the mechanisms underlying observed patterns.
Similar stylistic biases were found when comparing articles from journals with high impact factors to those with low impact factors. High impact factors were associated with significantly larger reported effect sizes (and lower sample sizes; see Fig. 4); these articles also had a significantly larger difference between effects reported in abstracts versus the main body of their reports (Fig. 3). This trend appears to be driven by a small number of journals with large impact factors; however, the result is consistent with those of supplementary studies. For example, our results corroborate with others by showing that high impact journals typically report large effects based on small sample sizes (Fraley and Vazire 2014), and high impact journals have shown publication bias in climate change research (Michaels 2008, and further discussed in Radetzki 2010).
Stylistic biases are less concerning than a systematic tendency to under-report non-significant effects, assuming researchers read entire reports before formulating theories. However, most audiences, especially non-scientific ones, are more likely to read article abstracts or summary paragraphs only, without perusing technical results. The onus to effectively communicate science does not fall entirely on the reader; rather, it is the responsibility of scientists and editors to remain vigilant, to understand how biases may pervade their work, and to be proactive about communicating science to non-technical audiences in transparent and un-biased ways. Ironically, articles in high impact journals are those most cited by other scientists; therefore, the practice of sensationalizing abstracts may bias scientific consensus too, assuming many scientists may also rely too heavily on abstracts during literature reviews and do not spend sufficient time delving into the lesser effects reported elsewhere in articles.
Despite our sincerest aim of using science as an objective and unbiased tool to record natural history, we are reminded that science is a human construct, often driven by human needs to tell a compelling story, to reinforce the positive, and to compete for limited resources—publication trends and communication bias is a proof of that.
We are grateful to Roger Butlin, Christer Brönmark, A. Richard Palmer, Tobias Uller, Charlie Cornwallis and Johannes Persson for constructive advice, and a special thanks to Dean Adams. We are also thankful to Michael MacAskill, Jessica Gurevitch and Shinichi Nakagawa for technical advice. JH was funded by a Marie Curie European Reintegration Grant (PERG08-GA-2010-276915) and by the Royal Physiographic Society in Lund.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflicts of interest.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.