Reproducibility is essential for reliable, robust, and rigorous scientific research. It has been found that the majority of non-clinical research (up to 89%) in the life sciences, including pharmacology, is not reproducible (Freedman et al. 2015). Low reproducibility rates of non-clinical studies result in waste of money, time, and effort and may be considered inappropriate use of experimental animals. Lack of reproducibility is also a major concern for successful translation of non-clinical into clinical studies. Poor translation of promising treatment into patients has undesirable effects on both human health and the economy (Erdogan and Michel 2020). Several factors including various types of biases such as selection, performance, detection, attrition, and publication bias have been identified to contribute to poor reproducibility in non-clinical research. In recent years, many applicable guidelines emphasize that pre-specification of various elements such as study design, execution, data analysis, and reporting are needed to avoid biases and enhance reproducibility of research (Vollert et al. 2020).

The importance of adequate graphical depiction of data in biomedicine has been emphasized before (Franzblau and Chung 2012) but not linked to reproducibility. It was highlighted outside of biomedicine, for instance in chemistry (Szafir 2018), sustainability research (Cüre et al. 2020) and accounting (Burgess et al. 2008), how frequently used types of graphical representation may mislead readers. However, this aspect has gained little attention in current guidelines related to reproducibility (Vollert et al. 2020). Using two examples from public media and outside of life sciences, we show how unit of measure and scaling of y-axis can cause biased perception of results, a phenomenon we propose to call perception bias.

Figure 1 shows the results from a 2019 state election in Germany. All 3 depictions of the election outcome are factually correct, and all three have been shown on federal, public service media. However, depending on unit of measure of the y-axis, each panel creates a very different perception on which party has “won”: while the red party got the greatest share of votes (Fig. 1a); compared with the election 5 years ago, the brown party gained most percentage points (Fig. 1b), and the yellow party had the largest % increase (Fig. 1c). Unit of measure is a common issue in biomedicine, very often as part of normalization. For instance, protein expression data are often shown after normalization for expression of a reference gene product. This can be appropriate if expression of the reference gene product is stable but can become misleading if it is not. Thus, treatment of human embryonic kidney cells stably transfected with β3-adrenoceptors with the agonist isoprenaline reduced the abundance of the G-protein α-subunits of Gi1 and Gs and did not change that of Gi2 (Michel-Reher and Michel 2013). Concomitantly such treatment reduced the abundance of GAPDH by about 30% (Michel-Reher and Michel 2015), which is frequently use to normalize data derived from immunoblots. Had such normalization been applied to the G-protein expression data, the reductions in Gi1 and Gs would have disappeared, and the lack of change of Gi2 turned into a decrease, i.e., a very different conclusion reached. A second example is the normalization of contraction data from organ bath experiments based on size of the tissue strips (Erdogan et al. 2020). Comparing contraction of urinary bladder strips from young and old rats, a reduced response to the agonist carbachol had been observed if contraction had been normalized to strip weight, but no such difference was found when it had been normalized to strip length (Schneider et al. 2005). A third and final example comes from the diabetes field. While animal models of type 1 diabetes typically have a reduced body weight, those of type 2 diabetes in most cases have an increased body weight; accordingly, the urinary bladder appears reduced or unchanged when normalized for body weight, but unchanged or increased, respectively, when not adjusted for body weight (Ellenbroek et al. 2018). Since the denominator body weight differentially affects type 1 and 2 diabetes, the decision to use this denominator biases findings from type 1 models towards finding greater and those from type 2 models towards finding smaller enlargement.

Fig. 1
figure 1

Results of election in the German state of Brandenburg held on 1.9.2019. a Share of vote. b Percentage point change of votes in comparison with 2014 values. c % change of votes in comparison with 2014 values (Statistisches Bundesamt 2019). (Figures were generated using GraphPad Prism, version 8.3)

Figure 2 is an example from a leading German daily newspaper, Die Welt. It shows reduction in share of cars with a diesel engine among newly registered cars in Germany. The graph as published in the newspaper (left panel) uses a y-axis starting at 40%, and the regression line creates the perception that cars with a diesel engine will cease to exist soon (needless to say this did not happen). If the same data are plotted using a neutral y-axis scale starting at 0 (right panel), the declining trend remains clear, but the decline is much less steep, and there is no perception that diesel engines will cease to exist in the foreseeable future. A variation of this theme is the choice of aspect ratio, i.e., whether a graph is wide and short or tall and skinny, which can create distinct impressions. Getting back to the specific example shown here, the extrapolation of the correlation line beyond the measured data (as provided in the original publication) further creates a perception that would only be justified if past events can be extrapolated into the distant future, which is generally accepted to be untrue.

Fig. 2
figure 2

Percent of newly registered cars powered by a diesel engine in Germany. a Redrawn based on the originally published graph (i.e., identical scaling of y-axis) (Anonymous 2017). b Redrawn using y-axis starting at 0. (Figures were generated using GraphPad Prism, version 8.3)

These simple examples highlight how easily unit of measure and scaling of y-axis affect our perception about the results—even for scientists being used to accessing them on a daily basis. Thus, presentation of data has an impact on what the findings convey to readers/reviewers and can cause misleading impressions about outcomes of the study. Both real life examples demonstrate that choice of unit of measure (Fig. 1) and/or scaling of y-axis (Fig. 2) can cause bias in the perception and interpretation of the data. In analogy to other type of biases which are well known to effect reproducibility of non-clinical studies, we propose to call this phenomenon perception bias.

Based on these considerations, we propose the following:

  • As with other types of bias, pre-specification of what and how to show experimental findings in the study protocol is an essential protection against bias in the graphical representation of data. However, we realize that this is not always feasible, particularly in exploratory studies. If pre-specification is not feasible, a second-best option is to specify the rules how unit of measure and scaling of y-axis will be determined. If neither is feasible, authors should carefully consider whether their choice of graphical representation may unduly nudge readers to one of several interpretations of the data.

  • Axis scaling should cover the full range of the data. The scale should be biologically or clinically meaningful. The default should be to start at 0 for all ratio variables; while exceptions from this rule can be very adequate (for instance when displaying mammalian body temperature, heart rate, or blood pressure), it is upon authors to explain why axis scaling does not start at 0. For instance, it has been argued that specific y-axis scaling may be appropriate to avoid overlapping in graphics (In and Lee 2017). It should be avoided to emphasize group differences by a short y-axis if the observed differences relative to normal variability of a parameter are small.

  • Absolute changes describe something different than relative changes (see Fig. 1). Depending on the scientific question at hand, either can be justified but the choice should not be driven by the desirability of the outcome. When in doubt, both should be presented.

  • Finally, within a given article authors should be as consistent as possible in the way they graphically represent the data.

We hope that a more conscientious approach to choice of unit of measure and scale of axes will lead to more transparency in reporting and thereby help to improve reproducibility.