Choice of y-axis can mislead readers

Using two examples from the non-scientific literature, we show how choice of unit of measure and scaling of y-axis can caused a biased perception of data, a phenomenon we propose to call perception bias. We recommend to pre-specify unit of measure or how it will be determined, whether outcome variables will be shown as absolute or relative/normalized changes, and to typically start y-axis at 0 for ratio variables.

2018), sustainability research (Cüre et al. 2020) and accounting (Burgess et al. 2008), how frequently used types of graphical representation may mislead readers. However, this aspect has gained little attention in current guidelines related to reproducibility (Vollert et al. 2020). Using two examples from public media and outside of life sciences, we show how unit of measure and scaling of y-axis can cause biased perception of results, a phenomenon we propose to call perception bias. Figure 1 shows the results from a 2019 state election in Germany. All 3 depictions of the election outcome are factually correct, and all three have been shown on federal, public service media. However, depending on unit of measure of the y-axis, each panel creates a very different perception on which party has "won": while the red party got the greatest share of votes (Fig. 1a); compared with the election 5 years ago, the brown party gained most percentage points (Fig. 1b), and the yellow party had the largest % increase (Fig. 1c). Unit of measure is a common issue in biomedicine, very often as part of normalization. For instance, protein expression data are often shown after normalization for expression of a reference gene product. This can be appropriate if expression of the reference gene product is stable but can become misleading if it is not. Thus, treatment of human embryonic kidney cells stably transfected with β 3 -adrenoceptors with the agonist isoprenaline reduced the abundance of the G-protein α-subunits of G i1 and G s and did not change that of G i2 (Michel-Reher and Michel 2013). Concomitantly such treatment reduced the abundance of GAPDH by about 30% (Michel-Reher and Michel 2015), which is frequently use to normalize data derived from immunoblots. Had such normalization been applied to the G-protein expression data, the reductions in G i1 and G s would have disappeared, and the lack of change of G i2 turned into a decrease, i.e., a very different conclusion reached. A second example is the normalization of contraction data from organ bath experiments based on size of the tissue strips (Erdogan et al. 2020). Comparing contraction of urinary bladder strips from young and old rats, a reduced response to the agonist carbachol had been observed if contraction had been normalized to strip weight, but no such difference was found when it had been normalized to strip length (Schneider et al. 2005). A third and final example comes from the diabetes field. While animal models of type 1 diabetes typically have a reduced body weight, those of type 2 diabetes in most cases have an increased body weight; accordingly, the urinary bladder appears reduced or unchanged when normalized for body weight, but unchanged or increased, respectively, when not adjusted for body weight (Ellenbroek et al. 2018). Since the denominator body weight differentially affects type 1 and 2 diabetes, the decision to use this denominator biases findings from type 1 models towards finding greater and those from type 2 models towards finding smaller enlargement. Figure 2 is an example from a leading German daily newspaper, Die Welt. It shows reduction in share of cars with a diesel engine among newly registered cars in Germany. The graph as published in the newspaper (left panel) uses a y-axis starting at 40%, and the regression line creates the perception that cars with a diesel engine will cease to exist soon (needless to say this did not happen). If the same data are plotted using a neutral y-axis scale starting at 0 (right panel), the declining trend remains clear, but the decline is much less steep, and there is no perception that diesel engines will cease to exist in the foreseeable future. A variation of this theme is the choice of aspect ratio, i.e., whether a graph is wide and short or tall and skinny, which can create distinct impressions. Getting back to the specific example shown here, the extrapolation of the correlation line beyond the measured data (as provided in the original publication) further creates a perception that would only be justified if past events can be extrapolated into the distant future, which is generally accepted to be untrue.
These simple examples highlight how easily unit of measure and scaling of y-axis affect our perception about the resultseven for scientists being used to accessing them on a daily basis. Thus, presentation of data has an impact on what the findings convey to readers/reviewers and can cause misleading impressions about outcomes of the study. Both real life examples demonstrate that choice of unit of measure (Fig. 1) and/or scaling of y-axis (Fig. 2) can cause bias in the perception and interpretation of the data. In analogy to other type of biases which are well known to effect reproducibility of non-clinical studies, we propose to call this phenomenon perception bias.
Based on these considerations, we propose the following: -As with other types of bias, pre-specification of what and how to show experimental findings in the study protocol is an essential protection against bias in the graphical representation of data. However, we realize that this is not always feasible, particularly in exploratory studies. If prespecification is not feasible, a second-best option is to specify the rules how unit of measure and scaling of y-axis will be determined. If neither is feasible, authors should carefully consider whether their choice of graphical representation may unduly nudge readers to one of several interpretations of the data. -Axis scaling should cover the full range of the data. The scale should be biologically or clinically meaningful. The default should be to start at 0 for all ratio variables; while exceptions from this rule can be very adequate (for instance when displaying mammalian body temperature, heart rate, or blood pressure), it is upon authors to explain why axis scaling does not start at 0. For instance, it has been argued that specific y-axis scaling may be appropriate to avoid overlapping in graphics (In and Lee 2017). It should be avoided to emphasize group differences by a short y-axis if the observed differences relative to normal variability of a parameter are small. -Absolute changes describe something different than relative changes (see Fig. 1). Depending on the scientific question at hand, either can be justified but the choice should not be driven by the desirability of the outcome.
When in doubt, both should be presented. -Finally, within a given article authors should be as consistent as possible in the way they graphically represent the data.
We hope that a more conscientious approach to choice of unit of measure and scale of axes will lead to more transparency in reporting and thereby help to improve reproducibility.