# Data reporting and visualization in ecology

• Original Paper
• Published:

## Abstract

The reporting and graphing of ecological data and statistical results often leave a lot to be desired. One reason can be a misunderstanding or confusion of some basic concepts in statistics such as standard deviation, standard error, margin of error, confidence interval, skewness of distribution and correlation. The implications of having small sample sizes are also often glossed over. In several situations, statistics and associated graphical representations are made for comparing groups of samples, where the issues become even more complex. Here, I aim to clarify these basic concepts and ways of reporting and visualizing summaries of variables in ecological research, both for single variables and for pairs of variables. Specific recommendations about better practice are made, for example describing precision of the mean by the margin of error and bootstrapping to obtain confidence intervals. The role of the logarithmic transformation of positive data is described, as well as its implications in the reporting of results in multiplicative rather than additive form. Comments are also made about ordination plots derived from multivariate analyses, such as principal component analysis and canonical correspondence analysis, with suggested improvements. Some data sets from this Kongsfjord special issue are amongst those used as examples.

This is a preview of subscription content, log in via an institution to check access.

## Subscribe and save

Springer+ Basic
\$34.99 /Month
• Get 10 units per month
• 1 Unit = 1 Article or 1 Chapter
• Cancel anytime

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

## Notes

1. This same least-squares principle is used in a more general form in multivariate ordination methods, to define closest subspaces to a set of points, to be treated later.

2. Strictly speaking, for theoretical reasons, the sum of squared distances (i.e. the sum of the squared deviations from the mean) is divided by n − 1, not the sample size n, to obtain the “average”. In practice, this only makes noticeable differences in the case of very small sample sizes.

3. The lower 0.025 quantile is clearly 0, while the upper 0.975 quantile is based on an interpolation between the 14th and 15th ordered values of 26 and 31, and closer to the 31 than the 26. There are at least nine slightly different ways of computing this interpolation, as detailed in the documentation of the $${\mathsf R}$$ function $${\tt quantile}$$, the default option of which was used to obtain the estimate of 29.25.

## References

• Aitchison J, Brown JAC (1957) The lognormal distribution. Cambridge University Press, Cambridge

• Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol 57:289–300

• Davison AC, Hinkley DV (1997) Bootstrap methods and their applications. Cambridge University Press, Cambridge

• Feng C, Wang H, Lu N, Chen T, He H, Lu Y, Tu XM (2014) Log-transformation and its implications for data analysis. Shanghai Arch Psychiatry 26:105–109

• Gierlinski M (2015) Understanding statistical error: a primer for biologists. Wiley, New York

• Good P (2005) Permutation, parametric and bootstrap tests of hypotheses, 3rd edn. Springer, New York

• Greenacre M (2012) Contribution biplots. J Comput Graph Stat 22:107–122

• Greenacre M, Hastie T (2010) Dynamic visualization of statistical learning in the context of high-dimensional textual data. J Web Semant 8:163–168

• Huenerlage K, Graeve M, Buchholz F (2016) Lipid composition and trophic relationships of krill species in a high Arctic fjord. Polar Biol 39:1803–1817

• Krzywinski M, Altman N (2013) Error bars. Nat Methods 10:921–922

• Land CE (1972) An evaluation of approximate confidence interval estimation methods for lognormal means. Technometrics 14:145–158

• Limpert E, Stahel WE, Abbt M (2001) The log-normal distribution across the sciences. Bioscience 51:341–352

• Millard SP (2013) EnvStats: an R package for environmental statistics. Springer, New York

• Mislan KAS, Heer JM, White EP (2016) Elevating the status of code in ecology. Trends Ecol Evol 31:4–7

• Moran MD (2003) Arguments for rejecting the sequential Bonferroni in ecological studies. Oikos 100:403–405

• O’Hara R, Kotze J (2010) Do not log-transform count data. Methods Ecol Evol 1:118–122

• Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H (2015) Vegan: community ecology package. http://CRAN.R-project.org/package=vegan

• Parkin TB, Chester ST, Robinson JA (1990) Calculating confidence intervals for the mean of a lognormally distributed variable. Soil Sci Soc Am J 54:321–326

• Piquet AM, Maat DS, Confurius-Guns V, Sintes E, Herndl GJ, van de Poll W, Wiencke C, Buma AGJ, Bolhuis H (2016) Springtime dynamics, productivity and activity of prokaryotes in two Arctic fjords. Polar Biol 39:1749–1763

• Quinn GP, Keough MJ (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge

• StataCorp (2015) Stata statistical software: release 14. StataCorp LP, College Station

• Tian L (2005) Inferences on the mean of zero-inflated lognormal data: the generalized variable approach. Stat Med 24:3223–3232

• Tufte ER (2001) The visual display of quantitative information, 2nd edn. Graphics Press, Cheshire

• Voronkov A, Hop H, Gulliksen B (2016) Zoobenthic communities on hard-bottom habitats in Kongsfjorden, Svalbard. Polar Biol. doi:10.1007/s00300-016-1935-9

• Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond bar and line graphs: time for a new data presentation. PLoS Biol 13(4):e1002128. doi:10.1371/journal/pbio.1002128

• Wickham H, Stryjewski L (2011) 40 years of boxplots. http://vita.had.co.nz/papers/boxplots.pdf. Accessed 1 July 2016

## Acknowledgements

The author would like to express his sincere thanks to Haakon Hop for his encouragement and constant constructive feedback related to this article, also to Markus Molis for our many discussions on this topic, as well as Walter Zucchini for additional comments. Thanks are also due to Kim Huenerlage, Andrey Voronkov and Henk Bolhuis for their cooperation in being able to use some of their data from this special Kongsfjorden issue of Polar Biology.

## Author information

Authors

### Corresponding author

Correspondence to Michael Greenacre.

This article belongs to the special issue on the “Kongsfjorden ecosystem—new views after more than a decade of research”, coordinated by Christian Wiencke and Haakon Hop.

## Electronic supplementary material

Below is the link to the electronic supplementary material.

### Video S1

Video animation of the three-dimensional view of Figure 10, showing the true nature of the separation between the ellipsoidal confidence regions (GIF 1869 kb)

### Video S2

Video animation of the CCA ordination of Figure 11 when a third dimension is added. The video pauses when dimension 2 is horizontal and pointing to the right (i.e. Figure 11), and when dimension 3 is horizontal and pointing to the right, which shows that all confidence ellipses overlap on the third dimension (GIF 5084 kb)

## Rights and permissions

Reprints and permissions

Greenacre, M. Data reporting and visualization in ecology. Polar Biol 39, 2189–2205 (2016). https://doi.org/10.1007/s00300-016-2047-2

• Revised:

• Accepted:

• Published:

• Issue Date:

• DOI: https://doi.org/10.1007/s00300-016-2047-2