Skip to main content
Log in

Data reporting and visualization in ecology

  • Original Paper
  • Published:
Polar Biology Aims and scope Submit manuscript

Abstract

The reporting and graphing of ecological data and statistical results often leave a lot to be desired. One reason can be a misunderstanding or confusion of some basic concepts in statistics such as standard deviation, standard error, margin of error, confidence interval, skewness of distribution and correlation. The implications of having small sample sizes are also often glossed over. In several situations, statistics and associated graphical representations are made for comparing groups of samples, where the issues become even more complex. Here, I aim to clarify these basic concepts and ways of reporting and visualizing summaries of variables in ecological research, both for single variables and for pairs of variables. Specific recommendations about better practice are made, for example describing precision of the mean by the margin of error and bootstrapping to obtain confidence intervals. The role of the logarithmic transformation of positive data is described, as well as its implications in the reporting of results in multiplicative rather than additive form. Comments are also made about ordination plots derived from multivariate analyses, such as principal component analysis and canonical correspondence analysis, with suggested improvements. Some data sets from this Kongsfjord special issue are amongst those used as examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. This same least-squares principle is used in a more general form in multivariate ordination methods, to define closest subspaces to a set of points, to be treated later.

  2. Strictly speaking, for theoretical reasons, the sum of squared distances (i.e. the sum of the squared deviations from the mean) is divided by n − 1, not the sample size n, to obtain the “average”. In practice, this only makes noticeable differences in the case of very small sample sizes.

  3. The lower 0.025 quantile is clearly 0, while the upper 0.975 quantile is based on an interpolation between the 14th and 15th ordered values of 26 and 31, and closer to the 31 than the 26. There are at least nine slightly different ways of computing this interpolation, as detailed in the documentation of the \({\mathsf R}\) function \({\tt quantile}\), the default option of which was used to obtain the estimate of 29.25.

References

  • Aitchison J, Brown JAC (1957) The lognormal distribution. Cambridge University Press, Cambridge

    Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol 57:289–300

    Google Scholar 

  • Davison AC, Hinkley DV (1997) Bootstrap methods and their applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Feng C, Wang H, Lu N, Chen T, He H, Lu Y, Tu XM (2014) Log-transformation and its implications for data analysis. Shanghai Arch Psychiatry 26:105–109

    PubMed  PubMed Central  Google Scholar 

  • Gierlinski M (2015) Understanding statistical error: a primer for biologists. Wiley, New York

    Google Scholar 

  • Good P (2005) Permutation, parametric and bootstrap tests of hypotheses, 3rd edn. Springer, New York

    Google Scholar 

  • Greenacre M (2012) Contribution biplots. J Comput Graph Stat 22:107–122

    Article  Google Scholar 

  • Greenacre M, Hastie T (2010) Dynamic visualization of statistical learning in the context of high-dimensional textual data. J Web Semant 8:163–168

    Article  Google Scholar 

  • Huenerlage K, Graeve M, Buchholz F (2016) Lipid composition and trophic relationships of krill species in a high Arctic fjord. Polar Biol 39:1803–1817

    Article  Google Scholar 

  • Krzywinski M, Altman N (2013) Error bars. Nat Methods 10:921–922

    Article  CAS  PubMed  Google Scholar 

  • Land CE (1972) An evaluation of approximate confidence interval estimation methods for lognormal means. Technometrics 14:145–158

    Article  Google Scholar 

  • Limpert E, Stahel WE, Abbt M (2001) The log-normal distribution across the sciences. Bioscience 51:341–352

    Article  Google Scholar 

  • Millard SP (2013) EnvStats: an R package for environmental statistics. Springer, New York

    Book  Google Scholar 

  • Mislan KAS, Heer JM, White EP (2016) Elevating the status of code in ecology. Trends Ecol Evol 31:4–7

    Article  CAS  PubMed  Google Scholar 

  • Moran MD (2003) Arguments for rejecting the sequential Bonferroni in ecological studies. Oikos 100:403–405

    Article  Google Scholar 

  • O’Hara R, Kotze J (2010) Do not log-transform count data. Methods Ecol Evol 1:118–122

    Article  Google Scholar 

  • Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H (2015) Vegan: community ecology package. http://CRAN.R-project.org/package=vegan

  • Parkin TB, Chester ST, Robinson JA (1990) Calculating confidence intervals for the mean of a lognormally distributed variable. Soil Sci Soc Am J 54:321–326

    Article  Google Scholar 

  • Piquet AM, Maat DS, Confurius-Guns V, Sintes E, Herndl GJ, van de Poll W, Wiencke C, Buma AGJ, Bolhuis H (2016) Springtime dynamics, productivity and activity of prokaryotes in two Arctic fjords. Polar Biol 39:1749–1763

    Article  Google Scholar 

  • Quinn GP, Keough MJ (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • StataCorp (2015) Stata statistical software: release 14. StataCorp LP, College Station

    Google Scholar 

  • Tian L (2005) Inferences on the mean of zero-inflated lognormal data: the generalized variable approach. Stat Med 24:3223–3232

    Article  PubMed  Google Scholar 

  • Tufte ER (2001) The visual display of quantitative information, 2nd edn. Graphics Press, Cheshire

    Google Scholar 

  • Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading

    Google Scholar 

  • Voronkov A, Hop H, Gulliksen B (2016) Zoobenthic communities on hard-bottom habitats in Kongsfjorden, Svalbard. Polar Biol. doi:10.1007/s00300-016-1935-9

    Google Scholar 

  • Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond bar and line graphs: time for a new data presentation. PLoS Biol 13(4):e1002128. doi:10.1371/journal/pbio.1002128

    Article  PubMed  PubMed Central  Google Scholar 

  • Wickham H, Stryjewski L (2011) 40 years of boxplots. http://vita.had.co.nz/papers/boxplots.pdf. Accessed 1 July 2016

Download references

Acknowledgements

The author would like to express his sincere thanks to Haakon Hop for his encouragement and constant constructive feedback related to this article, also to Markus Molis for our many discussions on this topic, as well as Walter Zucchini for additional comments. Thanks are also due to Kim Huenerlage, Andrey Voronkov and Henk Bolhuis for their cooperation in being able to use some of their data from this special Kongsfjorden issue of Polar Biology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Greenacre.

Additional information

This article belongs to the special issue on the “Kongsfjorden ecosystem—new views after more than a decade of research”, coordinated by Christian Wiencke and Haakon Hop.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Video S1

Video animation of the three-dimensional view of Figure 10, showing the true nature of the separation between the ellipsoidal confidence regions (GIF 1869 kb)

Video S2

Video animation of the CCA ordination of Figure 11 when a third dimension is added. The video pauses when dimension 2 is horizontal and pointing to the right (i.e. Figure 11), and when dimension 3 is horizontal and pointing to the right, which shows that all confidence ellipses overlap on the third dimension (GIF 5084 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Greenacre, M. Data reporting and visualization in ecology. Polar Biol 39, 2189–2205 (2016). https://doi.org/10.1007/s00300-016-2047-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00300-016-2047-2

Keywords

Navigation