Background

An understanding of the relationships between plant diversity and vegetation productivity offers insight into plant communities and the goods and services they provide (Darwin 1859; Waide et al. 1999; MEA-team 2005; Braat and de Groot 2012; Harrison et al. 2014; Fanin et al. 2018). In recent decades these relationships have provoked much argument and ultimately this led to improved understanding through experiments with herbaceous communities (e.g., Naeem et al. 1994; Tilman et al. 1996). Nonetheless debate has persisted, from the past (Huston 1997; Hector 1998; Huston et al. 2000; Loreau et al. 2001; Wardle 2001), to the present (Sandau et al. 2017; Wright et al. 2017; Oram et al. 2018).

The relationship between forest diversity and productivity has generated particular interest given concerns over forest degradation and its implications for the global carbon cycle, water, climate and related processes and values (Nadrowski et al. 2010; Edwards et al. 2014; Mori et al. 2017). While large, long-term experiments appear the best way to infer causal relationships and are increasingly being implemented (Tobner et al. 2016; Verheyen et al. 2016; Fichtner et al. 2017; Bruelheide et al. 2019), they remain time-consuming and costly (Leuschner et al. 2009; Wang et al. 2016; Huang et al. 2018; Mori 2018). Furthermore, we must recognise when and how results from planted or modified forests provide insight into natural systems (and visa-versa). Given this context, field observations may also provide valuable insights.

Here, we note various challenges for field based studies under three headings: volume values, disturbance difficulties and inferential inquiries. Our review was stimulated by a high profile 84-author study (Liang et al. 2016), that has already been cited nearly 400 times (Google Scholar January 2020). We use this study for illustration. We find broad lessons for future work and foresee good potential for progress.

Challenges

Volume values: wood volume growth ≠ productivity ≠ value production

Here we explain why changes in forest stand volume are neither a meaningful measure of ecological productivity nor of economic benefits nor of any other clearly defined values which we can identify. We start by considering wood volumes. Equal volumes of wood grown in different forests can differ in mass as mean wood densities vary within and among forests (Baker et al. 2004; Swenson and Enquist 2007; Slik et al. 2008). Species-specific wood densities vary from 0.1 g∙cm− 3 for Ochroma pyramidale (Cav. ex Lam.) Urb. (Malvaceae) to over 1.3 g∙cm− 3 for Guaiacum officinale L. (Zygophyllaceae) and Brosimum rubescens Taub. (Moraceae) (Praciak et al. 2013). Some fast-growing pioneer species possess naturally hollow stems (e.g., Cecropiaceae and Caricaceae). These differences result in variations in mean stem-weighted wood densities among forest communities that can vary over twofold in a given location (Slik et al. 2008).

Differences in wood density relate to various factors including soil conditions and drought tolerance, but also with each species’ typical successional position and ability for rapid growth (van Nieuwstadt and Sheil 2005; Nepstad et al. 2007; Poorter et al. 2019). For example, in wet tropical forests when species are ordered from early through to late succession their characteristic wood densities generally increase and their maximum volume growth decrease (Ter Steege and Hammond 2001; Slik et al. 2008). Thus, mean (volume-, basal area- or stem-weighted) wood density within any wet climatic region tends to be lower in the early stages of secondary regrowth forest when compared to a site comprising relatively undisturbed old growth. As a tree’s carbon costs per unit wood volume are directly related to its wood density (King et al. 2006), volume growth rates also tend to be greater in (younger) post-disturbance forests than in late successional formations dominated by tree species with higher wood density (see blue and red lines in Fig. 1). Typical stand level biomass production changes with succession too. Contrasting patterns of volume growth and wood density may cancel to some degree, typically leading to a rapid early rise to reach high rates of stem biomass production in early succession and a gradual decline through mid- and late-succession (Lasky et al. 2014). In dry tropical forest contrasting trends can occur with denser wood found in early succession, and declining wood densities as the forest matures (Poorter et al. 2019)—whatever the underlying patterns, plot level wood density and disturbance histories are not independent.

Fig. 1
figure 1

Schematic example of how species diversity (S), volume production and mean wood density may co-vary with disturbance and recovery in an example wet forest. The top schematic shows four idealized stages in forest recovery (I–IV) comprised of three species: pioneer, early- and late-successional (after Connell 1978). These species possess characteristic volume-growths and wood-densities indicated by the relative size of the red and blue circles respectively on the adult trees. Species diversity in the four schematic successional stages shows a rise and fall with long-term forest recovery (a peak occurs between II and III when all three species have the potential to co-occur). The central graphic illustrates the rise and fall of diversity (continuous black line), declining volume growth (red dotted line), and increasing mean stand wood density (blue dashed line) with recovery (absence of disturbance) in a wet forest. The two lower figures show the potentially contrasting relationships between diversity, volume growth and wood density that may occur depending on the disturbance histories observed. This schematic is a stylized representation of patterns that will differ among locations (for example, wood densities may decline with succession in dry Neotropical forests)

Forest productivity is defined, measured and estimated, in many ways. All methods involve assumptions, approximations and potential errors and biases (Sheil 1995a, 1995b; Waring et al. 1998; Clark et al. 2001a, 2001b; Chave et al. 2004; Roxburgh et al. 2005; Williams et al. 2005; Litton et al. 2007; Malhi 2012; Sileshi 2014; Talbot et al. 2014; Searle and Chen 2017; Šímová and Storch 2017; Kohyama et al. 2019). Typically, in ecological studies, focused on forest stands (not individual trees), we are interested in net primary production (NPP) or major components of biomass such as above ground woody material—these expressed as mass per unit area per unit time. Large scale studies suggest that forest properties, notably stand age and biomass, explain much of the variation in NPP (estimated annual dry-mass biomass production of root, stem, branch, reproductive structures and foliage) while climate often has surprisingly little influence (Michaletz et al. 2014). Such patterns differ among forests, notably, biomass and biomass production tend to be closely correlated in early succession as a stand establishes and grows to fill space, but this relationship tends to weaken and reverse in advanced succession (Lohbeck et al. 2015; Prado-Junior et al. 2016; Rozendaal et al. 2016).

In some situations, changes in forest volume production may covary with changes in market values. This would be the case when timber of all sizes, species and qualities are bundled together, as may arise when a forest stand is being managed to produce wood fibre or charcoal—but this generalisation is at best an approximation and is seldom true. In most stands not all volume is equally valued or valuable. Sizes matters: only stems with sufficient size and good form yield high value saw logs or veneer. Furthermore, relatively few tree species have high commercial value, especially in the tropics (Plumptre 1996). After a disturbance, it will generally be quicker for a forest to recover in terms of volume of small stemmed pioneers (low value volume), than to regenerate large stems of valuable dense timbered species (high value volume). These preferences are why timber extraction in species rich forests is usually selective: targeting only large stems of certain species. For example, commercial exploitation in Gabon typically involves less than one tree per ha (e.g., average 0.82 according to Medjibe et al. 2011). Note that once the small numbers of valued stems are removed the value of the remaining stand is much lower despite maintaining a similar volume and diversity. Such selective impoverishment has been widespread. An example is the Caribbean regions where high-value mahogany (Swietenia macrophylla King Meliaceae), has long been sought, removed and depleted (Snook 1996). Even where there are opportunities to use a broad range of species, e.g., for charcoal, some stems are still likely to be treated separately as a result of their greater commercial value (Plumptre and Earl 1986).

These stem specific differences in value also explain why silviculture in mixed species forests aims not to improve overall stand volume growth (or diversity) but rather to favour the production of particular species (Dawkins and Philip 1998; Peña-Claros et al. 2008; Doucet et al. 2009). These differences also apply in low diversity systems. Consider a high value teak (Tectona grandis L.) forest and a nearby monoculture stand of an abundant weak hollow stemmed species such as Cecropia (Cecropia peltata L.): volume production in these two stands represents very distinct commercial values. We are unaware of any general studies that indicate that other forest derived values, such as non-timber products, conservation benefits or hydrological function, vary in a predictable manner with stand volume or productivity (or related measures). Indeed indications from studies of stand structure and other forest characteristics such as tree or mammal diversities or palm densities—though seldom examining volume or volume growth—suggest such relationships are unlikely (Clark et al. 1995; Beaudrot et al. 2016; Sullivan et al. 2017). Valid global relationships appear especially implausible. We cannot identify any clearly defined values that vary linearly with volume production.

Disturbance difficulties: how histories influence stand properties

Forests reflect their histories including past disturbances and subsequent recovery. These relationships remain central themes in forest ecology (e.g., Guariguata and Ostertag 2001; Sheil and Burslem 2003; Canham et al. 2004; Royo and Carson 2006; Ghazoul and Sheil 2010; Drake et al. 2011; Seidl et al. 2011; Ding et al. 2012; Gamfeldt et al. 2013; Chazdon 2014; Lasky et al. 2014; Rozendaal and Chazdon 2015; Scheuermann et al. 2018). Indeed, while themes have evolved, the importance of these temporal relationships has long been recognised in both temperate (for example, Transeau 1908; Gleason 1917; Tansley 1920; Phillips 1934; Clements 1936; Watt 1947; Langford and Buell 1969) and tropical contexts (see, e.g., Richards 1939; Eggeling 1947; Greig-Smith 1952; Hewetson 1956; Webb 1958; Aubréville 2015). Even the first major treatise to examine tropical forests in a global manner, presented them in terms of succession and associated characteristics (Richards 1952). It is because of these somewhat predictable patterns, and the range of disturbance histories encompassed in most forest samples, that many stand properties co-vary; for example, stand turnover rates, wood density and tree-diversity (Sheil 1996; Slik et al. 2008). Similarly, because of such underlying relationships, we expect tree-diversity, productivity, and volume growth to co-vary with disturbance history.

Tree species richness typically rises in early succession as species accumulate, and, if conditions-permit, ultimately falls in later stages due to competition and the failure of shade-intolerant species to replace themselves (see black line, S, in Fig. 1). This “humped” or unimodal pattern is the basis for Connell’s Intermediate Disturbance Hypothesis (Connell 1978, 1979; Sheil and Burslem 2003, 2013). Despite debate and frequent misrepresentation, there is general agreement that the original mechanisms proposed by Connell, involving competition-colonization trade-offs among species, are valid and ecologically relevant (e.g., see Fox 2013a, 2013b; Sheil and Burslem 2013; Huston 2014). While other factors play a role too, disturbance dependent mechanisms often contribute to observed patterns of species diversity (Kershaw and Mallik 2013; Huston 2014).

When sampling and disturbance histories permit, both sides of the successional rise-and-fall in the species-richness pattern becomes evident. Consider Guyana, where some areas of forest possess a higher proportion of faster growing but light-timbered species whereas other, typically lower-diversity, forests possess more dense-timbered, slow-growing species (Molino and Sabatier 2001; Ter Steege and Hammond 2001).

Sampling may not capture the full pattern. For example, evaluations of sites representing only the rising section of diversity and succession may be interpreted (incorrectly) as evidence against disturbance playing a positive role in maintaining diversity, when an absence of disturbance would nonetheless lead to a loss of species (for more detailed explanations please see Sheil and Burslem 2003).

Sometimes interpretations remain ambiguous. For example, a study of diversity and successional state across Ghana’s forests found that while the predicted rise and fall diversity patterns were detected across all the major forest types, disturbance appeared to contribute more to maintaining diversity in drier than in wetter sites. The most mature (i.e. apparently old growth) forests found in these wetter sites maintained most (but not all) of the species found in less advanced sites (Bongers et al. 2009). Among drier sites, the most mature forests showed less diversity compared to younger sites. These results cannot distinguish whether other mechanisms contributed more to maintaining diversity in wetter (versus drier) forests or whether there was simply a paucity of sufficiently undisturbed (late successional) examples to show what happens under these conditions (Bongers et al. 2009).

So diversity tends to rise in early succession, reach a peak and may then gradually decline if there is little disturbance. What about volume growth and productivity? After an extreme event, volume and biomass production grow before levelling off and gently declining with stand age (Lorenz and Lal 2010; Goulden et al. 2011; He et al. 2012; Lasky et al. 2014). This can be understood as a result of the initial influx and establishment of fast growing (in wet forest typically low-wood-density) early successional pioneer species, with the forest becoming more efficient at capturing light as the more shade-tolerant, later-successional species also become established. The decline likely results from the reduced efficiency (per unit area) of light interception and photosynthesis of larger (versus smaller) trees (Yoder et al. 1994; Niinemets et al. 2005; Nock et al. 2008; Drake et al. 2011; Quinn and Thomas 2015) and the proportion of energy invested in woody growth (Kaufmann and Ryan 1986; Mencuccini et al. 2005; Thomas 2010).

The consequence of these successional trends is that various stand properties tend to co-vary (see Fig. 1). Depending on if and how the rising and falling section of the relationships are represented in the data, this covariation alone can result in an increase (or decrease) in stand-volume-growth, or productivity, with increasing diversity (see lower insets in Fig. 1). Consider an old-growth forest disturbed by some event that opens the canopy (a windstorm or timber-cutting): fast growing pioneer species that were not previously present can now establish, boosting species numbers and volume growth. Such patterns neither prove nor disprove that diversity bolsters productivity but they show how correlations can arise independently of such relationships.

Can we avoid the complications created by disturbance histories by using basal area as a proxy and including it as a random variable in our analyses? No—while basal area change can be useful as an immediate measure of disturbance there is no simple, one-to-one relationship between basal areas and successional stages or related processes. Partial basal area values can arise in many ways: for example, a value of 80% might result after several years of recovery following a large disturbance (a reduction to less than 50%), more recently after a lesser one (a reduction to 75%), or a cumulative consequence of many small events without sufficient time to recover (each event leading to only a few percent decline). In any case, few stands result from a single disturbance to an otherwise never-disturbed old-growth forest. Most experience a complex mixture of intrinsic and extrinsic disturbances of varying forms and magnitudes that to some degree decouples basal area from composition and other successional responses. Also, while basal area typically recovers quite rapidly this is not necessarily true for other stand characteristics (Rozendaal et al. 2019). Comparisons of regrowth and old-growth show how idiosyncratic recovery is, with many plots surpassing pre-disturbance reference levels of species richness in less than a decade while others don’t reach it in more than one century (see, e.g., Martin et al. 2013). Relatively rapid, but variable, recovery of species richness was also reported from a recent review of Neotropical sites (Rozendaal et al. 2019). For biomass, there is also variation with some sites recovering within a couple of decades and others not reaching original levels in 80 years (Martin et al. 2013; Poorter et al. 2016). Composition typically remains distinct for longer—decades or even centuries (see, e.g., Chai and Tanner 2011; Rozendaal et al. 2019).

How are basal area and successional state related? As basal area and compositional maturity both decline as a result of disturbance, and recover subsequently, we might expect that these variables would track each other yielding a clear positive relationship, but this is not necessarily the case. In forests subjected to repeated disturbance, basal area can become decoupled from composition. For example, consider any system in which stand basal area and composition (percentage of late successional species) both recover after disturbance, and in which stems can persist for decades, and subject it to just one disturbance: basal area and (some years later) composition will subsequently recover towards their pre-disturbance levels (Fig. 2 upper panel). Now subject this same system to stochastic disturbances over an extended period: if the disturbance events are sufficiently frequent and severe, any relationship between basal area and composition is readily obscured (see Fig. 2 lower panel). Our point here is not to identify specific conditions under which such decoupling occurs in a specific model—this will reflect many factors including both the vegetation persistence and response lag-times as well as details of the disturbance—but to recognise that it can plausibly do so in a wide range of cases that arise in nature. Studies of managed forests also show that, while some patterns appear typical, the nuances of stand structure, age and productivity cannot be readily captured in any one variable (Liira et al. 2007). For such reasons incorporating basal area, or similar univariate stand properties, in the analysis may influence results but not remove the impact of disturbance.

Fig. 2
figure 2

Outputs from a simple simulation model in which “basal area” and “composition” (percentage late successional species) both recover after disturbance. Composition involves persistence (the composition of surviving stems is unchanged immediately following disturbance), lag-times and integration (the composition of recruits depends on canopy openness over previous years with more early successional species surviving in more open conditions). a shows the simulated response over 400 years where a single event removes 90% of basal area in the tenth year. b shows an example where, from year ten onwards, disturbances occur with a 5% probability each year. If a disturbance event occurs it removes a randomly generated fraction of basal area between 0 to 100% (skewed to lower values). While both basal area and our measure of composition decline with disturbance, and increase with recovery, the Pearson product moment correlations (r) between these variables are often negative (as in the example)

We are not claiming that succession provides a simple explanation of community change. Succession is only predictable in part (Norden et al. 2015). Patterns can be complex, context dependent and idiosyncratic (Chazdon 2003; Ghazoul and Sheil 2010; Sheil 2016; Bendix et al. 2017). They may include alternative pathways, or stall (Royo and Carson 2006; Norden et al. 2011; Tymen et al. 2016; Arroyo-Rodríguez et al. 2017; Ssali et al. 2017). Nonetheless, these patterns—however manifested—may be sufficiently consistent to influence statistically defined relationships among stand properties like diversity, volume growth and productivity.

We cannot understand forests separate from their disturbance histories. The importance of sampling and context mean that generalisations may not be readily transferrable from one data set to another unless we know, and can account for, such histories. Thus, we need to consider these factors explicitly and be wary of generalisations that neglect them. Simple fixes are unlikely to be effective.

Inferential inquiries: conclusions about causation

Determining causality has been a theme in the philosophy of science since Aristotle (Holland 1986)—and has fuelled analytical innovations concerning the ability to infer and assess causal effects using both experimental and non-experimental observations (e.g., Freedman 2006; Cox 2018). While some issues remain contested (see, e.g., Pearl 2018) there is broad consensus that correlation alone should not be assumed as strong evidence of causation in non-experimental data (Höfer et al. 2004) and statistical methods used to “draw causal inferences are distinct from those used to draw associational inferences” (Holland 1986). While many will consider this obvious, the prevalence and persistence of the problem justifies concern.

So, if we find it, how should we interpret a positive correlation between species diversity and productivity? Potential explanations abound. Maybe greater diversity causes greater productivity. This could result from a niche interpretation in which a greater diversity of species use resources more effectively due to their complementary use of resources in space and time (del Río et al. 2017; Williams et al. 2017; Lu et al. 2018). It could also result if species which occurred at lower abundances (as occurs for an average species in richer communities) tend to have greater productivity than common species, through “rare species advantage” (Bachelot and Kobe 2013) permitting better growth and productivity than in lower diversity systems (Mangan et al. 2010; LaManna et al. 2016). It can also arise through a “sampling effect” in which communities with more taxa are more likely to include high-productivity species (Huston 1997).

Maybe, rather than diversity facilitating productivity it is productivity that facilitates diversity (Waide et al. 1999; Coomes et al. 2009; Jucker et al. 2018). For example, there are data indicating that taller forests occur on richer, presumably more productive, soils (at least in Africa and Asia, Yang et al. 2016), and also that, all else being equal within a given region, taller forests tend to contain more species than shorter forests (Huston 1994; Duivenvoorden 1996).

A positive correlation could also result from shared causes. For example, both diversity and productivity may vary with climate, soil nutrients or disturbance histories (see previous section). Stem densities and numbers are also a plausible explanation, as the count of individuals is an upper bound to the possible number of species (Hurlbert 1971; Colwell et al. 2012) and denser forest also tends to be more productive (Michaletz et al. 2014) at least in early succession (Lohbeck et al. 2015; Prado-Junior et al. 2016; Rozendaal et al. 2016). In any case, stem numbers and related measures can vary due to sampling noise making any such recorded variables non-independent—with such influences being particularly important when plots are small (Colwell et al. 2012). Positive correlations could arise from more complex relationships too, for example, when observations span only the left-hand (e.g. low productivity) part of a unimodal rise-and-fall relationship where productivity determines diversity (Tilman 1982), or result from more complex sampling effects (see, e.g., Waide et al. 1999; Chase and Leibold 2002).

Explanations and mechanisms are not exclusive and may be valid concurrently. Grassland studies indicate that differences in diversity can be simultaneously a cause and a consequence of differences in productivity (e.g., Grace et al. 2016).

We should also expect interactions amongst causes, effects and mechanisms. For example, many of the underlying processes will respond to climate (Fei et al. 2018). Or, to take a more specific example, we know that the responses and the effect of disturbance will vary with the local species—and we know that this can be determined by context. Consider, for example, the forests of the islands of Krakatoa versus the Sumatra mainland where though many tree species are shared, many mainland species, including the regionally dominant dipterocarps have failed to re-establish on the islands since the 1883 eruption due to dispersal limitation which has limited the convergence of the regrowth forest (Whittaker et al. 1997). Mechanisms vary too. For example, niche complementarity can vary with composition, context (Fichtner et al. 2017) and disturbance history in both temperate and tropical forests (Lasky et al. 2014; Gough et al. 2016).

We don’t dispute that diversity generally contributes to increased productivity. That has been demonstrated many times in various systems including forests (Wang et al. 2016; Fichtner et al. 2017; Huang et al. 2018; Mori 2018). But that doesn’t mean that this contribution alone determines the relationship between tree diversity and stand productivity. Correlations arise in many ways. Recognising the multiple processes that might generate and influence a correlation between diversity and productivity is essential for correct interpretation.

Case study

Liang et al. (2016) presented a global evaluation of tree diversity versus what they called “productivity” and inferred that greater diversity leads to greater productivity. They use this result to estimate the economic value of the diversity in forests. By way of context, they argued the need for “accurate valuation of global biodiversity”. They quantified plot level tree species richness and volume productivity using 777,126 tree plots from 44 countries (the plots contain more than 30 million trees from 8737 identified species, coverage is uneven with tropical forests being poorly represented). They used various analytical approaches, including spatially constrained resampling and regression. From these, they inferred a worldwide positive effect of tree species richness on tree volume productivity that varies somewhat among regions. They used the derived relationship to estimate that the economic value of biodiversity in maintaining commercial forest productivity is more than double the total estimated cost of effective conservation of all terrestrial ecosystems (between 166 billion and 490 billion USD$∙yr− 1). So how does this study measure up against our concerns?

Volume values

Liang et al. (2016) estimated changes in stem volume (m3∙ha− 1∙yr− 1 ) as their measure of productivity and associated values. This measure contrasts with more conventional assessments of productivity that consider rates of change in biomass or carbon stocks. Liang et al. defend their choice by noting that volume is easier to estimate and is sufficient for their goal of summarising overall forest product values. Assuming a linear relationship between stem volume productivity and “values” (we remain unclear how these values are circumscribed and what they represent—see below) they use their relationships to estimate that an evenly distributed worldwide decrease of tree species richness of 10% would reduce volume, and associated value, productivity by 2.1 to 3.1% which, using two alternative values for forest production, equates to costs of USD$ 13–23 billion per year. Volume growth is a poor proxy for timber value or carbon gains. Questions over which values might relate adequately to volume growth, and how, were debated previously. We will not repeat the details here (see, Barrett et al. 2016; Paul and Knoke 2016). Our view is that the implied values are ill-defined and the underlying assumptions and relationships undemonstrated. We highlight that volume increment does not provide a meaningful measure of primary productivity nor does it equate to an increment in economic values.

Disturbance difficulties

Liang et al. (2016) sought to eliminate the influence of disturbance by excluding plots where forest harvest had exceeded 50 % of stocking volume and by including basal area as a random variable in their analyses. Having taken these steps, they gave disturbance and recovery no further consideration. This is inadequate. There is no evidence that disturbance effects diversity and productivity only once stocking is reduced by over 50%, or that basal area is a valid and consistent—let alone sufficient—measure of succession (see previous discussions, and Fig. 2). The patterns they observe remain influenced by disturbance histories (as suggested in Fig. 1).

Inferential inquiries

Liang et al. (2016) find that, in general, higher diversity is associated with higher productivity. They interpret this as showing that greater diversity causes greater productivity and favour a niche interpretation. Indeed, this causation is assumed when they define the biodiversity-productivity-relationship as “the effect of biodiversity on ecosystem productivity”. While such a relationship likely exists, their estimates should be treated with caution as alternative influences and explanations remain unexamined.

Discussion and conclusions

We have highlighted pitfalls in the study of forest diversity and productivity and illustrated our concerns by showing that these faults are manifested in a highly cited, multi-author study in a prestigious journal. These pitfalls include the use of volume production as a measure of productivity and value; the neglect of disturbance histories; and interpreting a simple correlation as causal when other explanations exist. The problems would presumably be recognised and rectified given time, but in the meantime we observe these studies being cited as if they are established fact (see, e.g., Bruelheide et al. 2019). In fairness, we note that the problems we have detailed may not be common, and there are many more nuanced analyses in the literature (e.g., for USA forests, Fei et al. 2018). Nonetheless, that may change if flawed studies become influential. For example, we note that Luo et al. (2019), like Liang et al. (2016), adopt the same implicit causal assumptions and disregard alternatives. Forewarned is forearmed.

Our list of pitfalls and concerns is not exhaustive (see, e.g., Dormann et al. 2019). Other studies raise other concerns too. For example, one reviewer suggested that we assess some studies using European forest data: Jucker and colleagues concluded that aboveground stand biomass growth (not volume as in Liang et al. 2016), increased with tree stand species richness (Fig. 7 in Jucker et al. 2014a and Fig. 2 and Figure S11 in Jucker et al. 2016). Yet they also determined that neither stand basal area nor stem densities varied with species richness (Fig. S7 in Jucker et al. 2015 and Fig. S4 in Jucker et al. 2016) and indicated that neither mortality nor thinning varied with richness too (Jucker et al. 2014b, 2015). This raises questions concerning how biomass production can vary if basal area, mortality and thinning do not. As the reviewer noted, the claim that mortality does not vary with diversity may be the critical issue given results from other studies (for example, Liang et al. 2005, 2007; Lasky et al. 2014). Furthermore, the researchers suggested that canopy packing increased in response to species mixing and disregarded silvicultural activities as a possible cause (Jucker et al. 2015). However, such packing can be promoted by thinning: foresters often wish to ensure retained crop trees have the space and conditions for good growth and identify and remove trees that interfere with others or are likely to be supressed. Even if limited thinning occurred early in the stands' development the effects could be long lasting and could subsequently appear to result from “species mixing” alone. We cannot judge these suggestions from available information. Clearly there is much still to clarify and in the meantime all such studies should be treated with skepticism—even if they are published in reputable journals.

Returning to our concerns with Liang et al. (2016): how can such problems go unrecognised by authors, reviewers or editors in a high profile peer reviewed article? In particular, the simplistic causal inference? Part of the explanation may be prevalence and plausibility. Clearly, views can differ and—while we make no claim to be beyond such criticisms—we advocate less tolerance of causal claims based on plausibility. While correlations can be interesting and important we should be aware and explicit what we assume, infer and claim.

It is recognised in health and social sciences that elegant studies can gain undue prestige despite their failings (Ioannidis 2005; Smaldino and McElreath 2016; Camerer et al. 2018; Huebschmann et al. 2019). Our own numerous examples (e.g., Sheil 1995, 1996; Sheil et al. 1999, 2013, 2016, 2019; Sheil and Wunder 2002; Makarieva et al. 2014), and many others, suggest similar processes in other sciences including ecology, environment and climate. We all appreciate short, elegant articles but there is a cost to such simplification when key nuances and shortcomings are ignored or brushed aside. When presenting forest and biodiversity sciences to a wide readership we (authors, reviewers, editors and readers) must maintain our standards in terms of self-critical framing and interpretation. We know that the relationships between diversity and productivity are likely to be complex—as indeed much of the debate over previous studies indicates (Huston 1997; Sandau et al. 2017; Wright et al. 2017; Fei et al. 2018). In such contexts, we should beware simplicity.

Despite our concerns, field observations remain valuable. While formal experiments are essential for controlling and clarifying many aspects of the diversity-productivity relationship for trees and forests, field observations offer additional insights.

Furthermore, our concerns about disturbance histories and successional influences can be addressed with a thorough evaluation of available data. For example, the influence of disturbance histories on forest diversity, productivity and other characteristics can be explored through permanent plots and other available data (see, e.g., Rozendaal and Chazdon 2015; Li et al. 2018; Scheuermann et al. 2018). Linking stand characteristics to known histories should also aid more general characterisation. The understanding available from such analyses when combined with field experiments and critical reflection offers further insights into forests communities and their values. In this sense, we support calls for a detailed and nuanced appraisal of how plant diversity contributes to biomass production and other ecosystem properties (Adair et al. 2018).