Background

Since the beginning of the industrial era, anthropogenic activities have led to increased atmospheric concentrations of greenhouse gases, which are currently reaching their highest levels in the last 800,000 years [1]. This has resulted in a measureable increase in both global air and water temperatures—a trend that is projected to intensify through the end of the twenty first century [2]. Global warming is poised to be one of the most serious threats to aquatic ecosystems, both marine [1] and freshwater [3]. Various ecological responses to warming have been documented and demonstrated experimentally at all levels of biological organization, including changes in species distribution, phenology, growth, and metabolism, as well as in community structure, biodiversity and ecosystem functions [1, 3].

Understanding and predicting the consequences of climate change on species, communities, and ecosystems is challenging. It can be difficult to disentangle climate-driven signals from natural variability, especially when combined with other environmental stressors such as eutrophication and pollution [1]. Despite these difficulties, major scientific efforts (i.e., substantial research investments in the form of experiments) have been made to elucidate the ecological effects of climate change using a range of methods: extrapolation, experiments, game-theory population models, phenomenological models, expert opinion, and outcome-driven modelling and scenarios [4]. Among experimental approaches, mesocosms have become progressively more popular as they narrow the gap between smaller-scale, less realistic, microcosm experiments, and larger-scale, more complex, natural systems, in which mechanistic relationships are often difficult to identify [5].

Eugene P. Odum first coined the term ‘mesocosm’ to describe replicated experimental setups of a moderate size, where ‘parts (populations) and wholes (ecosystems) can be investigated simultaneously by a team of researchers’ [6]. Over the years, the term ‘mesocosm’ has been arbitrarily used to specify experimental enclosures of varied shape and volume, from one to thousands of litres [7]. Today, aquatic mesocosms are used in marine, estuarine, and freshwater systems. Enclosures of pelagic waters, in the laboratory (e.g., [8]), outdoors, or in situ (e.g., [9]) have been used to test the effects of warming on plankton communities. Benthic mesocosms have long been used in shallow freshwater ecosystems to establish mechanistic relationships between various stressors and population, community, or ecosystem dynamics [10]. More recently, benthic mesocosms have also been developed for marine environments to test the effect of warming, acidification, eutrophication and hypoxia on shallow coastal ecosystems while allowing for natural fluctuations, thus increasing realism [11].

Despite their widespread use in ecological studies, mesocosm experiments have been criticized as unrealistic and simplistic representations of ecological processes, producing results with limited relevance and applicability to natural ecosystems [12]. Mesocosm dimensions (volume), shape, settings, experimental duration, replication-level, and other design characteristics serve as confounding factors that can strongly influence the measured experimental effect [13]. To date, the most comprehensive review of mesocosm experiments as a tool for ecological climate change research was conducted by Stewart et al. in 2013 [5]. This extensive review covers terrestrial, marine and freshwater experiments, describes advantages and caveats of mesocosm methodologies, and illustrates the number of mesocosm studies in different categories. However, it does not provide quantitative measures of how ecological and methodological characteristics may influence the effect sizes measured in these studies, both in magnitude and direction. Therefore, a new quantitative evaluation of the contribution of experimental mesocosms to climate change research is both timely and vital to better understanding the limitations and caveats associated with such an approach.

Objective of this review

This systematic review is aimed at identifying the type, direction, and strength of species, community, and ecosystem responses to experimental warming in aquatic mesocosms. We will also investigate the context-dependency of observed effects on several a priori ecological and methodological moderators. This global review will cover existing studies conducted in aquatic ecosystems (i.e., marine, estuarine, and freshwater), across all biogeographical regions, and with all species. Studies will be considered included based upon the criteria described below.

Primary question

What are the type, direction, and strength of species, community, and ecosystem responses to warming in aquatic mesocosm studies?

Secondary question

How do experimental characteristics of aquatic mesocosms change the direction and magnitude of effect sizes in climate change research?

The list of components that will help guide the search and analysis of the extracted data is shown in Table 1.

Table 1 Guiding criteria for the literature search

Methods

Search strategy

We will use a pre-determined list of keywords to search for relevant studies in the academic databases Web of Science, Scopus and Google Scholar, and in non-academic websites using Google Custom International Governmental Organizations (IGO) search (https://cse.google.com/cse/home?cx=006748068166572874491:55ez0c3j3ey) and Google Custom Non-Governmental Organizations (NGO) search (https://cse.google.com/cse/home?cx=012681683249965267634:q4g16p05-ao). The list of search terms and Boolean operators that will be used to identify relevant aquatic mesocosm studies is provided in Table 2. Within each category (population, exposure and intervention, outcomes), the search terms will be combined in parentheses and separated using the Boolean operator ‘OR’. These categories will then be combined using the Boolean operator ‘AND’. An asterisk (*) indicates a ‘wildcard’, which allows databases to include multiple words with different prefixes or suffixes; for example, estuar* captures [estuary OR estuaries OR estuarine]. Quotation marks (“”) around two or more words restrict the search to instances where that exact phrase occurs.

Table 2 The search strings that will be used for the review

While reading the full-text publications we will look for further relevant material (e.g., cited papers) that may include useful data for this systematic review that were missed in our search of publication databases. In the case of papers reporting incomplete information, we will attempt to obtain the relevant information by contacting the authors. The resulting list of publications will be managed using reference management software (Mendeley), which will be used to eliminate redundant publications.

The degree of comprehensiveness of the search strategy and its ability to identify all relevant articles will be assessed using sensitivity analyses [14].

Article screening and study inclusion criteria

The eligibility of the articles obtained by the aforementioned search for the final analysis will be assessed via a set of inclusion criteria at three successive levels: title, abstract and full-text. First, we will evaluate articles by title to remove citations spuriously returned by our search. Next, we will evaluate the remaining citations based on their abstracts to further remove unrelated citations. At this stage, all participating reviewers will assess an identical subset of the articles (5%), and a Kappa inter-rater agreement statistic [15] will be calculated based on the assessments. If the statistic indicates that reviewers are inconsistent in their assessment of article relevance, discrepancies will be discussed and the inclusion criteria will be clarified or revised to ensure that consistent methods are utilized by all authors. We will iterate this process until the computed Kappa statistic exceeds 0.6 [15]. Finally, the full text of the remaining articles will be evaluated for the meta-analysis. If it is unclear whether an article meets the inclusion criteria at an initial level of screening, it will be included for evaluation at the next level of the systematic review. A table listing all articles excluded at full text stage with reasons for exclusion (based on the inclusion criteria) will be provided as a supplementary for the systematic review.

Selected publications must contain the following information: (1) replication level/sample size, (2) averages (arithmetic means) of control and treatment groups, and (3) variance estimates (as standard deviation, standard error or confidence interval). Further evaluations will be based on whether populations, exposures, comparators, outcomes, and study types meet the following criteria:

Relevant populations

Any aquatic species, population or community, including marine, brackish, and freshwater systems.

Relevant exposures

An experiment that manipulates water temperature (warming) and is conducted in a mesocosm setup. We will deem all replicated experimental setups whose volume is equal to or larger than 1 L as mesocosms [7].

Relevant comparators

(1) Experiments comparing “treated” (warmed) and “control” (ambient temperature) conditions (CI); (2) Experiments comparing “before” (ambient temperature) and “after” (warmed) conditions (BA).

Relevant outcomes

We will search for a broad range of outcomes (i.e., ecological responses): (1) changes in species richness, evenness, and diversity, (2) changes in species and community metabolism (productivity, respiration, calcification), (3) changes in species survival, mortality, size and growth, (4) changes in nutrient flux (carbon, nitrogen, phosphorus, sulfur), and (5) changes in species and communities resilience, stability or resistance.

Study quality assessment

Studies that have passed the inclusion criteria described above will be subject to an evaluation for bias. Susceptibility to bias will be defined by any of the following factors: lack of true replication, lack of methodological information (e.g. sample size), uninterpretable outcomes, and difficulty in interpreting exposure (mesocosm setup) and intervention (warming treatment) data. Based on assessing these criteria, studies will be categorised as having high, medium or low susceptibility to bias. Studies with high susceptibility will be excluded from the review. The list of studies, their level of susceptibility to bias (high, medium, low) and the categorization justification will be provided in the systematic review.

Data extraction and effect size calculation

Means, sample sizes and variance estimators will be extracted directly from the text and tables, or from figures using image analysis software (e.g., ImageJ). All three data components must be reported for a study to be included. Hedges’ g [16] will be used to calculate the effect size. Hedges’ g is the unbiased mean difference estimator, which estimates the difference in the response variable between the ‘treatment’ (i.e., warmed mesocosm units) and control (ambient temperature) groups. This measure is standardized by the within-group standard deviation, penalizing studies with large variances and/or few observations. In substance, this estimator transforms all effect sizes to a common metric, thus enabling the calculation of summary effects across data that may have been captured on different scales [17]. All extracted data records will be made available as additional files.

Potential effect modifiers and reasons for heterogeneity

Study-level modifiers may contribute to the variation in effect size and can thus be regarded as potential effect-moderators [18]. These modifiers can be related to either the characteristics of the studied species/habitats/regions or the methodology used. For each outcome category we will define “characteristic” categorical moderators, e.g., system type (lentic, lotic, marine, estuarine), community type (benthic/pelagic/both), mesocosm size (volume), experimental duration, replication type and level (gradient, repeated treatment), mesocosm settings (indoors/outdoors/in situ), experimental design (closed/open/semi-open-system), water source (natural, artificial), focal taxa, focal taxa size, number of trophic levels, number of species, biogeographic region (marine) or ecoregion (freshwater and estuarine), latitude, longitude, crossed manipulation (none, acidification, nutrient addition/depletion, exposure to invasive species, oxygen depletion, toxins/pollutants, feeding regime, exposure to disease/parasites, salination, flow, precipitation and sea level). Each of these attributes will be identified for each study, as relevant.

Data synthesis and presentation

The effect size estimates from individual studies will be aggregated using the ‘metafor’ package in R [19], and presented in forest plots. Assuming heterogeneous studies, the summary effect in each category will be calculated using a random-effects model. Funnel plots and the Trim and Fill algorithm [20] will be used to evaluate publication bias. To assess the relationship between potential effect-moderators and the effect size within each category, we will perform subgroup analyses using a mixed-effects model structure. These subgroup analyses will show which moderators, if any, have the most impact on mesocosm experimental design, thereby informing aquatic scientists wishing to plan mesocosm experiments. Our results will also provide guidelines for interpretation of climate warming experiments by scientists, policy-makers, and the general public.