What can crop stable isotopes ever do for us? An experimental perspective on using cereal carbon stable isotope values for reconstructing water availability in semi-arid and arid environments

This study re-assesses and refines the use of crop carbon stable isotope values (Δ13C) to reconstruct past water availability. Triticum turgidum ssp. durum (durum wheat), Hordeum vulgare (six-row barley) and Sorghum bicolor (sorghum) were experimentally grown at three crop research stations in Jordan for up to three years under five different irrigation regimes: 0% (rainfall only), 40%, 80%, 100% and 120% of the crops’ optimum water requirements. The results show a large variation in carbon stable isotope values of crops that received similar amounts of water, either as absolute water input or as percentage of crop requirements. We conclude that C3 crop carbon stable isotope composition should be assessed using a climate zone specific framework. In addition, we argue that interpretation should be done in terms of extremely high values showing an abundance of water versus low values indicating water stress, with values in between these extremes best interpreted in conjunction with other proxy evidence. Carbon stable isotope values of the C4 crop Sorghum were not found to be useful for the reconstruction of water availability.


Introduction
The reconstruction of water availability is essential for understanding past societies, especially in semi-arid and arid environments where fluctuations in aridity can have considerable effects on food production and, by implication, social and economic security. Profound droughts have, for example, been linked to the abandonment of sites as well as social, economic or political 'collapse' (Kaniewski et al. 2013;Weiss 2015). Water management strategies, such as floodwater farming and irrigation, have been employed since prehistory to ensure stable harvests and to generate agricultural surplus (Finlayson et al. 2011), both of which arguably underpin the development of complex societies. The reconstruction of past water availability is therefore central to many important questions in archaeology.
The application of stable carbon isotope discrimination (Δ 13 C) of archaeobotanical remains to infer past water availability was pioneered in the 1990s (Araus and Buxó 1993;Araus et al. 1997a, b) and has since been regularly applied (Ferrio et al. 2005;Araus et al. 2007;Fiorentino et al. 2008; 1 3 Riehl et al. 2008;Roberts et al. 2011;Masi et al. 2014;Caracuta et al. 2015;Mora-González et al. 2018). The advantages of the method are that crop remains are frequently present in the archaeological record, can be directly dated, and are easily linked to other archaeological remains. The carbon stable isotope composition of plant tissues primarily reflects water availability, so water management can be inferred by integrating the isotope data with palaeoclimate indicators for the time period in question. If the crop isotope signal suggests that water availability was greater than expected according to the climate proxies, it is likely that the crops received water in a form other than rainfall, such as through irrigation, artificial watering or by cultivation on alluvial fans (Ferrio et al. 2005).
The method has a solid theoretical basis (Farquhar et al. 1982(Farquhar et al. , 1989. During uptake and assimilation of CO 2 , plants discriminate against the heavier isotope 13 C. The magnitude of discrimination is largely dependent on the plants' photosynthetic pathway (C 3 , C 4 , CAM), but is also affected by environmental factors, most notably water availability, as confirmed by studies on modern plants such as wheat and barley (Craufurd et al. 1991;Araus et al. 1997aAraus et al. , b, 1999Merah et al. 2001;Ferrio et al. 2005Ferrio et al. , 2007Monneveux et al. 2005;Wallace et al. 2013). Following this, archaeobotanical crop Δ 13 C values have been linked with specific amounts of water input through a regression equation (Araus et al. 1997b(Araus et al. , 2014. However, slopes and intercepts of the regression lines vary between studies (ESM 1), indicating that crop Δ 13 C values are not solely determined by water input. Indeed, research has shown that large variability can exist in Δ 13 C of crops grown under similar amounts of rainfall or irrigation (Flohr et al. 2011;Wallace et al. 2013), especially in semi-arid and arid environments (Riehl et al. 2014).
This variability could be due to related factors including evapotranspiration, which is in turn affected by temperature and wind speed, and soil characteristics like soil type, depth and ability to retain water, as is further investigated in this paper. In addition, plant Δ 13 C has been shown to be affected by several other variables not directly related to water availability, most notably salinity (Isla et al. 1998;Shaheen and Hood-Nowotny 2005;Yousfi et al. 2010), temperature, also when unrelated to water availability (O'Leary 1995), light intensity (Mulkey 1986;O'Leary 1995;Yakir and Israeli 1995) and nutrient supply (Toft et al. 1989;Choi et al. 2005;Cabrera-Bosquet et al. 2007Serret et al. 2008) (but see Condon et al. 1992 and references therein). A correlation of Δ 13 C with altitude has also been demonstrated (Körner et al. 1988(Körner et al. , 1991Sparks andEhleringer 1997, but see Friend et al. 1989;van de Water et al. 2002;Wang et al. 2010), although this is probably the result of a combination of different environmental factors which vary with altitude, most notably precipitation and temperature (Friend et al. 1989). In environments with closed canopies, such as dense forests, as much as 5-8‰ higher plant Δ 13 C values have been observed, possibly because of the recycling of 13 C depleted CO 2 in such environments and/or low light intensities, in combination with physiological factors (van der Merwe and Medina 1991). This "canopy effect" is unlikely to affect crops grown in open fields.
While especially in semi-arid and arid environments water availability is often the most limiting factor for plant growth and development, and consequently often overrides the effects of other variables (Salisbury and Ross 1992), such effects may nonetheless introduce (additional) variability. While such variation can be accounted for in modern situations where the specifics of growing conditions, such as soil characteristics, temperature, humidity or seasonality of rainfall are well documented, it poses a problem for the interpretation of archaeobotanical remains, where many of these factors are unknown. Consequently, Wallace et al. (2013) introduced bands of broad levels of crop water status (poorly watered, moderately watered, well watered) as an interpretative model to understand crop Δ 13 C values, and this has since been applied to archaeological samples by a number of researchers (Riehl et al. 2014;Vaiglova et al. 2014;Wallace et al. 2015;Styring et al. 2017). Cut-off points between these bands currently vary, however, with the cut-off point between moderately and well-watered for barley set at 18.5‰ based on field observations (Wallace et al. 2013), or 17‰ extrapolated from published observations and adjusted for a different environment (Riehl et al. 2014).
Given the uncertainty that therefore still exists over the interpretation of stable isotope data from archaeological crop remains, there is clearly a need for further empirical studies of the relationship between carbon stable isotope discrimination in plants and irrigation levels, in order to define how to use Δ 13 C values to reconstruct past water availability. To this end, this paper uses a comprehensive data set of cereals which were experimentally grown for this purpose under five different irrigation regimes at three locations in Jordan for up to three years. Unlike previous studies, which largely focused on modern varieties grown in greenhouses or collected from fields with limited control over, or no monitoring of, water inputs, our experiments used traditional landraces of Triticum turgidum ssp. durum (durum wheat) and Hordeum vulgare (sixrow barley) grown under controlled conditions outdoors and with daily monitoring of environmental factors, thus addressing important criticisms of existing data sets (Fiorentino et al. 2015). In addition to the C 3 crops wheat and barley, the experiments also included the C 4 crop Sorghum bicolor (sorghum). While it has been argued that there is no theoretical basis for considerable impact of water availability on the Δ 13 C values of C 4 plants (Farquhar 1983), such an effect has nonetheless been observed in several different C 4 taxa (Ghannoum et al. 2002;Wang et al. 2005;Buchmann et al. 2006;An et al. 2015). The present study was designed to test this further.

3
Average values for a smaller sample of wheat and barley and a single growing season of sorghum have been reported previously (Flohr et al. 2011;Stokes et al. 2011), but this paper presents the entire dataset for the first time. In our previous research it was noted that the variation observed within single irrigation regimes, thus between crops receiving similar amounts of their optimal water requirements, was very large between different crop growing sites, which we ascribed to environmental differences between the locations (Flohr et al. 2011). We therefore concluded that a general regression equation linking crop Δ 13 C values with specific amounts of water was not useful. However, we did not offer a clear alternative, nor did we take fully into account that, because of the outdoor setting of the experiments, crops did not always receive the amount of water indicated by their irrigation regime treatment (see "Materials and methods"). The first aim of the current paper is therefore to (re-)analyse the now available complete dataset to reinterpret, test and statistically explain our results. Moreover, with various interpretative frameworks currently in use by researchers, namely regression equation or bands with various cut-off points, we will use our data to test these frameworks and formulate new recommendations for future practice.

Crop growing experiments
Native landraces of Triticum turgidum ssp. durum, ACSAD 65 (durum wheat) and Hordeum vulgare, ACSAD 176 (sixrow hulled barley), were grown over three consecutive years (2005)(2006)(2007)(2008) at each of three different NCARE crop growing stations in Jordan, Deir 'Alla, Ramtha and Khirbet as-Samra ( Fig. 1; Mithen et al. 2008). In addition, the C 4 plant Sorghum bicolor (sorghum) was grown at Deir 'Alla, Ramtha and at a farm near Salt for 2 years (2009)(2010). The sorghum was purchased in Amman, as it was unfortunately not possible to get a sufficient amount of sorghum seeds from a seed bank in time. Because too few sorghum seeds developed in the first year, more seeds had to be acquired the following year, and it cannot be excluded that these were of a different variety.
All the crop growing stations are located in the north of Jordan, but differ significantly in their micro-environments (Table 1; ESM 2 Tables S1-S2). With the exception of Salt, environmental conditions were closely monitored at or close to the sites, and soil characteristics have been intensively studied (Carr 2011, ESM 2 Table S3). Salinity largely remained within acceptable minimum levels of 4.5 dS/m for wheat (Acevedo et al. 2002) and 5 dS/m for barley (Katerji et al. 2006). No fertilizers were applied at any of the sites during the experiments, although part of the water used for irrigation at the three crop growing stations was reclaimed waste water, which contained plant beneficial nutrients (Carr 2011, ESM 2 Table S4).
The crops were grown in 5 × 5 m plots and received different amounts of water, applied through drip irrigation (Fig. 2). While this is not exactly comparable to ancient or traditional forms of water management (like canal, floodwater, or inundation irrigation), it ensured that as much of the applied water as possible reached the plants, thus avoiding as much as possible 'noise' caused by immediate evaporation. As such, these experiments form a more controlled addition to research looking into the effects of traditional water management on Δ 13 C of crops, while at the same time still providing a field-based setting; because they were controlled and closely monitored, it is possible to assess the workings of the method more closely, which was our main aim.
In the first year, four irrigation regimes were applied: 0% (rainfall only), 80%, 100%, and 120% of the crops' optimum water requirements. In the second and third seasons, and for both seasons of sorghum cultivation, a 40% regime was added. On the day of sowing an additional 25 m 3 /d of water was applied to each plot (included in our analyses where relevant). The crop water requirements were calculated on a weekly basis according to FAO guidelines as follows: ET c (crop optimum water requirements) = K c × E pan × K p × K r ∕irrigation effect, where K c is the crop coefficient, E pan is pan evaporation in mm (evaporation measured using a pan holding a set amount of water), K p is the pan coefficient, K r is the soil evaporation reduction coefficient, and the irrigation effect corrects for the amount of surface wetted by irrigation (Allen et al. 1998). Irrigation to be applied was calculated taking into account ET c and precipitation. Detailed information can be found in ESM 3. It is important to note here that K c combines the effect of crop transpiration and soil evaporation, and is defined by plant species (but is the same for wheat and barley), by growing stage (initial, development, mid, and end/ harvest stage), and climatic averages for a region (Doorenbos and Pruitt 1977;Allen et al. 1998).
There were differences in the amount of water added to the plots, with some fields receiving more irrigation water than others, even though rainfall and evaporation were similar (ESM 2 Table S1; ESM 4). Such seeming discrepancies result from site specific differences, such as the growing cycle of the plants expressed in differences in K c , which varied due to non-water site-specific variables such as temperature. At times, weekly rainfall exceeded crop water requirements, which could lead to the crops receiving more water than intended for short intervals. This was an unavoidable consequence of growing the plants outside in fields rather than in greenhouses, but detailed monitoring meant it was possible to take this into account in our interpretation of the results (ESM 3, 4).
Wheat and barley were grown from autumn to spring, sown in November/December and harvested in April/May (ESM 2 Table S1). Sorghum, requiring higher temperatures, was grown from April until September. Within the growing 1 3 season, different growing stages are recognised: the initial, development, mid and end stages (Doorenbos and Pruitt 1977;Allen et al. 1998), after which crops were left unirrigated to dry out (referred to in this paper as the 'drying stage', which follows the 'end stage'). The end stage is the stage from the start of maturation until full maturity. Grain filling takes place during the latter half of the mid stage and the early end stage. In some cases, netting was applied at the end of the season to prevent the grains being eaten by birds (ESM 2 Table S5). The crops were harvested in 50 cm grids to assess and even out intra-plot variation.

Sampling and sample preparation
Analyses using different numbers of grains showed that a minimum of seven grains should be averaged or homogenised per sample (ESM 3), in rough agreement with the number of six established by Riehl et al. (2008). Three samples of ten grains each were analysed per plot for barley, with a 'plot' defined as a specific combination of site, irrigation regime and growing season, for example the 100% barley plot at Ramtha in 2005-2006. Grains were randomly selected from several different ears, using randomly generated numbers which corresponded to grain positions on the ear, but avoiding unripe grains, and made up of plants from different grid quadrants within each plot. Replication within each plot was very good, with a standard deviation for each mostly well below the expected natural intra-plot variation of 0.5‰ (Wallace et al. 2013). Tests with wheat showed a similarly good replication, so that only one sample of ten grains per plot was analysed for the remainder of the wheat plots. The sampling protocol for sorghum was the same as for wheat and barley; however, due to a scarcity of grains in some plots, a minimum of eight grains per sample and plot was homogenised. Because of the low number of grain samples, sorghum chaff was also analysed for Deir 'Alla 2009, 2010 and Ramtha 2010. All grains were washed in deionised water to remove surface contamination, frozen, freeze-dried, ground and homogenised with a mortar and pestle. This homogenisation is important because of differences in the isotopic composition of grain components. Our own tests found a mean difference of 0.4‰ in ∆ 13 C between wheat grain endosperm and seed coat, while Heaton et al. (2009) observed a difference as large as 1‰.
Approximately 1 mg of homogenised grain was weighed into tin capsules for the carbon isotope analyses. Samples were analysed as duplicates on a Sercon Europa Geo 20-20 CF-IRMS (isotope radio mass spectrometer) coupled to a Sercon elemental analyser in the School of Archaeology, Geography and Environmental Sciences at the University of Reading. Analytical precision of 1 standard deviation, calculated from repeat analyses of internal reference materials, including a flour standard, which are calibrated against international standards, was 0.1‰ or less.

Statistical analyses
The results were analysed using Genstat v. 13. Contrast analysis was used to assess and compare the effects of different irrigation levels within and between sites. Contrasts are more relevant than other post-hoc analyses (tests used to determine if statistically significant differences between groups exist) in situations where each of the different levels of a treatment is expected to have an effect (https ://www.vsni.co.uk/produ cts/genst at/htmlh elp/anova /Multi pleCo mpari sons.htm and advice from the statistical advisory service at the University of Reading). The effects of different environmental factors on isotopic composition were assessed by using stepwise linear regression, taking into account minimum, maximum and average temperature, relative humidity, rainfall, applied irrigation in mm, irrigation regime (as % of plants' optimum water input), total water input (rainfall + irrigation), evaporation, irradiance, vapour pressure deficit (VPD) and soil nutrients. Where applicable, these variables were assessed for the whole growing season (November/December to April/May), the grain filling period, calculated as 40 days before the end of irrigation to be comparable to other studies (ESM 3), as well as for each of the different growing stages (initial, development, mid, end and harvest) (Doorenbos and Pruitt 1977) and for combinations of these.

Results and discussion
Crop Δ 13 C and water availability Barley and wheat Δ 13 C and irrigation regime Figures 3, 4a, b and Table 2 present the Δ 13 C values of barley and wheat grains (full results in ESM 2 Table S8). As expected, a positive relationship exists overall between the irrigation regime and Δ 13 C in barley and wheat grains when using data from all sites combined, although the correlation is only moderately strong to weak (barley p < 0.001, r 2 = 0.58; wheat p = 0.04, r 2 = 0.11; Table 3). For barley, there are significant differences between the average Δ 13 C values from the rain-fed and each of the irrigated plots, as well as between moderately irrigated (40%) and fully irrigated (100% and 120%) plots (Table 4). For wheat, however, the only significant difference is between the rain-fed and 120% irrigation plots. If data from each site are analysed separately, the correlation between irrigation regime and Δ 13 C is much stronger, explaining between 43% and 95% of variation ( Fig. 3; Table 3). For barley, but not wheat, inter-annual variation 1 3 in Δ 13 C is also significant at Khirbet-as-Samra and Ramtha (p < 0.001), but not Deir 'Alla (p = 0.09). Δ 13 C of wheat is significantly lower than barley at Khirbet as-Samra and Ramtha (ANOVA, p < 0.001 and p = 0.002, respectively), but not at Deir 'Alla (p = 0.566). This reflects the increased drought stress for wheat at these sites (Flohr et al. 2011). A similar difference between wheat and barley grown under the same conditions was also found in other studies (Araus et al. 1997b;Ferrio et al. 2005;Wallace et al. 2013).
Δ 13 C and 'actual irrigation regime' As explained above, the outdoor setting of the experiments meant that at times of significant rainfall the plots received more water than intended. In order to account for this, we recalculated how much of the crop requirements each plot actually received, termed here 'actual irrigation regime' (for details see ESM 3, 4 and 2 Table S9). After recalculation, for the combined data set, the correlation between actual irrigation regime and grain Δ 13 C remains significant and moderately strong for barley (p < 0.001, r 2 = 0.56). For wheat, the correlation of grain Δ 13 C with 'actual irrigation regime' became stronger after recalculation, although it remains weak (p < 0.001, r 2 = 0.29). Within each site, however, correlations are less strong. Nevertheless, when the data for 'actual irrigation regime' are grouped into three categories of 0-50% (water stressed), 50-100% (moderate amount of water) and > 100% (abundance of water), there is a much clearer distinction between the groups than when using the original irrigation regimes, for which the additional rainfall had not fully been taken into account, hereafter 'nominal irrigation regime' (Fig. 4b). For barley, all three groups are significantly different from each other, while for wheat the 0-50% and 50-100% categories are significantly different from the > 100% irrigation category (Table 4). It should be acknowledged, however, that the 0-50% category is currently based on only four plots per crop, as some of the 0% plots did not produce any grains, or it became evident after calculation of the 'actual irrigation regime' that some of these plots had actually received more than 50% of their optimal water requirements. It  is suggested that sample sizes for this category (0-50%) should ideally be increased in future experiments. Δ 13 C and water input While it could be expected that 'actual irrigation regime' should give the best correlation with Δ 13 C, as it takes into account water inputs, losses and crop requirements, it is nonetheless also relevant to assess the effect of water input only (rainfall + irrigation in mm) on Δ 13 C, as a correlation between the two has been shown in other studies and used to compute water status in archaeological samples (Araus et al. 1997b(Araus et al. , 2014. In our study, water input and Δ 13 C are significantly correlated for wheat and barley (p < 0.001), but again overall only moderately so for barley (r 2 = 0.61) and weakly for wheat (r 2 = 0.35). Unexpectedly, this is a slightly stronger correlation than between the pooled Δ 13 C data set and either the nominal or actual irrigation regime. With the exception of Deir 'Alla, correlations within the individual sites tend to be stronger here too (Fig. 5, ESM 2 Table S10).
Grain filling stage Because grains form and ripen during the latter part of the growing season (the grain filling stage), a number of studies have used water input only during this period to establish a relationship with carbon isotope discrimination (Araus et al. 1997b). Our results indeed indicate that, in most cases, water inputs during the latter half  1 3 of the season have the greatest influence on Δ 13 C. At site level, the relationship between Δ 13 C and water input during the mid, end and/or drying stages is significant, whilst this is either not, or less strongly, the case for the initial and development stages (ESM 2 Table S10). Nonetheless, in many cases water input before the grain filling period, that is during the early mid stage and at times also the earlier stages, or over the entire growing season, explain variation as well as or sometimes even better than water input during the grain filling period alone. This has also been observed in other studies (Sayre et al. 1995;Monneveux et al. 2005Monneveux et al. , 2006Wallace et al. 2013) and can probably be explained by remobilization of earlier fixated carbon within the plant and by water retained in the soil (Heaton et al. 2009;Wallace et al. 2013). Consequently, it can probably be concluded that carbon stable isotope composition of grain reflects not only water input during the grain filling period, but for a larger part of the growing season. This makes the method more suitable for inferring water management practices in the past, as irrigation may not necessarily have always been applied during the relatively short grain filling period.

Sorghum
No significant differences exist between sorghum grown under the 40%, 80%, 100% and 120% nominal irrigation regimes (Table 2). Interestingly, for both grains and chaff, Δ 13 C values for the 0% plots are significantly higher than for the other regimes (Table 3), although absolute differences are small (0.4-0.8‰ and 0.3-0.5‰ for grains and chaff, respectively). Significant variation was observed between sites and different years, the exact causes of which could not be determined even after extensive analysis, although it is likely that differences in the environmental settings played a role. The lack of a relationship between Δ 13 C and water availability in the irrigated plots and the potential, but very  Henderson et al. (1998) measuring a small effect in sorghum. Because discrimination is small to start with, the effects of any environmental variables will only be minor (Farquhar 1983;Farquhar et al. 1989;Henderson et al. 1998). The low slope of the fitted curve and the fact that it levels out from 40% onwards indicates that it would be hard to use this effect for archaeological investigations, even more so because inter-site and interannual differences are large (up to 1.6‰).
In contrast to C 3 plants, where un-irrigated values were lower than those of irrigated plants, Δ 13 C values of sorghum from the 0% regimes were higher than those from the irrigated plots. This observation is in agreement with other studies of C 4 plants (Williams et al. 2001;Ghannoum et al. 2002) and is explained by the fact that in the C 4 photosynthetic pathway, the amount of discrimination is determined not solely by internal CO 2 concentration (c i ) but also by additional factors, mainly bundle sheath leakiness. This can result in higher rather than lower Δ 13 C when c i decreases (Farquhar 1983;Henderson et al. 1992).

Explaining variation in Δ 13 C: new insights
Like other studies, this investigation has clearly shown that there is an effect of water availability, measured both in terms of water input and irrigation regime, which takes account of water losses as well as inputs, on the Δ 13 C values of C 3 but not C 4 crops. Our study, however, also demonstrates that the overall correlation for the combined data-set from all three sites is weak, especially for wheat. It therefore appears that neither irrigation regime nor water input, although they explain part of the variation, are the sole determining factors for cereal Δ 13 C. Instead, correlations are much stronger within each site than for the pooled data, which clearly indicates that factors that differ between the sites are responsible for the remaining variation. The sites differ considerably in their environment, especially in soils, precipitation and temperature. While the environmental factors at each site were relatively stable, these also varied between years of cultivation (Table 1, ESM 2 Table S1).

3
Although these differences were only significant for barley at Ramtha and Khirbet as-Samra (p < 0.001), they can give additional clues to the sources of variation affecting carbon isotope discrimination. In order to explain the variation, the relationship of Δ 13 C with multiple parameters in the sites' micro-climate was investigated. Using multivariate statistical analyses (GLM; stepwise regression), and taking into account plant physiology when interpreting statistical results, the most likely causes for the observed inter-site variation were identified as rainfall, temperature and perhaps nutrient availability (Table 5).
Precipitation is, after the irrigation regime, the most important factor (although it is acknowledged that they are interrelated), both in its amount and its regularity. Rainfall was taken into account when calculating how much irrigation water to apply, and any rainfall surplus was taken into account in the recalculated (actual) irrigation regimes (see "Materials and methods"). Nonetheless, this was not sufficient to explain all the remaining variation, and an additional effect from water retained in the soil at wetter sites is likely. In addition, the timing and periodicity of rainfall affects the amount of water actually available to a plant. For example, although many instances of small amounts of rain may not add up to a large total water input, they still ensure that a plant is not unduly water stressed. On the other hand, large, sudden bursts of rainfall would give a large water input, but much of this would not be available to the plants due to evaporation and drainage, leaving the plants prone to drought again after several days. While it makes the interpretation of Δ 13 C values more complicated, this 'amountand-regularity effect' of rainfall interestingly strengthens the climatic signal of plant Δ 13 C. The wettest site, Deir 'Alla, had the highest ('wettest') Δ 13 C values, while the driest site, Khirbet as-Samra the lowest ('driest').
A second important factor is temperature, which mainly explains differences between Deir 'Alla on the one hand and Ramtha and Khirbet as-Samra on the other. There is no simple, linear relationship between temperature and grain Δ 13 C; rather, temperature determines how quickly a plant develops (Acevedo et al. 2002;Fitter and Hay 2002), which indirectly affects how much water is available. Deir 'Alla experiences relatively mild winters, so the cereals develop quicker there and tend to ripen before the dry spring starts. At the other sites, most notably at Khirbet as-Samra which has the lowest minimum temperatures, plant development takes longer and the plants were still ripening during the drier spring. This explanation is supported by our field observations as well as by the fact that minimum temperature during the development stage explains more variation than average or maximum temperature or temperature during other parts of the growing season (Table 5).
A detailed report on nitrogen isotopes in the crops in our study will be published separately, but it is interesting to note that grain nitrogen content (%N) is negatively correlated to Δ 13 C, and is, in our statistical model, a significant factor in explaining Δ 13 C (Table 5, ESM 2 Table S11). It is possible that grain %N reflects soil nitrogen content (Serret et al. 2008), although such a relationship was not clearly found in other studies (Daniel and Triboï 2002;Fraser et al. 2011). δ 15 N, another and potentially more reliable measure of soil nitrogen content, was correlated with %N as well as with Δ 13 C at Khirbet as-Samra and Deir 'Alla, but not at Ramtha, which was the only site not to have been previously manured and not to receive nitrogen through its irrigation water (Flohr et al., in prep.). Unfortunately, %N of the soils was not measured, so it is still not clear what %N of grain signifies and why it is correlated to Δ 13 C; consequently, it can neither be concluded nor excluded at the moment that Water input (precipitation + applied irrigation in mm) and Δ 13 C for wheat (top) and barley (bottom) during the last part of the growing season (left) and the whole growing season (right). Khirbet as-Samra, open triangles; Ramtha, grey squares; Deir 'Alla, black lozenges nutrient availability has an effect on Δ 13 C (as reported by Cabrera-Bosquet et al. 2007;Serret et al. 2008).

How can archaeobotanical Δ 13 C values from arid and semi-arid regions be interpreted?
It can be concluded that water availability has a clear impact on Δ 13 C values in wheat and barley grain, although not in a straightforward way. Because of the effect of various often interlinked environmental variables, the same Δ 13 C value can reflect different amounts of water input or percentages of the crops' water requirements. When analysing archaeobotanical remains, past environmental factors will mostly be unknown, and their effects cannot therefore easily be untangled. For each individual experimental plot, a narrative explaining the variation might be constructed, using the detailed environmental information that we have available; however, this narrative would differ in each instance, and be based on information (daily or weekly rainfall/irrigation/ temperature/evaporation/humidity etc.) that is not available for the archaeological samples. Even though the factors that are responsible for the observed variation in our study are mainly related to actual water availability, such as rainfall patterns or temperature, which affect crop development, the complex interrelationships between these variables make it difficult to interpret specific results. Our study area, the Southern Levant, where very different environments are found within relatively close distances, is a case in point here. Nonetheless, large crop Δ 13 C variation within and between sites was also observed in Syria and Greece (Wallace et al. 2013), where environmental variation is not quite as stark, so the effects of various environmental factors are important to take into account everywhere.
We therefore argue that at least in semi-arid and arid environments, a generally applicable regression equation that links Δ 13 C to specific levels of water input is unlikely to be useful. When regression equations established in other studies (Araus et al. 1997b(Araus et al. , 2014 are applied to our data set, large discrepancies between calculated and actual water input exist, with the equations sometimes vastly (frequently by plus or minus > 100%) over-or underestimating how much water the crops received during grain filling (ESM Table 5 Results of statistical stepwise regression analysis of best fitting models with Genstat for all sites combined (where r 2 was highest) for all three sites together Linear fits are given, except when another type of model would increase r 2 by more than 0.03. Analyses were conducted with and without %N as a factor, as it was not certain if this represented nutritional status or was a reflection of water status (see text)  Table S12). The differences between the values predicted by those regression equations and the data from our study might partly be explained by the drip irrigation used in our experiment, as it is more efficient than other types of watering, however methods would have varied in the past too. In addition, the regression equations are based on water input during grain filling, while crop carbon stable isotope values also reflect water input in the earlier part of the season, as discussed above. Part of the discrepancy between the tested regression equations such as Araus et al. (2014) and our data might be caused by at least one of our sites, Khirbet as-Samra, being located in a much drier environment than the regions that the equation was based on. Indeed, for wheat at Khirbet as-Samra, the equations underestimate how much water the crops received. A universal regression equation for water input and Δ 13 C to be applied to all environments therefore seems unfeasible. If we divide our study area into a semiarid region (combining Deir 'Alla and Ramtha) and an arid one (Khirbet as-Samra), this does not improve the situation for barley (r 2 = 0.54 for the semi-arid region mid-dry stages water input and Δ 13 C, compared to r 2 = 0.61 for the pooled sample of all sites together), while for wheat the variance remains substantial for the semi-arid region ( r 2 = 0.17).
Because of the high observed variation in water input for crops with the same Δ 13 C, Wallace et al. (2013) proposed the use of bands of Δ values indicating 'poorly watered crops', 'moderately watered crops' and 'well watered crops'. The proposed cut-off points between these bands vary between different crops, because of their different water requirements; for barley they lie at < 17‰ for poorly watered, 17-18.5‰ for moderately watered and > 18.5‰ for well watered, while for wheat they were established at < 16‰, 16-17‰, and > 17‰ (Wallace et al. 2013). When applying this three-band interpretative system to our data set, it does not work well for discriminating between poorly and moderately watered barley crops, although well watered barley samples stand out clearly (Fig. 4b). In contrast, for wheat there is no clear separation in our study if the data from all environments are combined. However, once the data are disaggregated into semi-arid (Deir 'Alla and Ramtha) and arid environments (Khirbet as-Samra), is it possible to distinguish between three bands for barley and two bands (poorly/moderately and well watered) for wheat, although there is still some overlap between the ranges (Fig. 4c).
Where exactly to place the cut-off points between the bands depends on the criteria used and the number of available observations for the different categories. For example, for barley in the semi-arid region, 93% of well watered plots (n = 15) had Δ 13 C values above 19‰, and 11% of moderately or poorly watered plots (n = 9) had such valuesusing this value would therefore be reasonable. In contrast, 100% of well watered barley plots were above 18.5%, but as many as 33% of the moderately watered plots (n = 9) were as well. However, with a slight increase of the cut-off point to 18.6‰, again only 11% of the moderately watered plots are covered. For wheat, 88% of well watered and only 11% of moderately watered plots had a Δ 13 C value above 18‰, and 89% of the moderately watered fields had values below 17.6‰-as such there is a clear separation of the well watered and moderately watered bands. However, based on our study, it is unclear what values between 17.6‰ and 18‰ would indicate, as these were absent. These might seem small differences, but considering that the total range of Δ 13 C for each crop is only ~ 5‰, with most archaeological values covering an even smaller range, this is a relevant question. To establish cut-off points with more certainty, more samples should be studied. Until clear cut-off points can be calculated, applying a transitional range of lower certainty may be appropriate.
Based on our current data, cut-off points for very wet conditions lie at approximately > 18.5-19‰ for barley and > 17.5-18‰ for wheat in the semi-arid region and > 17.5‰ for barley and > 15‰ for wheat in the arid region. Dry conditions are represented by crop Δ 13 C values below 16.5‰ for barley and below around 15‰ for wheat in the semi-arid region, and below ~ 15‰ for barley and ~ 14‰ for wheat in the arid region. The large majority of samples from the two semi-arid sites are in rough agreement with the bands proposed by Wallace et al. (2013) whose sites were also in semiarid climate zones, even though there is still some overlap between bands, especially of the moderately watered category with the poorly watered band. The cut-off points are similar between the studies, especially for barley, although the cut-off point between moderately and well watered wheat appears to be higher in our study area (> 17.6‰ compared to 17‰). For the arid climate zone, the boundaries need to be adjusted down by 1-3‰ compared to the semi-arid zone (Fig. 4c).
Our data suggest that, while the bands proposed by Wallace et al. (2013) have a general validity for the reconstruction of plant water status, close attention must be paid to the (past) climatic setting of the site under investigation, as different baseline values or boundaries are required for different climatic zones. For the archaeological interpretation of the data, this can be crucial, as in wetter areas even relatively high crop Δ 13 C values may indicate that water availability decreased (Riehl et al. 2014). If this led to a reduction in crop yields, it could have proved very problematic for the communities growing the crops, especially if they practised agriculture to its full capacity. On the other hand, plants with lower Δ 13 C values in arid regions may appear stressed compared to those in wetter climates; however, if the reduction in moisture is small in relative terms, the affected populations may not experience resource stress, especially since they would likely be smaller or only seasonal in the first place.
It could perhaps be argued that the majority of archaeological sites would have been present in semi-arid rather than arid areas, but sites in (past) arid climate zones certainly existed-the town of Khirbet as-Samra itself has a Roman site. For such sites an adjusted baseline would be necessary to interpret isotope values correctly.
Bands for wet versus dry conditions therefore appear to work within the climatic zones for which they were defined. In the climatic zones covered in our study area, wet conditions resulted in significantly higher Δ 13 C values and dry conditions in significantly lower Δ 13 C. However, our study also highlights that for the semi-arid region (from which most samples were available) moderately watered crops show a large (> 3‰) range of values, with especially the distinction between poorly watered and moderately watered wheat very blurred. Values between 15.5‰ and 17‰ for wheat are commonly found in either group; this is the case for some barley values too, but for a much narrower range. In light of this, we suggest to limit the interpretation of Δ 13 C data to high and low values, while especially for wheat, intermediate values should be viewed with caution. Interpretation of the latter can be improved using a multi-proxy approach, such as by using weed functional characteristics in conjunction with stable isotope analysis (Bogaard et al. 2016;Styring et al. 2016Styring et al. , 2017. Since many Δ 13 C values measured from archaeological samples fall in these intermediate ranges (Riehl et al. 2014;Vaiglova et al. 2014;Wallace et al. 2015;Styring et al. 2017), it is very important that this limitation is taken into account.
In addition to using Δ 13 C values, there are two additional clues that carbon stable isotope data of archaeological crop samples can give about past agricultural practices at a site. First, Δ 13 C of different (sub-)species with different water requirements can be compared within the same site, ideally within the same period or even context. Using the fact that when grown under the same conditions, more droughtresistant crops such as barley will have higher Δ 13 C than less drought-tolerant ones, such as various wheats, this may give additional clues to past water management practices (Wallace et al. 2015;Styring et al. 2017). For example, if wheat has a higher Δ 13 C than barley from the same context instead of the expected lower values, it can be deduced that it was less water stressed than barley and therefore that the wheat would have received more water, either artificially by watering, or by growing it in naturally wetter soils.
Secondly, Δ 13 C variability can help assess the magnitude of variation in conditions between different fields near a site (provided it can be reasonably assumed that the crops were cultivated locally). If variation is significantly larger than "normal" intra-plot variation (< 0.5‰ in our study, comparable to the 0.5‰ observed by Wallace et al. 2013), it is likely that the crops were grown under very different conditions to each other (Riehl et al. 2014;Styring et al. 2016). This may have been the case, for example if some crops were irrigated while others were not.

Conclusions and recommendations
In conclusion, the results of experimentally grown crops from three sites in Jordan show that C 3 (wheat and barley) crop Δ 13 C is clearly affected by water availability, but not in a straightforward way, with various, often interlinked environmental factors also having an impact. Rainfall (and watering) patterns, temperature, affecting how quickly the plants develop and therefore how much of their growth season falls within the wet season, and possibly nutrient availability, all have an effect. However, these would be unknown for the period in the past when the crops were growing. Therefore, based on the experimentally grown crops from the three sites in Jordan (but keeping in mind that the number of poorly watered plots should be increased), we make the following recommendations regarding the interpretation of crop Δ 13 C: 1. The use of a general regression equation to quantify water input in absolute terms should be avoided. While our data indicate that these may work in some situations, variation in other cases is very large, and because of the number of "unknowns" when dealing with archaeological sites, it is usually impossible to know which conditions one is dealing with. 2. While the use of bands with baseline values indicating 'wet/well watered' or 'dry/poorly watered' conditions is the interpretive method of choice, boundaries for the bands will vary, not only between different crops but also between climatic zones (arid, semi-arid, temperate etc.). In this study, climatic zones were especially relevant for the interpretation of data obtained from wheat, as it is more sensitive to aridity than barley. This should be further tested for more arid and temperate sites, as most current modern comparative values come from semi-arid regions. 3. The interpretation of bands should focus on extreme 'well-watered' and 'poorly watered' values. According to our data, Δ 13 C values of > 17.6‰ for durum wheat and > 18.5‰ for six-row barley grown in semi-arid regions clearly indicate that the crops had plenty of available water, while values below 16.5‰ for barley indicate water stress in semi-arid regions; a number of intermediate values and especially the distinction between poorly and moderately watered wheat in semiarid regions, however, are more ambiguous. These Δ 13 C values differ between crops; variation between different genotypes may also be present, which needs to be tested further. When Δ 13 C values become available for a larger range of taxa, genotypes, and environments, the cut-off points can be refined further.
Ultimately, while there are limitations in the use of crop Δ 13 C values for assessing past plant water availability, when used correctly and in conjunction with other environmental proxies and other forms of archaeological evidence, the method can be a powerful tool for our understanding of past agricultural practices and water availability. This is essential to start gaining an understanding of past society and the interactions of these societies with their environments.