Introduction

Rice is the food staple of half the world (Meharg and Zhao 2012). Unfortunately, it is also the dominant source to the human diet of the toxicants and carcinogens arsenic (As) (Carey et al. 2019) and cadmium (Cd) (Shi et al. 2020; Zhao et al. 2022). However, the risk posed by As and Cd from rice exposure varies globally as these elements can range by more than two-orders of magnitude in grain (Carey et al. 2019; Shahriar et al. 2020; Shi et al. 2020; Watanabe et al. 1996). Geographically, rice grain As concentrations follow a clear trend of low grain As in the Southern latitudes rising to high concentrations at temperate northern latitudes (Carey et al. 2019; Dai et al. 2022). No global geographic trend is apparent for Cd, with grain Cd concentrations varying greatly within and between different regions (Chen et al. 2018a; Fang et al. 2022; Otero et al. 2020; Qu et al. 2020; Shahriar et al. 2020; Shi et al. 2020; Watanabe et al. 1996; Wen et al. 2020). To understand the geographic risk posed by rice grain with respect to As and Cd, and to develop strategies to mitigate those risks, the reasons for the geographical distribution of As and Cd concentrations in rice grain needs to be understood, which they are not at present (Carey et al. 2019; Shi et al. 2020; Zhao et al. 2022).

One reason why As and Cd may differ in global geographical trends is that there is natural variation in their concentrations in the parent materials that contribute to soil formation (Otero et al. 2020; Qu et al. 2020). As As and Cd are both anthropogenic contaminants, soil pollution at a regional or global scale may impact grain concentrations (Carey et al. 2019; Shi et al. 2020). For example, elevated As in rice grain in Bangladesh and West Bengal, India is caused by As contaminated paddy irrigation water (Bose et al. 2022; Williams et al. 2009). An example for Cd is that base metal mining has contaminated the paddy soils of China and has led to elevated Cd concentrations in rice grain (Chen et al. 2018a,b; Fang et al. 2022; Mu et al. 2019).

When comparing different regions, besides variation in soil As and Cd concentrations, cultivar, climate and management conditions may also impact on As and Cd concentrations in rice grain (Adomako et al. 2009; Chen et al. 2019a, b; Ishikawa et al. 2016; Norton et al. 2009, 2010, 2012, 2013; Shi et al. 2020, Zhao et al. 2018). The chemical and physical properties of paddy soils will alter dependent on regional-specific paddy soil management practices such as low-land flooded or upland rain-fed systems (Mu et al. 2019; Qu et al. 2020; Wen et al. 2020). Also, the macro-elements aluminium (Al), carbon (C), iron (Fe), manganese (Mn), phosphorus (P), silicon (Si) and surphur (S) interact to impact on the soil availability of both As and Cd. These macro-elements are all highly variable in soil and maybe subject to global geographical trends that have consequences for rice grain As and Cd concentrations (Fang et al. 2022).

It also has to be considered when investigating the reason behind the global trends in rice grain As and Cd that they have opposing biogeochemical pathways with respect to soil redox chemistry (Arao et al. 2009; Xu et al. 2008). Redox is dependent on soil chemistry and microbial activity and field management. The dominant form of arsenic in soil solution is inorganic arsenic (iAs) which is the sum of the redox interchangeable species arsenate (AsV) and arsenite (As(III) (Arao et al. 2009; Xu et al. 2008). Arsenate is immobilized in soils by insoluble iron (III) oxide under aerobic conditions. Flooding of soils, i.e., paddy management, leads to anaerobic conditions causing the dissolution of iron (III) and arsenite as soluble iron (II) and arsenite (As(III)) (Arao et al. 2009; Xu et al. 2008). This interconversion between arsenate and arsenite is microbially mediated (Afroz et al. 2019; Zhao et al. 2013a,b). Both arsenate and arsenite can be assimilated by rice roots as analogues of phosphate and silicic acid, respectively (Meharg and Zhao 2012). Contrastingly to As, Cd is mobilized under aerobic conditions were insoluble sulphide (to which Cd has a high affinity) is oxidized to sulphate (Arao et al. 2009; Xu et al. 2008). With respect to root uptake, Cd is a Mn analogue, assimilated across the root plasma membrane via the transport protein Nramp5 (Chang et al. 2020a; Sasaki et al. 2012). Although paddy fields are flooded during most of rice’s growth cycle they are dried out during grain-fill where Cd can be readily accumulated (Arao et al. 2009). This switch from anaerobic to aerobic soil environment during rice’s lifecycle leads to elevation of both Cd and As in rice grain (Arao et al. 2009).

A further complication in considering global trends in rice grain arsenic concentration is that iAs is readily metabolised in soils to methylated forms, primarily dimethylarsinic acid (DMA) (Hossain et al. 2021; Norton et al. 2013; Zhao et al. 2013a). As methylation is conducted via microbial arsC and arsM genes (Afroz et al. 2019; Zhao et al. 2013a,b). Soil redox state also appears to regulate microbial methylation of As as formation of DMA occurs under more reduced conditions (Hossain et al. 2021; Norton et al. 2013; Zhao et al. 2013a). DMA can be assimilated from soil by rice roots and translocated to grain. Both iAs and DMA are present in grain, though iAs dominates (Carey et al 2019). Both iAs and DMA vary in rice grain concentration at a global geographical scale (Carey et al 2019; Zhao et al. 2013a).

Microbial activity and diversity is central to As and Cd dynamics in paddy systems as they drive redox through consumption of organic matter (Falkowski et al. 2008; Norton et al. 2013). Microbes also mediate As speciation (arsenate, arsenite, DMA) (Afroz et al. 2019; Zhao et al. 2013a,b). The size and activity of the microbial biomass is driven by soil chemistry, particularly soil organic matter (SOM) as it acts as a microbial substrate with SOM consumption leading to more reduced and sustained anaerobism (Norton et al. 2013; Hossain et al. 2021). Furthermore, the composition of rhizosphere bacterial communities has been implicated in affecting rice grain As and Cd (Bose et al. 2022; Huang et al. 2021).

The overall aim of this current investigation is to identify soil chemical and microbial factors that explain the observed global patterns in rice grain As and Cd. To achieve this aim, 132 samples of paired soil, straw and grain was collected across a large geographical transect that ranged from latitudes − 12o south to 44° north, through 3 continents (Africa, Asia and Europe). Field-fresh soil was then transported to the laboratory and then incubated for 10 days under flooded conditions. At the end of the experiment soil and soil porewater chemistry was assayed, including redox relevant parameters such as Eh, pH and SOM, and soil DNA extracted and subjected to amplicon sequencing (16S rRNA and ITS) using Illumina Miseq. Microbial abundance in soil was related to soil and plant chemistry through factor analysis dimension reduction and correlation analysis. As such, this is the first systematic global study to identify soil As and soil Cd biogeochemical drivers that lead to higher levels of assimilation of these elements into the rice plant.

Materials and Methods

Field Sampling and Subsequent Sample Processing

Paddy fields from East Africa (A), Sri Lanka (L), Vietnam (V) and Southern Europe (S) were surveyed to collect grain, straw and soil during 2017/19, resulting in a total of 132 matched grain, straw and soil samples from A (33), L (30), V (38), S (31) for analysis, with sampling locations given in SI Fig. 1. All the samples were collected during the summer. At harvest, the centre of each field to be sampled was identified and then a 3 cm diameter core, down to 20 cm, was taken at the base of a rice plant, and at a plant 2 m from the centre to the north, east, south and west, i.e., 5 plants in total, and samples combined to give a bulked sample. Grain and straw from each of the plants at each of these locations was also sampled and combined to give a bulk sample. The soils were stored field wet at 4°C until used in incubation experiments. Straw and grain were air dried for storage until further processing. Air-dried grain was then manually dehusked to obtain the grain (kernel). Sub-samples of rice and shoot, dehusked grain and soil samples were freeze-dried for 24 h using a Christ Alpha 1-4LD Plus freeze dryer. Grain, shoot and soil where powdered using a Retsch PM100 Ball Mill (Retsch, Germany) fitted with zirconium dioxide grinding chamber and balls.

Soil Incubation and Porewater Generation

To 50 ml centrifuge tubes, field-moist soil was added to give the equivalent of 2 g d.wt. (ascertained by drying a subsample of the field-moist soil in advance) and then double-distilled deionized water added to give a 2 cm layer of standing water on top of the soil. The closed tubes were wrapped in aluminium foil and placed in an incubator for 14 days with a 28 °C daytime (16H) and 25 °C night-time (8 h) cycle. On harvesting tubes were first manually shaken and then centrifuged (Sorval Legend RT at 4600 rpm) for 20 min. On termination of centrifugation a subsample of the water layer was decanted into a separate 15 ml centrifuge tube and then immediately frozen at -20 °C until further analysis. On another porewater subsample, -log [hydrogen ion] (pH) and oxidation reduction potential (Eh) of porewaters were measured using a calibrated pH meter, pHenomenal PH 1000 H (VWR International, UK) directly following centrifugation. And for a further subsample for subsequent Total Organic Carbon (TOC) determination, 5 ml per sample was transferred to weighed sterile 15 ml falcon tubes and 50 µl 69% Analar grade HNO3 (Merck, Sigma-Aldrich, USA) was added before analysis using a LOTIX (USA) TOC analyser.

Porewater ICP-MS Analysis

To determine As speciation of porewaters, 0.5 ml of each pore water sample was diluted ten-fold by adding 4.5 ml of 1% HNO3 (v/v), made from 69% HNO3. Then, to 700 µl of the diluted porewater 7 µl of Prolabo Analar Normapur hydrogen peroxide, 30% (Prolabo, UK), was pipetted and mixed thoroughly. Samples were then run with standards prepared from a 100 µg/L DMA stock solution. A Thermo Dionex IC5000 Ion Chromatograph system, fitted with a Dionex IonPac AS7 RFIC analytical column (2 × 250 mm) and a Dionex AG7 guard column, was used to separate As species iAs, DMA, monomethylarsonic acid (MMA), trimethylarsenic oxide (TMAO) and tetramethyarsonium (TETRA). Authentic standards were run for all species. The mobile phase A contained 20 mM ammonium carbonate in deionised water and the mobile phase B contained 200 mM ammonium carbonate in deionised water. The flow rate for the run was 0.3 ml/min using the following gradient programme: 100% mobile phase A when time = 0 min, followed by a linear change to 100% mobile phase B when time = 10 min and finally followed by a linear change to 100% mobile phase A when time = 10.5 min, followed by 2 min equilibration, total analysis time 12.5 min. The ICP-MS (Thermo Scientific iCap Q) operating conditions were: Forward RF power- 1550W; Nebuliser gas flow- ~ 1 L/min, nebuliser sample flow rate- ~ 0.35 ml/min. Helium (He) was used as a collision gas at a flow rate of 4.5 ml/min, and the sole element measured was As at a mass of 75. Total elemental analysis of porewater was conducted on the same sample preparation used for As speciation, but with 5 µl of rhodium (Rh) internal standard (Fluka Analytical, Sigma-Aldrich, USA) added. ICP-MS (Thermo Scientific iCap Q) was calibrated with ‘Multi-Element 2’ (SPEX CLMS-2 Multi-Element Solution 2, matrix: 5% HNO3) and ‘Multi-Element 4’ (SPEX CLMS-4 Multi-Element Solution 4, matrix: water/Tr-HF) standards.

Plant and Soil Analysis

For As speciation of grain and straw, 10 ml of 1% Arsitar HNO3 was added to ~ 100 mg of material, accurately weighed and microwave digested using a CEM Mars 6 1800W (USA). After microwaving the digestate was centrifuged at 3500 rpm for 15 min, and then diluted to 10 ml with 1% HNO3. As speciation was then conducted on the diluted digestate using the same procedures as for porewaters. Three replicates of 100 mg certified reference material (CRM), rice flour NIST SRM 1568b, certified for As speciation, were run with each batch, and a reagent blank, with 40 samples in total per batch.

For total elemental analysis of straw and grain, the same procedures were followed for digestion except that 2 ml of 69% Aristar HNO3 was added in each to digestion tubes, and then left to sit overnight. Then 2 ml of 30% Analar Normapur hydrogen peroxide (Prolabo, UK) was added to each tube and tubes were left open for 15 min before microwave digestion. The same methodology was followed for soil but with soils digested in closed Teflon-lined vessels. On cooling, to each sample 30 µl of Rh internal standard (Fluka Analytical, Sigma-Aldrich, USA) was added. Each tube was then made up to a final volume equivalent to 30 ml with deionised water. Total elemental analysis was then conducted by ICP-MS as per porewaters. Three replicates of 100 mg certified reference material (CRM), namely rice flour NIST SRM 1568b, soil CRM- NCS ZC73007 (CNAC, China) and mixed Polish Herb-INCT-MPH-2 (INCT, Poland), were run with each batch, plus a reagent blank, with 40 samples in total per batch. Soil organic matter (SOM) was analysed by combustion technologies.

Statistical Interpretation of the Chemical Data

To investigate data, linear regression, Kruskal–Wallis one-way ANOVA, Principal Components Analysis (PCA) and Spearman’s Rank correlation analysis, where appropriate, were performed in GraphpadPrism (v.9) or SPSS (V.27). For both the parametric and PCA statistical testing, log10-transformed data were used.

Soil Microbial Diversity Assessment

At the end of the incubation experiment, soil was sampled, snap frozen and stored at -80°C for DNA extraction. DNA was extracted from 132 soil samples and from 1 negative control, using the Powerlyser Power soil DNA extraction kit (Qiagen, UK) following the manufacturer’s protocol. Bead beating was performed with a Precellys Dual (Bertin technologies, France) at 3 × 6000 rpm for 40 s. In between cycles, the tubes were kept on ice for 1 min. DNA quality was assessed on an agarose gel and with NanoDrop 8000 spectrophotometer (Thermo Fisher Scientific, UK). Following this, 16S and ITS amplicon sequencing libraries were generated using the standard Illumina 2-step protocol (Caporaso et al. 2011), with integration of sample-specific barcodes. The target-specific primers 515F/806R (Caporaso et al. 2011) were used for generation of the 16S amplicon sequencing library. The target-specific primers ITS1F/ITS2R (Mello et al. 2011; White et al. 1990) were used for generation of the ITS amplicon sequencing library. Following this, each library was sequenced on one run on the Illumina Miseq, using 300 bp paired end sequencing.

Data processing of the amplicon data were done according to the Qiime2-2019.04 pipeline (Bolyen et al. 2019) as follows: DADA2 (Callahan et al. 2016) was employed for denoising and generation of amplicon sequence variants (ASVs). The blast consensus option was chosen for taxonomic assignment. The 16S ASV sequences were annotated against the silva database (version 138 SSU, Quast et al. 2012). The ITS ASV sequences were annotated against the unite database (version 8, 04/02/2020, Nilsson et al. 2019). Following this, the Qiime2 generated count table and taxonomy file for each dataset were exported for downstream analysis in R version 4.1.1 (www.R-project.org). Analysis in R was conducted as follows: The R package phyloseq version 1.36.0 (McMurdie and Holmes 2013) was used to subset the count table at phylum and genus level and for generation of phylum level relative abundance plots. DESeq2 version 1.32.0 (Love et al. 2014) was used for analysis of the phylum and genus level count tables. The log ratio test (LRT) was employed for identification of significant regional differences and the WALD test for analysis of the pairwise comparisons A versus S, A versus L, A versus V, S versus V, S versus L and L versus V. DESeq2 results were compiled and ASVs with overall basemean < 10 removed. For integration of chemical data (soil porewater, soil, rice straw and rice grain) with soil microbial data (16S and ITS normalised counts at genus and phylum level), Spearman correlation analysis was performed with R package corrplot version 0.84 (Wei and Simko 2021). Results obtained from Spearman correlation analysis were subsequently integrated into each DESeq2 results table. Phylum and genus level annotated ASVs strongly correlating with grain iAs and/or grain Cd were identified (Spearman P value < 0.001). Following this, Spearman correlation plots and corresponding abundance heatmaps were generated with R packages corrplot version 0.84 and pheatmap version 1.0.12, respectively.

Results

Chemical analysis resulted in a reporting of 89 determinants across soil, porewater, shoot and grain. The recoveries of parameters for which CRMs were applicable are presented in SI Table 1 for 17 elements that were reported for the CRM and 12 elements and As species in grain reported for the CRM. Other materials (porewaters and straw) had no CRM. It was decided to report all appropriate elements that had no CRMs as they are interesting interpretatively. This was because even though recovery is unknown they were analysed according to a systematic procedure and, therefore, that relative differences in those elements are appropriate to analyse with the multivariate statistical approaches used. Therefore, retention of all data were thought appropriate so as not to decrease data richness, especially for components where no CRMs are available such as soil pore waters and rice straw, and the fact that multivariate statistics, PCA and Spearman’s rank used relative differences between variables, and all analysis was fully randomized. Only for analytes where CRM recoveries are available are the empirical data reported. Reported recoveries range for 59% for Se in grain, to 116% for Mo in soil. Grain Zn was also relatively low (60%), but besides Se and Zn all other variables were above 70%. As Se and Zn are important micro-nutrients it was decided to report these, with the provisos that must be considered when interpreting the data that recoveries are low for these two elements.

For selected elements and soil characteristics, the medians, 25th percentiles, maximum and minimum for the variables for each region along with the P value from Kruskal–Wallis analysis are reported in SI Table 2. All determinants were significantly different with the exception of SOM, soil Cr and Ni, porewater TMAO, straw Mo, and grain Ni and Zn. Medians and 25th percentiles for soil general characteristics (SOM, pH, Eh, TOC), and As (elemental As, or iAs and DMA, where appropriate), Cd and Zn in soil, porewater, shoot and grain are also shown in Fig. 1. Soil median SOM ranged between 7.2% and 10.4%, but variation within regions meant that there was no significant difference between these regions. Porewater TOC did differ between regions, with EU being the highest (36 mg/kg), more than 3-times higher than the lowest region, Vietnam, despite SOM percentage not being significantly different. The EU and Vietnam also contrasted most in pH and Eh, with these two parameters being 7.8 and -65 mV as compared to 5.6 and 55 mV for the EU and Vietnam, respectively.

Fig. 1
figure 1

Boxplots of soil, shoot and grain characteristics for regions East Africa (EA), Sri Lanka (SL), Vietnam (VN) and Southern Europe (EU). The figures shows results obtained for soil, porewater, rice shoot and rice grain. SOM soil organic carbon, pH − log [H+], Eh oxidation/reduction potential, TOC total organic carbon, As = arsenic, Zn = zinc, Cd cadmium, iAs inorganic arsenic, DMA dimethylarsenic arsenic. Plot line bar is for medians, and whiskers are the 25th percentiles

Soil total As median was highest for Vietnam (8.3 mg/kg) and lowest for Sri Lanka (2.32 mg/kg), yet Vietnam had the lowest porewater iAs and DMA concentrations, with the EU being the highest for both (Fig. 1). Regardless of this low porewater As speciation concentrations, both shoot and grain As concentrations were equivalent to the EU at 0.096 and 0.017 mg/kg for EU grain for iAs and DMA, respectively, circa. an order of magnitude higher than for East Africa, the lowest region for each of these chemicals. Soil Cd was also highest in EU (0.24 mg/kg), followed by Vietnam (0.13 mg/kg), being lowest in East Africa (0.049 mg/kg). However, porewater Cd was much more elevated in Vietnam, by an order of magnitude compared to all other regions, while EU was the lowest. Cd concentrations in porewaters matched well to pattern of the pattern, Vietnam being highest, EU lowest. However, Sri Lankan shoot and grain were the highest in Cd, while Vietnamese grain was intermediate with East African and EU the lowest. With respect to Cd in grain, EU was the lowest by an order of magnitude with a median of 0.001 mg/kg, with all other regions above 0.01 mg/kg, and Sri Lanka ~ 3-times higher than East Africa and Vietnam at 0.0156 mg/kg. Soil Zn concentration medians varied twofold between the lowest in East Africa at 30 mg/kg and the highest in Vietnam at 65 mg/kg. Porewater Zn was tenfold higher in Vietnam compared to all other sites, but these differences between regions reduced greatly between shoots (10.5–28.2 mg/kg from lowest to highest), and all but disappeared in the grain as median concentrations here were very similar (region median 11–13 mg/kg), and being non-significant (P = 0.547).

Given that many of the 87 parameters measured across soil, porewater, shoot and grain will cross-correlate, PCA analysis was conducted to identify the key patterns in the data, focussing on what partitions with grain As species, Cd and Zn (Fig. 2). Factor analysis showed that As species in grain partition with those species in shoot, particularly iAs and DMA, in the bottom lefts quadrant of the PCA. However, the porewater As species are orthogonal to grain and straw. Soil total As is located in the same location as grain and shoot As species. It is notable that soil Al, Fe and Si separate well from grain and shoot As species, and soil total As, on the x-axis, suggesting that soils lower in alumina silicates and Fe are higher in plant available As. Interestingly, soil Pb, Mo, Rb and Zn are also co-located with As on soil, shoots and grain, while Cd is also in the same quadrant, and like the As, Pb, Mo, Rb and Zn, well away from the origin, as also are soil B and S. Porewater As species are associated with high TOC and high pH. Porewater Cd and Zn separate with Eh, and also with soil Zn, and are diagonally opposite to high pH. Grain and straw Cd partition together, but relatively close to the axis. SOM was a poor predictor of soil and grain characteristics, sitting near the origin in the PCA. Soil Ca was diagonally opposed to grain Cd in the factor analysis showing a negative relationship between these two elements. Considering the region plot, all four regions separate well, showing that the geochemistry of each region is distinct, with East Africa and Sri Lanka being the most similar.

Fig. 2
figure 2

PCA analysis of soil and plant chemical characteristics, showing both the factor analysis and individual site regression plots. The factor analysis plot shows elements including Tot.As (total arsenic) cadmium (Cd) and arsenic species (DMA, MMA, TMAO, Tetra), as well as a range of other elements (standard chemical abbreviations), soil pH, soil Eh, TOC total organic carbon, and sample location (altitude, longitude, latitude). Elements (Cd and Zn) and arsenic species iAs, MMA, DMA, TMAO and TETRA) are shown in a larger font size. Zinc is further highlighted by all fills (soil, porewater, straw and grain) have a solid black fill and arsenic species by solid white fill. All other chemicals have a solid fill with no contrasting edge colour

To simplify down the number of parameters, those with a factor analysis score of ± 0.5 in either x or y direction were selected to conduct Spearman’s Rank analysis (Fig. 3). Also, as As species cluster in the factor analysis (Fig. 2), total grain As was used, again to simplify the number of parameters for visual interpretative reasons. The parameters were organised in blocks that correlated either positively or negatively with each other, and it was observed that grain As and grain Cd were associated with completely different geochemical drivers. Grain As was positively correlated with As in soil, porewater and straw; soil B, Mg, S, Ca, Zn, Sr, Mo and Pb; porewater TOC, B and P; straw Cr and Sr; and grain B, P, Cr, Mn and Sr. Most of these parameters were positive correlated with pH, although, notably grain As had a slight negative correlation with pH. Most of these parameters were also, generally, positively correlated with each other with the exception of porewaters which could be weakly or negatively correlated with soil, straw and grain parameters, but all porewater parameters were all strongly correlated with each other. Grain As contrasted with grain Cd which was positively correlated with porewater and straw Cd; soil Al, Si, Fe and Se, porewater Zn; straw Zn, and grain Co and Rb. Grain Cd and related positively correlated parameters, generally had a positive correlation with Eh. Notably, grain Cd was negatively correlated with soil Ca. Straw Ni and grain Co had more complex correlations, being both positive and negative in a non-systematic way as compared to the two clear blocks just described for parameters that positively correlate with As and Cd, respectively. As the factors that correlate with grain As and grain Cd contrast so sharply in the positive and negative correlations, some general statements can be made, namely that soils with high grain As are from soils lower in pH when flooded and higher in B, Mn, S, Ca, Mn, Zn, Sr, Mo, Pb, and, surprisingly, Cd. Conversely, Cd is elevated in grain when soils that have a high Eh when flooded and are high in Al, Si, Fe and Se. As noted, grain As had a positive correlation with soil As and soil Cd, while grain Cd has a negative association with soil Cd.

Fig. 3
figure 3

Spearman’s rank analysis of variables (soil and plant chemical characteristics) from the PCA factor analysis (Fig. 2) that had a factor analysis score of ± 0.5 in either x or y direction. SO soil, PW porewater, ST straw, GR grain

Soil As concentrations range from < 1 mg/kg to ~ 100 mg/kg. Log transformed data were used to normalize the regressions of soil related data, and the findings are presented in SI Fig. 2. Relationships between soil As and grain iAs (P < 0.0001, R2 0.211) and DMA (P < 0.0001, R2 0.281) where significant. The slope of the grain iAs to soil relationship, 0.203, was steeper than for DMA, 0.143. As there are a large number of samples and the number of parameters analysed it is easy to achieve high significance in correlations with a relatively low R2. Also, the reporting on all potential correlations, or even a substantial subset, will lead to Type I statistical errors. Therefore, considerations of regression equations will, initially be restricted to grain As and Cd against soil As and Cd (SI Fig. 2). Grain As and Cd both correlate with soil Cd (P < 0.001 in both cases, R2 0.237 for As and 0.170 for Cd), but with positive and negative slopes, respectively. However, only grain As significantly correlates with soil As (P = 0.028), where the grain Cd regression for soil As was non-significant (P = 0.112).

Qiime2 analysis resulted in 83,126 representative sequences for 16S and 33,519 representative sequences for ITS. For 16S ASVs, the most abundant phyla identified were Proteobacteria, Chloroflexi, Actinobacteria, Acidobacteria, Firmicutes amongst others (Fig. 4a). For ITS ASVs, the most abundant phyla identified were Ascomycota, Basidiomycota, Mortierellomycota, Chytridiomycota (Fig. 4b).

Fig. 4
figure 4

Phylum level relative abundance plot of A 16S ASVs and B ITS ASVs showing the median relative abundance of the most abundant organisms (relative abundance > 0/5%) in samples within each region. East Africa (n = 33), Sri Lanka (n = 30), Southern Europe (n = 31), Vietnam (n = 38). Organisms with an abundance < 0.5% are summed up under < 0.5%

After normalisation, and filtering for basemean > 10, there were 38 phylum and 362 genus level 16S ASVs and 11 phylum and 88 genus level ITS ASVs retained. Of these, 33 phylum and 276 genus level 16S ASVs and 8 phylum and 62 genus level ITS ASVs showed significantly different abundance between regions (DESeq2, LRT test, adjusted P value < 0.01). Spearman correlation analysis of the normalised counts of each ASV against chemical porewater, soil, grain and shoot data allowed identification of microbial groups associated with accumulation of iAs and/or Cd in rice grain. The full results tables for 16S and ITS phylum and genus level ASVs (DESeq2 LRT test result, DESeq2 WALD test results, Spearman correlation P and R value, DESeq2 normalised counts) are in additional file 2: SI Table 3 shows phylum and SI Table 4 genus level results for 16S ASVs. SI Table 5 shows phylum and SI Table 6 genus level results for ITS ASVs. SI Table 7 shows the names of the publicly available fastq files for each sample.

For correlation of phylum level ASVs with grain iAs and/or grain Cd, using a Spearman correlation P value of < 0.001 as cut-off for significance, the following result was obtained. For 16S phylum level ASVs, the abundance of 19 (11 positively, 8 negatively) was significantly correlated with grain iAs (Fig. 5a) and the abundance of 9 (3 positively, 6 negatively) with grain Cd (Fig. 6a). For ITS phylum level ASVs, the abundance of 1 was significantly negatively correlated with grain iAs (Fig. 5a) and of 2 was significantly positively correlated with grain Cd (Fig. 6a). The corresponding abundance heatmaps for phylum level 16S and ITS ASVs are shown in Fig. 7a and 8a.

Fig. 5
figure 5

Correlation Heatmap for 16S and ITS ASVs most highly correlated with grain iAs. Plot A and B: Spearman correlation plot of phylum (A) and Genus level annotated ASVs (B) with TOC, pH, Eh and elements (Mn, Co, Ni, Cu, Zn, Se, Rb, Cd, As species) in PW (porewater), Soil, Grain and Straw. The scale ranges from dark pink (Spearman correlation = 1 (positive correlation) to black = Spearman correlation -1 (negative correlation), white = no correlation. Annotation of the organisms in A: Kingdom_Phylum. Annotation of organisms in B: Kingdom_Phylum_Family_Genus

Fig. 6
figure 6

Correlation Heatmap for 16S and ITS ASVs most highly correlated with grain Cd. Plot A and B: Spearman correlation plot of phylum (A) and Genus level annotated ASVs (B) with TOC, pH, Eh and elements (Mn, Co, Ni, Cu, Zn, Se, Rb, Cd, As species) in PW (porewater), Soil, Grain and Straw. The scale ranges from dark pink (Spearman correlation = 1 (positive correlation) to black = Spearman correlation -1 (negative correlation), white = no correlation. Annotation of the organisms in A: Kingdom_Phylum. Annotation of organisms in B: Kingdom_Phylum_Family_Genus

Fig. 7
figure 7

Abundance Heatmap for 16S and ITS ASVs most highly correlated with grain iAs. Plot A and B: abundance heatmaps showing the abundance of the Phylum level (A) and Genus level annotated ASVs (B) in each sample within each region. Annotation of each sample is according to region A_ = East Africa (n = 33), L_ = Sri Lanka (n = 30), S_ = Southern Europe (n = 31), V_ = Vietnam (n = 38). The scale shows row-centred means ranging from bright yellow = 4 (higher abundance) to dark blue = -4 (lower abundance). Annotation of the organisms in A: Kingdom_Phylum. Annotation of organisms in B: Kingdom_Phylum_Family_Genus

Fig. 8
figure 8

Abundance Heatmap for 16S and ITS ASVs most highly correlated with grain Cd. Plot A and B: abundance heatmaps showing the abundance of the Phylum level (A) and Genus level annotated ASVs (B) in each sample within each region. Annotation of each sample is according to region A_ = East Africa (n = 33), L_ = Sri Lanka (n = 30), S_ = Southern Europe (n = 31), V_ = Vietnam (n = 38). The scale shows row-centred means ranging from bright yellow = 4 (higher abundance) to dark blue = -4 (lower abundance). Annotation of the organisms in A: Kingdom_Phylum. Annotation of organisms in B: Kingdom_Phylum_Family_Genus

For correlation of genus level ASVs with grain iAs, using a Spearman correlation P value of < 0.001 as cut-off for significance, the following result was obtained. For 16S genus level ASVs the abundance of 91 was positively and of 27 negatively correlated with grain iAs. Figure 5b shows the subset of these that are annotated at genus level (33 and 16, respectively). For ITS genus level ASVs the abundance of 13 was positively and of 9 negatively correlated with grain iAs. Figure 5b shows the subset of these that are annotated at genus level (3 and 7, respectively). The corresponding abundance heatmap is shown in Fig. 7b. This heatmap illustrates that organisms with strong positive correlation with grain iAs have significantly lower abundance in East African soils, in particular when compared to Southern European and/or Vietnamese soils. At genus level these included S-oxidising bacteria (Thiobacillus, Sulfurifustis), sulphate reducing bacteria (Desulfobacca, Desulfatiglans, Desulfomonile, Syntrophobacter), methanogenic archaea (Methanosarcina, Methanobacterium, Methanosaeta) and Colletotrichum (Kingdom Fungi) amongst others (Fig. 5b and 7b). Further to that, correlations observed between microbial ASVs and grain iAs at both phylum and genus level matched across to grain DMA and rice straw iAs, MMA, DMA, TMAO, but were inverted for grain TMAO. The correlations between microbial ASVs and grain iAs at both phylum and genus level also matched across to soil iAs, but not to soil porewater iAs or Eh. In contrast to this, organisms that showed strong negative correlation with grain iAs showed significantly higher abundance in East African soils in particular when compared to Southern European and/or Vietnamese soils. At genus level these included Noviherbaspirillum, Azohydromonas, Solirubrobacter, Geodermatophilus, Ammoniphilus, Bacillus, Azospirillum, Microvirga, Massilia, Micromonospora (kingdom bacteria) and Rhizophlyctis, Curvularia, Scytalidium, Aspergillus, Sporisorium (kingdom fungi) amongst others (Fig. 5b and 7b). Negative correlations observed between microbial ASVs and grain iAs at both phylum and genus level matched across to grain DMA as well as rice straw iAs, MMA, DMA, TMAO, and mirrored soil iAs, but not soil porewater iAs or Eh.

For correlation of genus level ASVs with grain Cd, using a Spearman correlation P value of < 0.001 as cut-off for significance, the following result was obtained. For 16S genus level ASVs, 38 were positively and 109 negatively correlated with grain Cd. Figure 6b shows the subset of these that are annotated at genus level (14 and 45, respectively). For ITS genus level ASVs, 18 were positively and 5 negatively correlated with grain Cd. Figure 6b shows the subset of these that are annotated at genus level (9 and 1, respectively). The corresponding abundance heatmap is shown in Fig. 8b. Organisms that were positively correlated with grain Cd included significant numbers of bacteria (37) and fungi (18), but only 1 archaea and showed lowest abundance in Southern European soils (where lowest grain Cd was recorded) and highest abundance in Vietnamese and Sri Lankan soils (where highest grain Cd was recorded). At genus level these included Udaeobacter, Conexibacter, Terrabacter, Singulispheara, Koribacter, Jatrophihabitans, Cupriavidus (Kingdom bacteria, aerobic) and Scolecobasidium, Talaromyces, Lipomyces, Ganoderma, Gymnopilus, Entrophospora, Curvularia, Trichoderma, Tricholoma (Kingdom fungi) amongst others (Figs. 6b and 8b). These organisms were also positively correlated with straw Cd as well as soil Eh, but not with soil Cd. Abundance of these organisms was also positively corelated with soil Se, straw Zn, straw Rb and grain Rb. Organisms that were negatively correlated with grain Cd on the other hand included mainly bacteria (106), but comparatively few fungi (5) and archaea (3). These generally showed highest abundance in Southern European soils and lowest abundance in Vietnamese and Sri Lankan soils (Figs. 6b and 8b). They were also negatively correlated with soil Eh and straw Cd, soil Se, straw Zn, straw Rb and grain Rb, but not with soil Cd.

Discussion

Here is presented the first systematic global study to investigate the soil biogeochemical factors in regulating As and Cd into rice grain. A key strength of this present study was that the varied nature of the soils used assured that only the strongest associations would return as significant. Summarizing what was found from this investigation. A major finding of this study was that globally, soil Cd and soil As were closely associated/correlated, suggesting a common origin. The second major finding was that while accumulation of As into rice shoot and rice grain was strongly linked to the concentration of As in soil, this was not the case for Cd. The third major finding was that soil Eh showed positive correlation with grain Cd, but grain As showed no significant negative correlation with soil Eh. The fourth major finding was that porewater As and Cd showed no significant correlation as regards their grain concentrations. The fifth major finding was that there was a significant interdependence with respect to translocation of some elements from soil to grain and with distinct groups of soil microbes. The final major finding was that distinct fungal, bacterial and archaeal communities, indicative of the predominant soil properties in each global region were observed, which correlated with high or low As in rice grain and/or high and low Cd in rice grain.

The finding at a global scale that shoot and grain As were highly related to soil As confirms previous studies on a more localized (Lu et al. 2009), and at a multi-region-scale (Adomako et al. 2009). However, the relationship between total soil As and grain As is less clear when considering upland (aerobic) and lowland (anaerobic) rice (Tran et al. 2020; Signes-Pastor et al. 2016). Also, a Chinese wide-scale survey found that grain As correlated better with calcium chloride extractable As than with total soil As (Mu et al. 2019).

It was found in this current study that shoot and grain As were associated with soils high in boron (B), Cd, molybdenum (Mo), P, lead (Pb), rubidium (Rb), S, and zinc (Zn). This may suggest an association with anthropogenic pollution (i.e., elevated Pb, Cd ad Zn), but could also be due to certain soils being associated with mineralized zones or limestones contributing to the original parent rock composition. Anthropogenic pollution might explain the observed strong association between soil As and soil Cd globally seen here. Such correlation between soil As and Cd was also identified in China, where mineral mining and industrial activities have been linked to co-pollution of paddy soils with these elements (Wang et al. 2023; Li et al. 2019). Furthermore, herbicides and insecticides can contain As (Bjørklund et al. 2020), phosphorous fertilizer Cd (Loganathan et al. 2008; Zhao et al. 2015), and some manures As (Zhao et al. 2015). Repeated application of these soil ammendments could therefore also contribute to accumulation of As and Cd in paddy soil (Zhao et al. 2015).

It is well known that As is mobilised under low Eh anaerobic conditions and that Cd is mobilized under low pH and high Eh aerobic conditions. Arao et al. (2009) found that draining the soil during grain-development (higher Eh) decreased As in rice grain, while keeping the soils flooded (lower Eh) enhanced grain As, while the reverse was true for Cd. Experiments where soil pH and Eh was artificially buffered over a range of rice relevant values (Reddy and Patrick 1977) also showed a positive linear association of Cd assimilation into the rice plant from low Eh (− 200 mV) to high Eh (400 mV), but assimilation of Cd jumped 2.5-fold in the range between − 100 to 0 mV, and was greatest at pH 5. While these authors only measured export of Cd to the shoot in their experiments, it can be assumed that the same holds for translocation from shoot to grain. At least this could be inferred based on our results, which globally show a strong correlation between Cd in rice shoot and grain. The observed dissociation between soil Cd and grain Cd, and a negative correlation between soil pH and grain Cd, in the current study is also in agreement with a survey of Chinese paddy soils within a karst region, which are characterised by limestone. In this karst region study it was found that even where soil Cd concentrations were high, grain concentrations were low (Wen et al. 2020). This was attributed to low bioavailability of Cd due to the high soil pH in these limestone areas (Wen et al. 2020). Lime addition to soils reduces grain Cd, further establishing a link between uptake of this element and pH (Chen et al. 2018b). Yang et al. (2022), in another Chinese study, found that soil pH was the strongest predictor of grain Cd, but only weakly so.

Marin et al. (1993), investigated rice assimilation of As in plants grown in soil solution as well as in soils with natural As (not dosed) and soils dosed with monomethylarsonic acid (MMAA), a herbicide relevant to rice culture in the USA. For rice plants grown in soil suspensions, this study showed that a decrease in pH led to higher dissolved As concentrations. For rice grown in soil microcosms, it showed a considerable drop in movement of As into porewaters from Eh − 200 to 400 mV, at all pH’s tested, both for dosed and non-dosed soils. Hence bioavailability of As to the rice plant was shown to increase with decreasing soil pH and decreasing soil Eh, with higher levels of assimilation observed at lower pH of 5.5 as opposed to 6.5 and 7.7. However, reported here for grain As there was a strong positive correlation with soil As, not with Eh/pH. Soil redox appears to be more relevant to accumulation of As in rice grain when looking at different soil management strategies within one soil or similar soils, as in Arao et al. (2009).

Porewater As and porewater Cd showed no significant correlation with grain As and grain Cd. This is counter intuitive, as elements have to dissolve from soil into porewater before they can be taken up into the plant. However, porewater concentrations of As species are dynamic and evolve over time (Arao et al. 2009), and plants may integrate their growth cycle exposure and export to grain during grain-fill (Carey et al. 2011). Furthermore, As species activity in porewater relates to buffering capacity of the solid phase, termed resupply, and this is known to vary greatly for iAs and DMA in soils (Williams et al. 2011). This current study suggests that globally, the bulk of soil As in each soil is available for assimilation into rice grain, but that this is not the case for soil Cd. Instead, most of the soil Cd must be relatively inaccessible to the rice plant, and soils with highly variable levels of soil Cd may have similar amounts of root bioaccessible soil Cd.

An interdependence with respect to translocation of some elements was found here. Grain As showed positive correlation with grain Mn. The reverse was true for grain Cd, which instead showed positive correlation with grain Rb and with the abundance of a distinct set of soil microbes correlated in each case. Enhanced Rb uptake in plants is associated with acidic soils where K, of which Rb acts as an analogue, is usually depleted through leaching (Drobner and Tyler 1998). The interdependence of elements with respect to grain concentrations was also illustrated by a study by Chen et al. (2020a, b) where a mutation in a Cd mutant (OsCADT1) that accumulated more Cd in root but not in grain or shoot, was elevated in grain S and Se. Another example is the OsNRAMP1 gene that is jointly responsible for Cd and Mn accumulation and translocation to grain (Chang et al. 2020b) and the OsNRAMP5 gene, which is another influx transporter for Cd and Mn (Chang et al. 2020a). Cd is also intimately linked to soil S, with S complexation immobilizing Cd under reducing conditions (Hashimoto et al. 2016). Grain Se was positively correlated with grain Cd in a wider global survey of polished rice, while grain Mn and S were negatively correlated (Meharg et al. 2022), again suggesting interplay between grain Cd and the grain assimilation of these other elements. This is interesting, as application of Se is seen as a potential route to reduce Cd accumulation in rice root, straw and grain, and this was verified in pot experiments where a significant decrease of Cd in rice root, straw and grain was observed in response to application of Se to soil in form of selenite or selenate (Huang et al. 2018).

Distinct fungal, bacterial and archaeal communities, indicative of the predominant soil properties in each global region were observed, which correlated with high or low As in rice grain and/or high and low Cd in rice grain. Vietnamese and Southern European soils exhibited higher abundance of S-oxidising bacteria, sulphate reducing bacteria and methanogenic archaea. The higher abundance of these organisms in these soils was strongly correlated with higher soil As, as well as As in rice straw (iAs, DMA, MMA, TMAO) and in rice grain (iAs, DMA) and Mn in rice grain. This is interesting, as Fe- Mn (hydr)oxides form Fe plaque around roots, which sequester As in paddy soil (Tian et al. 2023). Soil microbes facilitate transformation of As (iAs III, iAs V, DMA, MMA) and formation and dissolution of Fe plaque, which sequesters As and consequently drive the availability not only of As (Jia et al. 2014), but also Mn to rice plants. There is also a link of As and S biogeochemical cycles, and the role of S-oxidising bacteria in release of arsenate is well established (Fisher et al. 2008). When in presence of sulphides, S-oxidising bacteria have been shown to transform arsenite to arsenate, and can also oxidise thioarsenates to produce arsenate and sulphate (Fisher et al. 2008) as well as grow on sulphide released from FeS under reducing conditions (Zecchin et al. 2019). Sulphate reducing bacteria on the other hand promote the formation of thioarsenates (Zecchin et al. 2019; Fisher et al. 2008) and have also been shown to drive methylation of As (Chen et al. 2019a, b), while methanogenic archaea have been implicated in demethylation of As (Chen et al. 2019a, b). Increase in the abundance of soil fungi and some aerobic bacteria was positively correlated with grain Cd, as well as soil Eh, soil Se, grain Rb and grain Zn. These organisms showed lowest abundance in Southern European soils, where grain Cd was lowest. Soil organic matter can decrease availability of Cd in soil (Khan et al. 2017). Microbial decomposition of organic matter under aerobic conditions is around 10 times faster when compared to anaerobic conditions (Kristensen 1995). This means that organic matter turnover is lower in soils with lower Eh and indeed rice paddy soils therefore accumulate more carbon when compared to non-flooded arable cropping systems and orchards (Wu 2010). In the current study, the observed higher levels of Cd in rice grain in soils with higher Eh and lower pH, could therefore be due to a higher rate of microbially mediated release of Cd bound to organic matter. Higher levels of soil Se have previously been linked to lower levels of leaching (Kang et al. 1991) and this implies that higher levels of soil Se would be expected in soils with higher Eh, which would explain the observed shift in soil microbes. The positive correlation of these organisms with grain Rb is most likely due to the reported increase in Rb uptake by plants in acidic soils (Drobner and Tyler 1998). The positive correlation of these organisms with grain Zn is most likely due to the fact that Cd is translocated into the rice plant via Zn transporters, as these have high affinity for both Zn and Cd (Zhang et al. 2022). The observed increase in abundance of a fraction of aerobic microbes could therefore be highly relevant to the biogeochemical cycling of Cd in paddy soil. It should be noted that heatmaps also show within region differences, indicative of sub-regional-specific microbial communities driven by sub-regional differences in soil geochemistry in Southern Europe (Signes-Pastor et al. 2016), Vietnam (Tran et al. 2020) and Sri Lanka (Perera et al. 2022). However, confirmation of any subregion effects on microbial community structure is beyond this study, as would require higher numbers of replicates within each subregion. It is likely that subregion effects, are linked to altitude and differences in management practices, but this is subject to further investigation.

However, it must be recognised for all these studies that try to relate microbial taxonomic diversity to grain Cd and As assimilation that relationships between plant and soil As variables, or any other variable, and microbial diversity may be autocorrelative. For an example pertinent to paddy soils, there were considerable shifts in 16S rRNA diversity within a single soil across a redox gradient (Cai et al. 2022). Redox was also implicated in the decrease in arsM DNA copy number in rice rhizospheres due to radial oxygen loss from roots as compared to bulk soil, and also associated with diversity shifts when planted and unplanted treatments were compared (Afroz et al. 2019). Hence the trends observed in the current study are very interesting and provide microbial targets for the study of paddy biogeochemical cycling, but are not conclusive.

In conclusion, there are ultimately no soil chemical or microbial predictors measured here that have by themselves a definitive role in grain As and grain Cd. It is more that there is association of soil properties that can be used to describe a soil that may result in lower or higher grain As and/or Cd. High grain Cd and low As concentrations are associated with soils high in aluminium, iron and silicon, i.e., clay rich soils. Also, high grain Cd is associated with higher soil Eh and lower soil pH. Soil As was the strongest predictor of grain As, but soil Cd was a poor predictor of grain Cd. The microbial community is itself shaped by the overall characteristics of that soil and the observed correlation between abundance of certain microbial groups with grain As (iAs, DMA) and grain Cd may simply be autocorrelative. However, within the context of grain As and grain Cd and comparisons of these on a global scale, the role of S-oxidising bacteria, sulphate reducing bacteria and methanogenic archaea in As and fungi and aerobic bacteria in Cd biogeochemical cycles in paddy soil is worth further investigation.