Recognition of trace element hyperaccumulation based on empirical datasets derived from XRF scanning of herbarium specimens

Hyperaccumulation is generally defined as plants exhibiting concentrations of metal(loid)s in their shoots at least an order of magnitude higher than that found in ‘normal’ plants, but this notional threshold appears to have limited statistical underpinning. The advent of massive (handheld) X-ray fluorescence datasets of herbarium specimens makes it increasingly important to accurately define threshold criteria for recognising hyperaccumulation of metal(loid)s such as manganese, cobalt, nickel, zinc, arsenic, selenium, and rare earth elements. We use an extensive dataset of X-ray fluorescence elemental data of ~ 27,000 herbarium specimens together with Inductively Coupled Plasma Atomic Emission Spectroscopy (ICP-AES) elemental data of 1710 specimens to corroborate threshold values for hyperaccumulator plants. The distribution of elemental data was treated as a Gaussian mixture model due to subpopulations within the dataset and sub-populations were clustered in ‘normal’ and ‘hyperaccumulator’ classes. The historical hyperaccumulator thresholds were compared to the concentrations corresponding to the value for which the cumulative distribution function of the Gaussian model of the hyperaccumulator class reaches a probability of 99%. Our analysis of X-ray fluorescence data indicates that the historical thresholds for manganese (10,000 µg g−1), cobalt (300 µg g−1), nickel (1000 µg g−1), zinc (3000 µg g−1), arsenic (1000 µg g−1), and selenium (100 µg g−1) are substantially higher than then the concentrations required to have a 99% probability of falling in the hyperaccumulator class at 1210 µg g−1 for manganese, 32 µg g−1 for cobalt, 280 µg g−1 for nickel, 181 µg g−1 for zinc, 8 µg g−1 for arsenic, and 10 µg g−1 for selenium. All of the historical hyperaccumulation thresholds exceed the mean concentration of the hyperaccumulator populations and fall in the far-right tail of the models. The historical thresholds for manganese, cobalt, nickel, zinc, arsenic, and selenium are considerably higher than necessary to identify hyperaccumulators. Our findings provide a more precise understanding of the statistical underpinnings of the phenomenon of hyperaccumulation, which will ensure consistency in reporting on these plants.


Introduction
Plant communities in the landscape directly relate to underlaying environmental conditions, such as the chemical features of the soil (Küchler 1988).Plants take up both essential nutrients from the soil, but also potentially non-essential toxic elements (Page et al. 2006).The concentrations of these elements can vary significantly, with high concentrations of non-essential elements requiring plant species to evolve specific adaptations to survive or even thrive in toxic soils such as metalliferous soils including those derived from ultramafic bedrock (Baker and Brooks 1989;Baker et al. 2010).Plants exhibit three distinct modes based on their shoot metal(loid) accumulation in response to soil metal(loid)s bioavailability, namely excluders, bioindicators, and (hyper)accumulators (Pollard 2002;Krämer 2010;van der Ent et al. 2013).The most common adaptation to deal with toxic soils is the excluder response, in which plants exhibit restricted uptake of metal(loid)s, but once physiological mechanisms cannot cope unregulated uptake leads to death of the plant (Baker 1981).The less common adaption is the indicator type, and compared to excluders, indicators have limited or controlled uptake until phytotoxicity occurs.The last and rarest type of adaptation is (hyper)accumulation, a response in which metal(loids) are actively taken up and concentrated in the above-ground shoot without any symptoms of toxicity (Baker 1981(Baker , 1988)).
Hyperaccumulators are of special interest because these plants can be used to remediate contaminated soils (Chaney et al. 1997).In the 1970s, soil contamination had become an important problem across Europe and the USA (Chaney et al. 2018), and hyperaccumulators were developed as a tool to remediate soils in the process of phytoextraction (Chaney et al. 1997).Since then, hyperaccumulators, especially those for (Ni), have been put to use as "metal crops" in phytomining, and field trials have been conducted in Albania (van der Ent et al. 2016) and Malaysia (Nkrumah et al. 2019).Hyperaccumulators can also be used to indicate metalrich soils, as illustrated by a recent study in which the locations of herbarium specimens with high yttrium (Y) correlated with known geology with anomalous Y occurrences (van der Ent et al. 2023).Given the potential uses of hyperaccumulator plants, extensive attempts, from field surveys to systematic herbarium X-ray Fluorescence (XRF) scanning, have been performed to discover more suitable species for phytoextraction and phytomining (Chaney et al. 2018).
The term 'hyperaccumulator' was first coined for the tree Pycnandra acuminata from New Caledonia which has extraordinarily high Ni in its latex (Jaffré et al. 1976).The term was then used to define Ni hyperaccumulators (1000 µg g −1 dry weight) and subsequently extended to other elements with threshold values set to 10,000 µg g −1 for manganese (Mn), 300 µg g −1 dry weight for cobalt (Co) and copper (Cu), 3000 µg g −1 for zinc (Zn), 1000 µg g −1 dry weight for the sum of rare earth elements (REEs) and arsenic (As), and > 100 µg g −1 dry weight for selenium (Se), cadmium (Cd) and thallium (Tl) (Baker and Brooks, 1989;Reeves 2003;van der Ent et al. 2013).Tentative hyperaccumulation thresholds have also been proposed for other elements such as barium (Ba) at 1000 µg g −1 dry weight, strontium (Sr) at 3000 µg g −1 dry weight, tin (Sn) at 300 µg g −1 dry weight, boron (B) at 3000 µg g −1 dry weight, and antimony (Sb) at 1000 µg g −1 dry weight (van der Ent et al. 2021).Typically, these values are not derived from statistical analysis, but are rather proposed as approximate values 2-3 orders of magnitude higher than in leaves from "normal" plants on "normal" soils or at least one order of magnitude greater than in leaves from other plants growing on the same type of metalliferous soils (van der Ent et al. 2013).Given the difficulty in defining a truly "normal" specimen for either plants or soils, a more rigorous assessment of these thresholds is timely and needed.
Historically, such statistical analyses have been difficult to perform due to the sheer rarity of specimens displaying hyperaccumulation, which has led to different interpretations of the phenomenon and overly strict adherence to precise thresholds for recognition of hyperaccumulation (van der Ent et al. 2021).For example, the criterion for Ni hyperaccumulation was historically set at 10-1000-fold average Ni concentration in plant leaves on the basis of a bimodal distribution observed in previous studies (Brooks and Radford 1978).However, such bimodal distributions have been found in phylogenetically restricted or edaphically limited datasets, such as in the genus Alyssum in the Brassicaceae, or in plants from ultramafic soils, but a lognormal distribution has also been observed in a tropical ultramafic dataset (Brooks and Radford 1978;Reeves 1992;Pollard et al. 2002;Reeves et al. 2017).A study that analysed a wide range of different elements in plants from Sabah (Malaysia) found that Ni concentrations in plants occurring on ultramafic soils follow a distinct bimodal frequency distribution, and the two groups are centred at 250-850 µg g −1 Ni (van der Ent et al. 2020).This bimodal frequency suggests two groups of plants with the group centred at 250-850 µg g −1 Ni having evolved adapations in their uptake and translocation pathways (Merlot et al. 2014).Specimens falling under the first group centred at 250 µg g −1 Ni are 'normal' plants (i.e., non-hyperaccumulators) in which Ni concentrations are typically < 100 µg g −1 Ni (Dalcorso et al. 2014).Meanwhile, hyperaccumulators fall within the right tail of the second group centred at 850 µg g −1 Ni (van der Ent et al. 2020).
This distinction has broader implications.Bimodal frequency distributions are suggestive of a qualitatively and quantitatively different group of plants with a distinct physiology (Pollard et al. 2002), and the tail of a continuous distribution would suggest that hyperaccumulation is an extension of normal physiological processes.By comparison, the frequency distributions of essential elements typically have log-normal distributions, with concentrations of micronutrients (e.g., Cu, Ni, Zn) controlled over a narrow optimum range, although Mn has a comparatively wide range in many plants (van der Ent et al. 2020).
The use of handheld XRF instrumentation to obtain elemental concentration data from herbarium specimens, combined with the development of a universal data analysis pipeline to process herbarium XRF data (Purwadi et al. 2022), is enabling the acquisition and processing of enormous datasets, increasingly sufficient to approach fundamental questions regarding the incidence of trace element hyperaccumulation (McCartha et al. 2019;van der Ent et al. 2019a;Do et al. 2020;Gei et al. 2020;Abubakari et al. 2021a, b, c;Belloeil et al. 2021).Standardisation in setting hyperaccumulation threshold values is essential to avert ambiguity in scientific reporting and identify genuine hyperaccumulator plants (van der Ent et al. 2013).
Here, we use a large dataset of XRF elemental data of ~ 27,000 herbarium specimens, complemented with smaller dataset of Inductively Coupled Plasma Atomic Emission Spectroscopy (ICP-AES) data to re-evaluate threshold values for recognising hyperaccumulation.

Materials and methods
Handheld XRF data acquired during previous studies, as well as unpublished data, were combined to obtain a large dataset for statistical analysis (van der Ent et al. 2019b;Gei et al. 2020;Abubakari et al. 2021a, b, c).This led to a total of 26,942 measurements consisting of the following: 5981 measurements covering five families, 21 genera, and 1245 species from Malaysia (van der Ent et al. 2019b, 10,062 measurements covering 96 families, 281 genera, and 1484 species from New Caledonia (Gei et al. 2020), 2779 measurements covering 7 families, 449 genera, and 559 species from Papua New Guinea (Do et al. 2020, 6970 measurements covering seven families, 73 genera, and 266 species from Australia (Abubakari et al. 2021a, b, c), and 1150 measurements covering 38 families, 135 genera, and 251 species from Australia (unpublished).All the data were acquired with the same handheld XRF instrument (Thermo Niton Xl3t 950) and measurement protocol (30 s measurement time per specimen in 'Soils Mode' on top of the same titanium backing plate).
The data were processed using a universal pipeline in the GeoPIXE analysis package (CSIRO), which utilises a Dynamic Analysis (DA) algorithm developed for nuclear microprobe techniques and synchrotron-based XRF (Ryan et al. 1990(Ryan et al. , 2005(Ryan et al. , 2015)).The algorithm deconvolutes a spectrum into fluorescence components for each element based on an iterative process that involves nonlinear least-squares and linear fit (Ryan et al. 2015).The fundamental parameters model developed for the instrument has been described elsewhere (Purwadi et al. 2022).The density and thickness of herbarium specimens were predicted and coupled with instrument-related parameters, and the resulting spectra were processed in GeoPIXE (Purwadi et al. 2023) giving more accurate results than the empirical calibrations used by previous studies (van der Ent et al. 2019b;Gei et al. 2020;Abubakari et al. 2021a, b, c).An additional dataset derived from 1710 leaves measured via Inductively Coupled Plasma Atomic Emission Spectroscopy (ICP-AES) (van der Ent et al. 2020) was also subjected to the same mixture-model fitting procedure as a comparison.
In general, plants can be classified into normal and hyperaccumulators classes (Baker 1981; van der Ent Vol:.( 1234567890) et al. 2013).Two gaussian models were fitted into each ICP-AES and XRF concentration group to represent these assumed classes.Using the R package 'mixR' (Yu 2022), a gaussian mixture model was fitted into the elemental distribution obtained from processing XRF spectra.To address the effect of the limit of detection (LOD) in both groups, samples below LOD for each element were assigned new values produced by regression on order statistics (Cohn and Helsel 1988;Helsel 1990;Harter 2006) using the R package 'NADA'.The regression on order statistics requires two columns as an input.One column is for elemental concentration, and another column has a value 'true' indicating the concentration above LOD or 'false' indicating the concentration value below LOD.Any row indicated to be below LOD was set to have a concentration of 1, and then regression on order statistics was used to estimate and replace the concentrations below LOD by using a linear regression based on the positions of the 'true' concentrations and their normal quantile.The modelled data were transformed to log normal before two gaussian models were fitted.Only Mn, Co, Ni, and Zn concentrations were available from ICP-AES datasets, whilst the XRF data additionally had Y, Se, and As data.

Results
In total, ~ 27,000 XRF data points (i.e., herbarium specimens) and ~ 1700 ICP-AES data points were collated and processed in this study.The handheld XRF instrument was found to have a detection limit of 50-100 µg g -1 for most transition elements (Purwadi et al. 2022).Approximately 50% of the total XRF measured specimens were found to have at least one element above the detection limit (14,425 specimens for Mn, 426 specimens for Co, 2258 specimens for Ni, 3596 specimens for Zn, 77 specimens for As, 81 specimens for Se, and 105 specimens for yttrium (Y) as shown in Figs. 1  and 2).Conversely, the previous ICP-AES measurements found at least one of Mn, Co, Ni, and Zn above the detection limit in > 80% of specimens.Since many specimens had metal(loid) concentrations below the detection limits, a regression on order statistics was employed to estimate values for those samples falling below the detection limits.The F-statistics results indicated that the regression was significant, as shown in Figs.S1-S3.Two gaussian models, representing normal (green) and hyperaccumulator (red) plants, were fitted to the ICP-AES Mn histogram (Fig. 1).To be classified as an Mn hyperaccumulator plant with at least a 99% probability, the XRF and ICP-AES datasets suggest 1210 µg g −1 and 2850 µg g −1 , respectively.These numbers are less than a third of the historical threshold for Mn hyperaccumulators (black dashed vertical lines) at 10,000 µg g −1 , which fell on the far-right side of the hyperaccumulator (red) gaussian model or histogram for both datasets.The historical threshold for Co hyperaccumulators was 300 µg g −1, which lies in the far right of hyperaccumulator gaussian models at 9-fold and 60-fold than the concentration required to have a 99% probability of being Co hyperaccumulators based on the XRF and ICP-AES datasets, respectively.The Ni historical threshold at 1000 µg g −1 was found below the mean of the hyperaccumulator gaussian model in ICP-AES datasets (Fig. 2).Nevertheless, it still exceeds a 99% probability of being Ni hyperaccumulators determined using both datasets.Regarding Zn, the historical hyperaccumulator threshold is 3000 µg g −1 , which is at least 16 times greater than the Zn concentration to attain a 99% probability of a sample being a Zn hyperaccumulator.The XRF datasets also reported on As, Se, and Y concentrations (Fig. 3).Similar to other elements, the historical hyperaccumulator thresholds at 1000 µg g −1 for As and 100 µg g −1 for Se were 125 times and 10 times greater than the concentration at a 99% probability of being hyperaccumulators.Yttrium is part of REE groups and the historical REE hyperaccumulator threshold is the total REE concentration at 1000 µg g −1 .This threshold is more than 90-fold of the concentration required for a 99% probability of being a hyperaccumulator, based on Y alone (with Y typically making up 10-20% of total REEs).

Discussion
Hyperaccumulators have historically been classified by longstanding empirical thresholds (Baker and Brooks 1989).These thresholds were established based on elemental concentrations several magnitudes of magnitude higher than in other plants growing on the same soils (Reeves 2017;van der Ent et al. 2021).We have modelled this as a two-populations model tested against a large dataset with a high fraction of hyperaccumulators, in order to establish a statistical basis for these thresholds.By fitting two gaussian models representing normal and hyperaccumulator plants into Mn, Co, Ni, Zn, As, Se, and Y concentrations, the historical hyperaccumulator thresholds were assessed to determine whether they are sufficient to distinguish hyperaccumulators from normal plants based on elemental concentrations.It is self-evident that these thresholds are an abstraction imposed on a complex biological process, and the true distributions will vary via a host of complex underlying factors.A simple two-component model was chosen to avoid over-fitting, and to maximise consistency with the longstanding threshold-model in the literature.The results show that no historical thresholds occur in the gaussian models of normal plants.In general, all historical hyperaccumulator thresholds are more than the mean concentration of the hyperaccumulator gaussian models and 1.5 to 125 fold of the minimum concentration required to fall under hyperaccumulator gaussian models with a 99% probability.The frequency distributions of both Mn and Zn exhibit long tail distributions, possibly because both elements are essential to plant growth (Dalcorso et al. 2014).Manganese is more abundant in soil than Zn (600 µg g −1 vs. 71 µg g −1 ) (Taylor and McLennan (Reeves et al. 2017).Both Mn and Zn play crucial roles in the activation of many enzymes (Broadley et al. 2007;Schmidt and Husted 2019).Manganese is a constituent of the plant enzyme called the Mn-cluster which is responsible for water oxidation to release oxygen during photosynthesis (Schmidt and Husted 2019), whilst Zn is found in all six enzyme classes (hydrolases, oxidoreductases, lyases, transferases, ligases, and isomerases) (Gupta et al. 2016).
The histograms of Co and Ni for both XRF and ICP-AES datasets showed a bimodal distribution.These distinct features of Co and Ni hyperaccumulator plants may be attributed to the availability of the two elements in the soil, as most Co and Ni hyperaccumulator plant species occur on ultramafic soils, which are simultaneously enriched in both elements (van der Ent et al. 2015a;Ent et al. 2016;Echevarria 2018).The more distinct bimodal pattern in ICP-AES datasets than in XRF datasets may be because the XRF instrument detection limit is significantly higher than that of ICP-AES method.Therefore, the ICP-AES results have more specimens at intermediate concentrations.The histogram of ICP-AES Ni concentrations depicts ideal gaussian models in which one peak belongs to the normal and hyperaccumulator plants.
Compared to other Mn, Co, Ni, and Zn, the number of samples above detection limits for As, Se, and Y are fewer, which is understandable since these elements are present at much lower concentrations in the environment.The minimum concentrations to be clustered with the As, Se, and Y hyperaccumulator population at a 99% probability are also significantly lower at 8 µg g −1 , 10 µg g −1 , and 11 µg g −1 , respectively.Any samples with detectable As, Se, and Y with the XRF instrument are already extraordinary due to the relatively high detection limits of the XRF instrument used in this study.
The XRF instrumentation used in this study is the most sensitive to elements Z 25-35 (Mn-Br), which includes hyperaccumulating transition metals such as Ni and Zn.The limit of detection for these elements is in the range of 50-100 µg g -1 and increases sharply for elements both below (Z < 13 e.g., Al) and above (Z 40 e.g., Zr).Copper is highly regulated in most plants, typically present at around 10 µg g -1 in leaves, which is below the instrumental LOD (94 µg g -1 ), so only cases of exceptional accumulation would be present in this dataset.Moreover, Cu (hyper)accumulation in plants is exceedingly rare outside of the Copperhills of Central Africa (Lange et al. 2017).Similarly, Ni is typically < 10 µg g -1 in plants (albeit somewhat higher, 20-50 µg g -1 , in plants growing on ultramafic soils), and consequently, only 8% (2258 measurements out of 26,942) are > LOD of 97 µg g -1 .This compares to 14,425 or 54% of Mn > LOD and 3596 or 12% of Zn > LOD.Hence, there is one very important caveat with XRF data: the dataset is strongly affected by the LODs for the XRF method.Due to high number of specimens below LOD, the majority of XRF results are not usable, rendering it difficult to model the distribution of elemental concentrations below LOD.In this case, the ICP-AES datasets afford an opportunity to examine this effect due to the improved sensitivity of this method.
In conclusion, the results of this study show that the historical hyperaccumulator thresholds are to the right side of the hyperaccumulator gaussian model tail.So, the historical hyperaccumulator thresholds are acceptably conservative.The use of notional threshold values remains a crude way to detect hyperaccumulation.It is, however, practical and, when used sensibly, can guide the identification of extreme physiological behaviour in the absence of physiological definition of hyperaccumulation (van der Ent et al. 2015b).These updated values, now underpinned by rigorous statistical analysis of a large population of hyperaccumulators, will help to distinguish genuine hyperaccumulators and ensure consistency in reporting on these plants.

Fig. 1
Fig. 1 The histograms of manganese (Mn) and cobalt (Co) concentrations.A histogram comparison between the regression on order statistics and a constant value replacement is shown in Fig. S1, and Tables S1-S2 provide statistical summary of the original and ROS concentration.A gaussian mixture model is fitted into the histograms resulting in two models

Fig. 2
Fig. 2 The histograms of nickel and zinc concentrations.A histogram comparison between the regression on order statistics and a constant value replacement is shown in Fig. S2, and Tables S1-S2 provide statistical summary of the original and ROS concentration.A gaussian mixture model is fitted into the histograms resulting in two models representing the

Fig. 3
Fig.3The histograms of arsenic, selenium, and yttrium concentrations.A histogram comparison between the regression on order statistics and a constant value replacement is shown in Fig.S3, and TableS1provides statistical summary of the