Chemical characterisation is rough: the impact of topography and measurement parameters on energy-dispersive X-ray spectroscopy in biominerals

Energy dispersive X-ray microscopy (EDX) is a widely available, inexpensive method of characterizing the in-situ elemental composition of samples in Earth and life sciences. Common protocols and textbooks focussing on material sciences address EDX analysis of metallic samples that can be polished perfectly, whereas geoscientists often investigate specimens with prominent topography and composed of light, difficult to resolve elements. This is further compounded by the scarcity of literature surrounding the methodology of SEM–EDX in the field of palaeontology, leading to common misinterpretations and artefacts during data acquisition. Here, the common errors in elemental composition obtained with EDX arising from surface topography and from parameters subject to user decisions are quantified. As a model, fossil bioapatite (conodonts) and abiotic Durango apatite are used. It is shown that even microscale topography can distort measured composition by up to 34%, whereas topographic features such as tilt with respect to the electron beam lead to differences of up to 85%. Working distance was not the most important parameter affecting the results and led to differences in composition of up to 13%, whereas the choice of standard and its levelling with the sample surface led to inaccuracy reaching 33%. EDX results can be also affected by beam damage and the effects of acceleration voltage on sample acquisition and resolution are quantified. An estimate is provided of the severity of errors associated with samples which cannot satisfy preparation requirements for EDX fully, such as holotypes, and with user decisions. Using a palaeontological example, recommendations are offered for the best parameters and the relative importance of error sources are assessed.


Introduction
Energy-dispersive X-ray spectroscopy (EDX, EDS, EDXS or XEDS; here referred to as EDX) is a technique which allows a non-destructive analysis of elemental composition of materials. It is widely used, as it is user-friendly, inexpensive, quick, and found in many research and industrial facilities (Goldstein et al. 2017). When used with adequate standards and parameters, its accuracy and precision can exceed those of wavelength dispersive spectrometry (WDS, Ritchie et al. 2012). A side effect of the availability of EDX is a reduction in required user's expertise, which has contributed to the method's unjust reputation as "semi-quantitative" (Newbury et al. 1995;Newbury and Ritchie 2014). A common EDX application is the elemental characterisation of biominerals, e.g., in medicine, forensics, archaeology and palaeontology. Biominerals record environmental conditions and the physiology of the organisms that created them and the elemental characterisation of biomaterials can unlock insights into individual life histories of organisms (Mortensen and Rapp 1998;Parkinson et al. 2005;Shirley et al. 2018). Resolving these records at scales as short as days requires excellent spatial resolution and precision. In the case of limited or rare material, this is accompanied by the need for non-destructive methods. It may also preclude embedding and polishing, forcing the researcher to study an uneven surface (Newbury and Ritchie 2013), as would be the case for holotypes or museum loans (e.g., Gueriau et al. 2016;Murdock and Smith 2021). Are EDX analyses on uneven surfaces faulty to such a degree that they cannot offer any information? EDX detects X-rays emitted from the sample during bombardment by an electron beam to distinguish the elemental composition of the irradiated volume on a potentially submicron level. When this electron bombardment occurs, characteristic X-rays are emitted from elements when their electrons make transitions from electron shells with higher atomic energy levels (more outward) towards lower energy ones. An EDX detector measures the relative abundance of emitted X-rays as a function of their energy and can provide qualitative (spot) and semi-quantitative (line or map) scans of element distributions. While frequently used, the method is easily susceptible to errors when parameters are not chosen correctly (Goldstein et al. 2017). Factors such as sample preparation (topography), working distance and acceleration voltage makes precise quantifications harder to achieve, in particular when it comes to quantitative values.
Here a quantitative evaluation is presented of user-controlled and sample-specific factors affecting EDX resolution and precision in common analytical applications encountered in Earth and life sciences. The theoretical background can be found in reviews by Ritchie et al. (2012) and Newbury and Ritchie ( , 2015. Two materials were used for this investigation: (1) tooth-like remains of conodonts and (2) Durango apatite, commonly used as reference material for fission-track microprobe analysis (Jarosewich et al. 1980), oxygen isotope analyses (Sun et al. 2016) and synchrotron X-ray fluorescence of hard tissues (Anné et al. 2014(Anné et al. , 2019 and preserved soft tissues (Manning et al. 2019). The comparison between these abiotic and biotic materials was selected owing to their similar composition, Ca 5 (PO 4 ) 3 (F,Cl,OH), as well as their importance and broad use in Earth sciences.
Conodonts are extinct marine vertebrates (for a discussion of their affinity, see Donoghue et al. 2000), mostly known in the geological record from their microscopic dental elements, consisting of bioapatite forming an enamellike tissue. These elements are widely used to reconstruct oxygen, strontium and recently also calcium (Balter et al. 2019) isotope values of the seawater in which they lived, as well as neodymium isotope values and REE composition to deduce the seafloor conditions between the animal's death and burial (Wright et al. 1984;Trotter and Eggins 2006;Dopieralska et al. 2012;Trotter et al. 2016). For the span of their existence between the late Cambrian and the end of the Triassic Period, conodont elements are common tools for palaeotemperature and palaeoredox reconstructions thanks to their abundance in sedimentary rocks and their high stability (Trotter et al. 2007). Minor elements incorporated into their teeth have recently become exploited to answer palaeobiological questions, such as the animal's environmental and trophic niche, ontogeny and even phylogenetic affinity (Katvala and Henderson 2012;Zhuravlev 2017;Shirley et al. 2018;Terrill et al. 2018). This is particularly important in studying extinct organisms, where the biomineral is the only available remain of its growth processes, physiology and environment, but rapid, inexpensive and mostly non-destructive chemical examination of small quantities of biominerals is equally important in archaeology, anthropology, sclerochronology and medicine (e.g., Quintela Souza de Moraes et al. 2015). Conodonts have been chosen in particular for this study due to their high susceptibility to electron beam damage, which results from ejection of material from the surface of the sample, heating, electrostatic charging and ionisation (Egerton et al. 2004). These problems are further compounded by and complexity of preparation (Pérez-Huerta et al. 2012;Shirley et al. 2020), providing a "worst case scenario" for the application of EDX. In a previous study (Shirley et al. 2020), the pitfalls and solutions to common problems of bioapatite preparation for microanalysis were illustrated; here, the errors of user-defined settings on the reproducibility and precision of EDX analyses in biological and abiotic apatite are quantified.

Conodonts
Ozarkodinid conodonts stored in the collections of the GeoZentrum Nordbayern (EJ-14-407) were used for this study. The conodonts were collected from the middle Silurian of Gotland, Sweden, and selected due to their high abundance and low Conodont Alteration Index (CAI ~ 1, Jarochowska et al. 2016). This indicates that they are in best state of preservation obtainable and therefore suggests that the elemental composition has been minimally affected by burial or hydrothermal diagenesis. Conodont elements consist for the most part of lamellar tissue, which is structurally similar to enamel and consists of layers of francolite crystals in an organic matrix (Pietzner et al. 1968;Purnell and Donoghue 1998;Trotter et al. 2007).

Durango apatite
Durango apatite is a fluorapatite that is found as large, clear, yellow crystals within the open-pit mine at Cerro de Mercado, just north of Durango City, Mexico, that was deposited about 31 Ma (McDowell et al. 2005). This apatite demonstrates major element homogeneity of 1-5% relative standard deviation (RSD) at the 10 µm level (McDowell et al. 2005;Chew et al. 2016). Commercially sourced crystals were used. Durango apatite and conodonts have similar elemental compositions, but there is a clear difference in how they react to an electron beam. Durango apatite is much more stable and homogeneous compared to conodont bioapatite, allowing for multiple analyses to be conducted with little alteration to the crystals. On the other hand, conodonts are highly susceptible to beam damage (e.g., Pérez-Huerta et al. 2012) and have alternating layers with a variance in composition. This study focuses on four elements: calcium, phosphorus, fluorine and chlorine.

Preparation
All material was prepared following the steps outlined by Shirley et al. (2020). Each sample was suspended and oriented in EpoFix epoxy resin. After suspension, samples were ground with a progression of F800 and F1200 grit carborundum (silica carbide) until the desired plane to be observed was reached. This was followed by polishing with a succession of 6, 3 and 1 µm diamond grit (Struers ® DP-Spray P) on a Logitech WG lapping machine with a Logitech ® EP1/ EP2 polishing cloth and Struers ® DP-Lubricant Red. The samples were then chemically polished using Struers ® nondrying colloidal silica suspension for a period of 2-3 min. All samples were coated with 7 nm of carbon.

EDX
EDX measurements were conducted on a Tescan Vega\\xmu tungsten SEM using an Oxford Instruments X-Max 50 mm 2 silicon drift, solid-state detector using INCA software. In all cases, unless stated otherwise, the following parameters were used: a working distance of 11.5-12.5 mm, an acceleration voltage of 15 kV, tilt of 0°, spot-size of 100-250 nm, calibrated against a cobalt standard. Second, spectra were processed using the software NIST DTSA-II (Lorentz) and the "simulation alien" tool, which allows for the user to simulate the acquisition of EDX spectra of a homogenous material (Ritchie 2011;Goldstein et al. 2017). The first step in this analysis is to define the SEM parameters (detector type, angle of attack, optimal WD, environmental conditions), followed by defining an expected elemental composition. The parameters for this software were taken from the composition of Durango apatite as presented by (Young 1969): from which the w% of each element was estimated for entry into NIST DTSA-II. Using a Monte Carlo model, a simulated spectrum was created that could be compared to the acquired spectra for quality assurance (Fig. 1). The application of this method allows the user to test parameters (WD, acceleration voltage, probe current etc.) before a sample is put in the chamber and recording the "best case" settings for their analysis. A typical EDX spectrum is depicted as a plot of x-ray counts vs. energy (in keV). Energy peaks are narrow peaks that represent the energy of characteristic X-rays emitted from individual elements within a material that are commonly resolved, a single element producing several peaks that represent interactions between different electron shells (e.g., the spectrum in Fig. 1b display 3 peaks attributed to Ca corresponding to the Lα, Kα and Kβ emissions). The emissions that are detected, as well as the number of counts obtained during acquisition, can impact what elements are identified. For example, a comparison between simulated data acquired at 5 kV shows a difficulty in identifying the Cl Kα and Ca Kβ so a study at this energy may under-represent these elements or not detect them at all (Fig. 1c). Similarly, increasing energies can cause a "wash out" effect where at 30 kV the F Kα may be mistaken as background noise when compared to the P Kα or Ca Kα (Fig. 1d). The application of simulations allows for multiple tests to be run before inserting the sample into the SEM, saving time and costs in the future.

Data sets and statistical evaluation
Data analyses were performed using R Software version 4.0.3 (R Core Team 2020). The non-parametric Kruskal-Wallis test (Kruskal and Wallis 1952) for means and Levene's test for the homogeneity of variances (Levene 1960) were used. All EDX measurements and the R code used to evaluate it can be found as supplementary files in Shirley and Jarochowska (2021).

The impact of surface topography on measured concentrations
The topography of a sample can have drastic influence on the quantity and quality of characteristic X-rays that are received by the detector (Newbury and Ritchie 2014).
Here, this influence is quantified, thus emulating the situation of studying a sample with natural topography, e.g., a holotype. In order to quantify the error, 81 spot scans were conducted within a single area of the Durango crystal ( Fig. 2 A; Supplementary Table S1). Of these, 40 were 7 Page 4 of 15 Fig. 1 Graphical representation of EDX spectra highlighting a a spectrum at full length, b simulated data compared to acquired data and c, d a comparison of the simulated spectra and peak heights with increasing kV acquired on what was identified as a smooth surface and 41 in a series of natural voids that occur within the sample (Fig. 2b). Only Cl content showed no difference between the two topographies (p value 0.828 in a Kruskal-Wallis chi-squared test, also used for the following three comparisons, Table 1). For F, the measured weight percent content was higher by 34% on the smooth surface (p = 4.932 × 10 -5 ). For Ca, it was 9% higher on the rough surface (p = 0.004599). For P, it was 6% higher on the smooth surface (p = 0.002749). Additionally, the variance was significantly larger for all elements measured on rough surfaces (at α = 0.05, Levene's test), with the exception of Cl, where no difference could be detected.

The impact of sample tilt
Tilt is the angle at which the electron beam interacts with the specimen. Ideally this would be at 90°, as this is what most detectors are calibrated for. Tilting of the stage allows topography to be generated artificially in a single area and, therefore, to be quantified to the extent in which topography impacts perceived composition and simulates improper preparation and application to fossil specimens which cannot be sectioned and polished (Saleh et al. 2020;Murdock and Smith 2021). A series of spot scans were conducted as the sample was tilted in two distinct directions: (1) towards the detector up to 40° and (2) away from the detector up to − 20°, which was the limit the equipment was able to  reach. Fifteen spot scans were made at 34 tilt angles running from − 20° to 40° ( Fig. 3; Supplementary Table S2). Higher resolution (every 1°) was used for the range − 10° to + 10°, as this is the range that is most likely to pass as undetected or negligible. For all elements, concentrations obtained in tilted samples were lower than in those positioned horizontally, reaching down to 85% of the reference concentration (Cl at − 20°, Table 2), except for F where concentrations were 15% higher than the reference when measured at − 20° (Table 2 and Fig. S1 in Shirley and Jarochowska 2021). Only F concentrations were 15% higher than the reference when measured at − 20° (Table 2 and Fig. S1 in Shirley and Jarochowska 2021). In all cases, the concentrations measured at the lower and outer positions of the tilt values were significantly different from the reference measured at 0°. What is more, the variance in concentration of two elements was significantly different: lower for Ca measured in tilted samples and higher in Cl measured in tilted samples (Table 2).

Standards
As part of a typical workflow for EDX analysis, calibration is often carried out before the acquisition of spectra. This allows then detector to be calibrated using the signal from a sample with a known composition. Standards are, in the majority of cases, homogeneous metals that have a high conductivity and produce a large number of characteristic X-rays. In an ideal situation, this material would produce four characteristic peaks on a single spectrum (normally a combination of the Kα , Kβ, Lα and/or Lβ, depending on the material). However, in certain situations it is necessary to use elements in which this is not the case. For example, in measurements at a high resolution or in sensitive materials, lower acceleration voltages will be needed, not allowing for these four peaks to be observed. It is preferable to use elements with a similar atomic weight to that of the elements to be characterised. Figure 4 shows a comparison between calibrating the equipment using cobalt and silicon standards. Concentrations of all elements were higher by 6-7% following calibration of silicon and, except for Cl concentration, all were significant at α = 0.05 (Fig. 4, Table 3; Supplementary  Table S3). It is important to place the sample and standard at the same level in the chamber. Figure 5 and Table 4 highlight the extent of this error, showing the mean concentration across all measurements. If the standard was placed 5 mm below the sample, it led to an under-representation of the composition of both P (− 1.6 w%) and F (− 0.1 w%), but Ca was over-represented (+ 2.1 w%). However, if it was placed 5 mm above, there was a more drastic over-representation in Ca (+ 12.4 w%), P (+ 4.7 w%) and F (+ 1.3 w%), i.e., up to 33% for Ca.

Working distance
To test of the impact of working distance on measured composition, a series of 15 spot scans was conducted at 10 working distances between 10 and 19 mm. The system used for these tests is calibrated to have the ideal working distance between 11.5 and 12.5 mm ( Fig. 6; Supplementary Table S5). Average concentrations of all Ca, F and P were significantly different across the range of distances tested, with detected Ca concentration decreasing with increasing WD, F concentration increasing, and P concentration fluctuating without a trend. Ca concentration decreased by up to 2% at WD 19 mm compared to that measured at the optimal WD. F concentration was up to 13% higher at WD 19 mm compared with that at the optimal setting. No change in variances of concentrations could be detected for any of the four elements at α = 0.05. The same results (significant differences in Ca, F and P concentrations and none in the variance) were obtained for comparisons limited only to the optimal distance of 12 mm and directly adjacent positions of 11 mm and 13 mm.

Acceleration voltage and its impact on data acquisition
Biominerals such as enamel are susceptible to beam damage when subject to long exposure and high acceleration voltages. In order to estimate the effect of this damage on measured elemental composition, a sample of conodont lamellar tissue (enamel-like composite apatite-organic biomineral) was subject to four sessions of consecutive spot scans. The scan parameters (working distance, spot size, scan time) were as similar as possible for each experiment, with the only changing parameter being the acceleration voltage. A spot selected on the sample was subject to 30 spot scans for 60 s each. This was then repeated on a "fresh" spot for each acceleration voltage of 8, 10, 15 and 20 kV. Figure 7 highlights the change in elemental composition recorded during these repeated spot scans. The smallest changes (by + 0.2% in F to − 3% in Ca) were observed at 10 kV, followed by changes at 8 kV, while Ca concentrations were increasing by 5% with each measurement and F concentration decreased by 1% with each measurement (Table 5, Fig. 7, Supplementary Table S6). At these two low voltages no change in F concentration over the 30 scans could be detected. At 15 kV, significant increases in concentrations over time were detected for Ca and F, with Ca concentration increasing on average by 16% with each measurement. Measurements at 20 kV resulted in substantial decreases in all measured concentrations by up to − 20% for Ca and − 37% for F and all these decreases were significant at α = 0.05 (Table 5).

Acceleration voltage and its impact on resolution
Following a study by Shirley et al. (2018), a comparison was made between different acceleration voltages and their impact on scan resolution. EDX line transects were conducted across individual growth layers of a single polished section through the conodont Ozarkodina confluens in order to differentiate Sr concentrations within the dark and light bands. Both tests were conducted using a WD of 13 mm and  a spot size of 250 nm. Monte Carlo simulations on bioapatite determines that the interaction volumes of backscattered electrons are 0.3 µm at 5 kV and 3 µm at 15 kV (Joy 2006;Shirley et al. 2020). The impact of the interaction volume on spatial resolution can be seen in Fig. 8. At 5 kV a clear distinction can be seen in the concentration of Sr (Lα emissions) between the light and dark bands showing relative levels of strontium to be up to 48% higher in lighter areas.
In comparison to the test at 15 kV, there appears to be no correlation between the bands and the strontium concentration. This is due to the interaction volume at 15 kV being too large to pick up on smaller areas of study and there is a "wash-out" effect on the composition. However, using such a low voltage for EDX has a number of limitations; using low acceleration voltages limits the number of detectable elements and severely reduces the number of characteristic X-rays that are emitted, potentially increasing scan time and overall cost. The user needs to strike a balance between desired spatial resolution and appropriate acceleration voltages. Up to 8 kV is suggested as Monte Carlo simulations show activation volumes of ~ 1 µm and this range drastically increases the number of elements that can be identified.

Discussion
Although EDX has established itself as a cost-effective and easy to learn method in material and life sciences, the parameters which may affect the results and conclusions substantially are not documented sufficiently in publications in the fields of biology and palaeobiology. This, accompanied by user error in various analyses, reduces reproducibility and may result in analytical artifacts and erroneous values in quantitative measurements. This is further amplified by the fact that phosphatic biomaterials, such as those encountered in medicine or palaeontology, are susceptible to beam damage. The inclusion of abiotic apatite in this study allowed the quantification of the extent of common errors by facilitating a number of tests that cannot be conducted on bioapatite due to its susceptibility to beam damage.

Preparation
It has been shown here that the lack of adequate preparation, and thus of a flat surface, introduces systematic errors in EDX analysis. In the tests described here, it can be seen in spot scans on surface of the sample that elemental compositions are under-represented when measured on surfaces with substantial topography. It should also be mentioned that where one element is over-represented, another will be under-represented, as the concentration of these elements is directly impacted by one another. The error recorded due to topography is normally within the range of 2-4 w% across all elements, but some outliers have been shown to be upwards of 34% of their reference concentration. However, outliers have also been recorded in samples that do have an adequate polish. One way to address this issue is to take several spot scans on the surface in order to rule out these outliers. However, one should be careful not to place these spot scans atop one another. As shown in Fig. 7, the interaction of the electron beam with biological samples can have a serious effect on data acquisition. Further evidence of the impact of topography on the measurements can be seen in the exaggerated tests, where tilting the sample either towards or away from the detector systematically changes detected concentrations by up to 15%. What is more, the increased variance of uneven surfaces affects measurement reproducibility negatively.

User error
User error is when incorrect parameters are used in the setup and calibration of the SEM before the analysis is conducted. When beginning any analysis, there is a series of user-defined parameters which change dependent on the system. The parameter most commonly set incorrectly is the working distance. Here an attempt has been made to address the impact of this by testing data acquisition at various heights. As shown in Fig. 6, measurement accuracy, but not precision, is, for some elements, heavily affected (up to 13%) by WD values different from those optimal for the given SEM setup. Another common issue in user-defined parameters is the calibration of the system using a homogeneous metallic standard. Commonly there are software restrictions on the calibration standard that can be used (as is the case with the equipment used  here). In such a case, when applying this full standard, an attempt was made to use elements that are close to the atomic value of the elements that are being recorded. As shown in Fig. 5, the selection of the correct standard is critical for accuracy. Cobalt as a standard allows two separate characteristic peaks (from the Kα and Lα electron shells) for typical apatite composition to be acquired, providing a more robust standard than that of silicon. When calibrated against silicon, concentrations of all measured elements were overestimated. Newbury and Ritchie (2015) provided an exhaustive list of recommendations for quantitative EDX analysis to achieve maximum precision and accuracy through standardization following the k ratio protocol. The position of the standard relative to the specimen being analysed is important in the detection of element concentration. As shown in Fig. 5, either an increase or a decrease of the height of a standard relative to the sample results in, respectively, under-or over-estimation of the element's concentration.

Other considerations
The acceleration voltage of the electron beam is going to impact the resolution of the sample, which can be important, e.g., for sclerochronology. Lower acceleration voltages provide a higher resolution, but this comes with the caveat of possibly limiting the number of elements that can be observed. Further factors that impact the quality and reproducibility of EDX analyses include software settings such energy channels, dead time of the detector, process time, oxygen content of the sample, mole mass, type of detector, and even humidity and temperature of the surroundings Newbury and Ritchie 2015). Here, only the parameters that were identified as the most commonly used incorrectly and those that were able to be controlled were addressed.
Furthermore, user-defined software settings and post-processing (e.g., normalised by 100%, oxygen-weight percent or mole mass) of the data can have an impact on both acquisition (e.g., dead time) and perceived chemical concentration. While these are important for data acquisition, they are beyond the scope of this paper.

Conclusions
This study provides one of the first quantifications of usercontrolled and sample-specific errors in the chemical characterisation of palaeobiological materials. It has been shown that the surface topography has a substantial impact on data acquisition, mostly affecting the accuracy, i.e., resulting in systematic offsets between the values measured at correct and at incorrect parameters. This is further amplified if incorrect user-defined parameters are set before or during analysis. Biological apatite exemplifies two difficulties: it is susceptible to electron-beam damage and does not lend itself easily to polishing (Shirley et al. 2020). The latter problem, as shown here, decreases the reproducibility of EDX analyses. Based on this systematic study, it has been shown that it is worthwhile to dedicate time to establishing optimal parameters for a given material before undertaking actual measurements. For apatite, biological and abiological (macrocrystalline), it is suggested that a working distance of 11.5-12.5 mm (or as indicated by the producer of the particular setup), an acceleration voltage (kV) of 15, tilt of 0°, spot size of 100-250 nm, all calibrated with the commonly used and inexpensive cobalt standard. However, every system is calibrated differently, so this study highlights the importance of becoming familiar with the available equipment in order to fully understand the parameters that should be applied.