1 Introduction

Differentiation between natural variability in risk element contents in soils and anthropogenic contamination remains a challenge despite long efforts in environmental geochemistry and soil geochemical mapping (Vácha et al. 2015; Reimann et al. 2019a, b; Zhang et al. 2020; Grygar et al. 2023). It particularly concerns weak (diffuse) contamination, long-term impact of pedogenic element translocation in soil profiles, and understanding element patterns in soils on geochemically anomalous rocks (Amorosi et al. 2014; Fabian et al. 2017; Reimann et al. 2019a, b; Protano et al. 2021; Reimann and Fabian 2022). Diffuse contamination and pedogenic processes (Fabian et al. 2017; Reimann et al. 2019a, b; Reimann and Fabian 2022) are often attributed to the “ambient” or “natural” background (Zhao et al. 2007; Ander et al. 2013; Mikkonen et al. 2018), which is an implausible simplification when geochemical patterns should truly be understood and human impact unequivocally deciphered. The presence of geogenic anomalies requires the implementation of geological maps in data interpretation (Appleton and Adlam 2012; Ander et al. 2013; Labaz et al. 2019; Zhang et al. 2020; Beier et al. 2022), but this is also often neglected (Grygar 2016, 2020). Geological maps should already be used in planning sampling sites and examining and visualising data (Appleton and Adlam 2012; Vilà and Martínez-Lladó 2015), but it is frequently neglected. The differentiation between natural and anthropogenic factors thus remains a challenge for current environmental geochemistry.

Advanced and fast analytical techniques have allowed the production of extensive collections of soil compositional data in many countries, including the Czech Republic (Cornu et al. 2023). Data mining is thus of growing importance, because many conventional approaches need critical revision. The neglect of fundamental properties of geochemical data and belief that statistical methods can isolate any desired information in an automated way are quite widespread (Grygar 2016, 2020, 2022). Conventional multivariate methods are commonly used without considering the entire scale of real controlling factors (RCFs) of soil composition: soil provenance, sampling depth, grain size effects, and soil mineralogy (Baize and Sterckeman 2001; Grygar 2016, 2020, 2022; Négrel et al. 2021; Grygar et al. 2023). Recent devaluation of case studies in research publishing and advances in mathematical data processing have isolated geochemical data mining from empirical knowledge on the study areas and led to a growing number of studies without tangible results and empirically verifiable conclusions (Agyeman et al. 2022; Francos et al. 2022). Traditional holistic approaches to soil geochemistry data mining and expertise (Baize and Sterckeman 2001; Labaz et al. 2019) have been gradually replaced by the use of experimental data processing tools, which are possibly mathematically highly sophisticated but poorly empirically interpretable and verifiable. This is a general worrisome trend in geochemistry (Greenacre 2019; Razavi et al. 2022; Grygar 2022) and other branches of research (Rudin 2019). This paper shows how to take RCFs into account in case studies in soil contamination by risk elements.

The majority of recent studies on soil geochemistry have been based on dense mapping of large areas with analytically highly valuable datasets but have sometimes been conducted in quite complex bedrock geology, geography, and anthropogenic activities and presented without considering RCFs, which resulted in uncertainty in the identification of anthropogenic impacts (Zhang et al. 2020; Grygar et al. 2023). The impact of coal-fired power plants on soil contamination is a particular example of how difficult it can be to decipher the anthropogenic contribution to risk elements in soils. Pandey et al. (2016) reported on the impact of coal use on soils in eastern India; however, they did not take into account that coal-bearing strata around the power plants are geogenically anomalous by high Cu and Ni content. Coal-fired power plants have been shown to increase Hg contents in soil (Fan et al. 2020) but not other elements. An insignificant effect of lignite or coal burning on surrounding soils was found by Agrawal et al. (2010), Nanos et al. (2015), and da Silva Júnior et al. (2019). The Most Basin in the Northwestern Czech Republic, which is the centre of soft coal mining and energy production at the national scale, is characterised by soils with elevated contents of As, Be, Cd, Pb, and V (Vácha et al. 2015; Bednářová et al. 2016; Grygar et al. 2023), which is traditionally believed to be caused by the local industry (Vácha et al. 2015; Skála et al. 2022; Boahen et al. 2023). On the other hand, the Ore Mountains with granitic and metamorphic rocks with ore veins and the predominantly basaltic České Středohoří Mts., which border the basin from the NW and SE, respectively, show anomalously high contents of several risk elements, such as As, Pb, and V (Vácha et al. 2015; Vöröš et al. 2019; Skála et al. 2022; Grygar et al. 2023). These inherent geogenic anomalies must therefore be taken into account if the anthropogenic impacts on the soils of the Most Basin are to be identified beyond doubt. The undoubted recognition of real impacts of industrial activities on soils is indispensable part of realistic and sustainable planning of future economic activities. This is particularly appealing in the energy production with minimal negative impacts of nature, that is currently enormously relevant globally.

The major factual aim of this work was to evaluate the impact of the soft-coal-fired power plant, Ledvice (built in the 1960s, actually installed power of 770 MW), on risk elements in soils in the Most Basin, Czech Republic. The spatial distribution of risk elements around Ledvice was evaluated from soil mapping with risk elements obtained by conventional acid extractions and obtained in the frame of state monitoring of agricultural soils collected in Register of Contaminated Areas (RKP, Vácha et al. 2015; Grygar et al. 2023). To limit the impact of bedrock geology on risk element contents in soils (Ander et al. 2013; Zhang et al. 2020; Grygar et al. 2023) and to include essential information on emission and geogenic controls (Baize and Sterckeman 2001), we thus sampled soil depth profiles and subjected them to more detailed element analyses. Aqua regia extractions and X-ray fluorescence (XRF) analyses were used to distinguish anthropogenic and geogenic controls on the soil risk elements. The strategic aim was to demonstrate how to distinguish the geogenic and anthropogenic origin of risk elements in soils, where both factors are interrelated. This case study was performed to show what second step should be done after finding elevated risk element contents in soils before assigning them to human activities. The methods described in this work can be employed in any future unbiased case studies on anthropogenic impacts of industrial activities on agricultural soils in real, naturally complex landscapes under human pressure.

2 Study area and methods

2.1 Study area

The Most Basin is an initial rift formed by the tectonic evolution of central Europe (Rajchl et al. 2008). The rifting started with mafic effusive volcanism, which produced the České Středohoří Mountains (Figs. 1 and S1 to S3). The subsequent basin floor subsidence produced a flat landscape in the area approximately 10 × 40 km, in which peat accumulation and subsequent development produced approximately 30-m-thick coal seam. The uplift of the Ore Mountains (Fig. 1) associated with the rift is still ongoing; it has resulted in exhumation of felsic intrusive and metamorphic rocks, mostly gneiss, with frequent ore veins. Polymetallic ores were extracted in the Ore Mountains (Fig. S1, mines in the mountains). The local soils thus show anomalous contents of As and Pb (Grygar et al. 2023). The soil washes from the steep southeast slopes below the ridge towards the basin floor in Czechia resulted in massive deposition of unconsolidated sediments on the basin floor, the extent of which is shown in Fig. S2. The Most Basin floor was considerably changed by mining and overburden heaps, that is documented by a geological map (Fig. S2) and basin floor topography, showing depressions of former mines and elevated areas of heaps (Figs. S1 and S3). These facts must be taken into account in planning and performing soil sampling.

Fig. 1
figure 1

Map of the study area with major power plants and position of soil sampling

2.2 Geochemical maps of soils based on RKP

The soil geochemistry database RKP consists of two datasets of risk element contents: One subset was produced by cold HNO3 extraction according to older legislation, and another subset was produced by aqua regia (AR) extraction according to actually valid legislation. The former extraction was performed by shaking 10 g dry samples (< 2 mm) for 6 h with 100 mL cold 2 mol L−1 HNO3 (Zbíral et al. 2004). In the later procedure, a 3-g air-dried sample was mixed with 21 mL concentrated HCl and 7 mL concentrated HNO3, left to stand for 16 h, and then boiled under reflux for 2 h. The geochemical maps from the HNO3 and AR subsets were joined on the basis of the classification of heterogeneities in empirical cumulative distribution functions (ECDFs) in those two data series, i.e., using the same approach as in our preceding work (Grygar et al. 2023). Cold HNO3 extracted approximately 20% As, 90% Cd, 50% Cu, 80% Pb, and 30% Zn of the amount extracted by aqua regia (Grygar et al. 2023). Geochemical maps with classified layers were preferred to interpolated maps for RKP visualisation, as substantiated by Appleton and Adlam (2012) and Grygar et al. (2023). Sampling for national soil maps is rather uneven over the mapped surface due to the uneven distribution of agricultural fields. This spatial heterogeneity is particularly obvious in the Most Basin, where the agricultural fields and meadows are present in mosaics separated by anthropogenically altered basin floors and hills of the České Středohoří (Figs. S2 and S3), which is not favourable for spatial interpolation.

2.3 Sampling of soil profiles and sample processing

The positions of the soil depth profiles are shown in Figs. 1 and S3. The target area includes the Most Basin in its coal-bearing heavily industrialised part (north‒west) and coal-free part (south‒west) bordered by hills of the České Středohoří Mountains. The southeastern part of the target area is mainly used for agriculture and is separated from the Most Basin by the hills of the České Středohoří Mountains.

Sampling was carried out at least 50 m from public highways. Sampling on the basin floor is complicated by anthropogenic landscape modifications; thus, each site was preselected with geological maps (to cover all bedrock geologies) and aerial photographs (to evaluate land cover and extent of anthropogenisation). Sampling was avoided if visual examination of the preselected site in field showed any signs of anthropogenic inherences other than tillage, such as topography alterations or concrete or artificial waste material over the land surface or in topsoils. A major portion of soil profiles on the basin floor was taken on sedimentary rocks, which were expected to be relatively homogenous. Additionally, representative soil profiles were obtained from mafic volcanic rocks. Soil profiles on the basin floor were obtained by a soil auger at 10–20-cm-depth steps, and the total thickness of the profile was 60–220 cm depending on the depth of bedrock. Samples of soils derived from the gneiss rocks in the Ore Mountains were sampled by a shovel from the top 30 cm in areas neighbouring the target area (Fig. 1). Soil samples were left to dry under ambient conditions and finally at 50 °C. Dry samples were sieved through 2-mm nylon mesh.

2.4 Analyses of element contents

Total element contents were analysed by XRF of pulverised samples. For the total content soil samples (< 2 mm), the samples were pulverised in a planetary micromill, poured into nylon cells with Mylar foil bottoms, and subjected to XRF analysis using an Epsilon 3X spectrometer (PANalytical, the Netherlands) equipped with an X-ray tube (Ag cathode, up to 50 kV) and a Peltier-cooled large-area Si-drift detector. Single-element calibration of the analytical fluorescence lines was performed with certified reference materials IAEA SL-1, IAEA SL-3, and IAEA-314 Stream Sediment (IAEA, Vienna, Austria), 2702 Inorganics in Marine Sediment, 1646a Estuarine Sediment, 2704 Buffalo River Sediment, 2710a Montana I Soil, and 2711a Montana II Soil (NIST, USA), CTA-FFA-1 Fine Fly Ash (Institute of Nuclear Chemistry and Technology, Warsaw, Poland), BCR 176R, and ERM-CC141 Loam Soil (Institute for Reference Materials and Measurements, Geel, Belgium). Regression coefficients of calibration lines are shown in Table S1. XRF analyses of certified reference materials are shown in Table S2.

For the pseudo-total,content soil samples (size fraction < 2 mm, 0.5 g) were extracted by 6 ml HCl and 2 ml HNO3 in a Multiwave 5000 microwave (Anton Paar) using the European standard “EN ISO 54321:2021: Soil, treated biowaste, sludge and waste—Digestion of aqua regia soluble fractions of elements.” After centrifugation, the supernatant was diluted and analysed using an Agilent 7900 ICP‒MS. The quality of the analysis was checked using ERM–CC141 Loam Soil (Institute for Reference Materials and Measurements, Geel, Belgium). The results of that material analysis are shown in Table S3.

2.5 Data sources

2.5.1 Geographical information systems (GIS)

A geological map of the Czech Republic and a map of mining impacts are available on the server of the Czech Geological Survey (http://www.geology.cz/extranet/mapy/mapy-online/wms). Both cartographic sources were produced by the Czech Geological Survey (CGS) and are available online at www.geology.cz. The online geological map is based on original collections of geological maps at a scale of 1:50,000 (GeoČR50), which also includes information on the Quaternary cover. The legends in geological maps were simplified to major type rocks, as shown in Fig. S2. MS Excel was used for processing rock type data. A digital elevation model (DMR 5G) was obtained from the laser scanning (LiDAR) datasets of the Czech Office for Surveying, Mapping, and Cadastre (ČÚZK) from 2016 (Figs. 1 and S3). MS Excel was also used to prepare input files for GIS from RKP. Data sources were processed to final maps using ArcGIS Pro 3.1.1 software (ESRI, Redlands, CA).

2.5.2 Data mining tools

ECDFs (Sinclair 1976, 1991; Fan et al. 2020; Zhang et al. 2020) were used to define boundaries for soil geochemistry maps with classified layers based on RKP data from national monitoring. Breaks and gaps in the ECDFs separate populations of polymodal distributions and subpopulations of anomalously low or high values (Sinclair 1976, 1991; Reimann et al. 2005; Zuo et al. 2021). Boxplots were employed to examine controlling factors, as they are common tools in exploratory data analysis in geochemistry (Galán et al. 2008; Ander et al. 2013; Vilà and Martínez-Lladó 2015; Fan et al. 2020; Zhang et al. 2020). Geochemical normalisation was used to correct the risk element dilution by quartz, calcite, or organic matter analysis and soil textural effects (Matys Grygar and Popelka 2016). To gain better insight into the data structure and to define the soil provenance control, partial least squares discriminant analysis (PLS-DA) was performed. PLS-DA is a supervised algorithm that combines feature extraction and discriminant analysis in one algorithm and is well suited for high-dimensional data. PLS-DA was performed in the R platform (R Core Team 2021).

3 Results

3.1 Maps of pseudo-total element contents from topsoil mapping

Systematic soil monitoring in the frame of RKP has provided detailed picture of spatial distribution of risk elements in topsoils. Topsoil geochemical maps of the target area show highly heterogeneous distributions of Pb (Fig. 2), Zn (Fig. 3), As, Cd, Cu, and Hg (Figs. S4 to S7). The highest contents of soil As and Pb are found along the NW boundary of the target area, i.e., within the extent of washes from the Ore Mountains (Figs. 2 and S4). Hotspot 1 (Figs. 2 and S4) is practically below the historical ore mines (orange crosses in the Ore Mountains in Fig. S1). Additionally, below the Ore Mountains, there is a cluster of soils with elevated contents of all risk elements northwest of the refinery (hotspot 2 in Figs. 2 and 3). Another cluster of elevated contents of As, Cd, Hg, and Zn (hotspot 3 in Figs. 3S4, S5 and S7) has a shape of arc north‒east to south‒east of the Ledvice power plant. No other prominent hotspots are found in the topsoils in the target area; rather, isolated elevated points are scattered over the rest of the target area. Copper does not show elevated values in the Most Basin, and higher contents of Cu were scattered over the entire area, including the agricultural southeastern part (Fig. S6 in Supplementary Materials).

Fig. 2
figure 2

Map of Pb soil content in the target area constructed from state monitoring RKP. Hotspots are numbered (1 to 3). The dashed line shows the estimated extent of Quaternary washes from the Ore Mountains to the Most Basin floor

Fig. 3
figure 3

Map of Zn soil content in the target area constructed from state monitoring RKP. Other explanation as in legend to Fig. 2

3.2 Interelement relationships in total element contents

The RKP monitoring did not include lithogenic elements and thus it was indispensable to fill this gap and examine a possible geogenic control of the risk element contents. Relationships among lithogenic elements, Al, Si, Ca, Ti, K, and Zr in soils are driven by parent geology, i.e., by provenance, as shown in Fig. 4A. PLS-DA shows three end-members of soil parent rocks: mafic volcanic rocks (Cu-Fe-Ti element association), felsic gneiss (K-Rb-Si association), and carbonaceous sediments (Ca-Sr association). Cretaceous sediments in the target area vary from marls to mudstones; the soils derived from those rocks are decalcified depending on the soil maturity, as shown by the line of sample points spread along the Ca and Sr rays in the PLS-DA plot. Gneiss rocks in the Ore Mountains and basin floor sediments derived from them bear a felsic signature (K-Rb-Si) and show an association with Pb. Mafic rocks of the České Středohoří Mountains are rich in Ti and Fe and show association with Cu. Loess and other Quaternary deposits have been formed by mixing these three main end-members depending on the site position in the basin. Clastic Miocene sediments are situated in the centre of the plot. PLS-DA did not isolate contamination as a single component, as each of the three examined risk elements (Cu, Pb, and Zn) have their own specific geogenic control.

Fig. 4
figure 4

PLS-DA: A total element contents by XRF with soil samples post-stratified according to geological map of the bedrocks in sampling sites. B Pseudo-total contents of risk elements and selected lithogenic elements

Variable quartz and carbonate proportions in soils prompt geochemical normalisation of risk element contents. Scatterplots (Fig. 5) show good performance of Fe as a lithogenic element for geochemical normalisation to correct risk element contents for their dilution by quartz and carbonate. Elements such as Al, Ti, and Rb did not show a clear relation to these three risk elements (not shown). Scatter plots show considerably elevated Pb in the soils developed on gneiss (in agreement with PLS-DA; Fig. 4A). High Pb/Fe was also found in some soils on loess and other Quaternary sediments, which can contain admixtures of dust or washes derived from downslope deposits from the Ore Mountains (the extent of those washes is shown in Figs. 2 and 3). For evaluation of anthropogenic contamination by Pb, the contribution of gneiss-derived soils must thus be taken into account, e.g. the PC2 scores are well-performing (Fig. 4A). The scatter plot for Zn (Fig. 5) shows contamination by this element in some soils developed on loess at contents considerably exceeding those in the soils in the Ore Mountains or whatever other end-member shown in Fig. 6, which excludes the possibility of geogenic control of Zn in basin floor soils from the Ore Mountains. The Cu content variations in soils are clearly related to the Fe content (Fig. 5), as it also reflects a specific contribution of mafic rocks to the geogenically elevated Cu content (Fig. 4A).

Fig. 5
figure 5

Scatter plots of total Cu, Pb, and Zn vs. Fe for all soil samples (from all depths). The black lines are drawn by hand to indicate possible background functions

Fig. 6
figure 6

Pb/Fe ratio plot against the principal component indicative of the Ore Mountains provenance. Only lithogenic elements were used for PLS-DA. Red arrows indicate Pb contamination not related to geogenic factors, while most soils with Pb/Fe above the ECDF threshold have high PC2, indicative of the Ore Mountains provenance. Colour coding of soil bedrock is the same as in Fig. S2. Probability P of null hypothesis of no relationship between variables was obtained by Pearson correlation

For provenance evaluation, PLS-DA was performed again for lithogenic elements only to avoid circularity in the evaluation of Pb variation. PC2 scores for individual samples were insignificantly affected by removing Cu, Pb, and Zn from calculations. Figure 6 shows the Pb/Fe ratio plot against PC2 for lithogenic elements, of which positive values indicate the Ore Mountains (gneiss) provenance of soils, similar to Fig. 4A. The modal maximum of Pb/Fe in all soils from our dataset is 8·10–4, while the mean value for the upper continental crust (UCC) is 4.3·10–4 (Rudnick and Gao 2003), showing overall local enrichment of soils in the Most Basin floor by Pb. All samples with Pb/Fe > 12·10–4 (ECDF threshold) are in topsoils, but some of them bear the Ore Mountain signature, where the median Pb/Fe is 30·10–4.

3.3 Interelement relationships in pseudo-total element contents

XRF is efficient in lithogenic element analysis, its detection limits and selectivity for risk elements other than Cu, Pb, and Zn are insufficient; for this atomic spectroscopy was also used. Aqua regia extracted variable percentages of total element contents, and thus interelement relationships of pseudo-totals must be performed separately. The high fraction was extracted in the case of risk elements Cu (approximately 90%), Pb (approximately 60%), and Zn (approximately 80%) as well as Ca (approximately 100%) and Fe (approximately 70%), while lower contents of lithogenic elements Al (approximately 50% with high scatter), K (approximately 30% with high scatter), and Ti (approximately 10% with high scatter) are AR extractable. Pseudo-total contents of lithogenic elements cannot thus be recommended for provenance analyses due to their weaker relation to the total contents, except for Ca and Fe. Therefore, we selected As, Ca, Cd, Co, Cu, Pb, Fe, and Zn for PLS-DA (Fig. 4B). Plots of the first two components in PLS-DA with AR extracts (Fig. 4B) and XRF total contents (Fig. 4A) are similar. The three longest vectors in Fig. 4B belong to Ca (carbonate-bearing Cretaceous rocks), Co-Fe (with Cu, mafic volcanic rocks), and Pb (in analogy to Fig. 4A attributable to the Ore Mountains provenance). The vectors of Zn and Ca are roughly opposite, similar to the PLS-DA of total element contents. The vector of As is closest to that of Pb.

3.4 Topsoil enrichment in the target area

Topsoil contamination mapping must be accompanied by examination of topsoil enrichment by topsoil to subsoil concentrations of risk elements. Because risk element contents in soils around the Ledvice power plant are dependent on the soil provenance (Figs. 4A and 6), element contents cannot be directly compared among soil profiles without considering lithogenic elements. Figure 7 shows a comparison of average normalised risk element contents from XRF analyses in the top 40 cm (ploughed layer, topsoil) plot against the averages for subsoil at depths of 40–80 cm for individual soil profiles. The three risk elements (Cu, Pb, and Zn) are surface enriched in a considerable portion of soil profiles. The enrichment can have various reasons, including element translocation by natural pedogenic factors (Reimann et al. 2019b), for which two lines were drawn in Fig. 10: the black line 1:1 for no surface enrichment and the grey line for 20% surface enrichment, estimated as maximal natural enrichment based on findings by Reimann et al. (2019b). In some profiles, the topsoils are enriched considerably above that upper estimate, showing possible anthropogenic contamination.

Fig. 7
figure 7

Fe-normalised total risk element contents in topsoil (ploughed) plot against contents in bottom soils in soil profiles developed on sedimentary rocks. The black line is the 1:1 curve for no surface enrichment, and the grey line is the maximum estimated possible natural enrichment by 20%. Colour coding of the parent geology is the same as in Fig. S2

To evaluate the general influence of the Ledvice power plant, topsoil enrichment was also evaluated for the risk elements obtained by AR extraction, as their number is larger than that obtained by XRF. The AR risk element contents were not normalised because of the weaker relation of pseudo-totals with Fe, and thus, the AR raw contents in topsoil were divided by the contents in the deeper soil layer. In Fig. 8, soil profiles were split into two groups: the Ledvice area (< 15 km from the power plant) and elsewhere in the target area. Topsoils in the Ledvice area are slightly more enriched by As, Pb, and Zn relative to the surrounding landscape, in agreement with the maps based on national soil monitoring RKP (Figs. 2, 3 and S4). Topsoils of several profiles near Ledvice are severely contaminated by As or Zn (Fig. 8), but the overall impact of the power plant is rather weak if medians are compared for individual risk elements. Surprising is the wide scatter of Cd enrichment (Fig. 8). A more detailed examination of the content depth profiles in the Ledvice area is thus necessary to understand the Cd pattern.

Fig. 8
figure 8

The ratios of raw pseudo-total contents of risk elements in topsoil (depth of 0–40 cm) to subsoil (depth of 40–80 cm) stratified according to position with respect to the Ledvice power plant

3.5 Examination of soil profiles

Work with entire soil profiles was used to examine provenance control of risk element contents in soils. Because PLS-DA showed opposition of Zn content to carbonate content (Figs. 4A and B), the increase in risk elements associated with a decrease in Ca thus cannot be attributed to anthropogenic contamination. The risk element content growing simultaneously with Fe content only represent lithological effects, not contamination, as it is shown in Fig. 5. The content of Pb reflects the contribution of the materials derived from soils transported from the Ore Mountains, which can be qualitatively evaluated from the PC2 scores similar to those in Fig. 4A but in the element subset without the risk elements to avoid circularity in argumentation.

Figure 9 shows depth profiles of selected element contents or their ratios in Ledvice village, 1 km from the power plant (Fig. S3). Two profiles were sampled there; their topsoil enrichments by Zn were highest in our collection (Fig. 8). The profiles did not show much contribution from the Ore Mountains provenance; thus, natural factors are not expected to alter the depth profiles. The topsoil enrichment by Zn and Pb is typical for emission load, but the Cd depth profile is surprising: It shows a subsurface peak of almost 3 mg kg−1 apparently on top of the loess stratum with elevated carbonate content (Fig. 9, depth between 50 and 70 cm). The Ledvice profile is thus extremely contaminated not only by Zn (300 mg kg−1) but also Cd; however, the latter element is washed down within the soil column and weakens the surface contamination signal in the RKP monitoring. This Cd subsurface peak also weakens the applicability of topsoil/subsoil (T/B) concentration ratios (Fig. 8). The reason why Cd has been translocated to deeper strata is the relatively high mobility of this element in soils (Liénard and Colinet 2016).

Fig. 9
figure 9

Depth profile with total contents of lithogenic elements (Ca, PC2, and Fe) and pseudo-total contents of risk elements in Ledvice village

Figure S8 shows a profile situated near the boundary of hotspots 2 and 3 (Figs. 2 and 3; the position of the profile is shown in Fig. S3). This profile also documents the provenance control of risk elements just beside the downslopes of the Ore Mountains. The entire profile is characterised by high PC2 and Fe and low Ca, typical for the Ore Mountains (gneiss) provenance; in agreement with this, even the bottom soil in Jeníkov has very high As and Pb contents, and the profile shows little surface enrichment relative to the depth.

Figure S9 shows the profile with the largest As enrichment in topsoil relative to subsoil (Fig. 8). The profile is 6 km from the Ledvice power plant (Fig. S3), but it is within the extent of washes from the Ore Mountains (Fig. 2). The growth of As (as well as Pb) towards the top is accompanied by considerable growth of PC2 and Fe and considerable decline of Ca, which can also be attributed to the soil provenance change rather than anthropogenic contamination.

4 Discussion

4.1 Real impact of coal burning on soil contamination in the Most Basin

Identification of anthropogenic contamination in the Most Basin is complicated by the contribution of the washes (colluvial and alluvial sediments) from the Ore Mountains, which have geogenically high As and Pb and elevated Zn relative to the Cretaceous sediments on the Most Basin floor (Fig. 4A and B) and relative to the Czech and global backgrounds (Vácha et al. 2002; Grygar et al. 2023; Pavlíková 2023). The minimal extent of those washes on the basin floor (dashed line in Figs. 2, 3 and S2) is a conservative estimate based on topsoil lithology from geological maps and basin topography; the real extent is likely larger due to further spread of loess over the basin floor in the Pleistocene. Due to the Ore Mountains provenance, the impact of the power plant emissions on As and Pb in the soil of the basin floor would be low compared to the geogenic contribution.

The impact of coal burning on basin floor soils is unequivocal for Hg, Cd, and Zn in hotspot 3 (Figs. 3, 9, S5 and S7) east of the Ledvice power plant. Topsoil contamination by Zn around the Ledvice power plant is also shown in Fig. 9 and 10, where topsoil enrichment relative to subsoil is shown for total Zn content and total Zn/Fe ratio for the soil profiles sampled for this work. Interestingly, no such hotpot was found around other power plants situated in the southeastern part of the Most Basin (the position of power plants is shown in Fig. 1), partly because not all power plants are surrounded by agricultural soils. Notably, the general topsoil contamination in the Most Basin is not as dramatically larger than elsewhere in the target area (Fig. 8), except for the fields just in the village of Ledvice (Fig. 10).

Fig. 10
figure 10

The ratios of total contents of Zn (A) and total Zn/Fe (B) in topsoil (0–40 cm) to subsoil (depth 40–80 cm) plot against the distance of the soil profile to the Ledvice power plant. The grey rectangle indicates the extent of Zn emission contamination. Probability P of null hypothesis of no distance effect was obtained by Pearson correlation

4.2 Importance of provenance and sampling depth

The element composition of soils depends on the soil-forming rocks (Galán et al. 2008; Amorosi et al. 2014; Vilà and Martínez-Lladó 2015; Négrel et al. 2021; Grygar et al. 2023). It could seem simple to post-stratify soil datasets according to the geological map and thus correct the geochemical data for the soil provenance, but the real situation can be more complicated, such as what we show in the Most Basin. Basic geological maps, including the Czech Geological Map 1:50,000, used in this work, do not contain information on the provenance of loess and other Quaternary sediments relevant for soil composition (Grygar et al. 2023). Loess and other sediments formed in the Quaternary by surface transport by alluvial and aeolian processes can be derived from anomalous rocks and can thus have anomalous risk element contents even without a clear relation to the actual bedrock, as was also observed elsewhere in Europe by Labaz et al. (2019). For full implementation of soil provenance effects on risk elements, highly detailed, specialised soil and stratigraphic maps must be used based on all actual RCFs (Amorosi et al. 2014), but these are usually not available. The rational solution would thus be deciphering local anomalies; e.g. in the Most Basin, PCA with lithogenic elements can prove the geochemically anomalous soil components from the Ore Mountains and the Cretaceous sediments. The provenance change within soil profiles can be identified by simple lithogenic element proxies for those anomalies (Fig. 4A, PC2 and Ca content, respectively) and their changes simultaneous with the risk element depth profiles, as shown in Figs. 9, S8, and S9. This needs the implementation of expert-based knowledge instead of automated statistical routines, as any such routine could reliably work only with RCFs.

Top/bottom approach can also fail in identification of emission load. Comparison of the characteristics of topsoils and deeper soils is considered efficient in distinguishing emission contamination and anomalous bedrock (Baize and Sterckeman 2001; Lienard and Colinet 2016; Fan et al. 2020; Pasieczna et al. 2020). The top/bottom approach was also used in the form of a comparison of ECDFs for those two data series (Fabian et al. 2017; Reimann et al. 2019a, b; Fan et al. 2020; Reimann and Fabian 2022). This approach is based on the limited vertical mobility of risk elements in soil profiles (Lienard and Colinet 2016). However, the choice of sampling depths can be more critical than could be expected. The thickness of topsoils contaminated by emissions is expected to be maximally approximately 50 cm (Baize and Sterckeman 2001; Fan et al. 2020; Protano et al. 2021) depending on the permeability of soils (rate of eluviation), intensity of bioturbation, and depth of tillage. The soil profiles on the loess in the Most Basin show that the emission input can be translocated to deeper strata (Fig. 9): here, shallow sampling would not fully reveal the past emission load of Cd. The depth of subsoils can also reflect local conditions and should be planned carefully. For example, Pasieczna et al. (2020) sampled subsoils at 80–100 cm, whereas Fan et al. (2020) and Zhang et al. (2020) sampled subsoils even deeper at 150–180 cm. This approach can be limited by dissimilarity of active soils, with the A horizon most exposed to plant action of element translocations (Reimann et al. 2005; Labaz et al. 2019). Soil profiles can include genetically distinct strata, where subsoils are not adequate analogues to topsoils (Labaz et al. 2019). Generally, the deeper the reference soil sample is, the larger the probability that it can have a different provenance than topsoil. In one profile in the Most Basin, the soil provenance change occurred already below the ploughed horizon (Fig. S9), where the T/B approach would produce misleading results without examining provenance proxies. In studies aimed at reliably deciphering anthropogenic impacts in contaminated soils, entire soil profiles should be sampled and interpreted individually with local reference profiles, such as those demonstrated in studies by Baize and Sterckeman (2001) and Syu et al. (2023).

4.3 Difference in using total and pseudo-total contents

The soil provenance in any study area can be deciphered from lithogenic elements such as Al, Si, Na, K, Ti, Fe, and Zr (Galán et al. 2008; Vilà and Martínez-Lladó 2015; Négrel et al. 2021). Acid digestion without HF, including AR, only retrieves a certain portion of lithogenic elements, varying with actual soil provenance (Vilà and Martínez-Lladó 2015; Négrel et al. 2021). The AR extracted contents of lithogenic elements is only weakly related to the total element contents (Sect. 3.3). However, contents of lithogenic elements for provenance tracing were found to be indispensable in the Most Basin (Fig. 4A), but they have not yet been implemented in geochemical soil mapping and interpretation of local geochemical maps despite considerable preceding interest in NW Czech soils (Vácha et al. 2015; Bednářová et al. 2016; Skála et al. 2022; Boahen et al. 2023). Without analysis of lithogenic elements, misleading conclusions on the human impact on the Most Basin soils are inevitable.

Relevant topics of environmental geochemistry data processing are geochemical normalisation, construction of background functions (Fig. 5) and similar use of relationships between risk and lithogenic elements (Fig. 10). Geochemical normalisation is an empirical way to decrease the natural variability of risk element contents in sediments (Covelli and Fontolan 1997; Grygar and Popelka 2016; Lučić et al. 2023). Geochemical normalisation is also efficient for soils derived from sedimentary rocks (Baize and Sterckeman 2001; Hamon et al. 2004; Bravo et al. 2019); however, its performance must always be checked empirically (Zhao et al. 2007; Grygar and Popelka 2016). Figure 10 shows the beneficial impact of geochemical normalisation: Two profiles with high topsoil Zn enrichment also show lithological differences between topsoils and bottom soils (Orasice 1 and Milešov 3). This change was corrected by Fe normalisation due to the general association of Zn and Fe (Fig. 5), and the impact of the Ledvice power plant is thus clearer after normalisation (Fig. 10). The use of element ratios is generally consistent with fundamental requirements of compositional data processing (Greenacre 2019; Hron et al. 2021) and should thus also be preferred from a mathematical perspective.

However, the use of geochemical normalisation for pseudo-total contents (Zhao et al. 2007; Pandey et al. 2016; Pasieczna et al. 2020) is questionable, although it is frequently used. The interelement relationships in pseudo-total contents are weakened by scatter in element extractability. Geochemical normalisation and the difference between total and pseudo-total element contents in the data evaluation are demonstrated in Fig. S10, where ECDFs of Zn and Zn/Fe ratios for the soils derived from loess in the target area are compared. The purpose of geochemical normalisation is to “sharpen” the main population and thus improve identification of concentration anomalies by correcting element contents for the scatter of their lithologies and matrix dilution (Covelli and Fontolan 1997; Hamon et al. 2004; Grygar and Popelka 2016; Bravo et al. 2019; Lučić et al. 2023). This “sharpening” is, however, more efficient with total contents (left column in Fig. S10), which produces two clearly defined concentration populations and gap separating high outliers, as desired for successful data exploration (Sinclair 1976, 1991). The pseudo-total element concentrations do not produce so clear data structure (right column in Fig. S10). If total element contents and lithogenic elements were available in soil monitoring, the detection of contamination would be more sensitive and correct that is undoubtedly the major requirement in future environmental geochemistry studies.

5 Conclusion

Conventional soil mapping, such as RKP, which has been performed in the frame of Czech national legislation, has provided a valuable preview on the soil contents of risk elements at large spatial scales and coverage of agricultural soils at affordable costs. Analogously, only topsoils or unstratified sediments are sampled, and acid extractions as in RKP are being performed in many other national soil and sediment mapping projects. In the hotspots of soil risk elements, found by those monitoring projects, particularly in places where industrial activities have been inherently related to local geogenic anomalies, more detailed sampling and holistic data mining must be employed to separate natural and anthropogenic controls of risk element contents. We above show how to perform sampling, analyses, and data mining to achieve this understanding: complete soil depth profiles must be sampled, equal attention must be paid to risk elements and lithogenic elements, and soil profiles must be evaluated individually with respect to the actual local conditions, such as provenance variability and contaminant translocations in soil profiles. The routinely obtained topsoil maps based on RKP or other common project thus only provide the first step by revealing the areas of further interest; then, a holistic approach to the individual target areas is needed with methods tailored to the actual local situation. Reliable deciphering of anthropogenic contamination must be based on expert-based approaches and understanding of local specificities. It is very likely that many areas considered anthropogenically contaminated in previous studies are in fact the result of previously underestimated natural variability. The relevance of unbiased studies is growing in the current period of environmental concerns, because overestimation of the anthropogenic impacts on soils could result in unsubstantiated hampering of industrial activities needed for sustainable future of mankind.