Prediction of cadmium content using machine learning methods

Keçeci, Mehmet; Gökmen, Fatih; Usul, Mustafa; Koca, Celal; Uygur, Veli

doi:10.1007/s12665-024-11672-5

Prediction of cadmium content using machine learning methods

Original Article
Open access
Published: 27 May 2024

Volume 83, article number 362, (2024)
Cite this article

Download PDF

You have full access to this open access article

Environmental Earth Sciences Aims and scope Submit manuscript

Prediction of cadmium content using machine learning methods

Download PDF

277 Accesses
Explore all metrics

Abstract

Heavy metals are the most environmentally hazardous pollutions in agricultural soils, threatening humans and several ecosystem services. Cadmium (Cd) is a highly toxic element but distinctively different from other heavy metals with its high mobility in soil environments. The study aimed to evaluate the Cd concentration of soils in the Konya plain with a specific attribute to soil fertilization, mainly phosphorous fertilizers. A total of 538 surface (0–20 cm) soil samples were analyzed to determine basic physical and chemical properties and total phosphorus (P) and Cd concentrations. Descriptive statistics, machine learning, and regression models were used to assess the accumulation of Cd in soils. Decision Trees, Linear Regression, Random Forest, and XGBoost machine learning methods were used in Cd prediction. The XGBoost model proved to be the best prediction model, with a coefficient of determination of 98.1%. Electrical conductivity, pH, CaCO₃, silt, and P were used in the Cd estimation of the XGBoost model and explained 56.51% of the total variance in relation to measured soil properties. The results revealed that a machine learning algorithm could be useful for estimating Cd concentration in soils using basic physical and chemical soil properties.

Pollution indices as useful tools for the comprehensive evaluation of the degree of soil contamination–A review

Article Open access 05 April 2018

Sources of Soil Pollution by Heavy Metals and Their Accumulation in Vegetables: a Review

Article Open access 06 July 2019

Remote sensing, artificial neural networks, and spatial interpolation methods for modelling soil chemical characteristics

Article 04 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The presence of cadmium (Cd), a heavy metal, in soil presents a substantial threat to agricultural productivity. The increasing deposition of heavy metals in soil and the environment has been attributed to several factors such as the burning of fossil fuels, mining operations, non-standard use of pesticides and fertilizers, release of municipal effluents, transportation, and other industrial processes (Vaverková et al. 2019). These activities have contributed to ecological imbalances. The proliferation of industrialization and the adoption of intensive farming practices have expedited the accumulation of heavy metals in agricultural regions, resulting in a range of soil-related problems including land desertification, soil pollution, reduction in soil fertility, soil erosion, and salinization (Nosrati and Collins 2019; Zhang et al. 2020). The presence of heavy metal contamination in soil poses distinct difficulties because of its prolonged persistence, irreversibility, restricted movement, elevated toxicity, and intricate chemical makeup (Tsai et al. 2019; Dhaliwal et al. 2020; Zhao et al. 2023). Non-point heavy metal contamination in agricultural soils is mostly attributed to the extensive use of fertilizer and pesticides in agricultural production (De Souza et al. 2016; Kurwadkar 2019; Dong et al. 2020). The buildup of heavy metals in agricultural soils is attributed to the use of phosphorous and biologically derived fertilizers that include these heavy metals (Chao et al. 2019). On the other hand, the abuse of various chemicals and the utilization of fertilizers and pesticides with acidic properties in agricultural settings has resulted in soil acidification in numerous regions from a physiological standpoint (Wang et al. 2019a; Xu et al. 2020). Soil acidity has been worsened by the misuse of different chemicals and the use of acidic fertilizers and pesticides in agricultural environments, which has led to increased solubility and mobility.

New methods are developed to reduce the heavy metal content of fertilizers, while phosphorus fertilizers may still contain high amounts of heavy metals due to the content of phosphate rocks used in production (Nicholson et al. 2003; Lambert et al. 2007; Nziguheba and Smolder 2008; Kacar 2013). The superphosphates, double superphosphates, triple superphosphates and phosphorus compound fertilizers produced using phosphoric and sulphuric acid or wet combustion method contain higher potential hazards (Kacar 2013). However, the P fertilizers using phosphoric acid produced through dry combustion contain negligible Cd content (Kacar 2013; Lopez-Valdez and Fernandez-Luqueno 2014). The other nitrogenous and potassium fertilizers consist of relatively lower Cd content (Senesi and Polemio 1981; Molina et al. 2009; Jiao et al. 2012; Benson et al. 2014).

While descriptive explanations can be made with traditional statistical methods, regression models such as machine learning can make more meaningful descriptions of the data set. The high heterogeneity of agricultural soils reduces the prediction rates of traditional regression models (Padarian et al 2019). Machine learning models offer solutions to multivariate and complex structures unlike traditional regression models. Machine learning has predicted environmental processes and assessed pollutant toxicity using existing data (Gao et al. 2021; Yang et al. 2021). Cadmium, a heavy metal, poses a significant threat to ecosystem services. Competent tools such as machine learning are needed to model the behaviour of these threats in complex structures such as soils (Huang et al. 2023).

The previous studies have been primarily focused on potential sources of Cd with some other heavy metals in contaminated soils urban areas (Yadav et al. 2019; Wang et al. 2019b; Vural et al. 2021), industrialized and mining areas (Koca 2019; Wanhong et al. 2020), agricultural lands (Chakaraborty et al. 2019; Ozkan and Uygur 2019; Taspinar et al. 2022; Bayraklı et al. 2023; Gunal et al. 2023), and urban parks and public playgrounds (Luo et al. 2019; Yesil and Yesil 2019). Although there is little research on machine learning modelling specifically for lead in soils (Huang et al. 2023), there are researches on modelling heavy metals (Hu et al. 2020; Zhang et al. 2020, 2021; Zhao et al. 2023). The progress in machine learning techniques has shown substantial benefits in enhancing the precision of forecasts. For instance, Zhang et al. (2020) discovered that the presence of As, Zn, Pb, Hg, Ni, Cu, and Cr in the soil indicates that support vector machines (SVM) and random forest (RF) models may effectively predict heavy metals in urban soil. In their study, Zhang et al. (2021) examined the presence of As, Zn, Pb, Hg, Ni, and Cu in soil. They found that random forest models can effectively predict urban soil heavy metal levels. Hu et al. (2020) conducted a comparison of random forest, gradient-boosted machine, and generalized linear models. Employing ten machine learning methods to forecast the absorption of Cd, Hg, As, and Pb in rice grain. In Zhao et al. (2023) study, the extremely randomized tree model had superior performance in predicting rice-Cd and rice-Hg levels, but the random forest model yielded the best results for As and Pb. Given its classification as heavy metal, Cd presents substantial hazards and demonstrates a wider spectrum of pollution caused by agricultural practices, mainly fertilization. Therefore, it is crucial to thoroughly examine its presence in soil systems in relation to routinely analysed soil properties. However, there is a shortage of research that employs machine learning approaches to forecast the concentration of accessible Cd in soil.

The majority of traditional statistical methods for predictive modelling rely on the assumption of a Gaussian relationship between the response variable and predictors. Additionally, most methods assume that the impact of interactions between predictors on the response variable is consistent throughout the whole range of predictor values. Both assumptions are unjustified in the majority of circumstances. However, the use of a quadratic function to describe Gaussian responses in machine learning models has often resulted in strong prediction accuracy. Computer technology and data science developments offer the opportunity to make strong predictions with machine methods on multivariate and non-homogeneous soil sites. More detailed data on the most dynamic and mobile of the heavy metals, Cd, regarding farmland soils of Konya plain, the primary producer of wheat in Türkiye, are still scarce on a very large scale. In this study, we aimed to predict Cd concentration of soils using machine learning and regression models using basic soil properties.

Material and methods

According to the Köppen classification, Konya province has a cold semi-arid climate (BSk), while the Trewartha classification categorizes the climate as a temperate continental (Dc) (Peel et al. 2007; Yılmaz and Çiçek 2018). The mean temperature during the summer season is approximately 30 °C, characterized by chilly nights, while the average temperature in winter is − 4.2 °C. Annual average precipitation is low (325 mm) and primarily occur during the winter and spring seasons.

This study focuses on the Great Konya Basin which is an important basin for agricultural in Türkiye. The region is known for its diverse fruit production, particularly cultivating sweet cherry, apple, peach, and other fruits. Additionally, the area is suitable for the growth of cash crops such as corn and clover, provided that irrigation water is accessible or wheat and other drought-resistant cereals in the rain-fed areas (Fig. 1). The consistent use of commercial fertilizers with some organic fertilizers, has proficiently sustained the yields of the orchards. However, due to freshwater scarcity in the region, a very large portion of the agricultural lands are under a low-input rain-fed agricultural system. Therefore, drought-resistant cereals such as wheat, barley, etc. are the most common field crops in the area.

The altitude of the Great Konya Basin is approximately 1000 m. The influx of multiple rivers has influenced the Great Konya Basin, leading to the significant role of lacustrine carbonate formation and diagnosis in determining the physicochemical and mineralogical characteristics of the soils (de Meester 1971; Ozaytekin et al. 2012). The soils were primarily formed on Quaternary sediments, de Meester (1970a, b). These sediments were deposited in a shallow lake during the Late Pleistocene era, forming several sandy beach ridges and sand plains. The presence of diverse sediments led to the formation of different physiographic units, including uplands, colluvial slopes, piedmont plains, bajadas, terraces, alluvial plains, lacustrine plains, and soft-calcareous lake bottoms (de Meester 1970a, b). The flat Neogene limestone terraces are situated in the peripheries of the Konya Basin. The terrain has gradual inclines leading towards the central region, which has been subject to localized fragmentation due to the presence of erosion gullies. The southern region of the basin is characterized by alluvial plains and fans that are composed of sediments ranging from coarse sand to clay textures (Roberts et al. 1979; Roberts 1983).

Soil sampling

Soil sampling locations distributed within the study area is shown in Fig. 1. Soil samples were collected from the 0–20 cm depth. The samples were homogenized using standard soil ploughing techniques. Total of 538 soil samples were random collected (Fig. 2), of which 357 were from agricultural fields (mainly cereals), 85 from irrigated corn and fruit orchards, 47 from non-agricultural areas, 41 from pastures, and 8 from inside the city (Fig. 1).

The soil samples were air-dried and were subsequently sieved through a 2 mm sieve. The physical and chemical attributes of the soil were evaluated through conventional techniques applicable to calcareous soils (Sparks 1996; Kacar 2009). The soil properties that were ascertained include the mechanical analysis, which involved using a hydrometer method to quantify the proportion of soil particles of clay, silt, and sand sizes. The saturation paste was used to determine the electrical conductivity EC (EC was used to calculate total soluble salt) and pH levels, which were measured using an EC probe and a combined pH electrode (Rhoades 1996). Cation exchange capacity (CEC) was determined by sodium acetate (1 M, pH 8.2) method (Sommer and Miller 1996). The organic matter content was determined using the dichromate oxidation method of Walkley–Black (Nelson and Sommer 1996), and calcium carbonate content was determined with a pressure calcimeter (Loeppert and Suarez 1996) while the available phosphorus was measured using the NaHCO₃ method (Kuo 1996). The soil samples were solubilized using a solution of aqua regia with a volumetric ratio of 3:1 of HNO3 to HCl. Subsequently, the concentrations of phosphorus (P) and cadmium (Cd) in the resulting digests were analyzed using Inductively Coupled Plasma Optical Emission Spectroscopy (ICP-OES) with a PerkinElmer Optima 2100 instrument, following the guidelines outlined in the United States Environmental Protection Agency (USEPA 1996) protocol.

Statistical procedures

The descriptive statistics of all soil properties from each observation point were presented as mean, median, standard deviation, variance, skewness, kurtosis, range, minimum, maximum, and coefficient of variation (Table 1). Then the relationship between measured soil parameters was determined using Spearmen correlation.

Table 1 Descriptive statistics for measured soil properties

Full size table

The data was initially partitioned into two distinct sets, namely the training set, which constituted 70% of the data, and the testing set, which comprised 30% of the data. The model was trained using the first 70% of the measured soil properties. The training dataset was partitioned into two subsets, namely the learning and validation datasets, with a random split of 70% and 30%, respectively. These subsets were utilized to train a variety of machine learning and regression models. The use of Cd facilitated the selection of optimal parameters for a variety of machine learning and regression models through a grid search approach in parameter space. The final model was chosen based on the root mean squared error (RMSE) value, with preference given to the model with the lowest RMSE. All statistical analyses were conducted utilizing Rstudio version 4.3.0 software (R Core Team 2023).

Machine learning algorithm

The multivariate adaptive regression splines (MARS) algorithm was introduced by Friedman (1991) as a nonparametric machine learning technique that effectively identifies pattern challenges in both classification and regression tasks, especially for data that exhibit nonmonotonic or nonlinear characteristics. The MARS model is employed to generate a set of functions that represent linear regression that have the ability to predict the values of the continuous variable of interest.

Decision trees are algorithms that are based on trees and are utilized to assess a quantitative feature, as Ali et al. (2015) noted. Breiman et al. (1984) developed the classification and regression trees (CART) procedure to fulfill this particular objective. The CART algorithm is a type of tree model that employs a binary approach, wherein a node is recursively divided into two child nodes. The algorithm involves an iterative procedure for acquiring a group of uniform nodes from a dataset used for learning, aiming to reduce the error variance across both the training and test sets.

The random forest approach employs a nonparametric algorithm for analyzing data, which was first introduced by Breiman (2001). This technique is capable of handling tasks such as classification and regression. The application of regression trees in diverse amalgamations. Constructing regression trees by utilizing a subset selected randomly from predictors is feasible. The bootstrapping methodology is implemented to choose a subset of characteristics randomly. As a result, in the random forest algorithm, each individual regression tree assigns a unique group of predictors to its root, internal, and leaf nodes. Svetnik et al. (2003) proposed that the mean of tree results situated at the nodes that correspond to the leaves can be used to estimate the dependent parameters prediction values.

Friedman (2001) introduced XGBoost as a proficient and expandable algorithmic implementation grounded on gradient boosting. The XGBoost methodology is founded on the gradient-based tree technique. Furthermore, the method known as XGBoost is a tree-based regression algorithm that utilizes the same decision rules as the decision tree algorithm. Yu et al. (2020) state that the XGBoost employs a collection of tree structures for classification and regression to construct an expression that can proficiently accommodate the training dataset. Moreover, XGBoost exhibits the potential to exploit sparsity and tackle the problem of overfitting in the dataset by incorporating shrinkage and regularization methodologies (Gertz et al. 2020). In the training phase, XGBoost employs decision trees to distinguish between two cohorts and identify the variables that can improve the model's effectiveness. Furthermore, it is customary to prioritize computational efficiency rather than incorporating superfluous variables, as Gertz et al. (2020) stated. The primary objective of this procedure is to construct an ensemble of decision trees that exhibit high variance and low bias.

Model comparison criteria

The metrics frequently utilized for model comparison include root mean squared error (RMSE), standard deviation ratio (SDR), coefficient of determination (R²), and coefficient of variation (CV) (Table 2). In the assessment conducted to determine the adequacy of fit, it is necessary to consider many factors. These include low values of root mean square error (RMSE), coefficient of variation (CV), and standard deviation ratio (SDR). Conversely, high values of coefficient of determination (R²) and correlation coefficient (r) are desirable. The EhaGoF package, which was developed by Eyduran (2019), was utilized to compute the goodness of fit criteria.

Table 2 Assessing the Performance of the Models used to estimate Cd concentration of soils

Full size table

Results and discussion

Correlation analysis was used to determine the relationship between Cd concentration in soils and soil properties including EC, pH, total phosphorus (P), organic matter (OM), calcium carbonate equivalent (CaCO₃), available phosphorus (P₂O₅), cation exchange capacity (CEC), sand, silt, and clay (Fig. 3). The correlation test facilitates comprehension of the impact of variables on Cd levels in soil and the selection of appropriate variables for application in various machine learning and regression models. Cadmium concentration of soils has a significant correlation (P < 0.01) with EC (r = 0.16), pH (r = 0.17), P (r = 0.26), P₂O₅ (r = 0.22), CEC (r = 0.13), Silt (r = 0.16), Clay (r = 0.22), CaCO₃ (r = − 0.28), and sand (r = − 0.26). Besides soil organic matter, all of the soil properties have significant correlations, which different mechanisms may explain. The strong relation between the electrical conductivity and Cd concentration may be attributed to the relatively high mobility of both salt and Cd in the soil environment (Kabata-Pendias 2011). On the other hand, the irrigated agricultural lands have been under increasing salinity threat (Usta 1999) which also requires larger amounts of fertilizer input than the typical rain-fed management system. The CEC and clay content are related to Cd sorption, which results in accumulation.

The concentration of heavy metals in soils is influenced by various factors, including pH, soil organic matter, cation exchange capacity, and clay content (Alloway 2012). The study conducted by Chavez et al. (2015) examined the correlation between Cd and soil properties. The findings indicated a strong association between Cd, clay, CEC, pH, P₂O₅, and P. According to Seshadri et al. (2016), applying phosphorus fertilizers, an agricultural practice has been identified as a significant contributor to soil Cd contamination. The present findings indicate that the variables of EC, pH, P₂O₅, P, CEC, silt, and clay exhibit a positive correlation with Cd, which is consistent with the existing literature. The reaction between the lime present in the environment and Cd results in the formation of an insoluble compound (Sø et al. 2011). According to Cattani et al. (2008), the incorporation of CaCO₃ resulted in a 25% decrease in the Cd content of rice. The negative relationship between sand content and Cd concentration can be attributed to reduced surface area and ease of leaching. In contrast, the negative correlation observed in lime content is purportedly linked to a relative decrease in the portion of high sorption capacity soil components such as clay and sesquioxide minerals and incorporation of raw parent material due to shallow soil depth at hilly topography (de Meester 1971; Ozaytekin et al. 2012).

Chemometric relations of cadmium

Model performance assessment from the MARS, decision trees, linear regression, random forest, and XGBoost are given in Table 3. According to Grzesiak and Zaborski (2012), the RMSE and SDR values were near zero; Wilding et al. (1994), the coefficient of variation (CV) values were less than 10%, while the R-squared (Rsq) values approached 100, the models can be successfully used to the prediction processes. Hence, the XGBoost algorithm emerged as a dependable machine learning technique for accurately forecasting the Cd concentration, as evidenced by the results presented in Table 3. The XGBoost model exhibited a strong positive correlation between predicted and actual values, as depicted in Fig. 4. The findings suggest that the XGBoost algorithm demonstrated a proficient ability to make accurate predictions. The results revealed that the XGBoost machine learning model has excellent predictive ability, which is indicated by an extremely high coefficient of determination (0.981) predictive values compared to other machine learning models (Fig. 4, Table 3).

Table 3 Model performance assessment of relationships between soil properties and Cd

Full size table

Figure 5 presents the impact of the explanatory variables utilized in the XGBoost model on Cd. The EC (12.77%) and pH (11.78%), parameters related to solubility and equilibrium in soils, are important in predicting Cd by the XGBoost model. The parameter P₂O₅, which is the available phosphorus increase upon the application of phosphorus fertilizers, which are given for the sustainability of agricultural production, and which contain trace amounts of Cd, is important in the estimation of Cd (11.30%). In the study area, a former lake bottom, the physical parameters silt (10.25%), and CaCO₃ (10.25%) are also important. It is observed that parameters such as EC, pH, silt, and CaCO₃, which can be counted from the genetic characteristics of soils, play a dominant role in Cd estimation. At the same time, available phosphorus and the total P related to fertilization, controlled by anthropological factors totaling up to 21.19% of the variance, take the first place in Cd estimation. These five parameters totaled a 56.51% variance in the estimation of Cd by the XGBoost model. Indeed, there could be a significant contribution from the topography as a driving force, but this study did not consider it.

The machine learning method, which started to be used in modelling the inputs of natural ecosystems, has not yet reached the necessary importance in soil science. In the study by Tırınk and Öztürk (2023), predictions were made using MARS and XGBoost algorithms to predict particles of PM10 particle size. Faraz et al. (2023) study the body weight of animals estimated using MARS and XGBoost algorithms. In their study, Demirel et al. (2023) estimated wheat grain yield using MARS and CHAID algorithms. Eren et al. (2023) modeled of DNA methylation in vitro conditions using machine learning methods. The literature research conducted on the Web of Science platform on May 16, 2023, retrieved 19 scientific papers concerning XGBoost in the field of soil science. Most of these publications have been utilized to create models based on data acquired from spectrophotometric techniques, such as VNIRS and/or remote sensing predictions (Andrade et al. 2020; Mahmoudzadeh et al. 2020; Abedi et al. 2021; Ågren et al. 2021; Nguyen et al. 2021; Xu et al. 2022). The publications obtained through online open-source searches demonstrate a certain level of resemblance regarding the subject matter (Chen et al. 2022; Ge et al. 2021; Dhaliwal et al. 2022; Ye et al. 2021; Budak et al. 2023). The study of Budak et al. (2023) demonstrated that boosting algorithms produced the most resilient reliability estimates. Similarly, Gokmen et al. (2023) employed the XGBoost machine learning methodology to determine the available boron content in soil science with a similar approach, yielding high reliability with an R² coefficient of 0.997. In another study, Huang et al. (2023) investigated the performance of various models under different constraints. Their findings indicate that the post-constraint XGBoost model exhibits a high level of accuracy (R² = 0.81) in predicting the concentration of soil available Cd. Notably, this model successfully maintains the predicted values within the theoretical boundaries. The impact of explanatory variables of the XGBoost model on the concentration of Cd is primarily constituted by parameters that demonstrate significance in the correlation analysis. The XGBoost regression model is expected to exhibit high reliability in estimating total cadmium in soils.

Conclusions

This study tested several effective machine learning and regression models that provide a new perspective on the consistency of Cd levels and soil properties. The developed XGBoost machine learning model is expected to help identify Cd, assess economic and environmental impacts in a variety of local settings, and guide decision-making by environmental policymakers with a relatively limited number of soil tests. For farmers, this study suggests that phosphorus fertilization should be done with caution, taking into account solubility and equilibrium relationships based on easily measurable soil parameters such as pH and EC. For legislators and administrators, phosphorus fertilization strategies should be examined more carefully when preparing national fertilization programs. Especially in arid and semi-arid areas, fertilization strategies that are in excess of crop needs should be avoided. The present research evaluated the Cd levels of Konya Plain soils. The XGBoost algorithm, a machine learning model, can detect Cd concentration in soils better than other methods using measured soil properties. This paper presents new insights from the use of XGBoost in a relevant environment. The insightful findings guide further studies. The results hold great promise for environmental and interdisciplinary research planning in this field. Due to the infinity of soil properties, topography, agronomic management and cover crop differences, it should be investigated whether this research can be applied to other ecosystems with different sizes, crops and other varieties to improve prediction accuracy.

Data and Code availability

To reach the data and code, please contact the author Fatih Gökmen.

References

Abedi F, Amirian-Chakan A, Faraji M, Taghizadeh-Mehrjardi R, Kerry R, Razmjoue D, Scholten T (2021) Salt dome related soil salinity in southern Iran: prediction and mapping with averaging machine learning models. Land Degrad Dev 32(3):1540–1554
Article Google Scholar
Ågren AM, Larson J, Paul SS, Laudon H, Lidberg W (2021) Use of multiple LIDAR-derived digital terrain indices and machine learning for high-resolution national-scale soil moisture mapping of the Swedish forest landscape. Geoderma 404:115280
Article Google Scholar
Ali M, Eyduran E, Tariq MM, Tirink C, Abbas F, Bajwa MA et al (2015) Comparison of artificial neural network and decision tree algorithms used for predicting live weight at post weaning period from some biometrical characteristics in Harnai sheep. Pakistan J Zool 47(6):1579–1585
CAS Google Scholar
Alloway BJ (ed) (2012) Heavy metals in soils: trace metals and metalloids in soils and their bioavailability, vol 22. Springer Science & Business Media
Google Scholar
Andrade R, Silva SHG, Weindorf DC, Chakraborty S, Faria WM, Mesquita LF et al (2020) Assessing models for prediction of some soil chemical properties from portable X-ray fluorescence (pXRF) spectrometry data in Brazilian Coastal Plains. Geoderma 357:113957
Article CAS Google Scholar
Anonmous (2024) Corine maps database. https://corinecbs.tarimorman.gov.tr/corine. Access Time 09.01.2024.
Bayraklı B, Dengiz O, Özyazıcı MA, Koç Y, Kesim E, Türkmen F (2023) Assessment of heavy metal concentrations and behavior in cultivated soils under humid-subhumid environmental condition of the Black Sea region. Geoderma Reg 32:e00593
Article Google Scholar
Benson NU, Anake WU, Etesin UM (2014) Trace metals levels in inorganic fertilizers commercially available in Nigeria. J Sci Res 3(4):610–620
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman and Hall, London
Google Scholar
Budak M, Günal E, Kılıç M, Çelik İ, Sırrı M, Acir N (2023) Improvement of spatial estimation for soil organic carbon stocks in Yuksekova plain using Sentinel 2 imagery and gradient descent–boosted regression tree. Environ Sci Pollut Res 30(18):53253–53274
Article CAS Google Scholar
Cattani I, Romani M, Boccelli R (2008) Effect of cultivation practices on cadmium concentration in rice grain. Agron Sustain Dev 28:265–271
Article CAS Google Scholar
Chakraborty P, Sampath S, Mukhopadhyay M, Selvaraj S, Bharat GK, Nizzetto L (2019) Baseline investigation on plasticizers, bisphenol A, polycyclic aromatic hydrocarbons and heavy metals in the surface soil of the informal electronic waste recycling workshops and nearby open dumpsites in Indian metropolitan cities. Environ Pollut 248:1036–1045
Article CAS Google Scholar
Chao Z, Zhi-Xing G, Yu-Zhi Y, Min C, Xue-Ying L, Ru-Tian B (2019) Spatial and temporal changes of farmland soil acidification and their influencing factors in different regions of Guangdong Province, Chin. Ying yong sheng tai xue bao = J Appl Ecol 30(2):593–601
Google Scholar
Chavez E, He ZL, Stoffella PJ, Mylavarapu RS, Li YC, Moyano B, Baligar VC (2015) Concentration of cadmium in cacao beans and its relationship with soil cadmium in southern Ecuador. Sci Total Environ 533:205–214
Article CAS Google Scholar
Chen Y, Ma L, Yu D, Zhang H, Feng K, Wang X, Song J (2022) Comparison of feature selection methods for mapping soil organic matter in subtropical restored forests. Ecol Ind 135:108545
Article CAS Google Scholar
de Meester T (1970a) Soil map of the Great Konya Basin. Agricultural University, Wageningen
Google Scholar
de Meester T (1970b) Soils of the Great Konya Basin, Turkey. Agricultural Resource Report 740, pp290. Agricultural University, Wageningen
de Meester T (1971) Highly calcareous lacustrine soils in the Great Konya Basin, Turkey. Wageningen University and Research
Google Scholar
De Souza RV, Garbossa LHP, Campos CJA, Vianna LDN, Vanz A, Rupp GS (2016) Metals and pesticides in commercial bivalve mollusc production areas in the North and South Bays, Santa Catarina (Brazil). Mar Pollut Bull 105(1):377–384
Article Google Scholar
Demirel F, Eren B, Yilmaz A, Türkoğlu A, Haliloğlu K, Niedbała G et al (2023) Prediction of grain yield in wheat by CHAID and MARS Algorithms Analyses. Agronomy 13(6):1438
Article CAS Google Scholar
Dhaliwal SS, Singh J, Taneja PK, Mandal A (2020) Remediation techniques for removal of heavy metals from the soil contaminated through different sources: a review. Environ Sci Pollut Res 27:1319–1333
Article Google Scholar
Dhaliwal JK, Panday D, Saha D, Lee J, Jagadamma S, Schaeffer S, Mengistu A (2022) Predicting and interpreting cotton yield and its determinants under long-term conservation management practices using machine learning. Comput Electron Agric 199:107107
Article Google Scholar
Dong W, Zhang Y, Quan X (2020) Health risk assessment of heavy metals and pesticides: a case study in the main drinking water source in Dalian. China Chemosphere 242:125113
Article CAS Google Scholar
Eren B, Türkoğlu A, Haliloğlu K, Demirel F, Nowosad K, Özkan G, Niedbała G, Pour-Aboughadareh A, Bujak H, Bocianowski J (2023) Investigation of the influence of polyamines on mature embryo culture and DNA methylation of wheat (Triticum aestivum L.) using the machine learning algorithm method. Plants 12:3261
Article CAS Google Scholar
Eyduran E (2019) EhaGoF: Calculates goodness of fit statistics. R package version 0.1.0. https://CRAN.Rproject.org/package=ehaGoF
Faraz A, Tırınk C, Önder H, Şen U, Ishaq HM, Tauqir NA et al (2023) Usage of the XGBoost and MARS algorithms for predicting body weight in Kajli sheep breed. Trop Anim Health Prod 55(4):276. https://doi.org/10.1007/s11250-023-03700-6
Article Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article Google Scholar
Gao F, Shen Y, Sallach JB, Li H, Liu C, Li Y (2021) Direct prediction of bioaccumulation of organic contaminants in plant roots from soils with machine learning models based on molecular structures. Environ Sci Technol 55(24):16358–16368
Article CAS Google Scholar
Ge X, Ding J, Jin X, Wang J, Chen X, Li X et al (2021) Estimating agricultural soil moisture content through UAV-based hyperspectral images in the arid region. Remote Sens 13(8):1562
Article Google Scholar
Gertz M, Große-Butenuth K, Junge W, Maassen-Francke B, Renner C, Sparenberg H, Krieter J (2020) Using the XGBoost algorithm to classify neck and leg activity sensor data using on-farm health recordings for locomotor-associated diseases. Comput Electron Agric 173:105404
Article Google Scholar
Gökmen F, Uygur V, Sukuşu E (2023) Extreme gradient boosting regression model for soil available boron. Eurasian Soil Sci. https://doi.org/10.1134/S1064229322602128
Article Google Scholar
Grzesiak W, Zaborski D (2012) Examples of the use of data mining methods in animal breeding. Data mining applications in engineering and medicine, 303–324
Günal E, Budak M, Kılıç M, Cemek B, Sırrı M (2023) Combining spatial autocorrelation with artificial intelligence models to estimate spatial distribution and risks of heavy metal pollution in agricultural soils. Environ Monit Assess 195(2):317
Article Google Scholar
Hu B, Xue J, Zhou Y, Shao S, Fu Z, Li Y, Shi Z (2020) Modelling bioaccumulation of heavy metals in soil-crop ecosystems and identifying its controlling factors using machine learning. Environ Pollut 262:114308
Article CAS Google Scholar
Huang J, Fan G, Liu C, Zhou D (2023) Predicting soil available cadmium by machine learning based on soil properties. J Hazardous Mater 460:132327
Article CAS Google Scholar
Jiao W, Chen W, Chang AC, Page AL (2012) Environmental risks of trace elements associated with long-term phosphate fertilizers applications: a review. Environ Pollut 168:44–53
Article CAS Google Scholar
Kabata-Pendias A (2011) Trace elements in soils and plants. CRC press
Kacar B (2009) Toprak analizleri. Nobel Yayın, Ankara
Kacar B (2013) Temel gübre bilgisi. Nobel Yayın, Ankara
Koca YK (2019) Evaluation of environmental pollutant effect of thermal power plant in soil pollution regulations. KSU J Agric Nat 22(Suppl 1):148–153
Article Google Scholar
Kou S (1996) Phosphorus. Methods of soil analysis: Part 3 Chemical methods, 5, 870–920
Kurwadkar S (2019) Occurrence and distribution of organic and inorganic pollutants in groundwater. Water Environ Res 91(10):1001–1008
Article CAS Google Scholar
Lambert R, Grant C, Sauvé S (2007) Cadmium and zinc in soil solution extracts following the application of phosphate fertilizers. Sci Total Environ 378(3):293–305
Article CAS Google Scholar
Loeppert RH, Suarez DL (1996) Carbonate and gypsum. Methods of soil analysis: Part 3 Chemical methods, 5, 438–474
Lopez-Valdez F, Fernandez-Luqueño F (2014) Fertilizers: components, uses in agriculture and environmental impacts. Nova Science Publishers, Inc
Luo P, Xiao X, Han X, Ma Y, Sun X, Jiang J, Wang H (2019) Application of different single extraction procedures for assessing the bioavailability of heavy metal (loid) s in soils from overlapped areas of farmland and coal resources. Environ Sci Pollut Res 26:14932–14942
Article CAS Google Scholar
Mahmoudzadeh H, Matinfar HR, Taghizadeh-Mehrjardi R, Kerry R (2020) Spatial prediction of soil organic carbon using machine learning techniques in western Iran. Geoderma Reg 21:e00260
Article Google Scholar
Molina M, Aburto F, Calderón R, Cazanga M, Escudey M (2009) Trace element composition of selected fertilizers used in Chile: phosphorus fertilizers as a source of long-term soil contamination. Soil and Sediment Contam 18(4):497–511
Article CAS Google Scholar
Nelson DA, Sommers L (1996) Total carbon, organic carbon, and organic matter. Methods Soil Anal 5:962–1010
Google Scholar
Nguyen TG, Tran NA, Vu PL, Nguyen QH, Nguyen HD, Bui QT (2021) Salinity intrusion prediction using remote sensing and machine learning in data-limited regions: a case study in Vietnam’s Mekong Delta. Geoderma Reg 27:e00424
Article Google Scholar
Nicholson FA, Smith SR, Alloway BJ, Carlton-Smith C, Chambers BJ (2003) An inventory of heavy metals inputs to agricultural soils in England and Wales. Sci Total Environ 311(1–3):205–219
Article CAS Google Scholar
Nosrati K, Collins AL (2019) A soil quality index for evaluation of degradation under land use and soil erosion categories in a small mountainous catchment, Iran. J Mountain Sci 16(11):2577–2590
Article Google Scholar
Nziguheba G, Smolders E (2008) Inputs of trace elements in agricultural soils via phosphate fertilizers in European countries. Sci Total Environ 390(1):53–57
Article CAS Google Scholar
Ozaytekin HH, Mutlu HH, Dedeoglu M (2012) Soil formation on a calcic chronosequence of Ancient Lake Konya in Central Anatolia, Turkey. J Afr Earth Sc 76:66–74
Article CAS Google Scholar
Ozkan A, Uygur V (2019) Determination of heavy metal concentrations in agricultural lands of Amik Plain with MP-AES. Fresenius Environ Bull 28(1):416–425
CAS Google Scholar
Padarian J, Minasny B, McBratney AB (2019) Machine learning and soil sciences: A review aided by machine learning tools. Soil 6(1):35–52
Article Google Scholar
Peel MC, Finlayson BL, McMahon TA (2007) Updated world map of the Köppen-Geiger climate classification. Hydrol Earth Syst Sci 11(5):1633–1644
Article Google Scholar
R Core Team (2023) R: A language and environment for statistical computing. R Foundation for Statistical Computing, R version: 4.3.0 Vienna, Austria. https://www.R-project.org/
Rhoades JD (1996) Salinity: electrical conductivity and total dissolved solids. Methods Soil Anal 5:417–435
Google Scholar
Roberts N (1983) Age, palaeoenvironments, and climatic significance of late Pleistocene Konya Lake, Turkey. Quat Res 19(2):154–171
Article Google Scholar
Senesi N, Polemio M (1981) Trace element addition to soil by application of NPK fertilizers. Fertilizer Research 2:289–302. https://doi.org/10.1007/BF01050200
Article CAS Google Scholar
Seshadri B, Bolan NS, Wijesekara H, Kunhikrishnan A, Thangarajan R, Qi F et al (2016) Phosphorus–cadmium interactions in paddy soils. Geoderma 270:43–59
Article CAS Google Scholar
Sø HU, Postma D, Jakobsen R, Larsen F (2011) Sorption of phosphate onto calcite; results from batch experiments and surface complexation modeling. Geochim Cosmochim Acta 75(10):2911–2923
Article Google Scholar
Sparks DL (1996) Methods of soil analysis, Part 3: Chemical properties. Soil Sci Soc Am Book Series 5
Sumner ME, Miller WP (1996) Cation exchange capacity and exchange coefficients. Methods Soil Anal 5:1201–1229
Google Scholar
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43(6):1947–1958
Article CAS Google Scholar
Taşpınar K, Ateş Ö, Özge Pınar M, Yalçın G, Kızılaslan F, Fidantemiz YF (2022) Soil contamination assessment and potential sources of heavy metals of Alpu plain, Eskişehir, Turkey. Int J Environ Health Res 32(6):1282–1290
Article Google Scholar
Tırınk S, Öztürk B (2023) Evaluation of PM10 concentration by using Mars and XGBOOST algorithms in Iğdır Province of Türkiye. Int J Environ Sci Technol 20(5):5349–5358
Article Google Scholar
Tsai MS, Chen MH, Lin CC, Liu CY, Chen PC (2019) Children’s environmental health based on birth cohort studies of Asia (2)–air pollution, pesticides, and heavy metals. Environ Res 179:108754
Article CAS Google Scholar
USEPA (1996) Guidance on use of modeled results to demonstrate attainment of the ozone NAAQS, EPA-454/B-95-007, (June 1996). http://www.epa.gov/ttn/scram/ (file name: -O3TEST‖).
Vural H, Meral A, ŞenyiğitDoğan S (2021) Changes in the heavy metal levels in highway landscaping and protective effect of vegetative materials. Appl Environ Soil Sci 2021:1–9
Article Google Scholar
Wang P, Sun Z, Hu Y, Cheng H (2019a) Leaching of heavy metals from abandoned mine tailings brought by precipitation and the associated environmental impact. Sci Total Environ 695:133893
Article CAS Google Scholar
Wang S, Cai LM, Wen HH, Luo J, Wang QS, Liu X (2019b) Spatial distribution and source apportionment of heavy metals in soil from a typical county-level city of Guangdong Province, China. Sci Total Environ 655:92–101
Article CAS Google Scholar
Wanhong L, Fang L, Fan W, Maiqi D, Tiansen L (2020) Industrial water pollution and transboundary eco-compensation: analyzing the case of Songhua River Basin, China. Environ Sci Pollut Res 27:34746–34759
Article Google Scholar
Wilding LP, Bouma J, Goss DW (1994) Impact of spatial variability on interpretive modeling. Quant Model Soil Forming Process 39:61–75
Google Scholar
Xu D, Carswell A, Zhu Q, Zhang F, de Vries W (2020) Modelling long-term impacts of fertilization and liming on soil acidification at Rothamsted experimental station. Sci Total Environ 713:136249
Article CAS Google Scholar
Xu S, Zhao Y, Wang M, Shi X (2022) A comparison of machine learning algorithms for mapping soil iron parameters indicative of pedogenic processes by hyperspectral imaging of intact soil profiles. Eur J Soil Sci 73(1):e13204
Article CAS Google Scholar
Vaverková MD, Maxianová A, Winkler J, Adamcová D, Podlasek A (2019) Environmental consequences and the role of illegal waste dumps and their impact on land degradation. Land Use Policy 89:104234
Article Google Scholar
Yadav IC, Devi NL, Singh VK, Li J, Zhang G (2019) Spatial distribution, source analysis, and health risk assessment of heavy metals contamination in house dust and surface soil from four major cities of Nepal. Chemosphere 218:1100–1113
Article CAS Google Scholar
Yang H, Huang K, Zhang K, Weng Q, Zhang H, Wang F (2021) Predicting heavy metal adsorption on soil with machine learning and mapping global distribution of soil adsorption capacities. Environ Sci Technol 55(20):14316–14328
Article CAS Google Scholar
Ye Z, Sheng Z, Liu X, Ma Y, Wang R, Ding S et al (2021) Using machine learning algorithms based on GF-6 and google earth engine to predict and map the spatial distribution of soil organic matter content. Sustainability 13(24):14055
Article Google Scholar
Yesil P, Yesil M (2019) Heavy metal pollution in children’s playgrounds in Ordu, Turkey. Fresenius Environ Bull 28:5090–5098
CAS Google Scholar
Yılmaz E, Çiçek İ (2018) Detailed Köppen-Geiger climate regions of Turkey Türkiye’nin detaylandırılmış Köppen-Geiger iklim bölgeleri. J Human Sci 15(1):225–242
Google Scholar
Yu X, Wang Y, Wu L, Chen G, Wang L, Qin H (2020) Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting. J Hydrol 582:124293
Article Google Scholar
Zhang H, Yin S, Chen Y, Shao S, Wu J, Fan M, Gao C (2020) Machine learning-based source identification and spatial prediction of heavy metals in soil in a rapid urbanization area, eastern China. J Clean Prod 273:122858
Article CAS Google Scholar
Zhang H, Yin A, Yang X, Fan M, Shao S, Wu J, Gao C (2021) Use of machine-learning and receptor models for prediction and source apportionment of heavy metals in coastal reclaimed soils. Ecol Ind 122:107233
Article CAS Google Scholar
Zhao B, Zhu W, Hao S, Hua M, Liao Q, Jing Y, Gu X (2023) Prediction heavy metals accumulation risk in rice using machine learning and mapping pollution risk. J Hazard Mater 448:130879
Article CAS Google Scholar

Download references

Funding

Open access funding provided by the Scientific and Technological Research Council of Türkiye (TÜBİTAK). The authors would like to express their appreciation to TAGEM-TOPRAK SU for their financial contributions to the completion of the Project with 2012/148 reference number.

Author information

Authors and Affiliations

Fertilizer and Water Resources Research Institute, Yenimahalle, Ankara, Turkey
Mehmet Keçeci & Celal Koca
Department of Soil Science and Plant Nutrition, Agricultural Faculty, Iğdır University, Iğdır, Turkey
Fatih Gökmen
The, Ministry of Agriculture and Forestry, Çankaya, Ankara, Turkey
Mustafa Usul
Department of Soil Science and Plant Nutrition, Agricultural Faculty, Isparta University of Applied Sciences, Isparta, Turkey
Veli Uygur

Authors

Mehmet Keçeci
View author publications
You can also search for this author in PubMed Google Scholar
Fatih Gökmen
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Usul
View author publications
You can also search for this author in PubMed Google Scholar
Celal Koca
View author publications
You can also search for this author in PubMed Google Scholar
Veli Uygur
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors confirm contribution to the paper as follows: study conception, design and data collection: Mehmet Keçeci, Mustafa Usul and Celal Koca; analysis and interpretation of results and draft manuscript preparation: Veli Uygur and Fatih Gökmen; All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Fatih Gökmen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Keçeci, M., Gökmen, F., Usul, M. et al. Prediction of cadmium content using machine learning methods. Environ Earth Sci 83, 362 (2024). https://doi.org/10.1007/s12665-024-11672-5

Download citation

Received: 20 June 2023
Accepted: 11 May 2024
Published: 27 May 2024
DOI: https://doi.org/10.1007/s12665-024-11672-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Prediction of cadmium content using machine learning methods

Abstract

Similar content being viewed by others

Pollution indices as useful tools for the comprehensive evaluation of the degree of soil contamination–A review

Sources of Soil Pollution by Heavy Metals and Their Accumulation in Vegetables: a Review

Remote sensing, artificial neural networks, and spatial interpolation methods for modelling soil chemical characteristics

Introduction