Abstract
Purpose
Taking into account great spatial heterogeneity in soil environments is essential to carrying out an accurate soil contamination assessment at the regional scale. Although there are numerous methods for distinguishing between natural and anthropogenic element contents, few studies focus on on-site determination methods, with few site-specific and sensitive references available. In this study, site background concentration is estimated as an on-site reference for soil contamination assessment.
Materials and methods
Here, a support vector machine (SVM) is used to predict the site background concentration based on nine influential factors of soil formation. Three machine-learning algorithms, which are considered efficient in solving optimization problems, are used to select the optimal parameters of the SVM. These three algorithms are as follows: (1) a grid search algorithm, (2) a genetic algorithm, and (3) a particle swarm optimization algorithm.
Results and discussion
Model performances were evaluated using squared correlation coefficients and root-mean-square error. Their subsequent application to soil contamination assessment demonstrated that indiscriminate use of a consistent reference across all soil types in an environmental site assessment may result in under- and over-estimation. These problems are likely to be resolved by using site background concentration predictions to establish contaminated versus un-contaminated regions.
Conclusions
We conclude that a SVM based on factors of influence for soil formation is an effective method for site background concentration prediction and substantially improved the suitability of background references at our study site. Slight modifications would make this approach applicable to other regions and soil types.
Similar content being viewed by others
References
Agirre-Basurko E, Ibarra-Berastegi G, Madariaga I (2006) Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environ Modell Softw 21:430–446
Bäck T (1996) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press, Oxford
Baize D, Sterckeman T (2001) Of the necessity of knowledge of the natural pedo-geochemical background content in the evaluation of the contamination of soils by trace elements. Sci Total Environ 264(1):127–139
Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York
Blaser P, Zimmermann S, Luster J, Shotyk W (2000) Critical examination of trace element enrichments and depletion in soils: As, Cr, Cu, Ni, Pb, and Zn in Swiss forest soils. Sci Total Environ 249:257–280
Bourennane H, Douay F, Sterckeman T, Villanneau E, Ciesielski H, King D et al (2010) Mapping of anthropogenic trace elements inputs in agricultural topsoil from Northern France using enrichment factors. Geoderma 157:165–174
Boutron CF, Candelone JP, Hong S (1995) Greenland snow and ice cores: unique archives of large-scale pollution of the troposphere of the Northern Hemisphere by lead and other heavy metals. Sci Total Environ 160:233–241
CEMS: Chinese Environmental Monitoring Station (1990) Background values of elements in soils of China (in Chinese). China Environmental Press, Beijing, 501 p
Chen JS, Wei FS, Zheng CJ, Wu YY, Adriano DC (1991) Background concentrations of elements in soils of China. Water, Air, Soil Pollut 57–58:699–712
Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126
CMEP (Ministry of Environmental Protection of the People’s Republic of China) (2014) Q&A for Nationwide Soil Pollution Survey Report. http://www.zhb.gov.cn/gkml/hbb/qt/ 201404/t20140417_270671.htm (in Chinese)
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Desaules A (2012) Critical evaluation of soil contamination assessment methods for trace metals. Sci Total Environ 426:120–131
Elberling B, Breuning-Madsen H, Hinge H, Asmund G (2010) Heavy metals in 3300-year-old agricultural soils used to assess present soil contamination. Eur J Soil Sci 61:61–74
Gałuszka A (2007) A review of geochemical background concepts and an example using data from Poland. Environ Geol 52(5):861–870
Hampel FR, Ronchetti EM, Rousseeuw PJ et al (2011) Robust statistics: the approach based on influence functions. John Wiley & Sons
Hawkes HE, Webb JS (1962) Geochemistry in mineral exploration. Harper, New York, 409 p
He J, Xu G, Zhu H, Peng G (2006) Soil background values of Jiangxi province. Chinese Environmental Science Press, Beijing
ISO (2005) Soil quality—guidance on the determination of background values. ISO 19258
Jenny H (1941) Factors of soil formation. McGraw Hill, New York
Jolliffe T (1986) Principal component analysis, ACM computing surveys. Springer, New York, pp 1–47
Kennedy J (2010) Particle swarm optimization encyclopedia of machine learning. Springer, US, pp 760–766
Li S, Shi Z, Chen S et al (2015) In situ measurements of organic carbon in soil profiles using vis-NIR spectroscopy on the Qinghai–Tibet plateau. Environ Sci Technol 49(8):4980–4987
Lin SW, Ying KC, Chen SC et al (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824
Marvuglia A, Kanevski M, Benetto E (2015) Machine learning for toxicity characterization of organic chemical emissions using USEtox database: learning the structure of the input space. Environ Int 83:72–85
Matschullat J, Ottenstein R, Reimann C (2000) Geochemical background–can we calculate it? Environ Geol 39(9):990–1000
Park Y, Cho KH, Park J et al (2015) Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea. Sci Total Environ 502:31–41
Porteous A (1996) Dictionary of environmental science and technology, 2nd edn. Wiley, Chichester, 794 pp
Reimann C, Garrett R (2005) Geochemical background—concept and reality. Sci Total Environ 350:12–27
Reimann C, Filzmoser P, Garrett RG (2005) Background and threshold: critical comparison of methods of determination. Sci Total Environ 346(1):1–16
Ren Y, Bai G (2010) Determination of optimal SVM parameters by using GA/PSO. J Comput 5(8):1160–1168
Shotyk W, Cherkubin AK, Appleby PG, Fankhauser A, Kramers JD (1997) Lead in three peat bog profiles, Jura Mountains, Switzerland: enrichment factors, isotopic composition, and chronology of atmospheric deposition. Water, Air, Soil Pollut 100:297–310
Shotyk W, Blaser P, Grünig A, Cheburkin AK (2000) A new approach for quantifying cumulative, anthropogenic, atmospheric lead deposition using peat cores from bogs: Pb in eight Swiss peat bog profiles. Sci Total Environ 249:281–295
Simonson RW (1959) Outline of a generalized theory of soil genesis. Soil Science Society America Proceedings
Teng YG, Wu J, Lu SJ, Wang YY, Jiao XD, Song LT (2014) Soil and soil environmental quality monitoring in China: a review. Environ Int 69:177–199
Torija AJ, Ruiz DP (2015) A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods. Sci Total Environ 505:680–693
Valle S, Li W, Qin SJ (1999) Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods. Ind Eng Chem Res 38:4389–4401
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Varley A, Tyler A, Smith L et al (2015) Remediating radium contaminated legacy sites: advances made through machine learning in routine monitoring of “hot” particles. Sci Total Environ 521:270–279
Verrelst J, Muñoz J, Alonso L, Delegido J, Rivera JP, Camps-Valls G et al (2012) Machine learning regression algorithms for biophysical parameter retrieval: opportunities for sentinel-2 and -3. Remote Sens Environ 118:127–139
Walker PH (1989) Contributions to the understanding of soil and landscape relationships. Soil Res 27(4):589–605
Wang W, Xu Z, Lu W et al (2003) Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55(3):643–663
Wei FS, Zheng CJ, Chen JS, Wu YY (1991) Study on the background contents on 61 elements of soils in China. Chin J Environ Sci 12:12–19 (in Chinese)
Wu J, Teng Y, Lu S, Wang Y, Jiao X (2014) Evaluation of soil contamination indices in a mining area of Jiangxi, China. PLoS One 9(11), e112917
Yilmaz I, Kaynar O (2011) Multiple regression, Ann (RBF, MLP) and ANFIS models for prediction of swell potential of clayey soils. Expert Syst Appl 38:5958–5966
Zeng ZH, Zeng XP (2000) The relations between the cancers and the soil arsenic(as) content in China. Jiangxi Sci 18:1–5 (in Chinese)
Zhao FJ, Ma YB, Zhu YG, Tang Z, Steve PM (2015) Soil contamination in China: current status and mitigation strategies. Environ Sci Technol 49:750–759
Acknowledgments
This study was supported by the Specific Research of the Public Service on Environmental Protection in China (No. 201509031) and the National Natural Science Foundation of China (No. 41303069).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Jun Zhou
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(DOCX 50 kb)
Rights and permissions
About this article
Cite this article
Wu, J., Teng, Y., Chen, H. et al. Machine-learning models for on-site estimation of background concentrations of arsenic in soils using soil formation factors. J Soils Sediments 16, 1787–1797 (2016). https://doi.org/10.1007/s11368-016-1374-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11368-016-1374-9