Skip to main content

Advertisement

Log in

Machine-learning models for on-site estimation of background concentrations of arsenic in soils using soil formation factors

  • Soils, Sec 5 • Soil and Landscape Ecology • Research Article
  • Published:
Journal of Soils and Sediments Aims and scope Submit manuscript

Abstract

Purpose

Taking into account great spatial heterogeneity in soil environments is essential to carrying out an accurate soil contamination assessment at the regional scale. Although there are numerous methods for distinguishing between natural and anthropogenic element contents, few studies focus on on-site determination methods, with few site-specific and sensitive references available. In this study, site background concentration is estimated as an on-site reference for soil contamination assessment.

Materials and methods

Here, a support vector machine (SVM) is used to predict the site background concentration based on nine influential factors of soil formation. Three machine-learning algorithms, which are considered efficient in solving optimization problems, are used to select the optimal parameters of the SVM. These three algorithms are as follows: (1) a grid search algorithm, (2) a genetic algorithm, and (3) a particle swarm optimization algorithm.

Results and discussion

Model performances were evaluated using squared correlation coefficients and root-mean-square error. Their subsequent application to soil contamination assessment demonstrated that indiscriminate use of a consistent reference across all soil types in an environmental site assessment may result in under- and over-estimation. These problems are likely to be resolved by using site background concentration predictions to establish contaminated versus un-contaminated regions.

Conclusions

We conclude that a SVM based on factors of influence for soil formation is an effective method for site background concentration prediction and substantially improved the suitability of background references at our study site. Slight modifications would make this approach applicable to other regions and soil types.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Agirre-Basurko E, Ibarra-Berastegi G, Madariaga I (2006) Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environ Modell Softw 21:430–446

    Article  Google Scholar 

  • Bäck T (1996) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press, Oxford

    Google Scholar 

  • Baize D, Sterckeman T (2001) Of the necessity of knowledge of the natural pedo-geochemical background content in the evaluation of the contamination of soils by trace elements. Sci Total Environ 264(1):127–139

    Article  CAS  Google Scholar 

  • Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York

    Google Scholar 

  • Blaser P, Zimmermann S, Luster J, Shotyk W (2000) Critical examination of trace element enrichments and depletion in soils: As, Cr, Cu, Ni, Pb, and Zn in Swiss forest soils. Sci Total Environ 249:257–280

    Article  CAS  Google Scholar 

  • Bourennane H, Douay F, Sterckeman T, Villanneau E, Ciesielski H, King D et al (2010) Mapping of anthropogenic trace elements inputs in agricultural topsoil from Northern France using enrichment factors. Geoderma 157:165–174

    Article  CAS  Google Scholar 

  • Boutron CF, Candelone JP, Hong S (1995) Greenland snow and ice cores: unique archives of large-scale pollution of the troposphere of the Northern Hemisphere by lead and other heavy metals. Sci Total Environ 160:233–241

    Article  Google Scholar 

  • CEMS: Chinese Environmental Monitoring Station (1990) Background values of elements in soils of China (in Chinese). China Environmental Press, Beijing, 501 p

    Google Scholar 

  • Chen JS, Wei FS, Zheng CJ, Wu YY, Adriano DC (1991) Background concentrations of elements in soils of China. Water, Air, Soil Pollut 57–58:699–712

    Article  Google Scholar 

  • Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126

    Article  Google Scholar 

  • CMEP (Ministry of Environmental Protection of the People’s Republic of China) (2014) Q&A for Nationwide Soil Pollution Survey Report. http://www.zhb.gov.cn/gkml/hbb/qt/ 201404/t20140417_270671.htm (in Chinese)

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Google Scholar 

  • Desaules A (2012) Critical evaluation of soil contamination assessment methods for trace metals. Sci Total Environ 426:120–131

    Article  CAS  Google Scholar 

  • Elberling B, Breuning-Madsen H, Hinge H, Asmund G (2010) Heavy metals in 3300-year-old agricultural soils used to assess present soil contamination. Eur J Soil Sci 61:61–74

    Article  Google Scholar 

  • Gałuszka A (2007) A review of geochemical background concepts and an example using data from Poland. Environ Geol 52(5):861–870

    Article  Google Scholar 

  • Hampel FR, Ronchetti EM, Rousseeuw PJ et al (2011) Robust statistics: the approach based on influence functions. John Wiley & Sons

  • Hawkes HE, Webb JS (1962) Geochemistry in mineral exploration. Harper, New York, 409 p

    Google Scholar 

  • He J, Xu G, Zhu H, Peng G (2006) Soil background values of Jiangxi province. Chinese Environmental Science Press, Beijing

    Google Scholar 

  • ISO (2005) Soil quality—guidance on the determination of background values. ISO 19258

  • Jenny H (1941) Factors of soil formation. McGraw Hill, New York

    Google Scholar 

  • Jolliffe T (1986) Principal component analysis, ACM computing surveys. Springer, New York, pp 1–47

    Google Scholar 

  • Kennedy J (2010) Particle swarm optimization encyclopedia of machine learning. Springer, US, pp 760–766

    Google Scholar 

  • Li S, Shi Z, Chen S et al (2015) In situ measurements of organic carbon in soil profiles using vis-NIR spectroscopy on the Qinghai–Tibet plateau. Environ Sci Technol 49(8):4980–4987

    Article  CAS  Google Scholar 

  • Lin SW, Ying KC, Chen SC et al (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824

    Article  Google Scholar 

  • Marvuglia A, Kanevski M, Benetto E (2015) Machine learning for toxicity characterization of organic chemical emissions using USEtox database: learning the structure of the input space. Environ Int 83:72–85

    Article  CAS  Google Scholar 

  • Matschullat J, Ottenstein R, Reimann C (2000) Geochemical background–can we calculate it? Environ Geol 39(9):990–1000

    Article  CAS  Google Scholar 

  • Park Y, Cho KH, Park J et al (2015) Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea. Sci Total Environ 502:31–41

    Article  CAS  Google Scholar 

  • Porteous A (1996) Dictionary of environmental science and technology, 2nd edn. Wiley, Chichester, 794 pp

    Google Scholar 

  • Reimann C, Garrett R (2005) Geochemical background—concept and reality. Sci Total Environ 350:12–27

    Article  CAS  Google Scholar 

  • Reimann C, Filzmoser P, Garrett RG (2005) Background and threshold: critical comparison of methods of determination. Sci Total Environ 346(1):1–16

    Article  CAS  Google Scholar 

  • Ren Y, Bai G (2010) Determination of optimal SVM parameters by using GA/PSO. J Comput 5(8):1160–1168

    Article  Google Scholar 

  • Shotyk W, Cherkubin AK, Appleby PG, Fankhauser A, Kramers JD (1997) Lead in three peat bog profiles, Jura Mountains, Switzerland: enrichment factors, isotopic composition, and chronology of atmospheric deposition. Water, Air, Soil Pollut 100:297–310

    Article  CAS  Google Scholar 

  • Shotyk W, Blaser P, Grünig A, Cheburkin AK (2000) A new approach for quantifying cumulative, anthropogenic, atmospheric lead deposition using peat cores from bogs: Pb in eight Swiss peat bog profiles. Sci Total Environ 249:281–295

    Article  CAS  Google Scholar 

  • Simonson RW (1959) Outline of a generalized theory of soil genesis. Soil Science Society America Proceedings

  • Teng YG, Wu J, Lu SJ, Wang YY, Jiao XD, Song LT (2014) Soil and soil environmental quality monitoring in China: a review. Environ Int 69:177–199

    Article  CAS  Google Scholar 

  • Torija AJ, Ruiz DP (2015) A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods. Sci Total Environ 505:680–693

    Article  CAS  Google Scholar 

  • Valle S, Li W, Qin SJ (1999) Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods. Ind Eng Chem Res 38:4389–4401

    Article  CAS  Google Scholar 

  • Vapnik VN (1998) Statistical learning theory. Wiley, New York

    Google Scholar 

  • Varley A, Tyler A, Smith L et al (2015) Remediating radium contaminated legacy sites: advances made through machine learning in routine monitoring of “hot” particles. Sci Total Environ 521:270–279

    Article  Google Scholar 

  • Verrelst J, Muñoz J, Alonso L, Delegido J, Rivera JP, Camps-Valls G et al (2012) Machine learning regression algorithms for biophysical parameter retrieval: opportunities for sentinel-2 and -3. Remote Sens Environ 118:127–139

    Article  Google Scholar 

  • Walker PH (1989) Contributions to the understanding of soil and landscape relationships. Soil Res 27(4):589–605

    Article  Google Scholar 

  • Wang W, Xu Z, Lu W et al (2003) Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55(3):643–663

    Article  Google Scholar 

  • Wei FS, Zheng CJ, Chen JS, Wu YY (1991) Study on the background contents on 61 elements of soils in China. Chin J Environ Sci 12:12–19 (in Chinese)

    CAS  Google Scholar 

  • Wu J, Teng Y, Lu S, Wang Y, Jiao X (2014) Evaluation of soil contamination indices in a mining area of Jiangxi, China. PLoS One 9(11), e112917

    Article  Google Scholar 

  • Yilmaz I, Kaynar O (2011) Multiple regression, Ann (RBF, MLP) and ANFIS models for prediction of swell potential of clayey soils. Expert Syst Appl 38:5958–5966

    Article  Google Scholar 

  • Zeng ZH, Zeng XP (2000) The relations between the cancers and the soil arsenic(as) content in China. Jiangxi Sci 18:1–5 (in Chinese)

    Google Scholar 

  • Zhao FJ, Ma YB, Zhu YG, Tang Z, Steve PM (2015) Soil contamination in China: current status and mitigation strategies. Environ Sci Technol 49:750–759

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This study was supported by the Specific Research of the Public Service on Environmental Protection in China (No. 201509031) and the National Natural Science Foundation of China (No. 41303069).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanguo Teng.

Additional information

Responsible editor: Jun Zhou

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(DOCX 50 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, J., Teng, Y., Chen, H. et al. Machine-learning models for on-site estimation of background concentrations of arsenic in soils using soil formation factors. J Soils Sediments 16, 1787–1797 (2016). https://doi.org/10.1007/s11368-016-1374-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11368-016-1374-9

Keywords

Navigation