Geochemical Prospectivity Mapping Through a Feature Extraction–Selection Classification Scheme

  • Hamid Zekri
  • David R. CohenEmail author
  • Ahmad Reza Mokhtari
  • Abbas Esmaeili
Original Paper


Machine learning (ML) schemes can enhance success in geochemical prospectivity mapping. This study has examined the effectiveness of several feature extraction or selection approaches, using a variety of ML algorithms applied to multielement soil and lithogeochemical data, to identify new prospective Pb–Zn mineralisation in the Irankuh area. Singular value decomposition (SVD) was used as a dimensionality reduction technique to remove noise in the geochemical data. This was followed by application of feature selection techniques including filter-based methods such as principal component analysis (PCA), Pearson’s correlation coefficient (PCC), correlation-based feature selection (CFS), information gain ratio (IGR) and wrapper models, in combination with support vector machines, logistic regression and random forests analysis. The performance of the ML algorithms, assisted by feature extraction and selection methods, was subsequently assessed using a 10-fold cross-validation of separate training and testing data subsets. SVD boosted the performance of support vector machines, logistic regression and random forests. The ML algorithms are particularly effective when using two transformed principal components that are linked to a suite of elements associated with the sulphide mineralisation and variations in the host lithologies. PCA and PCC techniques generally suit support vector machines as the most effective feature selection methods. Logistic regression provided a better classification with PCA, IGR and a wrapper model. However, random forests delivered more accurate outcomes using PCA and PCC techniques. A geochemical prospectivity map of the study area has been derived from support vector machines, trained with two principal components as the best performing ML scheme, and has generated three new and distinct targets for more detailed exploration.


Machine learning Feature extraction Feature selection Singular value decomposition Geochemical prospectivity mapping 



The authors would like to thank the Bama Mining Company, its CEO Mr Hasan Eslami, and Mr Rabbani and Mr Ladvar without whose help this work would not have been possible.

Supplementary material

11053_2018_9422_MOESM1_ESM.docx (35 kb)
Supplementary material 1 (DOCX 34 kb)


  1. Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society, Series B (Methodological), 44(2), 139–177.Google Scholar
  2. Baker, K. (2005). Singular value decomposition tutorial. Columbus: The Ohio State University.Google Scholar
  3. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.CrossRefGoogle Scholar
  4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.Google Scholar
  5. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefGoogle Scholar
  6. Brown, W. M., Gedeon, T., Groves, D., & Barnes, R. (2000). Artificial neural networks: A new method for mineral prospectivity mapping. Australian Journal of Earth Sciences, 47(4), 757–770.CrossRefGoogle Scholar
  7. Carranza, E. J. M. (2008). Geochemical anomaly and mineral prospectivity mapping in GIS. Amsterdam: Elsevier.Google Scholar
  8. Carranza, E. J. M., & Hale, M. (2001). Logistic regression for geologically constrained mapping of gold potential, Baguio district, Philippines. Exploration and Mining Geology, 10(3), 165–175.CrossRefGoogle Scholar
  9. Carranza, E., Hale, M., & Faassen, C. (2008). Selection of coherent deposit-type locations and their application in data-driven mineral prospectivity mapping. Ore Geology Reviews, 33(3–4), 536–558.CrossRefGoogle Scholar
  10. Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.CrossRefGoogle Scholar
  11. Chen, S., Hattori, K., & Grunsky, E. C. (2017). Multielement statistical evidence for uraniferous hydrothermal activity in sandstones overlying the Phoenix uranium deposit, Athabasca Basin, Canada. Mineralium Deposita, 53(4), 493–508.CrossRefGoogle Scholar
  12. Chen, S., Hattori, K., & Grunsky, E. C. (2018). Identification of sandstones above blind uranium deposits using multivariate statistical assessment of compositional data, Athabasca Basin, Canada. Journal of Geochemical Exploration, 188, 229–239.CrossRefGoogle Scholar
  13. Chen, Y., & Wu, W. (2017). Mapping mineral prospectivity using an extreme learning machine regression. Ore Geology Reviews, 80, 200–213.CrossRefGoogle Scholar
  14. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.Google Scholar
  15. Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., et al. (2007). Random forests for classification in ecology. Ecology, 88(11), 2783–2792.CrossRefGoogle Scholar
  16. De Silva, A. M., & Leong, P. H. W. (2015). Feature selection. In A. M. De Silva & P. H. W. Leong (Eds.), Grammar-based feature generation for time-series prediction (pp. 13–24). Berlin: Springer.Google Scholar
  17. Filzmoser, P., Garrett, R. G., & Reimann, C. (2005). Multivariate outlier detection in exploration geochemistry. Computers & Geosciences, 31(5), 579–587.CrossRefGoogle Scholar
  18. Geranian, H., Tabatabaei, S. H., Asadi, H. H., & Carranza, E. J. M. (2016). Application of discriminant analysis and support vector machine in mapping gold potential areas for further drilling in the Sari-Gunay gold deposit, NW Iran. Natural Resources Research, 25(2), 145–159.CrossRefGoogle Scholar
  19. Ghazban, F., McNutt, R. H., & Schwarcz, H. P. (1994). Genesis of sediment-hosted Zn–Pb–Ba deposits in the Irankuh district, Esfahan area, west-central Iran. Economic Geology, 89(6), 1262–1278.CrossRefGoogle Scholar
  20. Gonbadi, A. M., Tabatabaei, S. H., & Carranza, E. J. M. (2015). Supervised geochemical anomaly detection by pattern recognition. Journal of Geochemical Exploration, 157, 81–91.CrossRefGoogle Scholar
  21. Granek, J. (2016). Application of machine learning algorithms to mineral prospectivity mapping. Vancouver: University of British Columbia.Google Scholar
  22. Grunsky, E. C. (2010). The interpretation of geochemical survey data. Geochemistry: Exploration, Environment, Analysis, 10(1), 27–74.Google Scholar
  23. Hall, M. A. (1999). Correlation-based feature selection for machine learning. Ph.D. thesis, University of Waikato.Google Scholar
  24. Hall, M. A. (2000). Correlation-based feature selection of discrete and numeric class machine learning. Department of Computer Science, University of Waikato. Working paper 00/08.Google Scholar
  25. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Exploration Newsletter, 11(1), 10–18.CrossRefGoogle Scholar
  26. Harris, E. (2002). Information gain versus gain ratio: A study of split method biases. In ISAIM. Rutgers University.Google Scholar
  27. Harris, D., & Pan, G. (1999). Mineral favorability mapping: A comparison of artificial neural networks, logistic regression, and discriminant analysis. Natural Resources Research, 8(2), 93–109.CrossRefGoogle Scholar
  28. Hosseini-Dinani, H., & Aftabi, A. (2016). Vertical lithogeochemical halos and zoning vectors at Gushfil Zn–Pb deposit, Irankuh district, southwestern Isfahan, Iran: Implications for concealed ore exploration and genetic models. Ore Geology Reviews, 72, 1004–1021.CrossRefGoogle Scholar
  29. Hosseini-Dinani, H., Aftabi, A., Esmaeili, A., & Rabbani, M. (2015). Composite soil-geochemical halos delineating carbonate-hosted zinc–lead–barium mineralization in the Irankuh district, Isfahan, west-central Iran. Journal of Geochemical Exploration, 156, 114–130.CrossRefGoogle Scholar
  30. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification. Taipei: National Taiwan University.Google Scholar
  31. Janecek, A., Gansterer, W., Demel, M., & Ecker, G. (2008). On the relationship between feature selection and classification accuracy. In: Y. Saeys et al. (Eds.), JMLR workshop and conference proceedings on new challenges for feature selection (Vol. 4, pp. 90–105).Google Scholar
  32. Jolliffe, I. T. (2002). Principal component analysis. Springer.Google Scholar
  33. Karegowda, A. G., Manjunath, A., & Jayaram, M. (2010). Comparative study of attribute selection using gain ratio and correlation based feature selection. International Journal of Information Technology and Knowledge Management, 2(2), 271–277.Google Scholar
  34. Khalid, S., Khalil, T., & Nasreen, S. (2014). A survey of feature selection and feature extraction techniques in machine learning. In Science and information conference, August 27–29, 2014. London: IEEE.Google Scholar
  35. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In 14th international joint conference on artificial intelligence. Montreal, Canada: Morgan Kauffman.Google Scholar
  36. Leach, D. L., Taylor, R. D., Fey, D. L., Diehl, S. F., & Saltus, R. W. (2010). A deposit model for Mississippi Valley-type lead–zinc ores. In: Chapter A, mineral deposit models for resource assessment. US Geological Survey report, 2010-5070-A.Google Scholar
  37. Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2(3), 18–22.Google Scholar
  38. Liu, Y., Xia, Q., Cheng, Q., & Wang, X. (2013). Application of singularity theory and logistic regression model for tungsten polymetallic potential mapping. Nonlinear Processes in Geophysics, 20(4), 445–453.CrossRefGoogle Scholar
  39. Malhi, A., & Gao, R. X. (2004). PCA-based feature selection scheme for machine defect classification. IEEE Transactions on Instrumentation and Measurement, 53(6), 1517–1525.CrossRefGoogle Scholar
  40. McKay, G., & Harris, J. (2016). Comparison of the data-driven Random Forests model and a knowledge-driven method for mineral prospectivity mapping: A case study for gold deposits around the Huritz Group and Nueltin Suite, Nunavut, Canada. Natural Resources Research, 25(2), 125–143.CrossRefGoogle Scholar
  41. McKinley, J. M., Grunsky, E. C., & Mueller, U. (2017). Environmental monitoring and peat assessment using multivariate analysis of regional-scale geochemical data. Mathematical Geosciences, 50(2), 235–246.CrossRefGoogle Scholar
  42. Mokhtari, A. R. (2014). Hydrothermal alteration mapping through multivariate logistic regression analysis of lithogeochemical data. Journal of Geochemical Exploration, 145, 207–212.CrossRefGoogle Scholar
  43. Porwal, A., Gonzalez-Alvarez, I., Markwitz, V., McCuaig, T., & Mamuse, A. (2010). Weights-of-evidence and logistic regression modeling of magmatic nickel sulfide prospectivity in the Yilgarn Craton, Western Australia. Ore Geology Reviews, 38(3), 184–196.CrossRefGoogle Scholar
  44. Rabbani, M., & Esmaeili, A. (2010). Geological-mineralogical report for the Gushfil–Tappe Sorkh district. Isfahan: The Bama Mining Company.Google Scholar
  45. Rajabi, A., Rastad, E., & Canet, C. (2012). Metallogeny of Cretaceous carbonate-hosted Zn–Pb deposits of Iran: Geotectonic setting and data integration for future mineral exploration. International Geology Review, 54(14), 1649–1672.CrossRefGoogle Scholar
  46. Reimann, C., Filzmoser, P., Garrett, R. G., & Dutter, R. (2008). Statistical data analysis explained: applied environmental statistics with R. Chichester: Wiley.CrossRefGoogle Scholar
  47. Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., & Chica-Rivas, M. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804–818.CrossRefGoogle Scholar
  48. Roshani, P., Mokhtari, A. R., & Tabatabaei, S. H. (2013). Objective based geochemical anomaly detection—Application of discriminant function analysis in anomaly delineation in the Kuh Panj porphyry Cu mineralization (Iran). Journal of Geochemical Exploration, 130, 65–73.CrossRefGoogle Scholar
  49. Sánchez-Maroño, N., Alonso-Betanzos, A., & Tombilla-Sanromán, M. (2007). Filter methods for feature selection—A comparative study. In International conference on intelligent data engineering and automated learning (IDEAL 2007) (pp. 178–187). Berlin: Springer.Google Scholar
  50. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.CrossRefGoogle Scholar
  51. Shardlow, M. (2016). An analysis of feature selection techniques. Manchester: The University of Manchester.Google Scholar
  52. Skillicorn, D. B. (2007). Understanding complex datasets, data mining with matrix decompositions. Boca Raton: CRC Press.CrossRefGoogle Scholar
  53. Wall, M. E., Rechtsteiner, A., & Rocha, L. M. (2003). Singular value decomposition and principal component analysis: A practical approach to microarray data analysis (pp. 91–109). Boston: Springer.Google Scholar
  54. Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques (4th ed.). Los Altos: Morgan Kaufmann.Google Scholar
  55. Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1-3), 37–52.CrossRefGoogle Scholar
  56. Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning. Google Scholar
  57. Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205–1224.Google Scholar
  58. Zekri, H., Mokhtari, A. R., & Cohen, D. R. (2016). Application of singular value decomposition (SVD) and semi-discrete decomposition (SDD) techniques in clustering of geochemical data, an environmental study in central Iran. Stochastic Environmental Research and Risk Assessment, 30(7), 1947–1960.CrossRefGoogle Scholar
  59. Zuo, R. (2017). Machine learning of mineralization-related geochemical anomalies: A review of potential methods. Natural Resources Research, 26(4), 457–464.CrossRefGoogle Scholar
  60. Zuo, R., & Carranza, E. J. M. (2011). Support vector machine: A tool for mapping mineral prospectivity. Computers & Geosciences, 37(12), 1967–1975.CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geosciences 2018
corrected publication 2018

Authors and Affiliations

  1. 1.Department of Mining EngineeringIsfahan University of TechnologyIsfahanIran
  2. 2.School of Biological, Earth and Environmental SciencesUniversity of New South WalesKensingtonAustralia
  3. 3.Department of GeologyBama Mining CompanyIsfahanIran

Personalised recommendations