Skip to main content

Advertisement

Log in

Optimal Thresholding of Predictors in Mineral Prospectivity Analysis

  • Original Paper
  • Published:
Natural Resources Research Aims and scope Submit manuscript

Abstract

Some methods for analysing mineral prospectivity, especially the weights of evidence technique, require the predictor variables to be binary values. When the original evidence data are numerical values, such as geochemical indices, they can be converted to binary values by thresholding. When the evidence layer is a spatial feature such as a geological fault system, it can be converted to a binary predictor by buffering at a suitable cut-off distance. This paper reviews methods for selecting the best threshold or cut-off value and compares their performance. The review covers techniques which are well known in prospectivity analysis as well as unfamiliar techniques borrowed from other literature. Methods include maximisation of the estimated contrast, Studentised contrast, \(\chi ^2\) test statistic, Youden criterion, statistical likelihood, Akman–Raftery criterion, and curvature of the capture–efficiency curve. We identify connections between the different methods, and we highlight a common technical error in their application. Simulation experiments indicate that the Youden criterion has the best performance for selection of the threshold or cut-off value, assuming that a simple binary threshold relationship truly holds. If the relationship between predictor and prospectivity is more complicated, then the likelihood method is the most easily adaptable. The weights-of-evidence contrast performs poorly overall. These conclusions are supported by our analysis of data from the Murchison goldfields, Western Australia. We also propose a bootstrap method for calculating standard errors and confidence intervals for the location of the threshold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27

Similar content being viewed by others

Notes

  1. While we have assumed that grid cell area is negligible in order to simplify and clarify the equations, this assumption is also pragmatically justifiable. If the survey region is 100 km across and grid cells are 100 metre squares, then there are a million grid cells, each with an area fraction of \(10^{-6}\), and our simplified equations are accurate to 4 decimal places.

  2. The original survey was compiled by Dr. Jonathan Huntington, CSIRO, and has been analysed by Berman (1986) and others. Coordinates were kindly provided to us by Dr. Mark Berman and Dr. Andy Green, CSIRO. The survey data are publicly available in the R package spatstat (Baddeley et al. 2015; Baddeley and Turner 2005) as the dataset copper. The data in Figure 5 are supplied in Online Resource 1.

  3. Harris and Pan (1999) used a related measure of classification performance which they call exploration performance. Exploration is viewed as the task of selecting the optimum threshold for allocating grid cells to the mineralised class, and it is assumed that all grid cells classified by the system as mineralised are retained as ground for further exploration. Classification performance is measured by the percentage of total grid cells that must be retained to ensure that different percentages of mineralised grid cells are retained when various cut-off probabilities for mineralisation are applied to neural network output.

  4. The Youden index is traditionally denoted by the letter J, but we have called it Y for a more mnemonic notation.

  5. For a detailed derivation of this likelihood, see the appendix Likelihood Function for Threshold Model. For a different, more intuitively accessible explanation, see Baddeley et al. (2015, pp. 132–135, 342–343).

  6. For experts in statistics, we note that the central limit theorem does not apply to this problem: the asymptotic law of the log-likelihood is a compound Poisson process, not a Gaussian process (Pflug 1983; Kutoyants 1998, eq. (5.3), p. 184).

  7. Akman and Raftery (1986) denoted their criterion by the letter Y, but we use \(\hbox {AR}\) for a more mnemonic notation.

  8. The technical requirements for the Student’s t distribution are not satisfied: in particular the numerator and denominator of (17) are not independent variables.

  9. For a given value z of the predictor, the prospectivity \(\rho (z)\) is defined as the expected density of deposits per unit area amongst those grid cells with predictor value equal to z.

References

  • Agterberg, F. (1974). Automatic contouring of geological maps to detect target areas for mineral exploration. Journal of the International Association for Mathematical Geology, 6, 373–395.

    Google Scholar 

  • Agterberg, F. (1992). Combining indicator patterns in weights of evidence modeling for resource evaluation. Nonrenewable Resources, 1, 39–50.

    Google Scholar 

  • Agterberg, F. (2011). A modified weights-of-evidence method for regional mineral resource evaluation. Natural Resources Research, 20(2), 95–101.

    Google Scholar 

  • Agterberg, F. (2014). Geomathematics: Theoretical foundations, applications, and future developments (Vol. 18). Cham: Springer.

    Google Scholar 

  • Agterberg, F., & Bonham-Carter, G. (1999). Logistic regression and weights of evidence modeling in mineral exploration. In K. Dagdalen (Eds.), Proceedings, 28th international symposium on computer applications in the mineral industries – APCOM 99 (pp. 483–590). Golden, Colorado:Colorado School of Mines. (ISBN 0-918062-12-8)

  • Agterberg, F., & Bonham-Carter, G. (2005). Measuring the performance of mineral-potential maps. Natural Resources Research, 14(1), 1–17.

    Google Scholar 

  • Agterberg, F., Bonham-Carter, G., Cheng, Q., & Wright, D. (1993). Weights of evidence modeling and weighted logistic regression for mineral potential mapping. In J. Davis & U. Herzfeld (Eds.), Computers in geology—25 years of progress (pp. 13–32). New York: Oxford University Press.

    Google Scholar 

  • Agterberg, F., & Cheng, Q. (2002). Conditional independence test for weights-of-evidence modeling. Natural Resources Research, 11, 249–255.

    Google Scholar 

  • Akman, V., & Raftery, A. (1986). Asymptotic inference for a change-point Poisson process. Annals of Statistics, 14(4), 1583–1590.

    Google Scholar 

  • Alison, P. (2002). Missing Data. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Anonymous. (2011). “Significant”. Retrieved from https://xkcd.com/882 (Web comic published on the XKCD website on 06 April 2011)

  • Baddeley, A. (2018). A statistical commentary on mineral prospectivity analysis. In B.D. Sagar, Q. Cheng, & F. Agterberg (Eds.), Handbook of mathematical geosciences: Fifty Years of IAMG (pp. 25–65). International Association for Mathematical Geosciences.

  • Baddeley, A., Berman, M., Fisher, N., Hardegen, A., Milne, R., Schuhmacher, D., et al. (2010). Spatial logistic regression and change-of-support for Poisson point processes. Electronic Journal of Statistics, 4, 1151–1201. https://doi.org/10.1214/10-EJS581.

    Article  Google Scholar 

  • Baddeley, A., Chang, Y., Song, Y., & Turner, R. (2012). Nonparametric estimation of the dependence of a spatial point process on a spatial covariate. Statistics and Its Interface, 5, 221–236.

    Google Scholar 

  • Baddeley, A., Rubak, E., & Turner, R. (2015). Spatial point patterns: Methodology and applications with R. London: Chapman and Hall/CRC.

    Google Scholar 

  • Baddeley, A., & Turner, R. (2005). Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software, 12(6), 1–42. (URL: www.jstatsoft.org, ISSN: 1548-7660)

  • Ballantyne, C., & Cornish, R. (1979). Use of the chi-square test for the analysis of orientation data. Journal of Sedimentary Research, 49(3), 773–776.

    Google Scholar 

  • Barnard, G. (1959). Control charts and stochastic processes. Journal of the Royal Statistical Society, Series B, 21, 239–271.

    Google Scholar 

  • Basseville, M., & Nikiforov, I. (1993). Detection of abrupt changes: Theory and applications. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Berman, M. (1986). Testing for spatial association between a point process and another stochastic process. Applied Statistics, 35, 54–62.

    Google Scholar 

  • Bhattacharya, G., & Brockwell, P. (1976). The minimum of an additive process with applications to signal estimation and storage theory. Zeitschrift fuer Wahscheinlichkeitstheorie und verwandte Gebiete, 37, 51–75.

    Google Scholar 

  • Bhattacharya, G., & Johnson, R. (1968). Nonparametric tests for shift at unknown time point. Annals of Mathematical Statistics, 39, 1731–1743.

    Google Scholar 

  • Bierlein, F., Murphy, F., Weinberg, R., & Lees, T. (2006). Distribution of orogenic gold deposits in relation to fault zones and gravity gradients: targeting tools applied to the Eastern Goldfields, Yilgarn Craton, Western Australia. Mineralium Deposita, 41, 107–126.

    Google Scholar 

  • Bierlein, F., Northover, H., Groves, D., Goldfarb, R., & Marsh, E. (2008). Controls on mineralisation in the Sierra Foothills gold province, central California, USA: a GIS-based reconnaissance prospectivity analysis. Australian Journal of Earth Sciences, 55, 61–78.

    Google Scholar 

  • Boleneus, D., Raines, G., Causey, J., Bookstrom, A., Frost, T., & Hyndman, P. (2001). Assessment method for epithermal gold deposits in northeast Washington State using weights-of-evidence GIS modeling (Open-File Report Nos. 2001–501). US Geological Survey.

  • Bonham-Carter, G. (1994). Geographic information systems for geoscientists: modelling with GIS (No. 13). Kidlington, Oxford, UK: Pergamon Press/Elsevier.

  • Bonham-Carter, G., & Agterberg, F. (1990). Application of a microcomputer-based geographic information system to mineral-potential mapping. In J.T. Hanley & D.F. Merriam (Eds.), Microcomputer applications in geology 2 (pp. 49–74). Amsterdam: Pergamon.

  • Bonham-Carter, G., Agterberg, F., & Wright, D. (1990). Weights of evidence modelling: a new approach to mapping mineral potential. In F. Agterberg & G. Bonham-Carter (Eds.), Statistical applications in the earth sciences (pp. 171–183). Ottawa: Geological Survey of Canada. (Proceedings of the Colloquium on Statistical Applications in the Earth Sciences hosted by the Geological Survey of Canada in Ottawa on 14–18 November, 1988)

  • Breiman, L., Friedman, J., Stone, C., & Olshen, R. (1984). Classification and regression trees. London: Chapman and Hall/CRC.

    Google Scholar 

  • Brown, W. (2002). Artificial neural networks: A new method for mineral-prospectivity mapping (Ph.D. thesis). University of Western Australia.

  • Brown, W., Gedeon, T., Baddeley, A., & Groves, D. (2002). Bivariate J-function and other graphical statistical methods help select the best predictor variables as inputs for a neural network method of mineral prospectivity mapping In U. Bayer, H. Burger, & W. Skala (Eds.), IAMG 2002: 8th annual conference of the international association for mathematical geology (Vol. 1, pp. 257–268).

  • Carlton, M., & Devore, J. (2014). Probability with applications in engineering, science, and technology. New York: Springer. https://doi.org/10.1007/978-1-4939-0395-5.

    Book  Google Scholar 

  • Carranza, E. (2004). Weights of evidence modeling of mineral potential: a case study using small number of prospects, Abra, Philippines. Natural Resources Research, 13(3), 173–187.

    Google Scholar 

  • Carranza, E. (2009). Data-driven modeling of mineral prospectivity. In M. Hale (Ed.), Handbook of exploration and environmental geochemistry 11: Geochemical anomaly and mineral prospectivity mapping in GIS (pp. 249–310). Elsevier.

  • Cassard, D., Billa, M., Lambert, A., Picot, J., Husson, Y., & Lassere, J. (2008). Gold predictivity mapping in French Guiana using an expert-guided data-driven approach based on a regional-scale GIS. Ore Geology Reviews, 34(3), 471–500.

    Google Scholar 

  • Cervi, F., Berti, M., Borgatti, L., Ronchetti, F., Manenti, F., & Corsini, A. (2010). Comparing predictive capability of statistical and deterministic methods for landslide susceptibility mapping: a case study in the northern Apennines (Reggio Emilia Province, Italy). Landslides, 7(4), 433–444.

    Google Scholar 

  • Chen, Y., & Wu, W. (2019). Isolation Forest as an alternative data-driven mineral prospectivity mapping method with a higher data-processing efficiency. Natural Resources Research, 28(1), 31–46.

    Google Scholar 

  • Cheng, Q. (2004). Application of weights of evidence method for assessment of flowing wells in the Greater Toronto Area, Canada. Natural Resources Research, 13, 77–86.

    Google Scholar 

  • Cheng, Q. (2007). Mapping singularities with stream sediment geochemical data for prediction of undiscovered mineral deposits in Gejiu, Yunnan Province, China. Ore Geology Reviews, 32, 314–324.

    Google Scholar 

  • Cheng, Q. (2008). Non-linear theory and power-law models for information integration and mineral resources quantitative assessments. Mathematical Geosciences, 40(5), 503–532.

    Google Scholar 

  • Chernoff, H., & Rubin, H. (1956). The estimation of the location of a discontinuity in density. In Proceedings, Third Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 19–37).

  • Chernoff, H., & Zacks, S. (1964). Estimating the current mean of a normal distribution which is subjected to change in time. Annals of Mathematical Statistics, 35, 999–1018.

    Google Scholar 

  • Chernoyarov, O., Kutoyants, Y., & Top, A. (2018). On multiple change-point estimation for Poisson process. Communications in Statistics - Theory and Methods, 47(5), 1215–1233. https://doi.org/10.1080/03610926.2017.1317810.

    Article  Google Scholar 

  • Commenges, D., & Seal, J. (1985). The analysis of neuronal discharge sequences: change-point estimation and comparison of variances. Statistics in Medicine, 4, 91–104.

    Google Scholar 

  • Conover, W. (1999). Practical nonparametric statistics (3rd ed.). New York: Wiley.

    Google Scholar 

  • Darkhovsky, B. (1976). A nonparametric method for the a posteriori detection of the “disorder” time of a sequence of independent random variables. Theory of Probability and its Applications, 21, 178–183.

    Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the E-M algorithm. Journal of the Royal Statistical Society B, 39, 1–22.

    Google Scholar 

  • Deshayes, J. (1984). Ruptures de modèles pour les processus de Poisson. Annales Scientifiques Univ Clermont-Ferrand II, 78, 1–7.

    Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap (Vol. 57). London: Chapman and Hall.

    Google Scholar 

  • Fabbri, A., & Chung, C.-J. (2008). On blind tests and spatial prediction models. Natural Resources Research, 17(2), 107–118.

    Google Scholar 

  • Filzmosera, P., Garrett, R., & Reimann, C. (2005). Multivariate outlier detection in exploration geochemistry. Computers and Geosciences, 31, 579–587.

    Google Scholar 

  • Fisher, R. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society, Series A, 222(594–604), 309–368.

    Google Scholar 

  • Ford, A., Miller, J., & Mol, A. (2016). A Comparative Analysis of Weights of Evidence, evidential belief functions, and fuzzy logic for mineral potential mapping using incomplete data at the scale of investigation. Natural Resources Research, 25, 19–33.

    Google Scholar 

  • Foxall, R., & Baddeley, A. (2002). Nonparametric measures of association between a spatial point process and a random set, with geological applications. Applied Statistics, 51(2), 165–182.

    Google Scholar 

  • Galun, S., & Trifonov, A. (1982). Detection and estimation of the time when the Poisson flow intensity changes. Automation and Remote Control, 43(6), 782–790.

    Google Scholar 

  • Gardner, L. (1969). On detecting changes in the mean of normal variables. Annals of Mathematical Statistics, 40, 116–126.

    Google Scholar 

  • Garrett, R. (1989). The chi-square plot: a tool for multivariate outlier recognition. Journal of Geochemical Exploration, 32(1), 319–341.

    Google Scholar 

  • Geological Survey of Western Australia. (1994). MINEDEX database. (https://dmp.wa.gov.au/Mines-and-mineral-deposits-1502.aspx)

  • Ghannadpour, S., & Hezarkhani, A. (2016). Exploration geochemistry data-application for anomaly separation based on discriminant function analysis in the Parkam porphyry system. Geosciences Journal, 20(6), 837–850.

    Google Scholar 

  • Goldfarb, R., & Groves, D. (2015). Orogenic gold: Common or evolving fluid and metal sources through time. Lithos, 233, 2–26.

    Google Scholar 

  • Goodacre, A., Bonham-Carter, G., Agterberg, F., & Wright, D. (1993). A statistical analysis of the spatial association of seismicity with drainage patterns and magnetic anomalies in western Quebec. Tectonophysics, 217, 285–305.

    Google Scholar 

  • Gorney, R., Ferris, D., Ward, A., & Williams, L. (2011). Assessing channel-forming characteristics of an impacted headwater stream in Ohio, USA. Ecological Engineering, 37(3), 418–430.

    Google Scholar 

  • Groves, D., Goldfarb, R., Knox-Robinson, C., Ojala, J., Gardoll, S., Yun, G., et al. (2000). Late-kinematic timing of orogenic gold deposits and significance for computer-based exploration techniques with emphasis on the Yilgarn Block, Western Australia. Ore Geology Reviews, 17, 1–38.

    Google Scholar 

  • Groves, D., & Santosh, M. (2016). The giant Jiaodong gold province: The key to a unified model for orogenic gold deposits? Geoscience Frontiers, 7, 409–417.

    Google Scholar 

  • Hájek, J., & Rényi, A. (1955). Generalization of an inequality of Kolmogorov. Acta Math. Acad. Sci. Hungar., 6, 281–283.

    Google Scholar 

  • Harrell, F. (2001). Regression modeling strategies. New York: Springer.

    Google Scholar 

  • Harris, D., & Pan, G. (1999). Mineral favourability mapping: a comparison of artificial neural networks, logistic regression, and discriminant analysis. Natural Resources Research, 8, 93–109.

    Google Scholar 

  • Harris, D., Zurcher, L., Stanley, M., Marlow, J., & Pan, G. (2003). Comparative analysis of favorability mappings by weights of evidence, probabilistic neural networks, discriminant analysis, and logistic regression. Natural Resources Research, 12(4), 241–255.

    Google Scholar 

  • Harris, J., Grunsky, E., Behnia, P., & Corrigan, D. (2015). Data- and knowledge-driven mineral prospectivity maps for Canada’s North. Ore Geology Reviews, 71, 788–803.

    Google Scholar 

  • Hinkley, D. (1970). Inference about the change-point in a sequence of random variables. Biometrika, 57, 1–17.

    Google Scholar 

  • Hinkley, D. (1971). Inference about the change-point from the cumulative sum test. Biometrika, 58, 509–523.

    Google Scholar 

  • Hochberg, Y., & Tamhane, A. (1987). Multiple comparison procedures. New York: Wiley.

    Google Scholar 

  • Hogg, R., & Craig, A. (1970). Introduction to mathematical statistics (3rd ed.). London: Macmillan.

    Google Scholar 

  • Hosmer, D., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). Hoboken: Wiley.

    Google Scholar 

  • Hsu, J. (1996). Multiple comparisons: theory and methods. London: Chapman and Hall.

    Google Scholar 

  • Kalbfleisch, J. (1985). Probability and statistical inference. Volume 2: Statistical inference (Second ed.). New York: Springer.

  • Kander, Z., & Zacks, S. (1966). Test procedures for possible changes in parameters of statistical distributions occuring at unknown time points. Annals of Mathematical Statistics, 37, 1196–1210.

    Google Scholar 

  • Kendall, M. G., & Stuart, A. (1973). The advanced theory of statistics (3rd ed., Vol. 2). London: Charles Griffin and Company Ltd.

    Google Scholar 

  • Knox-Robinson, C., & Groves, D. (1997). Gold prospectivity mapping using a geographic information system (GIS), with examples from the Yilgarn Block of Western Australia. Chronique de la Recherche Minière, 529, 127–138.

    Google Scholar 

  • Krzanowski, W., & Hand, D. (2009). ROC curves for continuous data. London/Boca Raton: Chapman and Hall/CRC Press.

    Google Scholar 

  • Kutoyants, Y. (1998). Statistical inference for spatial poisson processes (Vol. 134). New York: Springer.

    Google Scholar 

  • Lehmann, E. L. (1999). Elements of large-sample theory. New York: Springer.

    Google Scholar 

  • Leonard, T. (1978). Density estimation, stochastic processes and prior information (with discussion). Journal of the Royal Statistical Society, Series B, 40, 113–146.

    Google Scholar 

  • Li, N., Bagas, L., Li, X., Xiao, K., Li, Y., Ying, L., et al. (2016). An improved buffer analysis technique for model-based 3D mineral potential mapping and its application. Ore Geology Reviews, 76, 94–107.

    Google Scholar 

  • Lindsey, J. (1996). Parametric statistical inference. Oxford: Clarendon Press.

    Google Scholar 

  • Little, R., & Rubin, D. (2002). Statistical analysis with missing data (2nd ed.). Hoboken: Wiley.

    Google Scholar 

  • Liu, J., & Cheng, Q. (2019). A modified weights-of-evidence method for mineral potential prediction based on structural equation modeling. Natural Resources Research, 28, 1037–1053.

    Google Scholar 

  • Liu, Y., Cheng, Q., Xia, Q., & Wang, X. (2014). Mineral potential mapping for tungsten polymetallic deposits in the Nanling metallogenic belt, South China. Journal of Earth Science, 25, 689–700.

    Google Scholar 

  • Loader, C. (1992). A log-linear model for a Poisson process changepoint. Annals of Statistics, 20, 1391–1411.

    Google Scholar 

  • Murphy, S., & van der Vaart, A. (2000). On profile likelihood. Journal of the American Statistical Association, 95(450), 449–465.

    Google Scholar 

  • Nam, B.-H., & D’Agostino, R. (2002). Discrimination index, the area under the ROC curve. In C. Huber-Carol, N. Balakrishnan, M. Nikulin, & M. Mesbah (Eds.), Goodness-of-Fit Tests and Model Validity (pp. 267–279). Basel: Birkhäuser.

  • Neuhäuser, B., & Terhorst, B. (2007). Landslide susceptibility assessment using “weights-of-evidence” applied to study area at the Jurassic escarpment (SW-Germany). Geomorphology, 86(1–2), 12–24.

    Google Scholar 

  • Page, E. (1954). Continuous inspection schemes. Biometrika, 41, 100–115.

    Google Scholar 

  • Page, E. (1957). On problems in which a change in a parameter occurs at an unknown point. Biometrika, 44, 248–252.

    Google Scholar 

  • Payne, C., Cunningham, F., Peters, K., Nielsen, S., Puccioni, E., Wildman, C., et al. (2015). From 2D to 3D: Prospectivity modelling in the Taupo Volcanic Zone, New Zealand. Ore Geology Reviews, 71, 558–577.

    Google Scholar 

  • Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157–175. (Series 5).

    Google Scholar 

  • Pflug, G. (1983). The limiting log-likelihood process for discontinuous density families. Zeitschrift fuer Wahscheinlichkeitstheorie und verwandte Gebiete, 64, 15–35.

    Google Scholar 

  • Polykretis, C., & Chalkias, C. (2018). Comparison and evaluation of landslide susceptibility maps obtained from weights of evidence, logistic regression, and artificial neural network models. Natural Resources Research, 93, 249–274.

    Google Scholar 

  • Pons, O. (2018). Estimations and tests in change-point models. Singapore: World Scientific.

    Google Scholar 

  • Porwal, A., Gonzalez-Alvarez, I., Markwitz, V., McCuaig, T., & Mamuse, A. (2010). Weights-of-evidence and logistic regression modeling of magmatic nickel sulfide propectivity in the Yilgarn Craton, Western Australia. Ore Geology Reviews, 38(3), 184–196.

    Google Scholar 

  • Pratt, J. (1959). On a general concept of “In Probability”. Annals of Mathematical Statistics, 30, 549–558.

    Google Scholar 

  • R Development Core Team. (2018). R: A language and environment for statistical computing [computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/ (ISBN 3-900051-07-0)

  • Raftery, A., & Akman, V. (1986). Bayesian analysis of a Poisson process with a change-point. Biometrika, 73(1), 85–89.

    Google Scholar 

  • Read, T., & Cressie, N. (1988). Goodness-of-fit statistics for multivariate data. New York: Springer.

    Google Scholar 

  • Rice, J. (2006). Mathematical statistics and data analysis (3rd ed.). New York: Duxbury.

    Google Scholar 

  • Robert, F., Poulson, K., Cassidy, K., & Hodgson, C. (2005). Gold metallogeny of the superior and Yilgarn Cratons. In J. Hedenquist, J. Thompson, R. Goldfarb, & J. Richards (Eds.), Economic geology one hundredth anniversary volume (pp. 1001–1033). Littleton, Colorado, USA: Society of Economic Geologists (ISBN 978-1-887483-01-8).

  • Romero-Calcerrada, R., Barrio-Parra, F., Millington, J., & Novillo, C. (2010). Spatial modeling of socioeconomic data to understand patterns of human-caused wildfire ignition risk in the SW of Madrid (central Spain). Ecological Modelling, 221(1), 34–45.

    Google Scholar 

  • Romero-Calcerrada, R., & Luque, S. (2006). Habitat quality assessment using weights-of-evidence based GIS modelling: The case of picoides tridactylus as species indicator of the biodiversity value of the finnish forest. Ecological Modelling, 196(1–2), 62–76.

    Google Scholar 

  • Rubin, H. (1961). The estimation of discontinuities in multivariate densities, and related problems in stochastic process. In Proceedings, fourth berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 563–574). University of California Press.

  • Ruopp, M., Perkins, N., Whitcomb, B., & Schisterman, E. (2008). Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection. Biometrical journal, 50(3), 419–430.

    Google Scholar 

  • Sager, T. (1982). Nonparametric maximum likelihood estimation of spatial patterns. Annals of Statistics, 10, 1125–1136.

    Google Scholar 

  • Schaeben, H. (2014). Targeting: Logistic regression, special cases and extensions. ISPRS International Journal of Geo-Information, 3, 1387–1411.

    Google Scholar 

  • Schaeben, H., & Semmler, G. (2016). The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression. Frontiers of Earth Science, 10(3), 389–408.

    Google Scholar 

  • Schafer, J. (1997). Analysis of incomplete multivariate data. London: Chapman and Hall.

    Google Scholar 

  • Sen, A., & Srivastava, M. (1975). On tests for detecting change in mean. Annals of Statistics, 3, 98–108.

    Google Scholar 

  • Severini, T. (2000). Likelihood methods in statistics. Oxford: Oxford University Press.

    Google Scholar 

  • Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46, 561–584.

    Google Scholar 

  • Shewhart, W. (1983). Economic control of quality of manufactured product. Princeton, NJ: Van Nostrand Reinhold.

    Google Scholar 

  • Silverman, B. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.

    Google Scholar 

  • Smith, A. (1975). A Bayesian approach to inference about a change-point in a sequence of random variables. Biometrika, 62, 407–416.

    Google Scholar 

  • Solomon, M., & Groves, D. (1994). The geology and origin of Australia’s mineral deposits. New York: Oxford University Press.

    Google Scholar 

  • Stephens, M. (1986). Tests based on EDF statistics. In R. D’Agostino & M. Stephens (Eds.), Goodness-of-fit techniques (Vol. 68, pp. 97–193). New York: Marcel Dekker.

    Google Scholar 

  • Vach, W. (1994). Logistic regression with missing values in the covariates. Berlin: Springer.

    Google Scholar 

  • van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton: Chapman and Hall.

    Google Scholar 

  • Wand, M., & Jones, M. (1995). Kernel smoothing. London: Chapman and Hall.

    Google Scholar 

  • Wang, G., Du, W., & Carranza, E. (2016). Remote sensing and GIS prospectivity mapping for magmatic-hydrothermal base- and precious-metal deposits in the Honghai district, China. Journal of African Earth Sciences, 128, 97–115.

    Google Scholar 

  • Wang, G., Li, R., Carranza, E., Zhang, S., Yan, C., Zhu, Y., et al. (2015). 3D geological modeling for prediction of subsurface Mo targets in the Luanchuan district, China. Ore Geology Reviews, 71, 592–610.

    Google Scholar 

  • Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York: Springer.

    Google Scholar 

  • Watkins, K., & Hickman, A. (1990). Geological evolution and mineralization of the Murchison Province, Western Australia (Bulletin No. 137). Geological Survey of Western Australia. (Published by Department of Mines, Western Australia, 1990. Available online from Department of Industry and Resources, State Government of Western Australia, www.doir.wa.gov.au)

  • West, W., & Ogden, T. (1997). Continuous-time estimation of a changepoint in a Poisson process. Journal of Statistical Computation and Simulation, 56(4), 293–302.

    Google Scholar 

  • Wilk, M., & Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data. Biometrika, 55, 1–17.

    Google Scholar 

  • Witt, W., Ford, A., Hanrahan, B., & Mamuse, A. (2013). Regional-scale targeting for gold in the Yilgarn Craton: Part 1 of the Yilgarn gold exploration targeting atlas (Report No. 125). Perth, Western Australia: Geological Survey of Western Australia.

  • Xiao, K., Li, N., Porwal, A., Holden, E., Bagas, L., & Lu, Y. (2015). GIS-based 3D prospectivity mapping: A case study of Jiama copper-polymetallic deposit in Tibet, China. Ore Geology Reviews, 71, 611–632.

    Google Scholar 

  • Yang, F., Wang, G., Santosh, M., Li, R., Tang, L., Cao, H., et al. (2017). Delineation of potential exploration targets based on 3D geological modeling: A case study from the Laoangou Pb–Zn–Ag polymetallic ore deposit, China. Ore Geology Reviews, 89, 228–252.

    Google Scholar 

  • Yeomans, C. (2018). Enhancing the geological understanding of Southwest England using machine learning algorithms (unpublished doctoral dissertation). Camborne School of Mines.

  • Youden, W. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.

    Google Scholar 

  • Zacks, S. (1983). Survey of classical and Bayesian approaches to the change-point problem: Fixed sample and sequential procedures of testing and estimation. In M. Rizvi, J. Rustagi, & D. Siegmund (Eds.), Recent advances in statistics: Papers in honour of Herman Chernoff on his sixtieth birthday (pp. 245–269). New York/London: Academic Press.

    Google Scholar 

  • Zhang, N., & Zhou, K. (2015). Mineral prospectivity mapping with weights of evidence and fuzzy logic methods. Journal of Intelligent and Fuzzy Systems, 29(6), 2639–2651.

    Google Scholar 

Download references

Acknowledgments

We thank Kassel Hingee for his insightful contributions to the initial research work which led to this article. In initial phases of this research, Adrian Baddeley was funded by the Australian Research Council under a Discovery Outstanding Researcher Award, and hosted by Professor Eun-Jung Holden of the Centre for Exploration Targeting (CET) at the University of Western Australia. Aloke Phatak was partially supported by the Australian Government through the Australian Research Council’s Industrial Transformation Training Centres scheme (project IC180100030). We warmly thank the reviewers for their comments, which have greatly helped us to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian Baddeley.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 212 KB)

Supplementary material 2 (pdf 174 KB)

Appendices

Appendices

Capture–Efficiency Curve as a CDF

In the “Threshold Selection Using the Capture–Efficiency Curve” section, “Principle” section, we mentioned that the capture–efficiency curve can be regarded as a cumulative distribution function (cdf) in its own right. Here we clarify that comment.

A very subtle interpretation of the capture–efficiency curve used by statisticians is the “transformation to uniformity” or “probability integral transformation” (Kendall and Stuart 1973, p. 459 ff.; Hogg and Craig 1970, pp. 349–350). Suppose that the original predictor Z is replaced by a new predictor V, defined at each spatial location u by \(V(u) = G(Z(u)) = a(Z(u))/a\), which is the area fraction of spatial locations where the predictor value does not exceed the value z. In words, at a given spatial location u, the value of V(u) is the fraction of area of the survey region where the original predictor Z does not exceed the value Z(u) which it takes at u. If Z is distance-to-nearest-fault, then V(u) is the area fraction occupied by the buffer at distance equal to the distance from u to the faults, that is, the buffer whose boundary passes through the point u. Then, the capture–efficiency curve is the cumulative distribution function of the transformed predictor V at the deposit points, while the diagonal line is the cumulative distribution function of V over all spatial locations in the survey region.

Likelihood Function for Threshold Model

This appendix provides elementary explanations for the appearance of the binomial probability distribution discussed in the “Significance Tests” section, “Significance Test for a Binary Predictor” section, and for the form of the likelihood function based on this distribution, and for the form of the likelihood discussed in the section on “Threshold Selection Using Change-Point Analysis”, “Profile Likelihood for the Threshold Model” section, which results from the assumption that the grid cells are very small in area relative to the survey region.

To simplify discussion, we assume that the geometry of the survey region S and of the prospective map feature B are fixed and known in advance, while the mineral deposit locations are discovered during the survey. In the notation of the section on Binary Predictors, the survey region has area a and the feature B has area \(a_B\).

The survey process begins by dividing the survey region into N grid cells of equal area, then determining whether each grid cell contains or does not contain a deposit. For our purposes the results of the survey are the count \(n_B\) of grid cells inside B which contain a deposit, and the count \(n_{\overline{B}}\) of grid cells outside B which contain a deposit. The total number of grid cells containing deposits is \(n = n_B + n_{\overline{B}}\).

Uniform Prospectivity Model Using Binomial Probabilities

We first consider the simple model in which prospectivity is uniform over the entire survey region. Each grid cell has the same probability p of containing a deposit. The outcomes in different grid cells are assumed to be statistically independent. There are N grid cells altogether. Therefore, the number n of grid cells containing a deposit follows a binomial distribution on N trials with success probability p; the probability that exactly n grid cells contain deposits is

$$\begin{aligned} {N \atopwithdelims ()n} p^n (1-p)^{N-n} . \end{aligned}$$
(26)

The expected total number of grid cells which contain deposits is Np.

The same principle applies to the grid cells inside the feature B; the probability that exactly \(n_B\) grid cells inside B contain deposits is

$$\begin{aligned} {N_B \atopwithdelims ()n_B} p^{n_B} (1-p)^{N_B-n_B} , \end{aligned}$$
(27)

where \(N_B = (a_B/a) N\) is the number of grid cells that constitute the feature B. The expected number of grid cells inside B that contain deposits is \(p N_B = p (a_B/a) N\).

Again this applies to the grid cells outside B; the probability of obtaining \(n_{\overline{B}}\) grid cells outside B which contain deposits is

$$\begin{aligned} {N_{\overline{B}}\atopwithdelims ()n_{\overline{B}}} p^{n_{\overline{B}}} (1-p)^{N_{\overline{B}}-n_{\overline{B}}} , \end{aligned}$$
(28)

where \(N_{\overline{B}}= (a_{\overline{B}}/a) N\) is the number of grid cells in \({\overline{B}}\). The expected number of grid cells outside B that contain deposits is \(p N_{\overline{B}}= (a_{\overline{B}}/a) p N\).

Combining (27) and (28), the probability of obtaining \(n_B\) cells with deposits inside B and \(n_{\overline{B}}\) cells with deposits outside B is

$$\begin{aligned} {N_B \atopwithdelims ()n_B} p^{n_B} (1-p)^{N_B-n_B} {N_{\overline{B}}\atopwithdelims ()n_{\overline{B}}} p^{n_{\overline{B}}} (1-p)^{N_{\overline{B}}-n_{\overline{B}}} \end{aligned}$$

which simplifies to

$$\begin{aligned} {N_B \atopwithdelims ()n_B} {N_{\overline{B}}\atopwithdelims ()n_{\overline{B}}} p^{n} (1-p)^{N-n} . \end{aligned}$$
(29)

This combining is justified because individual grid cell outcomes, and hence outcomes in disjoint parts of the survey region, are independent.

The likelihood function of the model is the probability (29) of observing the data \((n_B,n_{\overline{B}})\), treated as a function of the parameter p:

$$\begin{aligned} L(p) = {N_B \atopwithdelims ()n_B} {N_{\overline{B}}\atopwithdelims ()n_{\overline{B}}} p^{n} (1-p)^{N-n} . \end{aligned}$$

Constant factors that do not depend on p can be omitted, so the likelihood would often be reported as

$$\begin{aligned} L(p) = p^{n} (1-p)^{N-n} . \end{aligned}$$
(30)

Simple Threshold Model Using Binomial Probabilities

In the simple threshold model we assume that prospectivity is higher inside the feature B. Each grid cell inside B has probability \(p_B\) of containing a deposit, while each grid cell outside B has a different probability \(p_{\overline{B}}\) of containing a deposit, where \(p_{\overline{B}}< p_B\). We simply replace p by \(p_B\) in equation (27) to find that the probability of obtaining exactly \(n_B\) cells with deposits inside B is

$$\begin{aligned} {N_B \atopwithdelims ()n_B} p_B^{n_B} (1-p_B)^{N_B-n_B} . \end{aligned}$$
(31)

Replacing p by \(p_{\overline{B}}\) in Eq. (28), the probability of obtaining \(n_{\overline{B}}\) grid cells outside B which contain deposits is

$$\begin{aligned} {N_{\overline{B}}\atopwithdelims ()n_{\overline{B}}} p_{\overline{B}}^{n_{\overline{B}}} (1-p_{\overline{B}})^{N_{\overline{B}}-n_{\overline{B}}} . \end{aligned}$$
(32)

Combining (31) and (32), the probability of obtaining \(n_B\) cells containing deposits inside B and \(n_{\overline{B}}\) cells containing deposits outside B is

$$\begin{aligned} {N_B \atopwithdelims ()n_B} p_B^{n_B} (1-p_B)^{N_B-n_B} {N_{\overline{B}}\atopwithdelims ()n_{\overline{B}}} p_{\overline{B}}^{n_{\overline{B}}} (1-p_{\overline{B}})^{N_{\overline{B}}-n_{\overline{B}}} . \end{aligned}$$

The likelihood function for the simple threshold model is (again omitting constant factors)

$$\begin{aligned} L(p_B, p_{\overline{B}}) = p_B^{n_B} (1-p_B)^{N_B-n_B} p_{\overline{B}}^{n_{\overline{B}}} (1-p_{\overline{B}})^{N_{\overline{B}}-n_{\overline{B}}} . \end{aligned}$$
(33)

The likelihood now has two arguments \(p_B\) and \(p_{\overline{B}}\) representing the probabilities of a deposit for grid cells inside and outside B, respectively. Notice that if the two probabilities were equal, \(p_B = p_{\overline{B}}= p\), then (33) would collapse to (30).

Rescaling in Terms of Density

Since the grid cells are artificial (and in particular their size is an arbitrary choice), it is useful to rescale the equations so that they depend as little as possible on the grid geometry. This can be done by using the average density of deposits, \(\mu\), defined as the expected number per unit area.

In the uniform prospectivity model, the expected total number of grid cells that contain deposits is Np. The average density is therefore \(\mu = Np/a\). Noting that the area of one grid cell is \(\epsilon = a/N\), we see that the average density is equal to \(\mu = p/\epsilon\). Equivalently, \(p = \mu \epsilon\), that is, the probability of a deposit in any given cell is equal to the average density of deposits times the area of the grid cell. Replacing p by \(\mu \epsilon\) in Eq. 30, and removing constant factors, we get the likelihood

$$\begin{aligned} L(\mu ) = \mu ^{n} (1-\mu \epsilon )^{N-n} . \end{aligned}$$
(34)

In the simple threshold model, we have two different densities inside and outside the feature B. Inside B, the expected number of grid cells containing deposits is \(N_B p_B\) and the area is \(a_B= N_B \epsilon\) so the density is \(\mu _B = N_B p_B/a_B = p_B/\epsilon\). Outside B, the density is \(\mu _{\overline{B}}= p_{\overline{B}}N_{\overline{B}}/a_{\overline{B}}= p_{\overline{B}}/\epsilon\). Substituting into (33) we get the likelihood

$$\begin{aligned} L(\mu _B, \mu _{\overline{B}}) = \mu _B^{n_B} (1-\epsilon \mu _B)^{N_B-n_B} \mu _{\overline{B}}^{n_{\overline{B}}} (1- \epsilon \mu _{\overline{B}})^{N_{\overline{B}}-n_{\overline{B}}} . \end{aligned}$$
(35)

Small Grid Cells

Finally, we suppose that the grid cells are very small. Then, N is very large, \(\epsilon = a/N\) is very small. For the uniform prospectivity model, in the likelihood (30) the term \((1-\mu \epsilon )^{N-n}\) converges to the exponential \(\exp (-\mu N \epsilon ) = \exp (-\mu a)\) as N becomes large, giving the likelihood

$$\begin{aligned} L(\mu ) = \mu ^{n} \exp (-\mu a). \end{aligned}$$

For the simple threshold model, the likelihood (33) similarly converges to

$$\begin{aligned} L(\mu _B, \mu _{\overline{B}}) = \mu _B^{n_B} \mu _{\overline{B}}^{n_{\overline{B}}} \exp (-\mu _B a_B - \mu _{\overline{B}}a_{\overline{B}}). \end{aligned}$$

Taking logarithms, the log-likelihood for the null model of uniform prospectivity is

$$\begin{aligned} \ln L(\mu ) = n \ln \mu -\mu a , \end{aligned}$$
(36)

and for the simple threshold model

$$\begin{aligned} \ln L(\mu _B, \mu _{\overline{B}}) = n_B \ln \mu _B + n_{\overline{B}}\ln \mu _{\overline{B}}-\mu _B a_B - \mu _{\overline{B}}a_{\overline{B}}. \end{aligned}$$
(37)

If the feature B is the region determined by thresholding a spatial predictor function Z(u) to have values less than or equal to a threshold z, then \(a_B = a(z)\) is the area of this region, \(n_B = n(z)\) is the number of deposits with predictor values less than or equal to z, and we have \(a_{\overline{B}}= a - a_B\) and \(n_{\overline{B}}= n - n_B = n - n(z)\), so that Eq. 37 is translated into Eq. 10 of the paper.

Connections Between the Threshold Selection Criteria

In this appendix, we provide proofs of claims made in the “Significance Tests” section and “Mathematical Connections between the Criteria” section regarding relationships between various threshold selection criteria.

Forms of the \(\chi ^2\) Statistic

First, we prove the claim in the “Significance Tests” section that the general form of the \(\chi ^2\) statistic (18) reduces to the special form (19) in this case. For any feature B, define \(e_B = (n/ a) a_B = ( a_B/ a) n\) and \(e_{\overline{B}}= (n/ a) a_{\overline{B}}= ( a_{\overline{B}}/ a) n\), the expected counts in B and \({\overline{B}}\), respectively, when the deposits are randomly distributed with constant density. Then

$$\begin{aligned} n_{\overline{B}}- e_{\overline{B}}= (n - n_B) - (n - e_B) = -(n_B - e_B) \end{aligned}$$

so that

$$\begin{aligned} X^2&= \frac{(n_B - e_B)^2}{e_B} + \frac{(n_{\overline{B}}- e_{\overline{B}})^2}{e_{\overline{B}}} = \frac{(n_B - e_B)^2}{e_B} + \frac{(n_B - e_B)^2}{e_{\overline{B}}} = (n_B - e_B)^2 \left( \frac{1}{e_B} + \frac{1}{e_{\overline{B}}} \right) \\&= (n_B - e_B)^2 \frac{e_{\overline{B}}+ e_B}{ e_B e_{\overline{B}}} = \frac{n}{e_B e_{\overline{B}}} (n_B - e_B)^2, \end{aligned}$$

which is equivalent to (19).

Connection Between Akman–Raftery and \(\chi ^2\) Criteria

Next we prove the connection between \(\hbox {AR}(z)\) and X(z) stated in Eq. 14, in the “Mathematical Connections between the Criteria” section, and again in the “Significance Tests” section, “Akman–Raftery (Constrained \(\chi ^2\) Test)” section.

For candidate threshold value z, let n(z) be the number of deposits below the threshold, a(z) the area of study region below the threshold, n the total number of deposits, a the total area of study region. Define \(s(z) = a(z)/a\), the area fraction.

Then, from the definition (13) of the Akman–Raftery criterion,

$$\begin{aligned} \hbox {AR}(z)&= \sqrt{s(z) (1-s(z))} \left( \frac{n(z)}{s(z)} - \frac{n - n(z)}{1- s(z)} \right) \\&= \sqrt{s(z) (1-s(z))} \frac{ n(z) (1 - s(z)) - s(z) (n - n(z)) }{ s(z) (1-s(z)) } \\&= \sqrt{s(z) (1-s(z))} \frac{ n(z) - n s(z) }{ s(z) (1-s(z)) } \\&= n \frac{ \frac{n(z)}{n} - s(z) }{ \sqrt{s(z) (1-s(z))} } \\&= \sqrt{n} X(z). \end{aligned}$$

This proves (14).

Relation Between \(\chi ^2\) Statistic and Youden Criterion

Here, we prove the claim, made in the “Mathematical Connections between the Criteria” section, that the \(\chi ^2\) statistic is the standardised version of the Youden criterion.

Under the null hypothesis that the density of deposits is uniform, if we treat the total number of deposits as fixed, then for any threshold z the count n(z) follows a binomial distribution with n trials and success probability \(p = a(z)/ a = s(z)\). This distribution has variance \(n p(1-p) = n s(z) (1-s(z))\). Accordingly \(Y(z) = (n(z)/n) - s(z)\) has variance \(s(z)(1-s(z))/n\) so that the standard error of Y(z), under the null hypothesis of a uniform density of deposits, is \(\hbox {se}_0(Y(z)) = \sqrt{s(z) (1-s(z))/n}\). Finally, inspecting (7) yields (15).

Derivation of Optimal Threshold for Gradual Decline in Prospectivity

This appendix provides the proof of the claim in the “Gradual Decline in Prospectivity” section of the “Discussion” section that, if the prospectivity (i.e. the density of deposit points) is a decreasing function of the predictor value, then the optimal threshold for the Youden criterion is the threshold at which that function equals the average density of deposits over the survey region.

We define the prospectivity as the spatially varying density (intensity) of deposit points considered as a function \(\lambda (u)\) of spatial location u. In any given grid cell, the expected number of deposits is equal to \(\lambda (u)\epsilon\) where u is the location of the cell centre and \(\epsilon > 0\) is the cell area. See Baddeley (2018) for an explanation.

Assume first that the density of deposits depends only on the predictor Z. That is, we assume that \(\lambda (u)\) depends on Z(u) through the relation \(\lambda (u) = \rho (Z(u))\), where \(\rho\) is a nonnegative function. Several examples of the relationship between predictor and prospectivity are shown in the top row of Figure 23; these are graphs of \(\rho (z)\) against z. Assume further, as we did in “Gradual Decline in Prospectivity” section, that \(\rho (z)\) is a decreasing function of z, such as those in the left and right columns of Figure 23. Then, G(z) is the spatial cumulative distribution function of the predictor over the study region S, that is,

$$\begin{aligned} G(z) = \frac{1}{ a} \int _S {{\mathbf {1}}} \{ Z(u) \le z \} \, \mathrm{d}u, \end{aligned}$$

where \({{\mathbf {1}}} \{ Z(u) \le z \}\) is the indicator function, equal to 1 if \(Z(u) \le z\) and equal to 0 otherwise. The expected total number of deposits \({{\mathbb {E}}}[n]\) is equal to the integral of the intensity function,

$$\begin{aligned} {{\mathbb {E}}}[n] = \int _S \lambda (u) \, \mathrm{d}u = \int _S \rho (Z(u)) \, \mathrm{d} u \end{aligned}$$

This integral over the spatial domain S can be transformed into a one-dimensional integral

$$\begin{aligned} {{\mathbb {E}}}[n] = a \int _{-\infty }^\infty \rho (z) \, \mathrm{d} G(z). \end{aligned}$$

The cumulative distribution function of the values of Z at the deposit points, averaged over all random outcomes, is

$$\begin{aligned} {\widetilde{F}}(z) = \frac{ a \, \int _{-\infty }^z \rho (v) \, \mathrm{d} G(v) }{ a \, \int _{-\infty }^\infty \rho (v) \, \mathrm{d} G(v) } = \frac{1}{\mu } \int _{-\infty }^z \rho (v) \, \mathrm{d} G(v), \end{aligned}$$
(38)

where \(\mu = {{\mathbb {E}}}[n]/ a\) is the average density of deposits over the whole domain, and v is the dummy variable of integration.

The expected capture–efficiency curve is the graph of \({\widetilde{F}}(z)\) against G(z) for all z; equivalently it is the graph of the function \(s \mapsto {\widetilde{F}}(G^{-1}(s))\) where \(G^{-1}\) is the inverse function of G. Assuming differentiability, the slope of the expected capture–efficiency curve is

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d} s} {\widetilde{F}}(G^{-1}(s)) = \frac{{\widetilde{F}}^\prime (G^{-1}(s))}{G^\prime (G^{-1}(s))}. \end{aligned}$$

But from (38) we have \({\widetilde{F}}^\prime (z) = \rho (z) G^\prime (z)/\mu\), so that the slope of the expected capture–efficiency curve at a given area fraction s is equal to \(\rho (z)/\mu\), where \(z = G^{-1}(s)\).

The Youden method selects the point on the capture–efficiency curve with slope equal to 1. Ignoring sampling variability, that is, if we replace the observed capture–efficiency curve by the expected capture–efficiency curve, the Youden method selects the point with slope \(\rho (z)/\mu = 1\), i.e., it selects the threshold value z for which \(\rho (z) = \mu\).

Non-negligible Grid Cell Size

Equations in the paper assume, for simplicity and clarity, that the area of a grid cell is negligible. To be precise, if \(\epsilon\) denotes the area of one grid cell, then we assume that \(n \epsilon\), the total area of all cells containing deposits, can be treated as zero. This appendix lists the modifications to these equations that are necessary when \(n\epsilon\) is not negligible.

In Eq.  1, the revised formulae are

$$\begin{aligned} W_{+}= \ln \frac{n_B}{ a_B - \epsilon n_B} - \ln \frac{n}{ a} \end{aligned}$$
(39)
$$\begin{aligned} W_{-}= \ln \frac{n_{\overline{B}}}{ a_{\overline{B}}- \epsilon n_{\overline{B}}} - \ln \frac{n}{ a} \end{aligned}$$
(40)
$$\begin{aligned} {\widehat{C}}= \ln \frac{n_B}{ a_B - \epsilon n_B} - \ln \frac{n_{\overline{B}}}{ a_{\overline{B}}- \epsilon n_{\overline{B}}}. \end{aligned}$$
(41)

Equation 2 becomes

$$\begin{aligned} {\widehat{C}}(z) = \ln \left( \frac{n(z)}{ a(z) - \epsilon n(z)} \bigg / \frac{n-n(z)}{ a- a(z) - \epsilon (n-n(z))} \right). \end{aligned}$$
(42)

Equation 3 becomes

$$\begin{aligned} \hbox {se} ({\widehat{C}}) = \sqrt{ \frac{1}{n_B} + \frac{1}{n_{\overline{B}}} + \frac{1}{ a_B/\epsilon - n_B} + \frac{1}{ a_{\overline{B}}/\epsilon - n_{\overline{B}}} } \end{aligned}$$
(43)

and Eq. 5 becomes

$$\begin{aligned} \hbox {se}({\widehat{C}}(z)) = \sqrt{ \frac{1}{n(z)} + \frac{1}{n-n(z)} + \frac{1}{ a(z)/\epsilon - n(z)} + \frac{1}{( a - a(z))/\epsilon - (n - n(z))} }. \end{aligned}$$
(44)

Equation (8) for the Youden criterion becomes

$$\begin{aligned} Y(z) = \frac{n(z)}{n} - \frac{ a(z)-\epsilon n(z)}{ a - \epsilon n}. \end{aligned}$$
(45)

There are no changes in other equations, except that equation (12) does not hold. In the right panel of Figure 10, the ROC curve should be used instead of the capture–efficiency curve.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baddeley, A., Brown, W., Milne, R.K. et al. Optimal Thresholding of Predictors in Mineral Prospectivity Analysis. Nat Resour Res 30, 923–969 (2021). https://doi.org/10.1007/s11053-020-09769-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11053-020-09769-2

Keywords

Navigation