Abstract
The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists’ favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to “validate” this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.
Similar content being viewed by others
References
Agterberg F P (2014). Geomathematics: Theoretical Foundations, Applications and Future Developments. Cham, Heidelberg, New York, Dordrecht, London: Springer
Agterberg F P, Bonham-Carter G F, Wright D F (1990). Statistical pattern integration for mineral exploration. In: Gaál G, Merriam D F, eds. Computer Applications in Resource Estimation Prediction and Assessment for Metals and Petroleum. Oxford, New York: Pergamon Press, 1–21
Agterberg F P, Cheng Q (2002). Conditional independence test for weights-of-evidence modeling. Nat Resour Res, 11(4): 249–255
Berkson J (1944). Application of the logistic function to bio-assay. J Am Stat Assoc, 39(227): 357–365
Bonham-Carter G (1994). Geographic Information Systems for Geoscientists: Modeling with GIS. New York: Pergamon, Elsevier Science
Butz C J, Sanscartier M J (2002). Properties of weak conditional independence. In: Alpigini J J, Peters J F, Skowron A, Zhong N, eds. Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science (Volume 2475). Berlin, Heidelberg: Springer, 349–356www2.cs.uregina.ca/butz/publications/properties.ps.gz
Chalak K, White H (2012). Causality, conditional independence, and graphical separation in settable systems. Neural Comput, 24(7): 1611–1668
Cheng Q (2012). Application of a newly developed boost weights of evidence model (BoostWofE) for mineral resources quantitative assessments. Journal of Jilin University, Earth Sci Ed, 42(6): 1976–1985
Cheng Q (2015). BoostWofE: a new sequential weights of evidence model reducing the effect of conditional dependency. Math Geosci, 47(5): 591–621
Chilès J P, Delfiner P (2012). Geostatistics- Modeling Spatial Uncertainty (2nd ed). New York, Chichester, Weinheim, Brisbane, Singapore, Toronto: John Wiley & Sons
Dawid A P (1979). Conditional independence in statistical theory. J R Stat Soc, B, 41(1): 1–31
Dawid A P (2004). Probability, causality and the empirical world: a Bayes-de Finetti-Popper-Borel synthesis. Stat Sci, 19(1): 44–57
Dawid A P (2007). Fundamentals of Statistical Causality. Research Report 279, Department of Statistical Science, University College London ESRI, ArcGIS. http://www.esri.com/software/arcgis
Ford A, Miller J M, Mol A G (2016). A comparative analysis of weights of evidence, evidential belief functions, and fuzzy logic for mineral potential mapping using incomplete data at the scale of investigation. Nat Resour Res, 25(1): 19–33
Freund Y, Schapire R E (1997). A decision theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 55 (1): 119–139
Freund Y, Schapire R E (1999). A short introduction to boosting. Jinko Chino Gakkaishi, 14(5): 771–780
Friedman J, Hastie T, Tibshirani R (2000). Additive logistic regression: a statistical view of boosting. Ann Stat, 28(2): 337–407
Good I J (1950). Probability and the Weighing of Evidence. London: Griffin
Good I J (1960). Weight of evidence, corroboration, explanatory power, information and the utility of experiments. J R Stat Soc, B, 22(2): 319–331
Good I J (1968). The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Research Monograph No. 30, The MIT Press, Cambridge, MA, 109
Harris D P, Pan G C (1999). Mineral favorability mapping: a comparison of artificial neural networks, logistic regression and discriminant analysis. Nat Resour Res, 8(2): 93–109
Harris D P, Zurcher L, Stanley M, Marlow J, Pan G (2003). A comparative analysis of favorability mappings by weights of evidence, probabilistic neural networks, discriminant analysis, and logistic regression. Nat Resour Res, 12(4): 241–255
Hastie T, Tibshirani R, Friedman J (2009). The Elements of Statistical Learning (2nd ed). New York: Springer
Hosmer D W, Lemeshow S, Sturdivant R X (2013). Applied Logistic Regression (3rd ed). Hoboken, NJ: Wiley & Sons
Journel A G (2002). Combining knowledge from diverse sources: an alternative to traditional data independence hypotheses. Math Geol, 34(5): 573–596
Kreuzer O, Porwal A, eds. (2010). Special Issue “Mineral Prospectivity Analysis and Quantitative Resource Estimation”. Ore Geol Rev, 38 (3): 121–304
Krishnan S (2008). The t-model for data redundancy and information combination in Earth sciences: theory and application. Math Geol, 40(6): 705–727
Minsky M, Selfridge O G (1961). Learning in random nets. In: Cherry C, ed. 4th London Symposium on Information Theory. London: Butterworths, 335–347
Pearl J (2009). Causality: Models, Reasoning, and Inference. 2nd ed. New York: Cambridge University Press
Polyakova E I, Journel A G (2007). The Math Geol, 39(8): 715–733
Porwal A, Carranza E JM(2015). Introduction to the Special Issue: GISbased mineral potential modelling and geological data analyses for mineral exploration. Ore Geol Rev, 71: 477–483
Porwal A, González-Álvarez I, Markwitz V, McCuaig T C, Mamuse A (2010). Weights of evidence and logistic regression modeling of magmatic nickel sulfide prospectivity in the Yilgarn Craton, Western Australia. Ore Geol Rev, 38(3): 184–196
Reed L J, Berkson J (1929). The application of the logistic function to experimental data. J Phys Chem, 33(5): 760–779
Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M (2015). Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev, 71: 804–818
Schaeben H (2014a). Targeting: logistic regression, special cases and extensions. ISPRS Int J Geoinf, 3(4): 1387–1411. Available at: http://www.mdpi.com/2220-9964/3/4/1387
Schaeben H (2014b). Potential modeling: conditional independence matters. GEM-International Journal on Geomathematics, 5(1): 99–116
Schaeben H (2014c). A mathematical view of weights-of-evidence, conditional independence, and logistic regression in terms of Markov random fields. Math Geosci, 46(6): 691–709
Šochman J, Matas J (2004). Adaboost with totally corrective updates for fast face detection. In: Proc. 6th IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, South Korea, 445–450
Suppes P (1970). A Probabilistic Theory of Causality. Amsterdam: North-Holland
Tolosana-Delgado R, van den Boogaart K G, Schaeben H (2014). Potential mapping from geochemical surveys using a Cox process. 10th Conference on Geostatistics for Environmental Applications, Paris, July 9–11, 2014
van den Boogaart K G, Schaeben H (2012). Mineral potential mapping using Cox–type regression for marked point processes. 34th IGC Brisbane, Australia
Wong M S K M, Butz C J (1999). Contextual weak independence in Bayesian networks. In: Proc. 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 670–679
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schaeben, H., Semmler, G. The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression. Front. Earth Sci. 10, 389–408 (2016). https://doi.org/10.1007/s11707-016-0595-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11707-016-0595-y