Abstract
A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.
Article PDF
Similar content being viewed by others
References
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36, 929–965.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International.
Brodley, C. & Rissland, E. (1993). Measuring concept change. Training Issues in Incremental Learning: Papers from the 1993 Spring Symposium (pp. 99–108). Menlo Park, CA: AAAI Press.
Cohen, P. R. (1995). Empirical Methods for Artificial Intelligence. Cambridge, MA: MIT Press.
Dietterich, T. (1995). Overfitting and under-computing in machine learning. ACM Computing Surveys, 27, 326–327.
Edgington, E. (1995). Randomization Tests (3rd edition). New York, NY: Marcel Dekker.
Einhorn, H. (1972). Alchemy in the behavioral sciences. Public Opinion Quarterly, 36, 367–378.
Fayyad, U. & Irani, K. (1992). The attribute selection problem in decision tree generation. Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92) (pp. 104–110). Menlo Park, CA: AAAI Press.
Feelders, A. & Verkooijen, W. (1996). On the statistical comparison of inductive learning methods. In D. Fisher & H.-J. Lenz (Eds.), Learning from Data: Artificial and Intelligence V. New York, NY: Springer Verlag.
Fisher, D. & Schlimmer, J. (1988). Concept simplification and prediction accuracy. Proceedings of the Fifth International Conference on Machine Learning (pp. 22–28). San Mateo, CA: Morgan Kaufmann.
Gaines, B. (1989). An ounce of knowledge is worth a ton of data: Quantitative studies of the trade-off between expertise and data based on statistically well-founded empirical induction. Proceedings of the Sixth International Workshop on Machine Learning (pp. 156–159). San Mateo, CA: Morgan Kaufmann.
Gascuel, O. & Caraux, G. (1992). Statistical significance in inductive learning. Proceedings of the Tenth European Conference on Artificial Intelligence (pp. 435–439). Chichester: Wiley.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1–58.
Hand, D. & Taylor, C. (1987). Multivariate Analysis of Variance and Repeated Measures: A Practical Approach for Behavioural Scientists. London: Chapman and Hall.
Hawkins, D. & Kass, G. (1982). Automatic interation detection. In D. Hawkins (Ed.), Topics in Applied Multivariate Analysis. Cambridge: Cambridge University Press.
Iba, W., Wogulis, J., & Langley, P. (1988). Trading off simplicity and coverage in incremental concept learning. Proceedings of the Fifth International Conference on Machine Learning (pp. 73–79). San Mateo, CA: Morgan Kaufmann.
Jensen, D. (1991). Knowledge discovery through induction with randomization testing. Proceedings of the 1991 Knowledge Discovery in Databases Workshop (pp. 148–159). Menlo Park, CA: AAAI.
Jensen, D. (1992). Induction with Randomization Testing: Decision-Oriented Analysis of Large Data Sets. Doctoral dissertation. St. Louis, MO: Washington University.
Jensen, D. & Schmill, M. (1997). Adjusting for multiple comparisons in decision tree pruning. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 195–198). Menlo Park, CA: AAAI Press.
Kass, G. (1975). Significance testing in Automatic Interaction Detection (A.I.D.). Applied Statistics, 24, 178–189.
Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29, 119–127.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1137–1143). San Francisco, CA: Morgan Kaufmann.
Kohavi, R. & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 275–283). San Francisco, CA: Morgan Kaufmann.
Liu, W. & White, A. (1994). The importance of attribute selection measures in decision tree induction. Machine Learning, 15, 25–41.
Miller, R. (1981). Simultaneous Statistical Inference (2nd edition). New York, NY: Springer-Verlag.
Mingers, J.(1989a).An empirical comparison of pruning methods for decision tree induction.Machine Learning,4,227–243.
Mingers, J.(1989b).Anempirical comparison of selection measures for decision-tree induction.Machine Learning,3, 319–342.
Morgan, J. & Andrews, F. (1973).A comment on Einhorn's “Alchemy in the behavioral sciences”.Public Opinion Quarterly, 37,127–129.
Murthy, S. & Salzberg, S. (1995). Lookahead and pathology in decision tree induction. IJCAI: Proceedings of Fourteenth International Joint Conference on Artificial Intelligence (pp. 1025–1031). San Francisco, CA: Morgan Kaufmann.
Noreen, E. (1989). Computer-Intensive Methods for Testing Hypotheses: An Introduction. New York, NY: Wiley.
Oates, T. & Jensen, D. (1997). The effects of training set size on decision tree complexity. Proceedings of the Fourteenth International Conference on Machine Learning (pp. 254–262). San Francisco, CA: Morgan Kaufmann.
Pearl, J. (1978). On the connection between the complexity and credibility of inferred models. International Journal of General Systems, 4, 255–264.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27, 221–234.
Quinlan, J. R. (1988). Decision trees and multi-valued attributes. In J. Hayes, D. Michie & J. Richards (Eds.), Machine Intelligence (Vol. 11). Oxford, England: Clarendon Press.
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.
Quinlan, J. R. & Cameron-Jones, R. (1995). Oversearching and layered search in empirical learning. IJCAI: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1019–1024). San Francisco, CA: Morgan Kaufmann.
Quinlan, J. R. & Rivest, R. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80, 227–248.
Rao, R., Gordon, D., & Spears, W. (1995). For every generalization action, is there really an equal and opposite reaction? Analysis of the conservation law for generalization performance. Machine Learning: Proceedings of the Twelfth International Conference (pp. 471–479). San Francisco, CA: Morgan Kaufmann.
Ross, S. (1984). A First Course in Probability (2nd edition). New York, NY: Macmillan.
Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1, 317–328.
Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10, 153–178.
Schaffer, C. (1994). A conservation law for generalization performance. Proceedings of the Eleventh International Conference on Machine Learning (pp. 259–265). San Francisco, CA: Morgan Kaufmann.
Sonquist, J., Baker, E., & Morgan, J. (1971). Searching for Structure (Alias, AID-III); An Approach to Analysis of Substantial Bodies of Micro-Data and Documentation for a Computer Program (Successor to the Automatic Interaction Detector Program). Ann Arbor, MI: Survey Research Center, Institute for Social Research, The University of Michigan.
Weiss, S. & Kulikowski, C. (1991). Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann.
White, A. & Liu, W. (1995). Superstitious learning and induction. Artificial Intelligence Review, 9, 3–18.
Wolpert, D.(1992).On the connection between in-sample testing and generalization error.Complex Systems,6,47–94.
Wolpert, D.(1994).Off-training set error and a priori distinctions between learning algorithms.Technical Report SFI TR 95–01–003. Santa Fe, NM: Santa Fe Institute.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Jensen, D.D., Cohen, P.R. Multiple Comparisons in Induction Algorithms. Machine Learning 38, 309–338 (2000). https://doi.org/10.1023/A:1007631014630
Issue Date:
DOI: https://doi.org/10.1023/A:1007631014630