Multiple Comparisons in Induction Algorithms

Jensen, David D.; Cohen, Paul R.

doi:10.1023/A:1007631014630

Multiple Comparisons in Induction Algorithms

Published: March 2000

Volume 38, pages 309–338, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Multiple Comparisons in Induction Algorithms

Download PDF

David D. Jensen¹ &
Paul R. Cohen²

2999 Accesses
117 Citations
19 Altmetric
1 Mention
Explore all metrics

Abstract

A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.

References

Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36, 929–965.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International.
Google Scholar
Brodley, C. & Rissland, E. (1993). Measuring concept change. Training Issues in Incremental Learning: Papers from the 1993 Spring Symposium (pp. 99–108). Menlo Park, CA: AAAI Press.
Google Scholar
Cohen, P. R. (1995). Empirical Methods for Artificial Intelligence. Cambridge, MA: MIT Press.
Google Scholar
Dietterich, T. (1995). Overfitting and under-computing in machine learning. ACM Computing Surveys, 27, 326–327.
Google Scholar
Edgington, E. (1995). Randomization Tests (3rd edition). New York, NY: Marcel Dekker.
Google Scholar
Einhorn, H. (1972). Alchemy in the behavioral sciences. Public Opinion Quarterly, 36, 367–378.
Google Scholar
Fayyad, U. & Irani, K. (1992). The attribute selection problem in decision tree generation. Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92) (pp. 104–110). Menlo Park, CA: AAAI Press.
Google Scholar
Feelders, A. & Verkooijen, W. (1996). On the statistical comparison of inductive learning methods. In D. Fisher & H.-J. Lenz (Eds.), Learning from Data: Artificial and Intelligence V. New York, NY: Springer Verlag.
Google Scholar
Fisher, D. & Schlimmer, J. (1988). Concept simplification and prediction accuracy. Proceedings of the Fifth International Conference on Machine Learning (pp. 22–28). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Gaines, B. (1989). An ounce of knowledge is worth a ton of data: Quantitative studies of the trade-off between expertise and data based on statistically well-founded empirical induction. Proceedings of the Sixth International Workshop on Machine Learning (pp. 156–159). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Gascuel, O. & Caraux, G. (1992). Statistical significance in inductive learning. Proceedings of the Tenth European Conference on Artificial Intelligence (pp. 435–439). Chichester: Wiley.
Google Scholar
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1–58.
Google Scholar
Hand, D. & Taylor, C. (1987). Multivariate Analysis of Variance and Repeated Measures: A Practical Approach for Behavioural Scientists. London: Chapman and Hall.
Google Scholar
Hawkins, D. & Kass, G. (1982). Automatic interation detection. In D. Hawkins (Ed.), Topics in Applied Multivariate Analysis. Cambridge: Cambridge University Press.
Google Scholar
Iba, W., Wogulis, J., & Langley, P. (1988). Trading off simplicity and coverage in incremental concept learning. Proceedings of the Fifth International Conference on Machine Learning (pp. 73–79). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Jensen, D. (1991). Knowledge discovery through induction with randomization testing. Proceedings of the 1991 Knowledge Discovery in Databases Workshop (pp. 148–159). Menlo Park, CA: AAAI.
Google Scholar
Jensen, D. (1992). Induction with Randomization Testing: Decision-Oriented Analysis of Large Data Sets. Doctoral dissertation. St. Louis, MO: Washington University.
Google Scholar
Jensen, D. & Schmill, M. (1997). Adjusting for multiple comparisons in decision tree pruning. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 195–198). Menlo Park, CA: AAAI Press.
Google Scholar
Kass, G. (1975). Significance testing in Automatic Interaction Detection (A.I.D.). Applied Statistics, 24, 178–189.
Google Scholar
Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29, 119–127.
Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1137–1143). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Kohavi, R. & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 275–283). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Liu, W. & White, A. (1994). The importance of attribute selection measures in decision tree induction. Machine Learning, 15, 25–41.
Google Scholar
Miller, R. (1981). Simultaneous Statistical Inference (2nd edition). New York, NY: Springer-Verlag.
Google Scholar
Mingers, J.(1989a).An empirical comparison of pruning methods for decision tree induction.Machine Learning,4,227–243.
Google Scholar
Mingers, J.(1989b).Anempirical comparison of selection measures for decision-tree induction.Machine Learning,3, 319–342.
Google Scholar
Morgan, J. & Andrews, F. (1973).A comment on Einhorn's “Alchemy in the behavioral sciences”.Public Opinion Quarterly, 37,127–129.
Google Scholar
Murthy, S. & Salzberg, S. (1995). Lookahead and pathology in decision tree induction. IJCAI: Proceedings of Fourteenth International Joint Conference on Artificial Intelligence (pp. 1025–1031). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Noreen, E. (1989). Computer-Intensive Methods for Testing Hypotheses: An Introduction. New York, NY: Wiley.
Google Scholar
Oates, T. & Jensen, D. (1997). The effects of training set size on decision tree complexity. Proceedings of the Fourteenth International Conference on Machine Learning (pp. 254–262). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Pearl, J. (1978). On the connection between the complexity and credibility of inferred models. International Journal of General Systems, 4, 255–264.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Google Scholar
Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27, 221–234.
Google Scholar
Quinlan, J. R. (1988). Decision trees and multi-valued attributes. In J. Hayes, D. Michie & J. Richards (Eds.), Machine Intelligence (Vol. 11). Oxford, England: Clarendon Press.
Google Scholar
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.
Google Scholar
Quinlan, J. R. & Cameron-Jones, R. (1995). Oversearching and layered search in empirical learning. IJCAI: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1019–1024). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. & Rivest, R. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80, 227–248.
Google Scholar
Rao, R., Gordon, D., & Spears, W. (1995). For every generalization action, is there really an equal and opposite reaction? Analysis of the conservation law for generalization performance. Machine Learning: Proceedings of the Twelfth International Conference (pp. 471–479). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Ross, S. (1984). A First Course in Probability (2nd edition). New York, NY: Macmillan.
Google Scholar
Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1, 317–328.
Google Scholar
Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10, 153–178.
Google Scholar
Schaffer, C. (1994). A conservation law for generalization performance. Proceedings of the Eleventh International Conference on Machine Learning (pp. 259–265). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Sonquist, J., Baker, E., & Morgan, J. (1971). Searching for Structure (Alias, AID-III); An Approach to Analysis of Substantial Bodies of Micro-Data and Documentation for a Computer Program (Successor to the Automatic Interaction Detector Program). Ann Arbor, MI: Survey Research Center, Institute for Social Research, The University of Michigan.
Google Scholar
Weiss, S. & Kulikowski, C. (1991). Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann.
Google Scholar
White, A. & Liu, W. (1995). Superstitious learning and induction. Artificial Intelligence Review, 9, 3–18.
Google Scholar
Wolpert, D.(1992).On the connection between in-sample testing and generalization error.Complex Systems,6,47–94.
Google Scholar
Wolpert, D.(1994).Off-training set error and a priori distinctions between learning algorithms.Technical Report SFI TR 95–01–003. Santa Fe, NM: Santa Fe Institute.

Download references

Author information

Authors and Affiliations

Experimental Knowledge Systems Laboratory, Department of Computer Science, University of Massachusetts, Amherst, MA, 01003-4610, USA
David D. Jensen
Experimental Knowledge Systems Laboratory, Department of Computer Science, University of Massachusetts, Amherst, MA, 01003-4610, USA
Paul R. Cohen

Authors

David D. Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Paul R. Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jensen, D.D., Cohen, P.R. Multiple Comparisons in Induction Algorithms. Machine Learning 38, 309–338 (2000). https://doi.org/10.1023/A:1007631014630

Download citation

Issue Date: March 2000
DOI: https://doi.org/10.1023/A:1007631014630

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multiple Comparisons in Induction Algorithms

Abstract

Article PDF

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Multiple Comparisons in Induction Algorithms

Abstract

Article PDF

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation