Abstract
Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount of error reduction is linked to the “degree to which the descriptions for a class make errors in a correlated manner.” We present a precise and novel definition for this notion and use twenty-nine data sets to show that the amount of observed error reduction is negatively correlated with the degree to which the descriptions make errors in a correlated manner. We empirically show that it is possible to learn descriptions that make less correlated errors in domains in which many ties in the search evaluation measure (e.g. information gain) are experienced during learning. The paper also presents results that help to understand when and why multiple descriptions are a help (irrelevant attributes) and when they are not as much help (large amounts of class noise).
Article PDF
Similar content being viewed by others
References
Ali K., & Pazzani M. (1992.) Reducing the small disjuncts problem by learning probabilistic concept descriptions. In Petsche T., Judd S. & Hanson S. (Eds.), Computational Learning Theory and Natural Learning Systems, Vol. 3. Cambridge, Massachusetts: MIT Press.
Ali K., & Pazzani M. (1993.) HYDRA: A Noise-tolerant Relational Concept Learning Algorithm In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence Chambery, France: Morgan Kaufmann.
Ali K. & Pazzani M. (1995a.) HYDRA-MM: Learning Multiple Descriptions to Improve Classification Accuracy International Journal on Artificial Intelligence Tools, 1 & 2, 115–133.
Ali K., & Pazzani M. (1995b.) Learning Multiple Relational Rule-based Models. In Fisher D., & Lenz H. (Eds.), Learning from Data: Artificial Intelligence and Statistics, Vol. 5. Fort Lauderdale, FL: Springer-Verlag.
Baxt W.G. (1992.) Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks. Neural Computation, 4, 772–780.
Brazdil, P., & Torgo, L. (1990.) Knowledge Acquisition via Knowledge Integration. In Current Trends in Knowledge Acquisition: IOS Press.
Bernardo, J.M. & Smith, A.F.M. (1994.) Bayesian Theory. John Wiley.
Breiman, L. (1994.) Heuristics of instability in model selection. (Technical Report University of California, Berkeley). Statistics Department.
Breiman, L. (in press.) Bagging Predictors Machine Learning, 24, 123–140.
Buntine W. (1990.) A Theory of Learning Classification Rules. Doctoral dissertation. School of Computing Science, University of Technology, Sydney, Australia.
Clark, P., & Boswell, R. (1991.) Rule Induction with CN2: Some Recent Improvements. In Proceedings of the European Working Session on Learning, 1991: Pitman.
Danyluk A., & Provost F. (1993.) Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network. In Proceedings of the Tenth International Conference on Machine Learning. Amherst, MA: Morgan Kaufmann.
DeGroot M.H. (1986.) Probability and Statistics. Reading, MA: Addison-Wesley.
Drobnic, M. & Gams, M. (1992.) Analysis of Classification with Two Classifiers. In B. du Boulay and V.Sgurev, Artificial Intelligence 5: Methodology, Systems, and Applications. North-Holland.
Drobnic, M. & Gams, M. (1993.) Multistrategy Learning: An Analytical Approach. In Proc. 2nd Intern. Workshop on Multistrategy Learning. Harpers Ferry, WV.
Drucker H., Cortes C., Jackel L., LeCun Y. & Vapnik V. (1994.) Boosting and Other Machine Learning Algorithms. In Machine Learning: Proceedings of the Eleventh International Conference. New Brunswick, NJ: Morgan Kaufmann.
Duda R., Gaschnig J., & Hart P. (1979.) Model design in the Prospector consultant system for mineral exploration. In D. Michie (ed.), Expert systems in the micro-electronic age. Edinburgh, England: Edinburgh University Press.
Dzeroski S., & Bratko, (1992.) Handling noise in Inductive Logic Programming. In Proceedings of the International Workshop on Inductive Logic Programming. Tokyo, Japan: ICOT Press.
Freund Y. & Schapire R.E. (1995.) A Decision-Theoretic Generalization of On-Line Learning and an application to Boosting. In Vitanyi P. (Ed.), Lecture Notes in Artificial Intelligence, Vol. 904. Berlin, Germany: Springer-Verlag.
Gams, M., & Petkovsek, M. (1988.) Learning From Examples in the Presence of Noise. In 8th International Workshop; Expert Systems and their applications, Vol. 2 Avignon, France.
Gams M. (1989.) New Measurements Highlight the Importance of Redundant Knowledge. In European Working Session on Learning (4th: 1989) Montpeiller, France: Pitman.
Hansen L.K., & Salamon P. (1990.) Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 10, 993–1001.
Holte R., Acker L., & Porter B. (1989.) Concept Learning and the Problem of Small Disjuncts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence. Detroit, MI: Morgan Kaufmann.
Howell D. (1987.) Statistical Methods for Psychology. Boston, MA: Duxbury Press.
Kong E.B., & Dietterich T. (1995.) Error-Correcting Output Coding Corrects Bias and Variance. In Machine Learning: Proceedings of the Twelfth International Conference on Machine Learning. Tahoe City, CA: Morgan Kaufmann.
Kononenko I., & Kovacic M. (1992.) Learning as Optimization: Stochastic Generation of Multiple Knowledge. In Machine Learning: Proceedings of the Ninth International Workshop. Aberdeen, Scotland: Morgan Kaufmann.
Kovacic M (1994.) MILP—a stochastic approach to Inductive Logic Programming. In Proceedings of the Fourth International Workshop on Inductive Logic Programming. Bad Honnef/Bonn, Germany: GMD Press.
Kruskal W.H. and Tanur J.M (1978.) International encyclopedia of statistics. New York, NY: Free Press.
Kwok S., & Carter C. (1990.) Multiple decision trees. Uncertainty in Artificial Intelligence, 4, 327–335.
Lavrac, N. & Dzeroski, S. (1991.) Inductive learning of relational descriptions from noisy examples. In Proceedings of International Workshop on Inductive Logic Programming ILP-91. Viana de Castelo, Portugal.
Lloyd J.W. (1984.) Foundations of Logic Programming. Springer-Verlag.
Madigan, D., & York, J. (1993.) Bayesian Graphical Models for Discrete Data. (Technical Report UW-93-259). University of Washington, Statistics Department.
Michalski, R.S., & Stepp, R. (1983.) Learning from Observation: Conceptual Clustering. In Michalski, R.S., Carbonell, J.G., & Mitchell T.M. (Ed.s), Machine Learning: An Artificial Intelligence Approach. Tioga Publishing Co.
Muggleton S., Bain M., Hayes-Michie J., & Michie D. (1989.) An experimental comparison of human and machine-learning formalisms. In Proceedings of the Sixth International Workshop on Machine Learning. Ithaca, NY: Morgan Kaufmann.
Muggleton, S. & Feng, C. (1990.) Efficient Induction of Logic Programs. In Proceedings of the Workshop on Algorithmic Learning Theory: Japanese Society for Artificial Intelligence.
Muggleton S., Srinivasan A., & Bain M. (1992.) Compression, Significance and Accuracy. In Machine Learning: Proceedings of the Ninth International Workshop. Aberdeen, Scotland: Morgan Kaufmann.
Murphy, P.M., & Aha D.W. (1992.) UCI repository of machine learning databases (a machine-readable data repository). Maintained at the Department of Information and Computer Science, University of California, Irvine, CA. Data sets are available by anonymous ftp at ics.uci.edu in the directory pub/machine-learning-databases.
Lavrac N. & Dzeroski S. (1992.) Background knowledge and declarative bias in inductive concept learning. In Oantke, K., Proceedings of the Third International Workshop on Analogical and Inductive Inference. Berlin, Germany: Springer.
Pazzani M., & Brunk C. (1991.) Detecting and correcting errors in rule-based expert systems: an integration of empirical and explanation-based learning. Knowledge Acquisition, 3, 157–173.
Pazzani M., Brunk C., & Silverstein G. (1991.) A knowledge-intensive approach to learning relational concepts. In Machine Learning: Proceedings of the Eighth International Workshop (ML91). Ithaca, NY: Morgan Kaufmann.
Perrone, M. (1993.) Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization. Doctoral dissertation. Department of Physics, Brown University.
Quinlan R. (1986.) Induction of Decision Trees. Machine Learning, 1, 1, 81–106.
Quinlan R. (1990.) Learning logical definitions from relations. Machine Learning, 5, 3, 239–266.
Quinlan R. (1991.) Technical note: Improved Estimates for the Accuracy of Small Disjuncts. Machine Learning, 6, 1, 93–98.
Ripley, B.D. (1987.) Stochastic Simulation. John Wiley & Sons.
Schapire R. (1990.) The strength of Weak Learnability. Machine Learning, 5, 2, 197–227.
Smyth P., Goodman R.M., & Higgins C. (1990.) A Hybrid Rule-Based/Bayesian Classifier. In Proceedings of the 1990 European Conference on Artificial Intelligence London, UK: Pitman.
Smyth P. & Goodman R. (1992.) Rule Induction Using Information Theory. In G. Piatetsky-Shapiro (ed.), Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press, MIT Press.
Spackman K. (1988.) Learning Categorical Decision Criteria in Biomedical Domains. In Proceedings of the Fifth International Conference on Machine Learning. Ann Arbor, MI: Morgan Kaufmann.
TorgoL. (1993.) Rule Combination in Inductive Learning. In Machine Learning: ECML 93. Vienna, Austria: Springer-Verlag.
Towell G., Shavlik J., & Noordewier M. (1990.) Refinement of Approximate Domain Theories by Knowledge-Based Artificial Neural Networks. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90). Boston, MA: AAAI Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ali, K.M., Pazzani, M.J. Error reduction through learning multiple descriptions. Mach Learn 24, 173–202 (1996). https://doi.org/10.1007/BF00058611
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00058611