Machine Learning

, Volume 23, Issue 1, pp 33–46

Scaling up inductive learning with massive parallelism

  • Foster John Provost
  • John M. Aronis
Article

Abstract

Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may by necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research.

Keywords

inductive learning parallelism small disjuncts 

References

  1. Aronis, J.M., & Provost, F.J. (1994). Efficiently constructing relational features from background knowledge for inductive machine learning. Working Notes of the AAAI-94 Workshop on Knowledge Discovery in Databases (pp. 347–358). Seattle, WA: AAAI.Google Scholar
  2. Bobrow, D. (1993). Editorial introduction. Artificial Intelligence, 60, 197.Google Scholar
  3. Buchanan, B., & Mitchell, T. (1978). Model-directed learning of production rules. In D. Waterman & F. Hayes-Roth (Eds.), Pattern Directed Inference Systems, New York, NY: Academic Press.Google Scholar
  4. Buntine, W. (1991). A theory of learning classification rules. Doctoral dissertation. School of Computer Science, University of Technology, Sydney, Australia.Google Scholar
  5. Catlett, J. (1991a). Megainduction: machine learning on very large databases. Doctoral dissertation. Basser Department of Computer Science, University of Sydney, Australia.Google Scholar
  6. Catlett, J. (1991b). Megainduction: A test flight. Proceedings of the Eighth International Workshop on Machine Learning (pp. 596–599). San Mateo, CA: Morgan Kaufmann.Google Scholar
  7. Catlett, J. (1991c). On changing continuous attributes into ordered discrete attributes. Proceedings of the European Working Session on Learning (pp. 164–178). New York, NY: Springer-Verlag.Google Scholar
  8. Catlett, J. (1992). Peepholing: choosing attributes efficiently for megainduction. Proceedings of the Ninth International Conference on Machine Learning (pp. 49–54). San Mateo, CA: Morgan Kaufmann.Google Scholar
  9. Chan, P., & Stolfo, S. (1993a). Meta-learning for multistrategy and parallel learning. Proceedings of the Second International Workshop on Multistrategy Learning (pp. 150–165). Fairfax, VA: Center for AI, George Masion University.Google Scholar
  10. Chan, P., & Stolfo, S. (1993b). Toward parallel and distributed learning by meta-learning. Working Notes of the AAAI-93 Workshop on Knowledge Discovery in Databases (pp. 227–240). Seattle, WA: AAAI.Google Scholar
  11. Clearwater, S.H., Cheng, T.P., Hirsch, H., & Buchanan, B.G. (1989). Incremental batch learning. Proceedings of the Sixth International Workshop on Machine Learning (pp. 366–370). San Mateo, CA: Morgan Kaufmann.Google Scholar
  12. Clearwater, S., & Provost, F. (1990). RL4: A tool for knowledge-based induction. Proceedings of the Second International IEEE Conference on Tools for Artificial Intellignece (pp. 24–30). Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  13. Cook, D., & Holder, L. (1990). Accelerated learning on the connection machine. Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing (pp. 448–454). Los Alamitos, CA: IEEE Computer Society.Google Scholar
  14. Cook, D., & Lyons, G. (1993). Massively parallel IDA* search. International Journal on Artificial Intelligence Tools, 2, 163–180.Google Scholar
  15. Danyluk, A.P., & Provost, F.J. (1993). Small disjuncts in action: Learning to diagnose errors in the telephone network local loop. Proceedings of the Tenth International Conference on Machine Learning (pp. 81–88). San Mateo, CA: Morgan Kaufmann.Google Scholar
  16. Gaines, B.R. (1989). An ounce of knowledge is worth a ton of data: Quantitative studies of the trade-off between expertise and data based on statistically well-founded empicical induction. Proceedings of the Sixth International Workshop on Machine Learning (pp. 156–159). San Mateo, CA: Morgan Kaufmann.Google Scholar
  17. Gordon, D., & desJardins, M. (Eds.) (1995). Special issue on bias evaluation and selection. Machine Learning, 20.Google Scholar
  18. Holte, R.C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.Google Scholar
  19. Holte, R.C., Acker, L.E., & Porter, B.W. (1989). Concept learning and the problem of small disjuncts. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 813–818). San Mateo, CA: Morgan Kaufmann.Google Scholar
  20. Kumar, V., & Rao, V. (1987). Parallel depth-first search, Part II: analysis. International Journal of Parallel Programming, 16, 501–519.Google Scholar
  21. Lathrop, R.H., Webster, T.A., Smith, T.F., & Winston, P.H. (1990). ARIEL: A massively parallel symbolic learning assistant for protein structure/function. In P. H. Winston & S. Shellard (Eds.), AI at MIT: Expanding Frontiers. Cambridge, MA: MIT Press.Google Scholar
  22. Lathrop, R.H. (1995). Massachusetts Institute of Technology. Personal Communication.Google Scholar
  23. Mahanti, A. & Daniels, C. (1993). A SIMD approach to parallel heuristic search. Artificial Intelligence, 60, 243–282.Google Scholar
  24. Michalski, R., Mozetic, I., Hong, J., & Lavrac, N. (1986). The Multi-purpose incremental learning system AQ15 and its testing application to three medical domains. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 1041–1045). Menlo Park, CA: AAAI-Press.Google Scholar
  25. Powley, C., Ferguson, C., & Korf, R. (1993). Depth-first heuristic search on a SIMD machine. Artificial Intelligence, 60, 199–242.Google Scholar
  26. Provost, F.J., & Buchanan, B.G. (1995). Inductive policy: The pragmatics of bias selection. Machine Learning, 20, 35–61.Google Scholar
  27. Provost, F.J., Buchanan, B.G., Clearwater, S.H., & Lee, Y. (1993). Machine learning in the service of exploratory science and engineering: A case study of the RL induction program. Technical Report ISL-93–6, Intelligent Systems Laboratory, Computer Science Department, University of Pittsburgh, Pittsburgh, PA.Google Scholar
  28. Provost, F.J., & Hennessy, D. (1994). Distributed machine learning: Scaling up with coarse-grained parallelism. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (pp. 340–347). Menlo Park, CA: AAAI Press.Google Scholar
  29. Quinlan, J. (1986). Induction of Decision Trees. Machine Learning, 1, 81–106.Google Scholar
  30. Quinlan, J. (1987). Generating production rules from decision trees. Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 304–307). San Mateo, CA: Morgan Kaufmann.Google Scholar
  31. Rao, V., & Kumar, V. (1987). Parallel depth-first search, Part I: Implementation. International Journal of Parallel Programming, 16, 479–499.Google Scholar
  32. Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (Eds.), Parallel Distributed Processing. Cambridge, MA: MIT Press.Google Scholar
  33. Schlimmer, J.C., & Fisher, D. (1986). A case study of incremental concept induction. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 496–501). San Mateo, CA: Morgan Kaufmann.Google Scholar
  34. Segal, R., & Etzioni, O. (1994). Learning decision lists using homogeneous rules. Proceedings of the Twelfth National Conference on Artificial Intelligence (pp. 619–625). Menlo Park, CA: AAAI Press.Google Scholar
  35. Sharma, R., Provost, F., Aronis, J., Mattison, D., & Buchanan, B. (1995). An unexpected relationship between the timing of entry into prenatal care, race, and infant mortality. In preparation. University of Pittsburgh, Pittsburgh, PA.Google Scholar
  36. Shaw, M. J., & Sikora, R. (1990). A distributed problem-solving approach to inductive learning. Technical Report CMU-RI-TR-90–26, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
  37. Stanfill, C., & Waltz, D. (1988). The memory-based reasoning paradigm. Proceedings of a Workshop on Case-Based Reasoning (pp. 414–424). San Mateo, CA: Morgan Kaufmann.Google Scholar
  38. Stolfo, S.J. (1987). Initial performance of the DADO2 prototype. Computer, 20, 75–83.Google Scholar
  39. Stolfo, S.J., & Shaw, D.E. (1982). DADO: a tree-structured machine architecture for production systems. Proceedings of the National Conference on Artificial Intelligence (pp. 242–246). Menlo Park, CA: AAAI Press.Google Scholar
  40. Utgoff, P.E. (1989). Incremental induction of decision trees. Machine Learning, 4, 161–186.Google Scholar
  41. Weiss, G.M. (1995). Learning with Small Disjuncts. Technical Report ML-TR-39. Department of Computer Science, Rutgers University, New Brunswick, NJ.Google Scholar
  42. Zhang, X., Mckenna, M., Mesirox, J., & Waltz, D. (1989). An efficient implementation of the backpropagation algorithm on the connection machine CM-2. Technical Report RL89–1, Boston, MA: Thinking Machines Corporation.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Foster John Provost
    • 1
  • John M. Aronis
    • 2
  1. 1.NYNEX Science and TechnologyWhite Plains
  2. 2.Intelligent Systems LaboratoryUniversity of PittsburghPittsburgh

Personalised recommendations