Data Mining and Knowledge Discovery

, Volume 6, Issue 4, pp 361–391 | Cite as

High-Performance Commercial Data Mining: A Multistrategy Machine Learning Application

  • William H. Hsu
  • Michael Welge
  • Tom Redman
  • David Clutter

Abstract

We present an application of inductive concept learning and interactive visualization techniques to a large-scale commercial data mining project. This paper focuses on design and configuration of high-level optimization systems (wrappers) for relevance determination and constructive induction, and on integrating these wrappers with elicited knowledge on attribute relevance and synthesis. In particular, we discuss decision support issues for the application (cost prediction for automobile insurance markets in several states) and report experiments using D2K, a Java-based visual programming system for data mining and information visualization, and several commercial and research tools. We describe exploratory clustering, descriptive statistics, and supervised decision tree learning in this application, focusing on a parallel genetic algorithm (GA) system, Jenesis, which is used to implement relevance determination (attribute subset selection). Deployed on several high-performance network-of-workstation systems (Beowulf clusters), Jenesis achieves a linear speedup, due to a high degree of task parallelism. Its test set accuracy is significantly higher than that of decision tree inducers alone and is comparable to that of the best extant search-space based wrappers.

constructive induction scalable high-performance computing real-world decision support applications relevance determination genetic algorithms software development environments for knowledge discovery in databases (KDD) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D., Kibler, D., and Albert, M. 1991. Instance-based learning algorithms. Machine Learning, 6:37–66.Google Scholar
  2. Auvil, L., Redman, T., Tcheng, D., and Welge, M. 1999. Data to Knowledge (D2K): A Rapid Application Development Environment for Knowledge Discovery. NCSA Technical Report, URL: http://archive.ncsa.uiuc.edu/STI/ALG/d2k.Google Scholar
  3. Benjamin, D.P. (Ed.) 1990. Change of Representation and Inductive Bias. Boston: Kluwer Academic Publishers.Google Scholar
  4. Brooks, F.P., Jr. 1995. The Mythical Man-Month: Essays on Software Engineering, 20th Anniversary Edition. Reading, MA: Addison-Wesley.Google Scholar
  5. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., and Freeman, D. 1988. AUTOCLASS: A bayesian classification system. In Proceedings of the Fifth International Conference on Machine Learning (ICML-88), pp. 54–64.Google Scholar
  6. Cherkauer, K.J. and Shavlik, J.W. 1996. Growing simpler decision trees to facilitiate knowledge discovery. In Proceedings of the Second International Conference of Knowledge Discovery and Data Mining (KDD-96): Portland, OR.Google Scholar
  7. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(Series B):1–38.Google Scholar
  8. Donoho, S.K. 1996. Knowledge-Guided Constructive Induction. PhD Thesis, University of Illinois at Urbana-Champaign (Technical Report UIUC-DCS-R1970).Google Scholar
  9. Dejong, K.A., Spears, W.M., and Gordon, D.F. 1993. Using genetic algorithms for concept learning. Machine Learning, 13:161–188.Google Scholar
  10. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. Knowledge Discovery and Data Mining: Towards a Unifying Framework. In Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., editors, Advances in Knowledge Discovery Data Mining, pp. 82–88. Cambridge, MA: MIT Press.Google Scholar
  11. Gersho, A. and Gray, R.M. 1992. Vector Quantization and Signal Compression. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  12. Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley.Google Scholar
  13. Grefenstette, J.J. 1990. Genesis Genetic Algorithm Package.Google Scholar
  14. Haykin, S. 1999. Neural Networks: A Comprehensive Foundation, 2nd edn. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  15. Hsu, W.H., Welge, M., Wu, J., and Yang, T. 1999. Genetic algorithms for selection and partitioning of attributes in large-scale data mining problems. In Proceedings of the Joint AAAI-GECCO Workshop on Data Mining and Evolutionary Algorithms, Orlando, FL.Google Scholar
  16. Hsu, W.H. 1998. Time series learning with probabilistic network composites. PhD Thesis, University of Illinois at Urbana-Champaign (Technical Report UIUC-DCS-R2063).Google Scholar
  17. Hsu, W.H., Ray, S.R., and Wilkins, D.C. 2000. A Multistrategy Approach to Classifier Learning from Time Series. Machine Learning, 38(1-2):213–236. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  18. Hsu, W. and Welge, M. to appear. Activities of the Prognostics Working Group. NCSA Technical Report.Google Scholar
  19. John, G., Kohavi, R., and Pfleger, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, New Brunswick, NJ. Morgan-Kaufmann, Los Altos, CA, pp. 121–129.Google Scholar
  20. Jonske, J. 1999. Personal communication. Unpublished.Google Scholar
  21. Kohavi, R., Becker, B., and Sommerfield, D. 1997. Improving simple bayes. Presented at the European Conference on Machine Learning (ECML-97).Google Scholar
  22. Kohonen, T., Hynninen, J., Kangas, J., and Laaksonen, J. 1996. SOM-PAK: The Self-Organizing Map Program Package. Technical Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science, FIN-02150 Espoo, Finland.Google Scholar
  23. Kohavi, R. and John, G.H. 1997. Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance, 97(1/2):273–324.Google Scholar
  24. Kohonen, T. 1990. The self-organizing map. Proceedings of the IEEE, 78:1464–1480.Google Scholar
  25. Koza, J. 1992. Genetic Programming: On the Programming of Computers by Natural Selection. Cambridge, MA: MIT Press.Google Scholar
  26. Kononenko, I. 1994. Estimating attributes: Analysis and extensions of relief. In Proceedings of the European Conference on Machine Learning, F. Bergadano and L. De Raedt (Eds.).Google Scholar
  27. Kohavi, R. 1995. Wrappers for Performance Enhancement and Oblivious Decision Graphs. PhD Thesis, Department of Computer Science, Stanford University.Google Scholar
  28. Kohavi, R. 1998. MineSet v2.6, Silicon Graphics Incorporated, CA.Google Scholar
  29. Kira, K. and Rendell, L.A. 1992. The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the National Conference on Artificial Intelligence (AAAI-92), San Jose, CA. Cambridge, MA: MIT Press, pp. 129–134.Google Scholar
  30. Krishnamurthy, B. (Ed.) 1995. Practical Reusable UNIX Software. New York: John Wiley and Sons.Google Scholar
  31. Kohavi, R. and Sommerfield, D. 1996. MLC++: Machine Learning Library in C++, Utilities v2.0. URL: http://www.sgi.com/Technology/mlc.Google Scholar
  32. Mitchell, T.M. 1997. Machine Learning. New York, NY: McGraw-Hill.Google Scholar
  33. Neal, R.M. 1996. Bayesian Learning for Neural Networks. New York, NY: Springer-Verlag.Google Scholar
  34. Princip, J. and Lefebvre, C. 1998. NeuroSolutions v3.02, NeuroDimension, Gainesville, FL. URL: http://www.nd.com.Google Scholar
  35. Porter, J. 1998. Personal communication. Unpublished.Google Scholar
  36. Quinlan, J.R. 1985. Induction of decision trees. Machine Learning, 1:81–106.Google Scholar
  37. Quinlan, J.R. 1990. Learning logical definitions from relations. Machine Learning, 5(3):239–266.Google Scholar
  38. Russell, S. and Norvig, P. 1995. Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  39. Raymer, M., Punch, W., Goodman, E., Sanschagrin, P., and Kuhn, L. 1997. Simultaneous feature extraction and selection using a masking genetic algorithm. In Proceedings of the 7th International Conference on Genetic Algorithms, San Francisco, CA, pp. 561–567.Google Scholar
  40. Sarle, W.S. (Ed.). Neural Network FAQ, periodic posting to the Usenet newsgroup comp.ai.neural-nets, URL: ftp://ftp.sas.com/pub/neural/FAQ.htmlGoogle Scholar
  41. Sterling, T.L., Salmon, J., Becker, D.J., and Savarese, D.F. 1999. How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. Cambridge, MA: MIT Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • William H. Hsu
    • 1
    • 2
  • Michael Welge
    • 2
  • Tom Redman
    • 2
  • David Clutter
    • 2
  1. 1.Department of Computing and Information SciencesKansas State UniversityManhattan
  2. 2.Automated Learning GroupNational Center for Supercomputing Applications (NCSA)Champaign

Personalised recommendations