Data Mining and Knowledge Discovery

, Volume 4, Issue 4, pp 251–280 | Cite as

Discovering Interesting Patterns for Investment Decision Making with GLOWER ☹—A Genetic Learner Overlaid with Entropy Reduction

  • Vasant Dhar
  • Dashin Chou
  • Foster Provost
Article

Abstract

Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule learning algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment with GLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.

data mining knowledge discovery machine learning genetic algorithms financial prediction rule learning investment decision making systematic trading 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achelis, S.B. 1995. Technical Analysis From A to Z. Chicago: Irwin.Google Scholar
  2. Atiya, A. 1995. An analysis of stops and profit objectives in trading systems. In Proceedings of the Third International Conference of Neural Networks in Capital Markets (NNCM-95), London, October 1995.Google Scholar
  3. Barr, D. and Mani, G. 1994. Using neural nets to manage investments. AI Expert, February.Google Scholar
  4. Bauer, R.J. 1994. Genetic Algorithms and Investment Strategies. John Wiley & Sons.Google Scholar
  5. Beasley, D. Bull, D.R., and Martin, R.R. 1993. A sequential niche technique for multimodal function optimization. Evolutionary Computation, 1(2):101–125.Google Scholar
  6. Blake, C. Keogh, E., and Merz, C.J. 1998. Repository of machine learning databases. Dept. of Information and Computer Sciences, University of California, Irvine.Google Scholar
  7. Breiman, L. Friedman, J. Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Wadsworth: Monterey, CA.Google Scholar
  8. Cartwright, H.M. and Mott, G.F. 1991. Looking around: Using clues from the data space to guide genetic algorithm searches. In Proceedings of the Fourth International Conference on Genetic Algorithms.Google Scholar
  9. Chou, D. 1999. The relationship between earnings events and returns: A comparison of four nonlinear prediction models. Ph.D. Thesis, Department of Information Systems, Stern School of Business, New York University.Google Scholar
  10. Clark, P. and Niblett, T. 1989. The CN2 induction algorithm. Machine Learning, 3:261–283.Google Scholar
  11. Clearwater, S. and Provost, F. 1990. RL4: A tool for knowledge-based induction. In Proceedings of the Second International IEEE Conference on Tools for Artificial Intelligence, pp. 24–30.Google Scholar
  12. Cohen, W.W. and Singer, Y. 1990. A simple, fast, and effective rule learner. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, American Association for Artificial Intelligence (AAAI-99), pp. 335–342.Google Scholar
  13. Deb, K. and Goldberg, D.E. 1989. An investigation of niche and species formation in genetic function optimization. In Proceedings of the Third International Conference on Genetic Algorithms.Google Scholar
  14. DeJong, K. 1999. Evolutionary computation for discovery. Communications of the ACM, 42(11):51–53.CrossRefGoogle Scholar
  15. Dhar, V. and Stein, R. 1997. Seven Methods for Transforming Corporate Data Into Business Intelligence. Prentice-Hall.Google Scholar
  16. Domingos, P. 1996a. Unifying instance-based and rule-based induction. Machine Learning, 24:141–168.Google Scholar
  17. Domingos, P. 1996b. Linear time rule induction. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 96–101.Google Scholar
  18. Forgy, L. 1982. RETE: A fast algorithm for many pattern/many object pattern matching. Artificial Intelligence, 19:17–37.CrossRefGoogle Scholar
  19. Friedman, J.H. 1996. Local learning based on recursive covering. Dept. of Statistics, Stanford University.Google Scholar
  20. Furnkranz, J. 1999. Separate-and-conquer rule learning. Artificial Intelligence Review, 13(1):3–54.CrossRefGoogle Scholar
  21. George, E.I., Chipman, H., and McCulloch, R.E. 1996. Bayesian CART. In Proceedings: Computer Science and Statistics 28th Symposium on the Interface, Sydney, Australia.Google Scholar
  22. Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley.Google Scholar
  23. Goldberg, D.E., Deb, K., and Horn, J. 1992. Massive multimodality, deception and genetic algorithms. In Parallel Problem Solving from Nature, 2, R. Manner and B. Manderick (Eds.). Elsevier Science.Google Scholar
  24. Goldberg, D.E. and Richardson, J. 1987. Genetic algorithms with sharing for multimodal function optimization. In Proceedings of the Second International Conference on Genetic Algorithms.Google Scholar
  25. Graham, B. and Dodd, D. 1936. Security Analysis. McGraw-Hill.Google Scholar
  26. Grefenstette, J.J. 1987. Incorporating problem specific knowledge into genetic algorithms. In Genetic Algorithms and Simulated Annealing, L. Davis (Ed.). Los Altos, CA: Morgan Kaufmann.Google Scholar
  27. Hekanaho, J. 1996. Background knowledge in GA-based concept learning. In Proceedings of the Thirteen International Conference on Machine Learning.Google Scholar
  28. Hong, J. 1991. Incremental discovery of rules and structure by hierarchical and parallel clustering. In Knowledge Discovery in Databases, Piatetsky-Shapiro and Frawley (Eds.). CA: AAAI Press, Menlo Park.Google Scholar
  29. Holland, J.H. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor: The University of Michigan Press.Google Scholar
  30. Holland, J.H. 1992. Adaptation in natural and artificial systems. In An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press.Google Scholar
  31. Janikow, C.Z. 1993. A knowledge-intensive genetic algorithm for supervised learning. Machine Learning, 13:189–228.CrossRefGoogle Scholar
  32. Jensen, D. and Cohen, P.R. 2000. Multiple comparisons in induction algorithms. Machine Learning, 38(3):309–338.CrossRefGoogle Scholar
  33. Lim, Tjen-Sien, Loh, Wei-Yin, and Shih, Yu-Shan Shih. 2000. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40(3):203–228.CrossRefGoogle Scholar
  34. Madden, B. 1996. The CFROI life cycle. Journal of Investing, 5(1).Google Scholar
  35. Mahfoud, S.W. 1995. A comparison of parallel and sequential niching methods. In Proceedings of the Sixth International Conference on Genetic Algorithms.Google Scholar
  36. Mahfoud, S.W. 1995. Niching methods for genetic algorithms. U. of Illinois, Illinois Genetic Algorithms Lab., Urbana.Google Scholar
  37. Michalski, R., Mozetec, I., Hong, J., and Lavrac, N. 1986. The multi-purpose incremental learning system AQ15 and its testing to three medical domains. In Proceedings of the Sixth National Conference on Artificial Intelligence, Menlo Park, CA, pp. 1041–1045.Google Scholar
  38. Michie, D., Spiegelhalter, D.J., and Taylor, C.C. 1994. Machine Learning, Neural and Statistical Classification, Ellis Horwood Ltd.Google Scholar
  39. Mitchell, T.M. 1980. The need for biases in learning generalizations. Report CBM-TR-117, Computer Science Department, Rutgers University.Google Scholar
  40. Murthy, S.K. 1998. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4):345–389.CrossRefGoogle Scholar
  41. Oei, C.K., Goldberg, D.E., and Chang, S. 1991. Tournament selection, niching and the preservation of diversity, U. of Illinois, Illinois Genetic Algorithms Lab., Urbana.Google Scholar
  42. Packard, N. 1989. A genetic learning algorithm. Tech Report, University of Illinois at Urbana Champaign.Google Scholar
  43. Provost, F.J. and Buchanan, B.G. 1995. Inductive policy: The pragmatics of bias selection. Machine Learning, 20:35–61.Google Scholar
  44. Provost, F. and Buchanan, B. 1992. Inductive strengthening: The effects of a simple heuristic for restricting hypothesis space search. In Analogical and Inductive Inference, K.P. Jantke (Ed.). Springer-Verlag. Lecture Notes in Artificial Intelligence, vol. 642.Google Scholar
  45. Provost, F., Aronis, J., and Buchanan, B. 1999. Rule-space search for knowledge-based discovery. Report #IS 99-012, IS Dept., Stern School, NYU.Google Scholar
  46. Quinlan, J. 1996. Machine Learning and ID3. Los Altos: Morgan Kauffman.Google Scholar
  47. Sikora, R. and Shaw, M.J. 1994. A double-layered learning approach to acquireing rules for classification: Integrating genetic algorithms with similarity-based learning. ORSA Journal on Computing, 6(2):334–338.Google Scholar
  48. Smythe, P. and Goodman, R. 1991. Rule induction using information theory. In Knowledge Discovery in Databases, Piatetsky-Shapiro and Frawley (Eds.). Menlo Park, CA: AAAI Press.Google Scholar
  49. UCI. 1995. Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. [http://www.ics.uci.edu/~mlearn/MLRepository.html].Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Vasant Dhar
    • 1
  • Dashin Chou
    • 2
  • Foster Provost
    • 2
  1. 1.Stern School of BusinessNew York UniversityNew YorkUSA
  2. 2.Stern School of BusinessNew York UniversityNew YorkUSA

Personalised recommendations