Abstract
Both economics and biology have come to agree that successful behavior in a stochastic environment responds to the variance of potential outcomes. Unfortunately, when biological and economic paradigms are mated together in a learning classifier system (LCS), decision-making agents called classifiers typically simply ignore risk. Since a fundamental problem of learning is risk management, LCS have not always performed as well as theoretically predicted. This paper develops a novel model of risk-neutral reinforcement learning in a traditional Bucket Brigade credit-allocation market under the pressure of a Genetic Algorithm. I demonstrate the applicability of the basic model to the classical LCS design and reexamine two basic issues where traditional LCS performance fails to meet expectations: default hierarchies and long chains of coupled classifiers. Risk-neutrality and noisy probabilistic auctions create dynamic instability in both areas, while identical preferences result in market failure in default hierarchies and exponential attenuation of price signals down classifier chains. Despite the limitations of simple risk-neutral classifiers, I show they’re capable of cheap short-run emulation of more rational behaviors. Still, risk-neutral information markets are a dead end. The model suggests a path toward a new type of LCS built on stable, heterogeneous, and risk-averse preferences under efficient auctions and access to more complete markets exploitable by competing risk management strategies. This will require a radical rethinking of the evolutionary and economic algorithms, but ultimately heralds a return to a market-based approach to LCS.
Similar content being viewed by others
Notes
See Drugowitsch [6] for an interesting attempt to build LCS from first principles in a probabilistic, Bayesian framework. The result looks very different from the simple traditional LCS studied here.
Daniel Bernoulli proposed the natural logarithm of wealth for a utility function, which maximizes the geometric mean growth rate resulting from risky returns.
Following Savage, a number of others developed alternative axiomizations and variations on the subjective expected utility model with different axiomatic foundations. For an early but thorough review, see Fishburn [10].
Modern behavioral approaches such as Kahneman and Tversky’s prospect theory [11] and its derivatives typically apply subjective preferences across both probabilities as well as magnitudes. Many LCS implementations apply ad hoc nonlinear transformations of the match specificity without any supporting behavioral theory; Holland [12] for example takes a base 2 logarithm.
The probability a classifier has of selling its output and the reward received upon completing a sale are assumed to be independent of the price paid for an input.
This Reduction of Compound Lotteries is an explicit axiom or derivative in most expected utility models, but doesn’t always hold up in decision-makers as complex as human-beings (Budescu and Fischer [19]).
More complex classifiers able to monitor and attempt to predict the bidding behavior of competitors may not make for smarter bidders. As Vickrey [15] showed, demand-revealing behavior can be the optimal strategy even when bidders can fully observe the bids of rivals, as in the English or progressive “open” auctions, so there’s little justification for additional complexity here.
The optimal bid that satisfies Eq. (9) is only solvable analytically in the risk-neutral case of a linear value function, v(w), so it must be found numerically under nonlinear preferences.
‘Constraint’ is a term inherited from the SEU and other economic models, but here the budget really isn’t constrained in the traditional sense, dependent on the classifier’s choice of bid.
Traditional LCS must initialize classifier wealths from some uniform distribution.
References
Bernoulli D (1738) Exposition of a new theory on the measurement of risk. Translated in 1954 in Econometrica, 22(1):23–36
Von Neumann J, Oskar M (1944) Theory of games and economic behavior. Princeton University, Princeton
Real L, Caraco T (1986) Risk and foraging in stochastic environments. Ann Rev Ecol Syst 17:371–390
Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms. In: Waterman DA, Hayes-Roth F (eds) Pattern directed inference systems. Academic Press, Waltham
Wilson SW, David EG (1989) A critical review of classifier systems. In: Schaffer JD (ed.) Proceedings from the third international conference on genetic algorithms, Morgan Kaufmann, pp 244–255
Drugowitsch J (2008) Design and analysis of learning classifier systems: a probabilistic approach. Springer, Berlin
Savage LJ (1954) The foundations of statistics. Wiley, New York
Bayes T (1763) An essay toward solving a problem in the doctrine of chances, vol. 53. Philosophical Transactions of the Royal Society, pp 370–418
Ellsberg D (1961) Risk, ambiguity, and the savage axioms. Quart J Econ 75:643–669
Fishburn PC (1981) Subjective expected utility: a review of normative theories. Theory Decis 13(2):139–199
Kahneman D, Tversky A (1979) Prospect theory: an analysis of decision under risk. Econometrica 47(2):263–292
Holland JH (1992) Adaptation in natural and artificial systems, 2nd edn. MIT Press, Cambridge
Grefenstette JJ (1991) Conditions for implicit parallelism. In: Rawlins GJE (ed) Foundations of genetic algorithms. Morgan Kaufmann Publishers, Waltham
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Boston
Vickrey W (1961) Counterspeculation, auctions, and competitive sealed tenders. J Financ 16(1):8–37
De Groot MH (1970) Optimal statistical decisions. McGraw-Hill, New York
Baum EB, Durdanovic I (2000) Evolution of cooperative problem solving in an artificial economy. Neural Comput 12:2743–2775
Goldberg DE (1990) Probability matching, the magnitude of reinforcement, and classifier system bidding. Machine Learn 5:407–425
Budescu DV, Fischer I (2001) The same but different: an empirical investigation of the reducibility principle. J Behav Decision-Making 14:187–206
Riolo RL (1987a) Bucket brigade performance: I. long sequences of classifiers. In: Grefenstette JJ (ed.) Proceedings from the second international conference on genetic algorithms. Lawrence Erlbaum Associates, pp 184–195
Riolo RL (1987b) Bucket brigade performance: II. default hierarchies. In: Grefenstette JJ (ed.) Proceedings from the second international conference on genetic algorithms. Lawrence Erlbaum Associates, pp 196–201
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Arrow KJ (1971) Essays in the theory of risk bearing. North-Holland, Amsterdam
Real LA (1987) Objective benefit versus subjective perception in the theory of risk-sensitive foraging. Am Nat 130(3):399–411
Healy PJ, Moore DA (2007) Bayesian overconfidence. SSRN: http://ssrn.com/abstract=1001820 or http://dx.doi.org/10.2139/ssrn.1001820
Kovacs T (2002) A comparison of strength and accuracy-based fitness in learning classifier systems. Dissertation, University of Birmingham
Wilson SW (1986) Hierarchical credit allocation in a classifier system. Research Memo RIS No. 37r. The Rowland Institute of Science
Holland JH (1985) Properties of the bucket brigade algorithm. In: Proceedings from the first international conference on genetic algorithms. Lawrence Erlbaum, pp 1–7
Holland JH (1986) Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: Michalski RS, Carbonell JG, Mitchel TM (eds) Machine learning II. Morgan Kaufmann, Waltham
Wilson SW (1989) Bid competition and specificity reconsidered. Complex Syst 2:705–723
Smith RE, Goldberg DE (1991) Variable default hierarchy separation in a classifier system. Found Genet Algorithms 1:141–167
Booker LB (2000) Do we really need to estimate rule utilities in classifier systems? In: Lanzi PL, Stolzmann W, Wilson SW (eds) Lecture notes in artificial intelligence 1813. Springer, Berlin
Holland JH (1995) Hidden order: how adaptation builds complexity. Addison-Wesley, Boston
Smith JTH (2010) Implicit fitness and heterogeneous preferences in the genetic algorithm. In: Proceedings of the 12th annual genetic and evolutionary computation conference (GECCO), ACM
Holland JH, Miller JH (1991) Artificial adaptive agents in economic theory. Am Econ Rev 81(2):365–370
Robson AJ (2001) The biological basis of economic behavior. J Econ Lit 39(1):11–33
Rayo L, Becker G (2007) Evolutionary efficiency and happiness. J Political Econ 11(2):37–302
Netzer N (2009) Evolution of time preferences and attitudes toward risk. Am Econ Rev 99(3):937–955
Acknowledgments
This paper began development in Stephanie Forrest’s Complex Adaptive Systems seminar at the University of New Mexico. I am grateful to Dr. Forrest as well as Janie M. Chermak at UNM and John H. Miller at CMU/SFI, and anonymous reviewers for feedback that substantially helped me clarify arguments, improve examples, and fix mistakes. All remaining errors are my own. Cheers!
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Smith, J.T.H. Risk neutrality in learning classifier systems. Evol. Intel. 5, 69–86 (2012). https://doi.org/10.1007/s12065-012-0079-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-012-0079-2