Reference Work Entry

Encyclopedia of Machine Learning and Data Mining

pp 217-224


Classifier Systems

  • Pier Luca LanziAffiliated withPolitecnico di Milano


Genetics-based machine learning; Learning classifier systems


Classifier systems are rule-based systems that combine temporal difference learning or supervised learning with a genetic algorithm to solve classification and reinforcement learning problems. Classifier systems come in two flavors: Michigan classifier systems, which are designed for online learning, but can also tackle offline problems; and Pittsburgh classifier systems, which can only be applied to offline learning.

In Michigan classifier systems (Holland 1976), learning is viewed as an online adaptation process to an unknown environment that represents the problem and provides feedback in terms of a numerical reward. Michigan classifier systems maintain a single candidate solution consisting of a set of rules, or a population of classifiers. Michigan systems apply (1) temporal difference learning to distribute the incoming reward to the classifiers that are accountable for it; and (2) a genetic algorithm to select, recombine, and mutate individual classifiers so as to improve their contribution to the current solution.

In contrast, in Pittsburgh classifier systems (Smith 1980), learning is viewed as an offline optimization process in which a genetic algorithm alone is applied to search for the best solution to a given problem. In addition, Pittsburgh classifier systems maintain not one, but a set of candidate solutions. While in the Michigan classifier system each individual classifier represents a part of the overall solution, in the Pittsburgh system each individual is a complete candidate solution (itself consisting of a set of classifiers). The fitness of each Pittsburgh individual is computed offline by testing it on a representative sample of problem instances. The individuals compete among themselves through selection, while crossover and mutation recombine solutions to search for better solutions.

Motivation and Background

Machine learning is usually viewed as a search process in which a solution space is explored until an appropriate solution to the target problem is found (Mitchell 1982) (see Supervised Learning). Machine learning methods are characterized by the way they represent solutions (e.g., using decision trees, rules), by the way they evaluate solutions (e.g., classification accuracy, information gain) and by the way they explore the solution space (e.g., using a general-to-specific strategy or a specific-to-general strategy).

Classifier systems are methods of genetics-based machine learning introduced by Holland, the father of genetic algorithms. They made their first appearance in Holland (1976) where the first diagram of a classifier system, labeled “cognitive system,” was shown. Subsequently, they were described in detail in the paper “Cognitive Systems based on Adaptive Algorithms” (Holland and Reitman 1978). Classifier systems are characterized by a rule-based representation of solutions and a genetics-based exploration of the solution space. While other rule learning methods, such as CN2 (Clark and Niblett 1989) and FOIL (Quinlan and Cameron-Jones 1995), generate one rule at a time following a sequential covering strategy (see Covering Algorithm), classifier systems work on one or more solutions at once, and they explore the solution space by applying the principles of natural selection and genetics.

In classifier systems (Holland 1976; Holland and Reitman 1978; Wilson 1995), machine learning is modeled as an online adaptation process to an unknown environment, which provides feedback in terms of a numerical reward. A classifier system perceives the environment through its detectors and, based on its sensations, it selects an action to be performed in the environment through its effectors. Depending on the efficacy of its actions, the environment may eventually reward the system. A classifier system learns by trying to maximize the amount of reward it receives from the environment. To pursue such a goal, it maintains a set (a population) of condition-action-prediction rules, called classifiers, which represents the current solution. Each classifier’s condition identifies some part of the problem domain; the classifier’s action represents a decision on the subproblem identified by its condition; and the classifier’s prediction, or strength, estimates the value of the action in terms of future rewards on that subproblem. Two separate components, credit assignment and rule discovery, act on the population with different goals. Credit assignment, implemented either by methods of temporal difference or supervised learning, exploits the incoming reward to estimate the action values in each subproblem so as to identify the best classifiers in the population. At the same time, rule discovery, usually implemented by a genetic algorithm, selects, recombines, and mutates the classifiers in the population to improve the current solution.

Classifier systems were initially conceived as modeling tools. Given a real system with unknown underlying dynamics, for instance a financial market, a classifier system would be used to generate a behavior that matched the real system. The evolved rules would provide a plausible, human readable model of the unknown system – a way to look inside the box. Subsequently, with the developments in the area of machine learning and the rise of reinforcement learning, classifier systems have been more and more often studied and presented as alternatives to other machine learning methods. Wilson’s XCS (1995), the most successful classifier system to date, has proven to be both a valid alternative to other reinforcement learning approaches and an effective approach to classification and data mining (Bull 2004; Bull and Kovacs 2005; Lanzi et al. 2000).

Kenneth de Jong and his students (de Jong 1988; Smith 19801983) took a different perspective on genetics-based machine learning and modeled learning as an optimization process rather than an adaptation process as done in Holland (1976). In this case, the solution space is explored by applying a genetic algorithm to a population of individuals each representing a complete candidate solution – that is, a set of rules (or a production system, de Jong 1988; Smith 1980. At each cycle, a critic is applied to each individual (to each set of rules) to obtain a performance measure that is then used by the genetic algorithm to guide the exploration of the solution space. The individuals in the population compete among themselves through selection, while crossover and mutation recombine solutions to search for better ones.

The approaches of Holland (Holland 1976; Holland and Reitman 1978) and de Jong (de Jong 1988; Smith 19801983) have been extended and improved in several ways (see Lanzi et al. (2000) for a review). The models of classifier systems that are inspired by the work of Holland (1976) at the University of Michigan are usually called Michigan classifier systems; the ones that are inspired by Smith (19801983) and de Jong (1988) at the University of Pittsburgh are usually termed Pittsburgh classifier systems – or briefly, Pitt classifier systems.

Pittsburgh classifier systems separate the evaluation of candidate solutions, performed by an external critic, from the genetic search. As they evaluate candidate solutions as a whole, Pittsburgh classifier systems can easily identify and emphasize sequentially cooperating classifiers, which is particularly helpful in problems involving partial observability. In contrast, in Michigan classifier systems the credit assignment is focused, due to identification of the actual classifiers that produce the reward, so learning is much faster but sequentially cooperating classifiers are more difficult to spot. As Pittsburgh classifier systems apply the genetic algorithm to a set of solutions, they only work offline, whereas Michigan classifier systems work online, although they can also tackle offline problems. Finally, the design of Pittsburgh classifier systems involves decisions as to how an entire solution should be represented and how solutions should be recombined – a task which can be daunting. In contrast, the design of Michigan classifier systems involves simpler decisions about how a rule should be represented and how two rules should be recombined. Accordingly, while the representation of solutions and its related issues play a key role in Pittsburgh models, Michigan models easily work with several types of representations (Lanzi 2001; Lanzi and Perrucci 1999; Mellor 2005).

Structure of the Learning System

Michigan and Pittsburgh classifier systems were both inspired by the work of Holland on the broadcast language (Holland 1975). However, their structures reflect two different ways to model machine learning: as an adaptation process in the case of Michigan classifier systems; and as an optimization problem, in the case of Pittsburgh classifier systems. Thus, the two models, originating from the same idea (Holland’s broadcast language), have radically different structures.

Michigan Classifier Systems

Holland’s classifier systems define a general paradigm for genetics-based machine learning. The description in Holland and Reitman (1978) provides a list of principles for online learning through adaptation. Over the years, such principles have guided researchers who developed several models of Michigan classifier systems (Butz 2002; Wilson 1994, 1995, 2002) and applied them to a large variety of domains (Bull 2004; Lanzi and Riolo 2003; Lanzi et al. 2000). These models extended and improved Holland’s original ideas, but kept all the ingredients of the original recipe: a population of classifiers, which represents the current system knowledge; a performance component, which is responsible for the short-term behavior of the system; a credit assignment (or reinforcement) component, which distributes the incoming reward among the classifiers; and a rule discovery component, which applies a genetic algorithm to the classifiers to improve the current knowledge.

Knowledge Representation

In Michigan classifier systems, knowledge is represented by a population of classifiers. Each classifier is usually defined by four main parameters: the condition, which identifies some part of the problem domain; the action, which represents a decision on the subproblem identified by its condition; the prediction or strength, which estimates the amount of reward that the system will receive if its action is performed; and finally, the fitness, which estimates how good the classifier is in terms of problem solution.

The knowledge representation of Michigan classifier systems is extremely flexible. Each one of the four classifier components can be tailored to fit the need of a particular application, without modifying the main structure of the system. In problems involving binary inputs, classifier conditions can be simply represented using strings defined over the alphabet {0, 1, #}, as done in Holland and Reitman (1978), Goldberg (1989), and Wilson (1995). In problems involving real inputs, conditions can be represented as disjunctions of intervals, similar to the ones produced by other rule learning methods (Clark and Niblett 1989). Conditions can also be represented as general-purpose symbolic expressions (Lanzi 2001; Lanzi and Perrucci 1999) or first-order logic expressions (Mellor 2005). Classifier actions are typically encoded by a set of symbols (either binary strings or simple labels), but continuous real-valued actions are also available (Wilson 2007). Classifier prediction (or strength) is usually encoded by a parameter (Goldberg 1989; Holland and Reitman 1978; Wilson 1995). However, classifier prediction can also be computed using a parameterized function (Wilson 2002), which results in solutions represented as an ensemble of local approximators – similar to the ones produced in generalized reinforcement learning (Sutton and Barto 1998).

Performance Component

A simplified structure of Michigan classifier systems is shown in Fig. 1. We refer the reader to Goldberg (1989) and Holland and Reitman (1978) for a detailed description of the original model and to Butz (2002) and Wilson (1994, 1995, 2001) for descriptions of recent classifier system models.
Classifier Systems, Fig. 1

Simplified structure of a Michigan classifier system. The system perceives the environment through its detectors and (1) it builds the match set containing the classifiers in the population that match the current sensory inputs; then (2) all the actions in the match set are evaluated, and (3) an action is selected to be performed in the environment through the effectors

A classifier system learns through trial and error interactions with an unknown environment. The system and the environment interact continually. At each time step, the classifier system perceives the environment through its detectors; it builds a match set containing all the classifiers in the population whose condition matches the current sensory input. The match set typically contains classifiers that advocate contrasting actions; accordingly, the classifier system evaluates each action in the match set, and selects an action to be performed balancing exploration and exploitation. The selected action is sent to the effectors to be executed in the environment; depending on the effect that the action has in the environment, the system receives a scalar reward.

Credit Assignment

The credit assignmentcomponent (also called reinforcement component, Wilson 1995) distributes the incoming reward to the classifiers that are accountable for it. In Holland and Reitman (1978), credit assignment is implemented by Holland’s bucket brigade algorithm (Holland 1986), which was partially inspired by the credit allocation mechanism used by Samuel in his pioneering work on learning checkers-playing programs (Samuel 1959).

In the early years, classifier systems and the bucket brigade algorithm were confined to the evolutionary computation community. The rise of reinforcement learning increased the connection between classifier systems and temporal difference learning (Sutton 1988; Sutton and Barto 1998): in particular, Sutton (1988) showed that the bucket brigade algorithm is a kind of temporal difference learning, and similar connections were also made in Watkins (1989) and Dorigo and Bersini (1994). Later, the connection between classifier systems and reinforcement learning became tighter with the introduction of Wilson’s XCS (1995), in which credit assignment is implemented by a modification of Watkins Q-learning (Watkins 1989). As a consequence, in recent years, classifier systems are often presented as methods of reinforcement learning with genetics-based generalization (Bull and Kovacs 2005).

Rule Discovery Component

The rule discovery component is usually implemented by a genetic algorithm that selects classifiers in the population with probability proportional to their fitness; it copies the selected classifiers and applies genetic operators (usually crossover and mutation) to the offspring classifiers; the new classifiers are inserted in the population, while other classifiers are deleted to keep the population size constant.

Classifiers selection plays a central role in rule discovery. Classifier selection depends on the definition of classifier fitness and on the subset of classifiers considered during the selection process. In Holland and Reitman (1978), classifier fitness coincides with classifier prediction, while selection is applied to all the classifiers in the population. This approach results in a pressure toward classifiers predicting high returns, but typically tends to produce overly general solutions. To avoid such solutions, Wilson (1995) introduced the XCS classifier system in which accuracy-based fitness is coupled with a niched genetic algorithm. This approach results in a pressure toward accurate maximally general classifiers, and has made XCS the most successful classifier system to date.

Pittsburgh Classifier Systems

The idea underlying the development of Pittsburgh classifier systems was to show that interesting behaviors could be evolved using a simpler model than the one proposed by Holland with Michigan classifier systems (Holland 1976; Holland and Reitman 1978).

In Pittsburgh classifier systems, each individual is a set of rules that encodes an entire candidate solution; each rule has a fixed length, but each rule set (each individual) usually contains a variable number of rules. The genetic operators, crossover and mutation, are tailored to the rule-based, variable-length representation. The individuals in the population compete among themselves, following the selection-recombination-mutation cycle that is typical of genetic algorithms (Goldberg 1989; Holland 1975). While in Michigan classifier systems individuals in the population (the single rules) cooperate, in Pittsburgh classifier systems there is no cooperation among individuals (the rule sets), so that the genetic algorithm operation is simpler for Pittsburgh models. However, as Pittsburgh classifier systems explore a much larger search space, they usually require more computational resources than Michigan classifier systems.

The pseudo-code of a Pittsburgh classifier system is shown in Fig. 2. At first, the individuals in the population are randomly initialized (line 2). At time t, the individuals are evaluated by an external critic, which returns a performance measure that the genetic algorithm exploits to compute the fitness of individuals (lines 3 and 10). Following this, selection (line 6), recombination, and mutation (line 7) are applied to the individuals in the population – as done in a typical genetic algorithm. The process stops when a termination criterion is met (line 4), usually when an appropriate solution is found.
Classifier Systems, Fig. 2

Pseudo-code of a Pittsburgh classifier system

The design of Pittsburgh classifier systems follows the typical steps of genetic algorithm design, which means deciding how a rule set should be represented, what genetic operators should be applied, and how the fitness of a set of rules should be calculated. In addition, Pittsburgh classifier systems need to address the bloat phenomenon (Tackett 1994) that arises with any variable-sized representation, like the rule sets evolved by Pittsburgh classifier systems. Bloat can be defined as the growth of individuals without an actual fitness improvement. In Pittsburgh classifier systems, bloat increases the size of candidate solutions by adding useless rules to individuals, and it is typically limited by introducing a parsimony pressure that discourages large rule sets (Bassett and de Jong 2000). Alternatively, Pittsburgh classifier systems can be combined with multi-objective optimization, so as to separate the maximization of the rule set performance and the minimization of the rule set size.

Examples of Pittsburgh classifier systems include SAMUEL (Grefenstette et al. 1990), the Genetic Algorithm Batch-Incremental Concept Learner (GABIL) (de Jong and Spears 1991), GIL Janikow (1993), GALE (Llorá 2002), and GAssist (Bacardit 2004).


Classifier systems have been applied to a large variety of domains, including computational economics (e.g., Arthur et al. 1996), autonomous robotics (e.g., Dorigo and Colombetti 1998), classification (e.g., Barry et al. 2004), fighter aircraft maneuvering (Bull 2004; Smith et al. 2000), and many others. Reviews of classifier system applications are available in Lanzi et al. (2000); Lanzi and Riolo (2003), and Bull (2004).

Programs and Data

The major sources of information about classifier systems are the LCSWeb maintained by Alwyn Barry, which can be reached through, and www.​learning-classifier-systems.​org maintained by Xavier Llorà.

Several implementations of classifier systems are freely available online. The first standard implementation of Holland’s classifier system in Pascal was described in Goldberg (1989), and it is available at http://​www.​illigal.​org/​; a C version of the same implementation, developed by Robert E. Smith, is available at http://​www.​etsimo.​uniovi.​es/​ftp/​pub/​EC/​CFS/​src/​. Another implementation of an extension of Holland’s classifier system in C by Rick L. Riolo is available at http://​www.​cscs.​umich.​edu/​Software/​Contents.​html. Imple-mentations of Wilson’s XCS 1995 are distributed by Alwyn Barry at the LCSWeb, by Martin V. Butz (at www.​illigal.​org), and by Pier Luca Lanzi (at Among the implementations of Pittsburgh classifier systems, the Samuel system is available from Alan C. Schultz at http://​www.​nrl.​navy.​mil/​; Xavier Llorà distributes GALE (Genetic and Artificial Life Environment) a fine-grained parallel genetic algorithm for data mining at www.​illigal.​org/​xllora.


Copyright information

© Springer Science+Business Media New York 2017
Show all