Encyclopedia of Systems and Control

Living Edition
| Editors: John Baillieul, Tariq Samad

Stochastic Games and Learning

  • Krzysztof SzajowskiEmail author
Living reference work entry

Latest version View entry history

DOI: https://doi.org/10.1007/978-1-4471-5102-9_33-2

Abstract

A stochastic game was introduced by Lloyd Shapley in the early 1950s. It is a dynamic game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage, the game is in a certain state. The players select actions, and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state, and the play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs.

A learning problem arises when the agent does not know the reward function or the state transition probabilities. If an agent directly learns about its optimal policy without knowing either the reward function or the state transition function, such an approach is called model-free reinforcement learning. Q-learning is an example of such a model.

Q-learning has been extended to a noncooperative multi-agent context, using the framework of general-sum stochastic games. A learning agent maintains Q-functions over joint actions and performs updates based on assuming Nash equilibrium behavior over the current Q-values. The challenge is convergence of the learning protocol.

Keywords

Markov decision process Repeated game Equilibrium Dynamic programming Reinforcement learning Asynchronous dynamic programming Q-learning 

Synonyms

Introduction

A Stochastic Game

Definition 1 (Stochastic games).

A stochastic game is a dynamic game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage, the game is in a certain state. The players select actions, and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The process is repeated at the new state, and the play continues for a finite or infinite number of stages.

The total payoff to a player can be defined in various ways. It depends on the payoffs at each stage and strategies chosen by players. The aim of the players is to control their total payoffs in the game by appropriate actions.

The notion of a stochastic game was introduced by Lloyd Shapley (1953) in the early 1950s. Stochastic games generalize both Markov decision processes (see also MDP) and repeated games. A repeated game is equivalent to a stochastic game with a single state. The stochastic game is played in discrete time with past history as common knowledge for all the players. An individual strategy for a player is a map which associates with each given history a probability distribution on the set of actions available to the players. The players’ actions at stage n determines the players’ payoffs at this stage and the state \(s \in\mathfrak{S}\) at stage n + 1.

Learning

Learning is acquiring new, or modifying and reinforcing existing, knowledge, behaviors, skills, values, or preferences, and may involve synthesizing different types of information. The ability to learn is possessed by humans, animals, and some machines which will be later called agents. In the context of this entry, learning refers to a particular class of stochastic game theoretical models.

Definition 2 (Learning in stochastic games).

A learning problem arises when an agent does not know the reward function or the state transition probabilities. If the agent directly learns about its optimal policy without knowing either the reward function or the state transition function, such an approach is called model-free reinforcement learning. Q-learning is an example of such a model.

Learning models constitute a branch of larger literature. Players follow a form of behavioral rule, such as imitation, regret minimization, or reinforcement. Learning models are most appropriate in settings where players have a good understanding of their strategic environment and where the stakes are high enough to make forecasting and optimization worthwhile. The known approaches are formulated as minimax-Q (Littman 1994), Nash-Q (Hu and Wellman 1998), tinkering with learning rates (“Win or Learn Fast”-WoLF Bowling and Veloso 2001) and multiple timescale Q-learning (Leslie and Collins 2005).

Model of Stochastic Game

Let us assume that the environment is modeled by the probability space \(\left (\Omega,\mathcal{F},\mathbf{P}\right )\). An N-person stochastic game is described by the objects \(\left (\mathfrak{N},\mathfrak{S},X_{k},A_{k},r_{k},q\right )\) with the interpretation that:
  1. 1.

    \(\mathfrak{N}\) is a set of players, with\(\left \vert \mathfrak{N}\right \vert= N \in\mathbb{N}\).

     
  2. 2.

    \(\mathfrak{S}\) is the set of states of the game, and it is finite.

     
  3. 3.

    \(\overrightarrow{X} = X_{1} \times X_{2} \times \ldots \times X_{N}\)is the state of actions, where X k is a nonempty, finite space of actions for player k.

     
  4. 4.

    A k ’s are correspondences from \(\mathfrak{S}\) into nonempty subsets of X k . For each \(s \in\mathfrak{S}\), A k (s) represents the set of actions available to player k in state s. For \(s \in\mathfrak{S}\), denote \(\overrightarrow{A}(s) = A_{1}(s) \times A_{2}(s) \times \ldots \times A_{N}(s)\).

     
  5. 5.

    \(r_{k} : \mathfrak{S} \times \overrightarrow{ X} \rightarrow \mathfrak{R}\) is a payoff function for player k.

     
  6. 6.

    q is a transition probability from \(\mathfrak{S} \times \overrightarrow{ X}\) to \(\mathfrak{S}\), called the law of motion among states. If s is a state at a certain stage of the game and the players select \(\overrightarrow{x} \in \overrightarrow{ A}(s)\), then \(q\left (\cdot \left \vert s,\right.\overrightarrow{x}\right )\) is the probability distribution of the next state of the game.

     
The stochastic game generates two processes:
  1. 1.

    \(\{\sigma _{n}\}_{n=1}^{T}\) with values in \(\mathfrak{S}\)

     
  2. 2.

    \(\{\alpha _{n}\}_{n=1}^{T}\) with values in \(\overrightarrow{X}\)

     

Strategies

Let \(\mathfrak{H} = \mathfrak{S}_{1} \times \overrightarrow{ X}_{1} \times \mathfrak{S}_{2} \times \cdots \) be the space of all infinite histories of the game and \(\mathfrak{H}_{n} = \mathfrak{S}_{1} \times \overrightarrow{ X}_{1} \times \mathfrak{S}_{2} \times \overrightarrow{ X}_{2} \times \cdots \mathfrak{S}_{n}\) the histories up to stage n.

Definition 3.

A player’s strategy \(\pi =\{\alpha _{n}\}_{n=1}^{T}\)consists of random maps \(\alpha _{n} : \Omega\times \mathfrak{H}_{n} \rightarrow X\). In other words, the strategy associates with each given history a probability distribution dependent on the set of actions available to the player. If α n is dependent on the history only, it is called deterministic.

The mathematical description of the strategies can be made as follows:
  1. 1.

    For player \(i \in\mathbb{N}\), a deterministic strategy specifies a choice of actions for the player at every stage of every possible history.

     
  2. 2.

    A mixed strategy is a probability distribution over deterministic strategies.

     
  3. 3.
    Restricted classes of strategies:
    1. 1.

      A behavioral strategy – a mixed strategy in which the mixing takes place at each history independently.

       
    2. 2.

      A Markov strategy – a behavioral strategy such that for each time t, the distribution over actions depends only on the current state, but the distribution may be different at time t than at time t t.

       
    3. 3.

      A stationary strategy – a Markov strategy in which the distribution over actions depends only on the current state (not on the time t).

       
     

The Total Payoffs Type

For any profile of strategies \(\pi = (\pi _{1},\ldots,\pi _{N})\) of the players and every initial state \(s_{1} = s \in\mathfrak{S}\), a probability measure \(P_{s}^{\pi }\) and a stochastic process \(\{\sigma _{n},\alpha _{n}\}\) are defined on \(\mathfrak{H}\) in a canonical way, where the random variables σ n and α n describe the state and the actions chosen by the players, respectively, on the nth stage of the game. Let us define \(E_{s}^{\pi }\) the expectation operator with respect to the probability measure \(P_{s}^{\pi }\). For each profile of strategies π = (π 1, , π N ) and every initial state \(s \in\mathfrak{S}\), the following are considered:
  1. 1.
    The expectedT-stage payoff to player k, for any finite horizon T, defined as
    $$\Phi _{k}^{T}(\pi )(s) = E_{ s}^{\pi }\left (\displaystyle\sum \limits _{ n=1}^{T}r_{ k}(\sigma _{n},\alpha _{n})\right )$$
     
  2. 2.
    The β-discounted expected payoff to player k, where β ∈ (0, 1) is called the discount factor, defined as
    $$\Phi _{k}^{\beta }(\pi )(s) = E_{ s}^{\pi }\left (\displaystyle\sum \limits _{ n=1}^{\infty }\beta ^{n-1}r_{ k}(\sigma _{n},\alpha _{n})\right )$$
     
  3. 3.
    The average payoff per unit time for player k defined as
    $$\Phi _{k}(\pi )(s) =\mathop{ \mathop{lim}\;\sup }\limits _{T} \frac{1} {T}\Phi _{k}^{T}(\pi )(s)$$
     

Equilibria

Let \(\pi ^{{\ast}} = \left (\pi _{1}^{{\ast}},\ldots,\pi _{N}^{{\ast}}\right ) \in\Pi \) be a fixed profile of the players’ strategies. For any strategy \(\pi _{k} \in\Pi _{k}\) of player k, we write \(\left (\pi _{-k}^{{\ast}},\pi _{k}\right )\) to denote the strategy profile obtained from π by replacing \(\pi _{k}^{{\ast}}\) with \(\pi _{k}\).

Definition 4 (A Nash equilibrium).

A strategy profile \(\pi ^{{\ast}} = \left (\pi _{1}^{{\ast}},\ldots,\pi _{N}^{{\ast}}\right ) \in\Pi \) is called a Nash equilibrium (in \(\Pi \)) for the average payoff stochastic game if no unilateral deviations from it are profitable, that is, for each sS,
$$\Phi _{k}(\pi ^{{\ast}})(s) \geq\Phi _{ k}(\pi _{-k}^{{\ast}},\pi _{ k})(s)$$
for every player k and any strategy π k .

Definition 5 (An \(\boldsymbol{\epsilon }\)-Nash equilibrium).

A strategy profile \(\pi ^{{\ast}} = \left (\pi _{1}^{{\ast}},\ldots,\pi _{N}^{{\ast}}\right )\) is called an \(\epsilon \)-(Nash) equilibrium of the average payoff stochastic game if for every \(k \in\mathfrak{N}\), we have
$$\Phi _{k}(\pi ^{{\ast}})(s) \geq\Phi _{ k}(\pi _{-k}^{{\ast}},\pi _{ k})(s)-\epsilon,$$
for the given \(\epsilon> 0\) and all π k .

Nash equilibria and \(\epsilon \)-Nash equilibria are analogously defined for the T-stage stochastic games, β-discounted stochastic games, and the average payoff per unit time stochastic games.

Construction of an Equilibrium

For stochastic games with a finite state space and finite action spaces, the existence of a stationary equilibrium has been shown (cf. Herings and Peeters 2004). The stationary strategies at time t do not depend on the entire history of the game up to that time. This allows reduction of the problem of finding discounted stationary equilibria in a general n-person stochastic game to that of finding a global minimum in a nonlinear program with linear constraints. Solving this nonlinear program is equivalent to solving a certain nonlinear system for which it is known that the objective value in the global minimum is zero (cf. Filar et al. 1991). However, as is noted by Breton (1991), the convergence of an optimization algorithm to the global optimum is not guaranteed.

The solution of the finite horizon finite stochastic game can be construct by dynamic programming (see, e.g., Nowak and Szajowski 1998; Tijms 2012). For discounted games, the solution construction is based on an equivalence (the two-person case is presented here for simplicity):
  1. 1.

    \(\left (\pi _{1}^{{\ast}},\pi _{2}^{{\ast}}\right )\) is an equilibrium point in the discounted stochastic game with equilibrium payoffs \(\left (\Phi _{1}^{\beta }\left (\overrightarrow{\pi }^{{\ast}}\right ),\Phi _{2}^{\beta }\left (\overrightarrow{\pi }^{{\ast}}\right )\right )\).

     
  2. 2.
    For each \(s \in\mathfrak{S}\), the pair \(\left (\pi _{1}^{{\ast}}(s),\pi _{2}^{{\ast}}(s)\right )\) constitutes an equilibrium point in the static bimatrix game (B 1(s),\(B_{2}(s))\) with equilibrium payoffs \(\left (\Phi _{1}^{\beta }\left (s,\overrightarrow{\pi }^{{\ast}}\right ),\Phi _{2}^{\beta }\left (s,\overrightarrow{\pi }^{{\ast}}\right )\right )\), where for players k = 1, 2, and pure actions (\(a_{1},a_{2}) \in A_{1}(s) \times A_{2}(s)\), an admissible action space at state s, the elements of \(B_{k}(s)\) related to (\(a_{1},a_{2})\)
    $$\begin{array}{c} b_{k}(s,a_{1},a_{2}) := (1-\beta )r_{k}(s,a_{1},a_{2}) \\ +\beta E_{s}^{(a_{1},a_{2})}\Phi _{k}^{\beta }\left (\overrightarrow{\pi }^{{\ast}}\right )\end{array}$$
    (1)
    An algorithm for recursive computation of stationary equilibria in stochastic games can be derived from (1). It starts with bimatrix games with β = 0, and then a careful equilibrium selection process guarantees its convergence under mild assumptions on the model (see, e.g., Herings and Peeters 2004).
     

A Brief History of the Research on Stochastic Games

The notion of a stochastic game was introduced by Shapley (1953) in the early 1950s. It is a dynamic game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage, the game is in a certain state. The players select actions, and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The process is repeated at the new state, and the play continues for a finite or an infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs.

The theory of nonzero-sum stochastic games with the average payoffs per unit time for the players started with the papers by Rogers (1969) and Sobel (1971). They considered finite state spaces only and assumed that the transition probability matrices induced by any stationary strategies of the players are irreducible. Until now, only special classes of nonzero-sum average payoff stochastic games have been shown to possess Nash equilibria (or \(\epsilon \)-equilibria). A review of various cases and results for generalization to infinite state spaces can be found in the survey paper by Nowak and Szajowski (1998).

Learning in Stochastic Game

The problem of an agent learning to act in an unknown world is both challenging and interesting. Reinforcement learning has been successful at finding optimal control policies for a single agent operating in a stationary environment, specifically a Markov decision process. Learning to act in multi-agent systems offers additional challenges (see the following surveys: Shoham and Leyton-Brown 2009, Chap. 7; Weiß and Sen 1996; Buşoniu et al. 2010). We provide here, an overview of a general idea of learning for single and multi-agent systems:
  1. 1.

    Goals of single-agent reinforcement learning are to determine the optimal value and a control policy which maximizes the payoff. The model of such a system can be built based on the framework of Markov decision processes with discounted payoff. Suppose the policy is stationary and defined by a function \(h : \mathfrak{S} \rightarrow X\). Such a policy defines what action should be taken in each state: \(\alpha _{n}(\cdot ) := h(\cdot )\). There are various ways to learn the optimal policy. The most straightforward way is based on the Q-values: \(\mbox{ Q}^{h}(s,a) =\sum \limits _{ j=0}^{\infty }\beta _{j+1}^{jr}\). The greedy action is \(a =\mathop{ \arg \max }\limits _{a'\in A(s)}Q^{h}(s,a')\) (see the article on Q-learning in Reinforcement learning).

     
  2. 2.

    Multi-agent reinforcement learning can be employed to solve a single task, or an agent may be required to perform a task in an environment with other agents, either human, robot, or software ones. In either case, from an agent’s perspective, the world is not stationary. In particular, the behavior of the other agents may change as they also learn to better perform their tasks. This type of a multi-agent nonstationary world creates a difficult problem for learning to act in these environments. Such a nonstationary scenario can be viewed as a game with multiple players. In game theory, in the study of such problems, there is generally an underlying assumption that the players have similar adaptation and learning abilities. Therefore, the actions of each agent affect the task achievement of the other agents. It allows to build the value of the game and an equilibrium strategy profile in following steps.

     

Stochastic games can be seen as extension of single-agent Markov decision process framework to include multiple agents whose actions all impact the resulting rewards and the next state. They can also be viewed as an extension of the framework of matrix games. Such a view emphasizes the difficulty of finding the optimal behavior in stochastic games since the optimal behavior of any one agent depends on the behavior of other agents. A comprehensive study of the multi-agent learning techniques for stochastic games does not yet exist. For the interested reader, there are monographs by Fudenberg and Levine (1998) and Shoham and Leyton-Brown (2009) and the special issue of AI journal (Vohra and Wellman 2007), which could be consulted.

Despite its interesting properties, Q-learning is a very slow method that requires a long period of training for learning an acceptable policy. In practice, to reduce the problem, there are parallel computing implementation models of Q-learning.

Summary and Future Directions

Details concerning solution concepts for stochastic games can be found in Filar and Vrieze (1997). The refinements of the Nash equilibrium concept have been known in the economic dynamic games (see Myerson 1978). The Nash equilibrium concept may be extended gradually when the rules of the game are interpreted in a broader sense, so as to allow preplay or even intraplay communication. A well-known extension of the Nash equilibrium is Aumann’s correlated equilibrium (see Aumann 1987), which depends only on the normal form of the game. Two other solution concepts for multistage games have been proposed by Forges (1986): the extensive form correlated equilibrium, where the players can observe private exogenous signals at every stage, and the communication equilibrium, where the players are furthermore allowed to transmit inputs to an appropriate device at every stage. An application of the notion of correlated equilibria for stochastic games can be found in Nowak and Szajowski (1998).

In economics, in the context of economic growth problems, Ramsey (1928) has introduced an overtaking optimality and independently (Rubinstein 1979) for repeated games. The criterion has been investigated for some stochastic games by Carlson and Haurie (1995) and Nowak (2008), and others. The existence of overtaking optimal strategies is a subtle issue, and there are counterexamples showing that one has to be careful with making statements on overtaking optimality.

Regarding a stochastic game and learning, let us mention that the first idea can be found in the papers by Brown (1951) and Robinson (1951). Some convergence results for a fictitious play have been given by Shoham and Leyton-Brown (2009) in Theorem 7.2.5. An important example showing non-convergence was given by Shapley (1964). In multi-person stochastic games and learning, convergence to equilibria is a basic stability requirement (see, e.g., Greenwald and Hall 2003; Hu and Wellman 2003). This means that the agents’ strategies should eventually converge to a coordinated equilibrium. Nash equilibrium is most frequently used, but their usefulness is suspected. For instance, in Shoham and Leyton-Brown (2009), there is an argument that the link between stage-wise convergence to Nash equilibria and the performance in stochastic games is unclear.

Cross-References

Bibliography

  1. Aumann RJ (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55:1–18. doi:10.2307/1911154CrossRefzbMATHMathSciNetGoogle Scholar
  2. Bowling M, Veloso M (2001) Rational and convergent learning in stochastic games. In: Proceedings of the 17th international joint conference on artificial intelligence (IJCAI), Seattle, pp 1021–1026Google Scholar
  3. Breton M (1991) Algorithms for stochastic games. In: Raghavan TES, Ferguson TS, Parthasarathy T, Vrieze OJ (eds) Stochastic games and related topics: in honor of Professor L. S. Shapley, vol 7. Springer Netherlands, Dordrecht, pp 45–57. doi:10.1007/978-94-011-3760-7_5CrossRefGoogle Scholar
  4. Brown GW (1951) Iterative solution of games by fictitious play. In: Koopmans TC (ed) Activity analysis of production and allocation. Wiley, New York, Chap. XXIV, pp 374–376Google Scholar
  5. Buşoniu L, Babuška R, Schutter BD (2010) Multi-agent reinforcement learning: an overview. In: Srinivasan D, Jain LC (eds) Innovations in multi-agent systems and application–1. Springer, Berlin, pp 183–221Google Scholar
  6. Carlson D, Haurie A (1995) A turnpike theory for infinite horizon open-loop differential games with decoupled controls. In: Olsder GJ (ed) New trends in dynamic games and applications. Annals of the international society of dynamic games, vol 3. Birkhäuser, Boston, pp 353–376CrossRefGoogle Scholar
  7. Filar J, Vrieze K (1997) Competitive Markov decision processes. Springer, New YorkzbMATHGoogle Scholar
  8. Filar JA, Schultz TA, Thuijsman F, Vrieze OJ (1991) Nonlinear programming and stationary equilibria in stochastic games. Math Program 50(2, Ser A):227–237. doi:10.1007/BF01594936Google Scholar
  9. Forges F (1986) An approach to communication equilibria. Econometrica 54:1375–1385. doi:10.2307/1914304CrossRefzbMATHMathSciNetGoogle Scholar
  10. Fudenberg D, Levine DK (1998) The theory of learning in games, vol 2. MIT, CambridgezbMATHGoogle Scholar
  11. Greenwald A, Hall K (2003) Correlated-Q learning. In: Proceedings 20th international conference on machine learning (ISML-03), Washington, DC, 21–24 Aug 2003, pp 242–249Google Scholar
  12. Herings PJ-J, Peeters RJAP (2004) Stationary equilibria in stochastic games: structure, selection, and computation. J Econ Theory 118(1):32–60. doi:10.1016/j.jet.2003.10.001zbMATHMathSciNetGoogle Scholar
  13. Hu J, Wellman MP (1998) Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the 15th international conference on machine learning, New Brunswick, pp 242–250Google Scholar
  14. Hu J, Wellman MP (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069MathSciNetGoogle Scholar
  15. Leslie DS, Collins EJ (2005) Individual Q-learning in normal form games. SIAM J Control Optim 44(2):495–514. doi:10.1137/S0363012903437976CrossRefzbMATHMathSciNetGoogle Scholar
  16. Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 13th international conference on machine learning, New Brunswick, pp 157–163Google Scholar
  17. Myerson RB (1978) Refinements of the Nash equilibrium concept. Int J Game Theory 7(2):73–80. doi:10.1007/BF01753236CrossRefzbMATHMathSciNetGoogle Scholar
  18. Nowak AS (2008) Equilibrium in a dynamic game of capital accumulation with the overtaking criterion. Econ Lett 99(2):233–237. doi:10.1016/j.econlet.2007.05.033CrossRefzbMATHGoogle Scholar
  19. Nowak AS, Szajowski K (1998) Nonzerosum stochastic games. In: Bardi M, Raghavan TES, Parthasarathy T (eds) Stochastic and differential games: theory and numerical methods. Annals of the international society of dynamic games, vol 4. Birkhäser, Boston, pp 297–342. doi:10.1007/978-1-4612-1592-9_7Google Scholar
  20. Ramsey F (1928) A mathematical theory of savings. Econ J 38:543–559CrossRefGoogle Scholar
  21. Robinson J (1951) An iterative method of solving a game. Ann Math 2(54):296–301. doi:10.2307/1969530CrossRefGoogle Scholar
  22. Rogers PD (1969) Nonzero-sum stochastic games, PhD thesis, University of California, Berkeley. ProQuest LLC, Ann ArborGoogle Scholar
  23. Rubinstein A (1979) Equilibrium in supergames with the overtaking criterion. J Econ Theory 21:1–9. doi:10.1016/0022-0531(79)90002-4CrossRefzbMATHGoogle Scholar
  24. Shapley L (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100. doi:10.1073/pnas.39.10.1095CrossRefzbMATHMathSciNetGoogle Scholar
  25. Shapley L (1964) Some topics in two-person games. Ann Math Stud 52:1–28zbMATHGoogle Scholar
  26. Shoham Y, Leyton-Brown K (2009) Multiagent systems: algorithmic, game-theoretic, and logical foundations. Cambridge University Press, Cambridge. doi:10.1017/CBO9780511811654Google Scholar
  27. Sobel MJ (1971) Noncooperative stochastic games. Ann Math Stat 42:1930–1935. doi:10.1214/aoms/1177693059CrossRefzbMATHMathSciNetGoogle Scholar
  28. Tijms H (2012) Stochastic games and dynamic programming. Asia Pac Math Newsl 2(3):6–10MathSciNetGoogle Scholar
  29. Vohra R, Wellman M (eds) (2007) Foundations of multi-agent learning. Artif Intell 171:363–452Google Scholar
  30. Weiß G, Sen S (eds) (1996) Adaption and learning in multi-agent Systems. In: Proceedings of the IJCAI’95 workshop, Montréal, 21 Aug 1995, vol 1042. Springer, Berlin. doi:10.1007/3-540-60923-7Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  1. 1.Faculty of Fundamental Problems of Technology, Institute of Mathematics and Computer Science, Wroclaw University of TechnologyWroclawPoland