1 Introduction

A 15-year-period research on quantum games results in many ideas of how a quantum game might look like and how it might be played. Certainly, the quantum scheme for \(2 \times 2\) games introduced in [1] (the EWL scheme) has become one of the most common models and it has already found application in more complex games (see, for example, [2]). However, the more complex the classical game is, the more sophisticated techniques are required to find optimal players’ strategies in the EWL-type scheme. While in the scheme for \(2 \times 2\) games the result of the game depends on six real parameters (each players’ strategy is a unitary operator from \(\mathsf {SU}(2)\), and it is defined by three real parameters), the EWL-type scheme for \(3\times 3\) games would already require 16 parameters to take into account [3, 4]. One way to avoid cumbersome calculations when studying a game in the quantum domain was presented in [5] (see also recent papers [68] and [9] based on this scheme). The authors defined a model (the MW scheme) for quantum game where the players’ unitary strategies were restricted to the identity and bit-flip operator. Then, the game became quantum if the players’ local operators were performed on some fixed entangled state \(|\varPsi \rangle \) (called the players’ joint strategy). The MW scheme appears to be much simpler than the EWL scheme. The number of pure strategies of each player is the same as in the classical game [10]. Thus, the complexity of finding a rational solution is similar in both a classical game and the corresponding quantum counterpart. Unfortunately, that simple scheme exhibits some undesirable properties that we pointed out in [11]. First, the MW scheme implies non-classical game even if the players’ joint strategy is an unentangled state. In particular, if a player’s qubit is in an equal superposition of computational basis states, she cannot affect the game outcome in contrast to her strategic position in the classical game. Moreover, the players have no impact on the form of the initial state. In paper [11], we showed that the above-mentioned drawbacks vanish by allowing the players to choose between the basis state that represents the classical game and the state \(|\varPsi \rangle \). In this paper, we continue that line of research. We give a formal description for players’ strategies to include the choice of the initial state in the MW scheme. It will allow us to move beyond bimatrix games examined in [11] and consider more general normal-form games. Then, we study possible applications of the scheme.

Some knowledge of game theory is required to follow this paper. While theory of bimatrix games is commonly used in quantum game theory, the notion of normal representation of extensive games may not be known for readers that deal with quantum games. Therefore, we encourage the reader who is not familiar with extensive game theory to see one of the textbooks [12, 13].

2 Refinement of the Marinatto–Weber scheme

In paper [11], we introduced a new scheme for playing finite bimatrix games in the quantum domain. The idea behind the scheme is that the players can choose whether they play a classical game or its quantum counterpart defined by the MW scheme. In the case of quantum model for \(2\times 2\) bimatrix games, this means that the players choose their local operations: the identity \({\mathbb {1}}\) or the Pauli operator \(\sigma _{x}\) and, additionally, they decide whether the chosen operators are performed on state \(|00k\rangle \) or some fixed state \(|\varPsi \rangle \in \mathbb {C}^2\otimes \mathbb {C}^2\). Now, we give a formal description for the scheme.

2.1 Quantum model for \(2\times 2\) bimatrix game

Let us consider a \(2\times 2\) game

$$\begin{aligned} \begin{pmatrix}(a_{00},b_{00}) &{} (a_{01}, b_{01}) \\ (a_{10}, b_{10}) &{} (a_{11}, b_{11})\end{pmatrix}, ~~\text{ where }~~(a_{ij},b_{ij})\in \mathbb {R}^2. \end{aligned}$$
(1)

The quantum scheme for game (1) is defined on an inner product space \((\mathbb {C}^2)^{\otimes 4}\) by the following components:

  1. 1.

    A positive operator \(H\),

    $$\begin{aligned} H = ({\mathbb {1}}\otimes {\mathbb {1}} - |11\rangle \langle 11|)\otimes |00\rangle \langle 00| + |11\rangle \langle 11|\otimes |\varPsi \rangle \langle \varPsi |, \end{aligned}$$
    (2)

    where \(|\varPsi \rangle \in \mathbb {C}^2\otimes \mathbb {C}^2\) such that \(\Vert |\varPsi \rangle \Vert = 1\),

  2. 2.

    Players’ pure strategies: \(P^{(1)}_{i}\otimes U^{(3)}_{j}\) for player 1, \(P^{(2)}_{k}\otimes U^{(4)}_{l}\) for player 2, where \(i,j,k,l =0,1\), and the upper indices identify the subspace \(\mathbb {C}^2\) of \((\mathbb {C}^2)^{\otimes 4}\) on which the operators

    $$\begin{aligned} P_{0} = |0\rangle \langle 0|,~ P_{1} = |1\rangle \langle 1|, \quad U_{0} = {\mathbb {1}},~ U_{1} = \sigma _{x}, \end{aligned}$$
    (3)

    are defined. That is, player 1 acts on the first and third qubit and player 2 acts on the second and fourth one. The order of qubits is in line with the upper indices.

  3. 3.

    Measurement operators \(M_{1}\) and \(M_{2}\) are given by formula

    $$\begin{aligned} M_{1(2)} = {\mathbb {1}}\otimes {\mathbb {1}}\otimes \left( \sum _{x,y = 0,1}a_{xy}(b_{xy})|xy\rangle \langle xy|\right) , \end{aligned}$$
    (4)

    where \(a_{xy}\) and \(b_{xy}\) are the payoffs from (1).

The scheme proceeds in the similar way as the MW scheme or the EWL scheme—the players determine the final state by choosing their strategies and acting on operator \(H\). As a result, they determine the following density operator:

$$\begin{aligned} \rho _{\mathrm{f}}= & {} \left( P^{(1)}_{i}\otimes P^{(2)}_{k}\otimes U^{(3)}_{j}\otimes U^{(4)}_{l}\right) H \left( P^{(1)}_{i}\otimes P^{(2)}_{k}\otimes U^{(3)}_{j}\otimes U^{(4)}_{l}\right) \nonumber \\= & {} {\left\{ \begin{array}{ll}|11\rangle \langle 11|\otimes \left( U^{(3)}_{j}\otimes U^{(4)}_{l}|\varPsi \rangle \langle \varPsi | U^{(3)}_{j}\otimes U^{(4)}_{l}\right) &{}\hbox {if }i=k=1 \\ |ik\rangle \langle ik|\otimes \left( U^{(3)}_{j}\otimes U^{(4)}_{l}|00\rangle \langle 00|U^{(3)}_{j}\otimes U^{(4)}_{l}\right) &{}\hbox {if otherwise}. \end{array}\right. } \end{aligned}$$
(5)

Next, the payoffs for player 1 and 2 are

$$\begin{aligned} \mathrm{tr}(\rho _{\mathrm{f}}M_{1})~~\text{ and }~~\mathrm{tr}(\rho _{\mathrm{f}}M_{2}). \end{aligned}$$
(6)

Similar to the MW scheme, each player is allowed to use mixed strategies, i.e., to choose her own strategies according to some probability distribution. Let \((p_{ij})_{ij=0,1}\) be a probability distribution over the set \(\left\{ P^{(1)}_{i} \otimes U^{(3)}_{j}:i,j =0,1\right\} \) and \((q_{kl})_{k,l = 0,1}\) be a probability distribution over \(\left\{ P^{(2)}_{k}\otimes U^{(4)}_{l}:k,l = 0,1\right\} \). Then, the resulting density operator takes the form

(7)

Note that scheme (2)–(4) generalizes the classical way of playing the game. If the players’ strategy profile takes the form

$$\begin{aligned} P^{(1)}_{0}\otimes P^{(2)}_{0} \otimes U^{(3)}_{j} \otimes U^{(4)}_{l}, \end{aligned}$$
(8)

the players’ payoffs depend on \(U^{(3)}_{j}\) and \(U^{(4)}_{l}\) and are equal to

$$\begin{aligned} \mathrm{tr}\left( \left( U^{(3)}_{j} \otimes U^{(4)}_{l}|00\rangle \langle 00| U^{(3)}_{j} \otimes U^{(4)}_{l}\right) \sum _{x,y = 0,1}a_{xy}(b_{xy})|xy\rangle \langle xy|\right) = a_{jl}(b_{jl}). \end{aligned}$$
(9)

Obviously, if \(U^{(3)}_{j}\) and \(U^{(4)}_{j}\) are chosen according to some probability distributions \(\{p_{00}, p_{01}\}\) and \(\{q_{00}, q_{01}\}\), respectively, the resulting distribution over \(a_{jl}(b_{jl})\) coincides with one given by the corresponding mixed strategy profile in game (1). As a result, scheme (2)–(4) determines a game that is a complete quantization of (1) (see [14] for the definition of complete quantization).

Nash equilibrium In non-cooperative quantum game theory, Nash equilibrium is the most used solution concept. It is defined as a profile of strategies of all players in which each strategy is a best response to the other strategies. In view of scheme (2)–(4), it is a mixed strategy profile \(\left( (p^*_{ij})_{i,j=0,1},(q^*_{kl})_{i,j=0,1}\right) \) that solves the following optimization problems:

$$\begin{aligned}&(p^*_{ij})\in \arg \!\max _{(p_{ij})}\mathrm{tr}\left( \sum _{i,j,k,l = 0,1}p_{ij}q^*_{kl}S_{ikjl}HS_{ikjl}M_{1}\right) ,\end{aligned}$$
(10)
$$\begin{aligned}&(q^*_{kl})\in \arg \!\max _{(q_{kl})}\mathrm{tr}\left( \sum _{i,j,k,l = 0,1}p^*_{ij}q_{kl}S_{ikjl}HS_{ikjl}M_{2}\right) , \end{aligned}$$
(11)

where \(S_{ikjl} = P^{(1)}_{i}\otimes P^{(2)}_{k}\otimes U^{(3)}_{j} \otimes U^{(4)}_{l}\). Like in the classical game theory, we can simplify conditions (10) and (11) and only check whether \((p^*_{ij})\) or \((q^*_{kl})\) yields a payoff that is equal to a maximum payoff when choosing pure strategies. More formally, condition (10) is equivalent to the following one

$$\begin{aligned} \mathrm{tr}\left( \sum _{i,j,k,l = 0,1}p^*_{ij}q^*_{kl}S_{ikjl}HS_{ikjl}M_{1}\right) = \max _{i,j = 0,1}{\mathrm{tr}\left( \sum _{k,l = 0,1}q^*_{kl}S_{ikjl}HS_{ikjl}M_{1}\right) }.\qquad \end{aligned}$$
(12)

It follows from the fact that \(\mathrm{tr}(\rho _{\mathrm{f}}M_{1})\) for density operator \(\rho _{\mathrm{f}}\) given by (7) is a convex combination of elements

$$\begin{aligned} \mathrm{tr}\left( \sum _{k,l = 0,1}q^*_{kl}S_{ikjl}HS_{ikjl}M_{1}\right) ~~\text{ for }~~i,j =0,1 \end{aligned}$$
(13)

with weights \(p_{ij}\). In similar way, we can simplify condition (11).

Bimatrix form The game given by scheme (2)–(4) can be expressed in terms of bimatrix form. Each entry of the bimatrix is a pair \(\left( \mathrm{tr}(\rho _{\mathrm{f}}M_{1}), \mathrm{tr}(\rho _{\mathrm{f}}M_{2})\right) \) of payoffs that corresponds to a particular profile \(P^{(1)}_{i}\otimes P^{(2)}_{k}\otimes U^{(3)}_{j} \otimes U^{(4)}_{l}\). As a result, we obtain

(14)

where

$$\begin{aligned} (\alpha _{ij},\beta _{ij}) = \left( \mathrm{tr}(\rho _{ij}M_{1}), \mathrm{tr}(\rho _{ij}M_{2})\right) ~~\text{ for }~~\rho _{ij} = |11\rangle \langle 11|\otimes \left( U_{i}\otimes U_{j} |\varPsi \rangle \langle \varPsi | U_{i}\otimes U_{j}\right) . \end{aligned}$$
(15)

Bimatrix (14) is a very convenient way to study the game determined by scheme (2)–(4). Once the entries \(\left( \mathrm{tr}(\rho _{\mathrm{f}}M_{1}), \mathrm{tr}(\rho _{\mathrm{f}}M_{2})\right) \) are specified, we can leave quantum formalism out and use (14). This is due to the linearity of trace that makes a density operator (7) and the corresponding probability distribution over pure strategies equivalent in a sense of generated outcomes. For example, in order to find Nash equilibria, we can use the techniques for bimatrix games instead of conditions (10) and (11).

Note that bimatrix (14) clearly shows the role of components \(P_{i}\) of players’ strategies. Namely, the operations \(U^{(3)}_{j}\otimes U^{(4)}_{l}\) are performed on state \(|\varPsi \rangle \) if and only if both players form profile \(P^{(1)}_{1}\otimes P^{(2)}_{1} \otimes U^{(3)}_{j}\otimes U^{(4)}_{l}\).

The scheme can be generalized to include more than one joint strategy \(|\varPsi \rangle \). Let us define operator \(H\) on \(\left( \mathbb {C}^n\otimes \mathbb {C}^{n}\right) \otimes \left( \mathbb {C}^2\otimes \mathbb {C}^2\right) \),

$$\begin{aligned} H = \left( {\mathbb {1}}_{n^2\times n^2} - \sum ^n_{i=1}|ii\rangle \langle ii|\right) \otimes |00\rangle \langle 00| + \sum ^n_{i=1}|ii\rangle \langle ii|\otimes |\varPsi _{i}\rangle \langle \varPsi _{i}| \end{aligned}$$
(16)

and players’ pure strategies

$$\begin{aligned} P^{(1)}_{i}\otimes U^{(3)}_{j}, P^{(2)}_{k}\otimes U^{(4)}_{l} \in \{|0\rangle \langle 0|, |1\rangle \langle 1|, \dots , |n\rangle \langle n|\}\otimes \{{\mathbb {1}}, \sigma _{x}\}. \end{aligned}$$
(17)

In this case, the local operators \(U^{(3)}_{j}\otimes U^{(4)}_{l}\) are performed on state \(|\varPsi _{i}\rangle \) if and only if the resulting stategy profile takes the form \(|ii\rangle \langle ii|\otimes U^{(3)}_{j}\otimes U^{(4)}_{l}\).

2.2 Quantum model for general bimatrix games

We showed in [10] how to construct the scheme for any finite bimatrix game according to the MW model. The key elements of the scheme are appropriately defined operators for players. In the case of \((n+1)\times (m+1)\) bimatrix game,

$$\begin{aligned} \begin{pmatrix}(a_{00},b_{00}) &{} (a_{01}, b_{01}) &{} \cdots &{} (a_{0m}, b_{0m})\\ (a_{10}, b_{10}) &{} (a_{11}, b_{11}) &{} \cdots &{} (a_{1m}, b_{1m})\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ (a_{n0},b_{n0}) &{} (a_{n1}, b_{n1}) &{} \cdots &{} (a_{nm},b_{nm})\end{pmatrix}, ~~(a_{ij},b_{ij})\in \mathbb {R}^2. \end{aligned}$$
(18)

where \(n,m\ge 1\), player 1 (player 2) has \(n+1\) operators \(U_{i}\) (\(m+1\) operators \(V_{j}\)) defined on space \(\mathbb {C}^{n+1}\) (\(\mathbb {C}^{m+1}\)) that act on basis states \(\{|0\rangle , |1\rangle , \dots , |n\rangle \}\) (\(\{|0\rangle , |1\rangle , \dots , |m\rangle \}\)) as follows:

$$\begin{aligned}&\!\!U_{0}|i\rangle = |i\rangle , ~~U_{1}|i\rangle = |i+1 ~\mathrm{mod}~ n+1\rangle , ~\dots ~~U_{n}|i\rangle = |i+n ~\mathrm{mod}~ n+1\rangle ;\qquad \end{aligned}$$
(19)
$$\begin{aligned}&\!\!V_{0}|i\rangle = |i\rangle , ~~V_{1}|i\rangle = |i+1 ~\mathrm{mod}~ m+1\rangle , ~\dots ~~V_{m}|i\rangle = |i+m ~\mathrm{mod}~ m+1\rangle .\qquad \quad \end{aligned}$$
(20)

In view of (19) and (20), scheme (2)–(4) can be generalized by the players’ strategies

$$\begin{aligned} \{P_{0},P_{1}\}\otimes \{U_{0},U_{1},\dots ,U_{n}\}~~\text{ and }~~\{P_{0},P_{1}\}\otimes \{V_{0}, V_{1},\dots , V_{m}\}. \end{aligned}$$
(21)

and the positive operator having the same form as (2), but with the outer product operators \(|00\rangle \langle 00|, |\varPsi \rangle \langle \varPsi |\) defined on \(\mathbb {C}^{n+1}\otimes \mathbb {C}^{m+1}\).

3 Quantum approach to finite normal-form games

In the previous section, we formalized the refinement of the MW scheme that was introduced in [11]. We obtained the scheme that can be applied to any finite bimatrix game. In this section, we construct a framework for general normal-form games. The term of normal-form game has two main meanings. One concerns a strategic game given a priori. It is defined by triple \((N,\{S_{i}\}_{i\in N}, \{u_{i}\}_{i\in N})\), where \(N\) is a set of players and, for \(i\in N\), components \(S_{i}\) and \(u_{i}\) are player i’s strategy set and payoff function, respectively. The second meaning concerns a strategic game \((N,\{S_{i}\}_{i\in N}, \{u_{i}\}_{i\in N})\) that is generated by a game in extensive form. The strategic game obtained in this way is called the normal representation of the extensive game. In what follows, we extend the scheme (2)–(4) to cover both cases.

3.1 Strategic-form game

The difference between bimatrix games and finite strategic games is that more than two players (say \(n\) players) are allowed in the latter case. Therefore, operator (2) has to be modified in such a way that it simply outputs a density operator after \(n\) players’ strategies act on it.

For simplicity of our analysis, we restrict our attention to \(n\)-person strategic games with each \(S_{i}\) having two elements. The extension of scheme (2)–(4) is defined now on space \((\mathbb {C}^2)^{\otimes n} \otimes (\mathbb {C}^2)^{\otimes n}\) with the positive operator \(H\),

$$\begin{aligned} H = \left( {\mathbb {1}}^{\otimes n} - (|1\rangle \langle 1|)^{\otimes n}\right) \otimes (|0\rangle \langle 0|)^{\otimes n} + (|1\rangle \langle 1|)^{\otimes n} \otimes |\varPsi \rangle \langle \varPsi |. \end{aligned}$$
(22)

where \(|\varPsi \rangle \in (\mathbb {C}^{2})^{\otimes n}\), \(\Vert |\varPsi \rangle \Vert = 1\). Each player \(i\in \{1,\dots , n\}\) has a strategy determined by (3) that acts on qubits \(i\) and \(n+i\), i.e., it is on the form \(P^{(i)}_{j_{i}}\otimes U^{(n+i)}_{j_{n+i}},\) where \(j_{i}, j_{n+i} = 0,1.\) As a result, a profile of players’ strategies forms operator \(\left( \bigotimes ^n_{i=1}P^{(i)}_{j_{i}}\right) \otimes \left( \bigotimes ^{n}_{i=1}U^{(n+i)}_{j_{n+i}}\right) \) that results in the following density operator:

$$\begin{aligned} \rho _{\mathrm{f}}= & {} \left[ \left( \bigotimes ^n_{i=1}P^{(i)}_{j_{i}}\right) \otimes \left( \bigotimes ^{n}_{i=1}U^{(n+i)}_{j_{n+i}}\right) \right] H\left[ \left( \bigotimes ^n_{i=1}P^{(i)}_{j_{i}}\right) \otimes \left( \bigotimes ^{n}_{i=1}U^{(n+i)}_{j_{n+i}}\right) \right] \nonumber \\= & {} {\left\{ \begin{array}{ll}(|1\rangle \langle 1|)^{\otimes n} \otimes \left[ \left( \bigotimes \nolimits _{i=1}^{n}U^{(n+i)}_{j_{n+i}}\right) |\varPsi \rangle \langle \varPsi | \left( \bigotimes \nolimits _{i=1}^{n}U^{(n+i)}_{j_{n+i}}\right) \right] &{}\hbox {if } j_{1},\dots , j_{n} = 1 \\ \bigotimes \nolimits _{i=1}^{n}|j_{i}\rangle \langle j_{i}|\otimes \left[ \left( \bigotimes \nolimits _{i=1}^{n}U^{(n+i)}_{j_{n+i}}\right) (|0\rangle \langle 0|)^{\otimes n}\left( \bigotimes \nolimits _{i=1}^{n}U^{(n+i)}_{j_{n+i}}\right) \right] &{}\hbox { if otherwise}.\end{array}\right. }\nonumber \\ \end{aligned}$$
(23)

Finally, we define for each player \(i\) the payoff measurement \(M_{i}\),

$$\begin{aligned} M_{i} = {\mathbb {1}}^{\otimes n}\otimes \left( \sum _{x_{1},\dots ,x_{n}=0,1} a^i_{x_{1},\dots ,x_{n}}|x_{1}\dots x_{n}\rangle \langle x_{1}\dots x_{n}|\right) , \end{aligned}$$
(24)

where \(a^i_{x_{1},\dots ,x_{n}}\) is player \(i\)’s payoff in the classical game that corresponds to strategy profile consisting of \((x_{1} + 1)\)th strategy of player 1, \((x_{2} + 1)\) strategy of player 2, ..., \((x_{n} + 1)\) strategy of player \(n\). It is not difficult to check that scheme (22)–(24) generalizes an \(n\)-person strategic game with two strategies for each player. If the joint strategy \(|\varPsi \rangle \) is not played, i.e., element \(\left( \bigotimes ^n_{i=1}P^{(i)}_{j_{i}}\right) \) of a strategy profile is not equal to \((|1\rangle \langle 1|)^{\otimes n}\), then

$$\begin{aligned}&\mathrm{tr}{\left[ \left( \bigotimes ^{n}_{i=1}|j_{i}\rangle \langle j_{i}|\right) \otimes \left[ \left( \bigotimes ^n_{i=1}U^{(n+i)}_{j_{n+i}} \right) (|0\rangle \langle 0|)^{\otimes n}\left( \bigotimes ^n_{i=1}U^{(n+i)}_{j_{n+i}} \right) \right] M_{i}\right] } \nonumber \\&\quad = \mathrm{tr}{\left[ \left( \bigotimes ^{n}_{i=1}|j_{i}\rangle \langle j_{i}|\right) \otimes \left( \bigotimes ^{n}_{i=1}|j_{n+i}\rangle \langle j_{n+i}|\right) M_{i} \right] } = a_{j_{n+1, \dots , 2n}}. \end{aligned}$$
(25)

Thus, for strategic-form game \(\left( N, \{S_{i}\}_{i\in N}, \{u_{i}\}_{i\in N}\right) \),

$$\begin{aligned} N = \{1,\dots , n\},~~S_{i} = \left\{ s^{(i)}_{0}, s^{(i)}_{1}\right\} ,~~u_{i}\left( s^{(1)}_{k_{1}},\dots , s^{(n)}_{k_{n}}\right) = a^i_{k_{1},\dots , k_{n}}. \end{aligned}$$
(26)

The game generated by scheme (22)–(24) is equivalent to game (26) if strategies \(s^{(i)}_{0}\) and \(s^{(i)}_{1}\) are identified, respectively, with \(U^{(n+i)}_{0}\) and \(U^{(n+i)}_{1}\) for each \(i\). s

Example 1

Let us consider the three-person Prisoner’s Dilemma that was studied in the quantum domain (via the EWL scheme) by Du et al. [15]. In terms of matrices the game is defined as follows:

(27)

Here, players 1 and 2 choose between the rows and the columns, respectively, whereas player 3 chooses between the matrices. We recall that the only Nash equilibrium in (27) is a profile consisting of the players’ second strategies. Thus, the most reasonable result of the game is \((1,1,1)\). Similar to the best-known 2-person Prisoner’s Dilemma, the players would increase their payoffs if at least two of them played their first strategies. However, the first strategy cannot be played by a rational player since for each profile of the opponents’ strategies this strategy always yields a worse payoff than the second strategy. In what follows, we apply scheme (22)–(24) to game (27). According to the reasoning used immediately before Example 1, we identify each player’s strategies in game (27) with local operators \(U_{0}\) and \(U_{1}\). Moreover, let us assume that player \(i\), \(i=1,2,3\) acts on the system of \(i\)th and \((i+3)\)th qubit. As a result, scheme (22)–(24) comes down to one defined on \((\mathbb {C}^{2})^{\otimes 3} \otimes (\mathbb {C}^2)^{\otimes 3}\) with the positive operator

$$\begin{aligned} H = \left( {\mathbb {1}}^{\otimes 3} - |111\rangle \langle 111|\right) \otimes |000\rangle \langle 000| + |111\rangle \langle 111| \otimes |\varPsi \rangle \langle \varPsi |, \end{aligned}$$
(28)

the player \(i\)’s strategy set

$$\begin{aligned} \left\{ P^{(i)}_{0}\otimes U^{(i+3)}_{0}, P^{(i)}_{0}\otimes U^{(i+3)}_{1}, P^{(i)}_{1}\otimes U^{(i+3)}_{0}, P^{(i)}_{1}\otimes U^{(i+3)}_{1}\right\} , \end{aligned}$$
(29)

and the triple of payoff operators

$$\begin{aligned} (M_{1}, M_{2}, M_{3})= & {} {\mathbb {1}}^{\otimes 3}\otimes \bigl [(3,3,3)|000\rangle \langle 000| + (2,2,5)|001\rangle \langle 001| \nonumber \\&+ (2,5,2)|010\rangle \langle 010|+ (0,4,4)|011\rangle \langle 011| + (5,2,2)|100\rangle \langle 100| \nonumber \\&+ (4,0,4)|101\rangle \langle 101|+ (4,4,0)|110\rangle \langle 110| + (1,1,1)|111\rangle \langle 111|\bigr ].\nonumber \\ \end{aligned}$$
(30)

Let us fix now the players’ joint strategy \(|\varPsi \rangle \) as:

$$\begin{aligned} |\varPsi \rangle = \frac{1}{2}\left( |001\rangle + |010\rangle + |100\rangle + |111\rangle \right) \end{aligned}$$
(31)

and determine the resulting players’ payoffs that correspond to profiles

$$\begin{aligned} \bigotimes ^{3}_{k=1}P^{(k)}_{j_{k}}\otimes \bigotimes ^6_{k=4}U^{(k)}_{j_{k}},~j_{k} \in \{0,1\}. \end{aligned}$$
(32)

Note that for fixed \(\bigotimes ^6_{k=4} U^{(k)}_{j_{k}}\), the value

$$\begin{aligned} \mathrm{tr}{\left[ \left( \bigotimes ^{3}_{k=1}P^{(k)}_{j_{k}} \otimes \bigotimes ^6_{k=4} U^{(k)}_{j_{k}}\right) H \left( \bigotimes ^{3}_{k=1}P^{(k)}_{j_{k}} \otimes \bigotimes ^6_{k=4} U^{(k)}_{j_{k}}\right) M_{i}\right] },~i=1,2,3 \end{aligned}$$
(33)

is the same for each \(\bigotimes ^{3}_{k=1}P^{(k)}_{j_{k}} \ne |111\rangle \langle 111|\). Therefore, the problem of determining all the 64 payoff profiles actually reduces to determining \(64 - 6\cdot 8 = 16\) of them. For example,

$$\begin{aligned}&\left( P^{(1)}_{1}\otimes P^{(2)}_{0} \otimes P^{(3)}_{0} \otimes U^{(4)}_{0}\otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) \nonumber \\&\quad H\left( P^{(1)}_{1}\otimes P^{(2)}_{0} \otimes P^{(3)}_{0} \otimes U^{(4)}_{0}\otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) \nonumber \\&\quad = |100\rangle \langle 100|\otimes \left( U^{(4)}_{0} \otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) |000\rangle \langle 000|\left( U^{(4)}_{0} \otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) \nonumber \\&\quad = |100\rangle \langle 100| \otimes |001\rangle \langle 001|. \end{aligned}$$
(34)

Then,

$$\begin{aligned} \mathrm{tr}{\left( |100\rangle \langle 100| \otimes |001\rangle \langle 001|M_{i}\right) } = {\left\{ \begin{array}{ll}2 &{}\hbox {if }i\in \{1,2\} \\ 5 &{}\hbox {if }i=3.\end{array}\right. } \end{aligned}$$
(35)

Hence, we obtain the same payoffs if \(P^{(1)}_{1}\otimes P^{(2)}_{0} \otimes P^{(3)}_{0}\) is replaced by \(P^{(1)}_{j_{1}}\otimes P^{(2)}_{j_{2}} \otimes P^{(3)}_{j_{3}} \ne |111\rangle \langle 111|\). For case \(P^{(1)}_{1}\otimes P^{(2)}_{1} \otimes P^{(3)}_{1}\), we have

$$\begin{aligned}&\left( P^{(1)}_{1}\otimes P^{(2)}_{1} \otimes P^{(3)}_{1} \otimes U^{(4)}_{0}\otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) \nonumber \\&\quad H\left( P^{(1)}_{1}\otimes P^{(2)}_{1} \otimes P^{(3)}_{1} \otimes U^{(4)}_{0}\otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) \nonumber \\&\quad = |111\rangle \langle 111| \otimes \left( U^{(4)}_{0}\otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) |\varPsi \rangle \langle \varPsi |\left( U^{(4)}_{0}\otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) \nonumber \\&\quad = |111\rangle \langle 111|\otimes \left( |\varPsi '\rangle \langle \varPsi '|\right) \end{aligned}$$
(36)

where \(|\varPsi '\rangle = (|000\rangle + |011\rangle + |101\rangle + |110\rangle )/2\). State (36) implies the payoff

$$\begin{aligned} \mathrm{tr}{\left[ |111\rangle \langle 111|\otimes \left( U^{(4)}_{0}\otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) |\varPsi \rangle \langle \varPsi |\left( U^{(4)}_{0}\otimes U^{(5)}_{0} \otimes U^{(6)}_{1}\right) M_{i}\right] } = \frac{11}{4}\nonumber \\ \end{aligned}$$
(37)

for each \(i=1,2,3\). Having determined the payoffs associated with each strategy profile, we can describe the game given by scheme (28)–(30) with the use of four matrices

We see from the matrix representation that there are two types of pure Nash equilibria. The first one corresponds to the unique equilibrium in game (27), and it is generated by profiles

$$\begin{aligned} \bigotimes ^3_{k=1}P^{(k)}_{j_{k}} \otimes \bigotimes ^{6}_{k=4} U^{(k)}_{1}~~\text{ where }~~(j_{1},j_{2},j_{3}) \in \{(0,0,0), (1,0,0), (0,1,0), (0,0,1)\}. \end{aligned}$$
(38)

Each profile of (38) is a Nash equilibrium since each player’s unilateral deviation from the equilibrium strategy yields the payoff 0 or 1. It also follows from the construction of (22)–(24). Namely, if a player cannot cause the joint strategy \(|\varPsi \rangle \) to be played by changing her own strategy, the equilibrium analysis is restricted to studying the local operations on state \(|000\rangle \). That, in turn, coincides with the problem of finding Nash equilibria in game (27), and \( \bigotimes ^{6}_{k=4}U^{(k)}_{1}\) is just the counterpart of the profile of the players’ second strategies that forms the unique equilibrium in (27). However, in contrast to (27), the quantum game has another equilibrium given by profile

$$\begin{aligned} \bigotimes ^3_{k=1}P^{(k)}_{1} \otimes \bigotimes ^6_{k=4}U^{(k)}_{1}. \end{aligned}$$
(39)

Indeed, player 1 suffers a loss of at least 1/4 by unilateral deviation from strategy \(P^{(1)}_{1}\otimes U^{(4)}_{1}\) and the same occurs in the case of players 2 and 3. Profile (39) is more profitable than (38) since it implies 11/4 for each player instead of 1. Thus, the players gain by making use of the joint strategy \(|\varPsi \rangle \), i.e., by playing \(\bigotimes ^3_{k=1}P^{(k)}_{1}\).

3.2 Normal representation of extensive games

Given an extensive-form game, one can construct a representation of that game in the strategic (normal) form. The resulting strategic game and the given extensive game have the same set of players and the same set of strategies for each player. The payoff functions are determined by the payoffs generated by the strategies in the extensive game. The normal representation appears to be a very convenient way to study the extensive game. In particular, while we lose the sequential structure, we obtain the sufficient and easier form of the game to find all the Nash equilibria.

In our earlier paper [16], we introduced a quantum scheme for playing an extensive game by using its normal representation. Based on the MW and EWL schemes, we assigned an action at each information set in an extensive game to a local operation on a particular qubit in the quantum game. As a result, a number of qubits on which each player was allowed to specify local operations were equal to the number of their information sets. In what follows, we extend our idea to the refinement of the MW scheme. This means that in addition to multiple choice of \({\mathbb {1}}\) and \(\sigma _{x}\), the players specify the state on which they perform the local operators.

Let us modify (22) to cover the normal-form game determined by an extensive game with the set of players \(\{1,2,\dots , k\}\) and \(n\) information sets, \(n \ge k\). The positive operator is now defined on \((\mathbb {C}^2)^{\otimes k} \otimes (\mathbb {C}^{2})^{\otimes n}\) by formula

$$\begin{aligned} H = \left( {\mathbb {1}}^{\otimes k} - (|1\rangle \langle 1|)^{\otimes k} \right) \otimes (|0\rangle \langle 0|)^{\otimes n} + (|1\rangle \langle 1|)^{\otimes k}\otimes |\varPsi \rangle \langle \varPsi |, \end{aligned}$$
(40)

where \(|\varPsi \rangle \in (\mathbb {C}^{2})^{\otimes n}\) and \(\Vert |\varPsi \rangle \Vert = 1.\) Let \(\xi :\{k+1,k+2,\dots , k+n\} \rightarrow \{1,2, \dots , k\}\) be a surjective map. We define player \(i\)’s set of strategies as follows

(41)

where \(P^{(i)}_{j_{i}}\) and \(U^{(y)}_{j_{y}}\) are defined by (3). As a possible application of (40)–(41), let us consider the following example:

Example 2

(Four-stage centipede game) A centipede game is a 2-person extensive game in which the players move one after another for finitely many rounds. In some sense, it can be treated as an extensive counterpart of the Prisoner’s Dilemma. While both players are able to obtain a high payoff, their rationality leads them to one of the worst outcomes. An example of a four-stage centipede game is shown in Fig. 1. Each player has two information sets (in this case, they are represented by the nodes of the game tree) with two available actions at each of them. Each player can stop the game (action S) or continue the game (action C), giving the opportunity to the other player to make her choice. One way to learn how the game may end is by backward induction. If player 2 is to choose at her second information set, she certainly plays action \(S\) since she obtains 5 instead of 4—the result of playing action \(C\). Since players’ rationality is common knowledge, player 1 knows that by playing \(C\) at her second information, she ends up with payoff 3. Thus, player 1 chooses \(S\) that yields 4. Similar analysis shows that the players choose action \(S\) at their first information sets. Consequently, the backward induction predicts outcome \((2,0)\). As we focus on normal-form games, we construct the normal representation associated with the game in Fig. 1. Let us first determine the players’ strategies. We recall that a player’s strategy in an extensive game is a function that assigns an action to each information set of that player. Thus, each player has four strategies in the case of a four-stage centipede game. They can be written in the form \(SS, SC, CS\), and \(CC\), where, for example, \(CS\) means that a player chooses \(C\) at her first information set and \(S\) at the second one. Once the strategies are specified, we determine the payoffs that correspond to all possible strategy profiles. For example, \((SC, CC)\) determines outcome \((2,0)\) since player 1’s strategy \(SC\) specifies action \(S\) at her first information set. On the other hand, profile \((CC,CS)\) corresponds to payoff \((3,5)\) as player 1 always plays \(C\) and player 2 chooses \(S\) at her second information set. The players’ strategies together with the payoffs corresponding to the strategy profiles define the following normal representation

(42)

By using bimatrix (42), we can learn that rational players always choose action \(S\) at their first information sets. More formally, there are four pure Nash equilibria: \((SS,SS), (SS, SC), (SC, SS)\), and \((SC, SC)\), each resulting in outcome \((2,0)\).

Fig. 1
figure 1

Extensive-form representation of a four-stage centipede game (left) and the corresponding payoff polytope (right)

Let us consider the four-stage centipede game in terms of (40)–(41). We have \(k=2\) and \(n=4\). Thus, operator (40) comes down to

$$\begin{aligned} H = \left( {\mathbb {1}}\otimes {\mathbb {1}} - |11\rangle \langle 11|\right) \otimes |0000\rangle \langle 0000| + |11\rangle \langle 11| \otimes |\varPsi \rangle \langle \varPsi |. \end{aligned}$$
(43)

Let us assume that player 1 (player 2) performs her local operations on third and fifth (fourth and sixth) qubit, i.e., we define a map \(\xi :\{3,4,5,6\} \rightarrow \{1,2\}\) by setting \(\xi (\{3,5\}) = \{1\}\) and \(\xi (\{4,6\}) = \{2\}\). According to (41), player 1 and player 2’s strategies take the form, respectively,

$$\begin{aligned} P^{(1)}_{j_{1}} \otimes U^{(3)}_{j_{3}}\otimes U^{(5)}_{j_{5}}~~\text{ and }~~P^{(2)}_{j_{2}} \otimes U^{(4)}_{j_{4}}\otimes U^{(6)}_{j_{6}}~~\text{ for }~~j_{k}\in \{0,1\}. \end{aligned}$$
(44)

In order to generalize game (42), we specify payoff operators (24) as follows

Setting \(|\varPsi \rangle = (|1010\rangle + |1011\rangle )/\sqrt{2}\) and determining

$$\begin{aligned} \mathrm{tr}{\left[ \left( P^{(1)}_{j_{1}}\otimes P^{(2)}_{j_{2}}\otimes \bigotimes ^{6}_{k=3}U^{(k)}_{j_{k}}\right) H\left( P^{(1)}_{j_{1}} \otimes P^{(2)}_{j_{2}}\otimes \bigotimes ^{6}_{k=3}U^{(k)}_{j_{k}}\right) M_{i}\right] } \end{aligned}$$
(45)

for player \(i\in \{1,2\}\) and \(j_{1},\dots , j_{6} \in \{0,1\}\), we obtain the following normal-form game:

(46)

where \(A_{j_{1}j_{3}j_{5}} = P^{(1)}_{j_{1}}\otimes U^{(3)}_{j_{3}} \otimes U^{(5)}_{j_{5}}\) and \(B_{j_{2}j_{4}j_{6}} = P^{(2)}_{j_{2}}\otimes U^{(4)}_{j_{4}} \otimes U^{(6)}_{j_{6}}\). The game given by (46) extends (42) to local operations on \(|\varPsi \rangle \langle \varPsi |\). If players 1 and 2 restrict their strategies, for example, to \(P^{(1)}_{0} \otimes U^{(3)}_{j_{3}}\otimes U^{(5)}_{j_{5}}\) and \(P^{(2)}_{0} \otimes U^{(4)}_{j_{4}}\otimes U^{(6)}_{j_{6}}\), \(j_{3}, \dots , j_{6} \in \{0,1\}\), bimatrix (46) boils down to (42) (with the unique equilibrium outcome \((2,0)\)). In general, game (46) has another Nash equilibrium

$$\begin{aligned} (A_{100}, B_{110}) = P^{(1)}_{1}\otimes P^{(2)}_{1}\otimes U^{(3)}_{0}\otimes U^{(4)}_{1}\otimes U^{(5)}_{0} \otimes U^{(6)}_{0} \end{aligned}$$
(47)

that is not available in the classical game. Moreover, profile (47) implies pair of payoffs \((9/2,9/2)\), that is the best possible symmetric outcome in (42) (see, the payoff polytope in Fig. 1).

The main advantage of model (40)–(41) or equivalently (22)–(24) is that a classical normal-form game and its quantum counterpart have similar complexity. In particular, given any 2-person finite extensive game with \(k\) strategies for each player, the normal-form game implied by scheme (40)–(41) is just a bimatrix \(2k \times 2k\) game. As a result, there is no significant difference in the problem of determining Nash equilibria in both games.

Fig. 2
figure 2

\(N\)-stage centipede game

Example 3

(N-stage centipede game) Let us consider a centipede game where this time the number of stages is any even integer \(n\) for \(n\ge 2\). The extensive form for this game is given in Fig. 2. Similar to the four-stage centipede game, the \(n\)-stage case has also the unique equilibrium outcome \((2,0)\). Rational players choose action \(S\) at their own information sets even though the game enables the players to obtain the payoffs approximate to the number of stages. We have learned from the preceding example that there is a unique, symmetric, and pareto-optimal Nash equilibrium if (42) is extended to (46). It turns out that the result is valid in the general case. That is, there is a Nash equilibrium that implies the payoff \(n + 1/2\) for both players (pair of payoffs \((n + 1/2, n+ 1/2)\) is indeed a pareto-optimal outcome since it is the midpoint of the segment whose endpoints are \((n-1,n+1)\) and \((n+2,n)\)). In order to prove the existence of that equilibrium, let us generalize (43) and (44) to an arbitrary n-stage centipede game. Since there are two players and \(n\) information sets in the game, the positive operator \(H\) and the players’ strategies are given by (40) and (41) for \(k=2\). We assume that players 1 and 2 perform their local operators on qubits with odd and even indices, respectively. Thus, the map \(\xi :\{3,4,\dots , n+2\} \rightarrow \{1,2\}\) is given by formula

$$\begin{aligned} \xi (x) = {\left\{ \begin{array}{ll}1 &{}\hbox { if }x\hbox { is odd}\\ 2 &{}\hbox { if } x\hbox { is even}.\end{array}\right. } \end{aligned}$$
(48)

The appropriately generalized payoff operators take the form

(49)

Let us consider the state \(|\varPsi \rangle \in (\mathbb {C}^{2})^{\otimes n}\),

$$\begin{aligned} |\varPsi \rangle = \frac{|1010\dots 1010\rangle + |1010 \dots 1011\rangle }{\sqrt{2}} \end{aligned}$$
(50)

and a strategy profile \(U^* \otimes V^*\) such that

(51)

First note that strategy profile \(U^*\otimes V^*\),

$$\begin{aligned} U^*\otimes V^* = |11\rangle \langle 11| \otimes {\mathbb {1}}^{(3)}\otimes \sigma ^{(4)}_{x} \otimes {\mathbb {1}}^{(5)} \otimes \sigma ^{(6)}_{x} \otimes \dots \otimes {\mathbb {1}}^{(n-1)} \otimes \sigma ^{(n)}_{x} \otimes {\mathbb {1}}^{(n+1)} \otimes {\mathbb {1}}^{(n+2)} \end{aligned}$$
(52)

implies the payoffs

$$\begin{aligned} \mathrm{tr}{\left[ \left( U^*\otimes V^*\right) H\left( U^*\otimes V^*\right) M_{i}\right] } = n + \frac{1}{2}~~\text{ for }~~i = 1,2. \end{aligned}$$
(53)

Let \(U = P^{(1)}_{j_{1}}\otimes \bigotimes _{y\in \xi ^{-1}(1)}U^{(y)}_{j_{y}}\) be an arbitrary player 1’s strategy. If \(j_{1} = 0\), then

$$\begin{aligned}&\!\!\!\left( U\otimes V^*\right) H\left( U\otimes V^*\right) \\&= |01\rangle \langle 01|\otimes \left( U^{(3)}_{j_{3}} \otimes \dots \otimes U^{(n+1)}_{j_{n+1}}\right) |01\dots 0100\rangle \langle 01\dots 0100|\\&\quad \times \left( U^{(3)}_{j_{3}} \otimes \dots \otimes U^{(n+1)}_{j_{n+1}}\right) . \end{aligned}$$

Since player 1 cannot affect the system of \((n+2)\) th qubit, we have

(54)

In the case of \(j_{1} = 1\),

$$\begin{aligned}&\left( U\otimes V^*\right) H\left( U\otimes V^*\right) \nonumber \\&\quad = |11\rangle \langle 11|\otimes \left( U^{(3)}_{j_{3}} \otimes \dots \otimes U^{(n+1)}_{j_{n+1}}\right) |\varphi \rangle \langle \varphi |\left( U^{(3)}_{j_{3}} \otimes \dots \otimes U^{(n+1)}_{j_{n+1}}\right) , \end{aligned}$$
(55)

where \(|\varphi \rangle = (1/\sqrt{2})(|11\dots 10\rangle + |11\dots 1\rangle ).\) From (53), we know that player 1 gets \(n + 1/2\) if \(U = U^*\). Thus, the form of (49) implies that \(U \ne U^*\) would increase the player 1’s payoff only if \(U\) made the magnitude of the amplitude of \(|11\dots 1\rangle \) higher than \(1/\sqrt{2}\). However, it is not possible because of the form of \(U\). As a result, we have proved that \(U^*\) is a player 1’s best response to \(V^*\) over all her pure strategies. Using a similar argument to one concerning the equivalence of (10) and (12), we conclude that \(U^*\) is a player 1’s best response to \(V^*\) over all her (pure and mixed) strategies. In similar way, we can show that player 2’s strategy \(V^*\) is a best response to \(U^*\).

4 Conclusions

The aim of our research was to formalize our idea about the MW-type schemes. As a result, we have showed that the players’ strategies do not have to be unitary operators or even superoperators in the quantum game. Apart from unitary operators, they may include projectors that determine the state on which the unitary operations are performed. Thus, the initial state does not have to be a density operator. Certainly, the scheme is in accordance with the laws of quantum mechanics. The resulting state is given by a density operator, and therefore, the payoff measurement is well defined. A positive point of the scheme is the way it can be considered. Given a bimatrix game, the scheme outputs a bimatrix game. Consequently, it implies similar complexity in finding optimal strategies for the players. In addition, our model enables us to consider extensive games via the normal representation. Moreover, the example of the general centipede game has proved that the analysis does not have to be limited to simple games. We suppose that this argument may attract the attention of researchers to the refinement of the MW scheme.