A quantum approach to twice-repeated 2×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times 2$$\end{document} game

This work proposes an extension of the well-known Eisert–Wilkens–Lewenstein scheme for playing a twice repeated 2×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times 2$$\end{document} game using a single quantum system with ten maximally entangled qubits. The proposed scheme is then applied to the Prisoner’s Dilemma game. Rational strategy profiles are examined in the presence of limited awareness of the players. In particular, the paper considers two cases of a classical player against a quantum player game: the first case when the classical player does not know that his opponent is a quantum one and the second case, when the classical player is aware of it. To this end, the notion of unawareness was used, and the extended Nash equilibria were determined.


Introduction
In the recent years, the field of quantum computing has developed significantly. One of its related aspects is quantum game theory that merges together ideas from quantum information [1] and game theory [2] to open up new opportunities for finding optimal strategies for many games. The concept of quantum strategy was first mentioned in [3], where a simple extensive game called PQ Penny Flip was introduced. The paper showed that one player could always win if he was allowed to use quantum strategies against the other player restricted to the classical ones. Next, Eisert Wilkens and Lewenstein proposed a quantum scheme for Prisoners Dilemma game based on entanglement [4]. Their solution leads to a Nash equilibrium that a Pareto-optimal payoff point.
Since then, many other examples of quantum games were proposed. Good overview of quantum game theory can be found in [5]. One of the latest trends is to study quantum repeated games [6,7]. In particular, quantum repeated Prisoner's Dilemma [8,9] was investigated. In [8], the idea was to classically repeat the Prisoner's Dilemma with strategy sets extended to include some special unitary strategies. That enabled one to study conditional strategies similar to ones defined in the classical repeated Prisoner's Dilemma, for example, the "tit for tat" or Pavlov strategies.
We present a different approach taking advantage of the fact that a repeated game is a particular case of an extensive-form game. A twice repeated 2 × 2 game is an extensive game with five information sets for each of the two players. Instead of using a classically repeated scheme based on two entangled qubits [8], we consider a twice repeated game as a single quantum system which requires ten maximally entangled qubits. Our scheme uses the quantum framework introduced in [10] and recently generalized in [11], according to which choosing an action in an information set is identified with acting a unitary operation on a qubit.
In this paper, we examine one of the most interesting cases in quantum game theory-the problem in which one of the players has access to the full range of unitary strategies whereas the other player can only choose from unitary operators that correspond to the classical strategies. Additionally, we examine the quantum game in terms of players' limited awareness about available strategies. We use the concept of games with unawareness [12][13][14] to check to what extend two different factors: access to quantum strategies and game perception affect the result of the game.

Preliminaries
In what follows, we give a brief review of the basic concepts of games with unawareness. The reader who is not familiar with this topic is encouraged to see [12]. Introductory examples and application of the notion of games with unawareness to quantum games can be found in [15,16].

Strategic game with unawareness
A strategic form game with unawareness is defined as a family of strategic form games. The family specifies how each player perceives the game, how she perceives the other players' perceptions of the game and so on. To be more precise, let G = (N , (S i ) i∈N , (u i ) i∈N ) be a strategic form game. This is the game played by the players which is also called the modeler's game. Each player may have a restricted view of the game, i.e., she may not be aware of the full description of G. Hence, That is, the player v views the set of players, the sets of players' strategies, and the payoff functions as N v , (S i ) v and (u i ) v , respectively. In general, each player also considers how each of the other players views the game. Formally, with a finite sequence of play- This is the game that player i 1 considers that player i 2 considers that …player i n is considering. A sequence v is called a view. The empty sequence v = ∅ is assumed to be the modeler's view, i.e., G ∅ = G. We denote an action profile is a collection of finite sequences of players is called a strategic-form game with unawareness and the collection of views V is called its set of relevant views if the following properties are satisfied: (1)

Extended Nash equilibrium
A basic solution concept for predicting players' behavior is a Nash equilibrium [17].
Definition 2 A strategy profile s * = (s 1 , s 2 , . . . , s n ) is a Nash equilibrium if for each player i ∈ {1, . . . , n} and each strategy s i of player i where s * −i :=(s j ) j =i . In order to define the Nash-type equilibrium for a strategic-form game with unawareness, it is needed to redefine the notion of strategy profile.

Definition 3 Let
{G v } v∈V be a strategic-form game with unawareness. An extended strategy profile (ESP) in this game is a collection of (pure or mixed) strategy profiles To illustrate (6), let us take the game G 12 -the game that player 1 thinks that player 2 is considering. If player 1 assumes that player 2 plays strategy (σ 2 ) 12 in the game G 12 , she must assume the same strategy in the game G 1 that she considers, i.e., (σ 2 ) 1 = (σ 2 ) 12 .
Next step is to extend rationalizability from strategic-form games to the games with unawareness.
Consider a strategic-form game with unawareness {G v } v∈V . For every relevant view v ∈ V, the relevant views as seen from v are defined to be V v = {ṽ ∈ V : vˆṽ ∈ V}. Then, the game with unawareness as seen from v is defined by {G vˆṽ }ṽ ∈V v . We are now in a position to define the Nash equilibrium in the strategic-form games with unawareness.
The first part of the definition (rationalizability) is similar to the standard Nash equilibrium, where it is required that each strategy in the equilibrium is a best reply to the other strategies of that profile. For example, according to Definition 4, player 2's strategy (σ 2 ) 1 in the game of player 1 has to be a best reply to player 1's strategy (σ 1 ) 12 in the game G 12 . On the other hand, in contrast to the concept of Nash equilibrium, (σ 1 ) 12 does not have to a best reply to (σ 2 ) 1 but to strategy (σ 2 ) 121 .
The following proposition shows that the notion of extended Nash equilibrium coincides with the standard one for strategic-form games when all views share the same perception of the game. Proposition 1 Let G be a strategic-form game and {G v } v∈V a strategic-form game with unawareness such that for some v ∈ V, we have G vˆv = G for everyv such that vˆv ∈ V. Let σ be a strategy profile in G. Then,

σ is a Nash equilibrium for G if and only if (σ ) v = σ is part of on an ENE for
{G v } v∈V , and this ENE also satisfies (σ ) v = (σ ) vˆv .

Remark 1
We see from (3) and (6) Then, we get For this reason, we often restrict ourselves to N ∪ {∅} throughout the paper. The concept of a finitely repeated game assumes playing a normal-form game (a stage of the repeated game) for a fixed number of times (see, for example, [18]). The players are informed about the results of consecutive stages. Let us consider a 2 × 2 bimatrix game In the two-stage 2 × 2 bimatrix, the game can be easily depicted as an extensive-form game (see Fig. 1). The first stage of the twice repeated 2 × 2 game is a part of the game where the players specify an action C or D at the information sets 1.1 and 2.1.
When the players choose their actions, the result of the first stage is announced. Since they have knowledge about the results of the first stage, they can choose different actions at the second stage depending on the previous result. Hence, the next four game trees from Fig. 1 are required to describe the repeated game. Each player has five information sets at which they specify their own actions; player 1's information sets are denoted by 1.1, 1.2, 1.3, 1.4 and 1.5, player 2's information sets are 2.1, 2.2, 2.3, 2.4 and 2.5. Note that player 2's information sets consist of two nodes connected by dotted lines. This is intended to show a lack of knowledge of the player 2 about the previous move of player 1. Recall that a player's strategy is a function that assigns to each information set of that player an action available at that information set. In our example, this means that each player's strategy specifies an action at the first stage and four actions at the second stage. For example, strategy (C, C, D, D, C) of a player in the game given in Fig. 1 says that the player chooses action C at the first stage, and depending on one of the four possible results of the first stage, he chooses actions C, D, D, C, respectively. If player 1 plays that strategy whereas player 2 chooses, for example (D, C, D, C, C), then the resulting strategy vector determines the unique path from the node 1.1 that intersects the nodes 2.1, 1.2 and 2.3 and gives the payoff outcome (a 01 +a 10 , b 01 +b 10 ).
The players can also choose their own actions in a random way, i.e., according to some probability distribution determined by themselves. Such strategies are called behavioral strategies (see, for example, [2]).

Definition 6
A behavior strategy of a player in an extensive-form game is a function mapping each of his information sets to a probability distribution over the set of possible actions at that information set.
For example, in the case of the game given by Fig. 1, player 1's and player 2's behavioral strategies are determined by quintuples ( p 1 , p 2 , p 3 , p 4 , p 5 ) and (q 1 , q 2 , q 3 , q 4 , q 5 ), respectively, in which p i and q i are the probabilities of choosing their first strategy at information set i. The payoff outcome resulting from playing by the players the general behavioral strategies is

Construction of a twice repeated 2 × 2 quantum game
We propose a scheme of playing a twice repeated 2 × 2 game. It is based on the protocol introduced in [10], where a quantum approach to general finite extensive quantum games was considered. A two-stage 2 ×2 game is an example of an extensive game with ten information sets. According to the idea presented in [9], we associate choosing an action at an information set with a unitary operation performed on a qubit. As a result, each player specifies a unitary action on each of five qubits. To be more specific, let us consider a 2 × 2 bimatrix game (8). We define a triple where -H is a Hilbert space C 2 ⊗10 .
-SU (2) is the special unitary group of degree 2. The commonly used parameterization for U ∈ SU(2) is given by -| f is the final state determined by a strategy 5 of player 1 and a strategy 10 j=6 U j (θ j , α j , β j ) ∈ SU(2) ⊗5 of player 2 according to the following formula: -the payoff vector function (u 1 , u 2 ) is given by where The construction (14) of the operator X results from the following reasoning. First note that the information sets 1.1, …, 1.5 of player 1 are associated with the first five qubits, and the information sets 2.1, …, 2.5 of player 2 are associated with the other five qubits. Now, consider, for example, the outcome (2a 00 , 2b 00 ). In the classical case that payoff outcome is obtained if the players choose their first strategies at the information sets 1.1, 2.1, 1.2 and 2.2. These information sets are assigned to the first, sixth, second and seventh qubit, respectively. Therefore, the state 0 measured on those qubits results in the outcome (2a 00 , 2b 00 ) in the quantum game. In similar way, we can justify the other terms of (14). The scheme defined by (10)-(14) is an extension of the classical way of playing the game. As in the case of the standard Eisert-Wilkens-Lewenstein scheme, the model Q Q determines the game equivalent to the classical one by restricting the strategy sets of the players.

Proposition 2 The game determined by
is outcome-equivalent to the two-stage bimatrix 2 × 2 game.

Twice-repeated quantum Prisoner's Dilemma with unawareness
The Prisoner's Dilemma is one of the most interesting problems in game theory. It shows how the individual rationality of the players can lead them to an inefficient result. Let us consider a general form of the Prisoner's Dilemma where T > R > P > S. The payoff profile (R, R) of (24) is more beneficial to both players than (P, P). However, each player obtains a higher payoff by choosing D instead of C (in other words, the strategy C is strictly dominated by D). As a result, the rational strategy profile is (D, D), and it implies the payoff P for each player. A similar scenario occurs in a case of finitely repeated Prisoner's Dilemma game. By induction, it can be shown that playing the action D at each stage of finitely repeated Prisoner's Dilemma constitutes the unique Nash equilibrium.
We assume that the modeler's game G ∅ (the game that is actually played by the players) is defined by (10). Player 1 being aware of all the unitary strategies also views the quantum game, i.e., G 1 = Q Q . Next, we assume that player 2 perceives the game to be the classical one. In other words, player 2 views the game of the form We then assume that player 1 finds that player 2 is considering CC , and higher-order views v ∈ {21, 121, 212, . . . } are associated with CC . We thus obtain a game with unawareness { v } v∈V 0 defined as follows: In what follows, we determine the players' rational strategies by applying the notion of extended Nash equilibrium. First, we need to formulate the lemma that specifies player 1's best reply to the Nash equilibrium strategy of the classical twice repeated Prisoner's dilemma. Recall that the action D corresponds to iσ x in the quantum scheme (10). This implies that (iσ x ) ⊗5 is a counterpart of the unique Nash equilibrium (D, D, D, D, D) in the classical game. The following result is a part of the extended Nash equilibrium.

Remark 2
It is worth noting that the strategy (26) turns out to be a nontrivial extension of the quantum player's best reply to strategy iσ x in the one-stage Prisoner's Dilemma.
Recall that according to [4,19], the Eisert-Wilkens-Lewenstein approach to game (24) is defined by the final state and the measurement operator In case . Thus, the set of player 1's best replies to iσ x is We know from classical game theory that the unique Nash equilibrium in the twice repeated Prisoner's Dilemma is (D, D, D, D, D). In terms of the EWL scheme that profile can be written as (iσ x ) ⊗5 . Therefore, In order to prove that (σ ) 1 = (σ 1 , σ 2 ) 1 = τ * , (iσ x ) ⊗5 , we first note from the definition of extended strategy profile that According to Definition 4, player 1's strategy (σ 1 ) 1 has to be a best reply to (σ 2 ) 1 = (iσ x ) ⊗5 in the game 1 = Q Q . Since player 1 has access to all the unitary actions, by Lemma 1, his best reply to (iσ x ) ⊗5 is (σ 1 ) 1 = τ * given by (26). Finally, (6) implies that

Higher-order unawareness
In the previous section, we considered a typical case in which one of the players is aware of quantum strategies, whereas the other player views the classical game. Then, we showed that the quantum player obtains the best possible payoff resulting from playing an extended Nash equilibrium. An interesting question that arises here is whether the strategic position of the classical player can be improved by increasing his awareness about the game. Let us consider the case that player 1 views the quantum game. In addition, player 2 is aware of using quantum strategies by player 1, ( 2 = QC ) and he knows that player 1 views the quantum strategies ( 21 = QC ). The formal way of describing the problem is twofold. Player 1 can perceive the game with quantum strategies for both players ( 1 = Q Q ), or he may think that he is the only one who has access to all the unitary strategies ( 1 = QC ). As long as player 1 finds that player 2 is considering the classical game CC (i.e., 12 = 12 = CC ), both ways describe the same problem. Formally, the case in which the classical player is aware of using the quantum strategies by player 1 is given by collections of games In order to find out the reasonable outcome of (37), we need to determine player 2's best reply to τ * .

Lemma 2 Player 2's best reply to
Proof Since player 2's payoff function is linear in each pure strategy of {1, iσ x } ⊗5 when player 1's strategy is fixed, any mixed best reply cannot lead to a higher payoff. It is therefore sufficient to compare the expected payoffs of player 2 that correspond to strategy profiles from τ * ⊗{1, iσ x } ⊗5 . We obtain the following four different outcomes Strategy profile Player 2's payoff From the fact that T > R > P > S, we see that player 2's best reply is given by (38) for every θ 5 ∈ [0, π/2].
Lemma 2 enables us to determine all the extended Nash equilibria in { v } defined by (37).

Summary and conclusions
In this paper, we proposed a new scheme for a twice repeated quantum game based on the fact that it is a particular case of an extensive form game. We analyzed the scheme for a twice repeated Prisoner's Dilemma game, with focus on the situation where players have different perception of the game described by the formalism of the games with unawareness [12].
In particular, we determined the extended Nash equilibrium for the case where one player has access to full range of quantum strategies, while the other perceives the game as a classical one. We found best replies of the quantum player to the classical equilibrium strategy. This result is an extension of the corresponding one-stage version of the game, and it similarly allows quantum player to get the best possible outcome.
We also discussed high-order unawareness, where we slightly increase game perception of the classical player, so that he knows that his opponent is actually a quantum player, while the quantum player is not aware of that knowledge of the classical player. We show that this situation improves the strategic position of the classical player. As a result of playing the extended Nash equilibrium, the difference between the classical and quantum player's payoffs is always nonnegative and strictly positive as long as the parameter θ 5 = 0 in the player 1's equilibrium action τ * . Therefore, the average payoff of the classical player is grater that the payoff of the quantum player.
Our results showed that the proposed scheme is a nontrivial generalization of the well-known EWL scheme. It can be easily extended to any repeated 2 × 2 quantum game. Additionally, in the future it should be possible to implement our scheme on already existing quantum hardware: IBM-Q or Rigetti computing. The research based on the proposed scheme is also promising in the incoming era of quantum internet as indicated by appearing quantum network simulators such as Simulaqron, which can be used to simulate two players playing over quantum net.
Funding The research presented in this paper has been partially supported by the funds of Polish Ministry of Science and Higher Education assigned to AGH University of Science and Technology in Kraków, and Pomeranian University in Slupsk.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.