Keywords

1 Introduction

1.1 Background - The Need of BBB TBC

Together with the development of authenticated encryption (AE) in CAESAR competition [1] and the on-going lightweight cryptography competition [64], tweakable block ciphers (TBC) are playing a more and more important role. Besides the plaintext, TBCs take a tweak as an additional input, which can be viewed as an index to the underlying block cipher, so it becomes a family of (independent) block ciphers v.s. a single instance of block cipher. Its formalization is motivated by the needs of (more than one) independent block ciphers in some modes, e.g., OCB [67], while using multiple independent ciphers or keys could cause efficiency issues. In contrast, using a TBC that typically lends itself to very efficient (both software and hardware) implementations, a new instance of block cipher could be obtained by simply choosing a new value of the tweak.

Beyond-Birthday-Bound Security. Most of the current (tweakable) block cipher standards have a block length of 128 bits or less, providing a security level at most 64 bits when instantiated in designs offering only birthday-bound security. Such a security level has become largely inadequate [35]. Even worse, in order to save hardware implementation costs, many lightweight block cipher designs tend to have a smaller block length like 64 bits, providing a birthday security of 32 bits only. Hence, the needs of modes providing BBB security are emerging, and the same has been observed by Gueron and Lindell [35] and in this whitepaper [2].

There are two different ways to construct TBCs. Following the modular approach, they can be built from classical block ciphers via various modular constructions, and security is ensured by a reduction to that of the underlying block ciphers. Alternatively, one could appeal to (probably more efficient) dedicated algorithms, the security guarantees of which come from comprehensive cryptanalysis. Below we’ll review both methods.

1.2 Modular Approach: TBCs from Block Ciphers

A classical popular approach is to construct TBCs from existing (traditional) block ciphers in a black-box fashion. Such proposals are further divided into two classes. The “old school” approach, initiated by Liskov et al.  [54], works in the so-called standard model, models the underlying block cipher as a pseudorandom permutation. The “new school” approach recently popularized by Mennink [56] models the block cipher as an ideal cipher. The two approaches deviate not only in their security assumptions, but also in their design philosophies. Concretely, standard assumption-based constructions typically tried to avoid tweak-dependent rekeying, which were deemed as (arguably) costly. Another shortage of rekeying is the unavoidable “hybrid security loss” in their security bounds [58, 69] (some withstand this loss using carefully-chosen parameters [17, 61]). Such a loss doesn’t appear in the ideal cipher model, and this is leveraged by many constructions for good bounds and efficiency at the same time. Indeed, ideal cipher-based TBCs have achieved \(\ge n\)-bit security within 1 or 2 cipher-calls [43, 53, 77].

In this paper we follow the standard model. In this respect, the original Liskov et al.’s paper [54] proposed two constructions that were subsequently named \(\mathsf {LRW1}\) and \(\mathsf {LRW2}\) by Landecker et al.  [51]. The former is based on a block cipher E with key space \(\mathcal {K}_E\) and message space \(\{0,1\}^n\), and is defined as

$$\begin{aligned} \mathsf {LRW1} ((K,K'),T,X)=E_{K'}\big (T\oplus E_K(X)\big ). \end{aligned}$$
(1)

where \((K,K')\in \mathcal {K}_E\times \mathcal {K}_E\) is the key, \(T\in \mathcal {T}\) is the tweak, and \(X\in \{0,1\}^n\) is the message. Unfortunately it is only CPA secure up to a tight birthday bound, i.e., \(2^{n/2}\) adversarial queries. Actually, achieving CCA security was an important motivation for their second proposal \(\mathsf {LRW2}\), which is based on a block cipher E and message space \(\{0,1\}^n\) and an almost XOR-universal (AXU) family of hash functions \(\mathcal {H}=(H_K)_{K\in \mathcal {K}_H}\) from some set \(\mathcal {T}\) to \(\{0,1\}^n\), and defined as

$$\begin{aligned} \mathsf {LRW2} ((K,K'),T,X)=H_{K'}(T)\oplus E_K(H_{K'}(T)\oplus X), \end{aligned}$$
(2)

where \((K,K')\in \mathcal {K}_E\times \mathcal {K}_H\) is the key. This construction was proved CCA secure in [54] up to a tight birthday bound. To seek for beyond-birthday-bound (BBB) secure TBCs, pioneered by Landecker et al.  [51], subsequent works studied cascade of \(\mathsf {LRW2}\) (with independent underlying keys): its 2-cascade was first proved secure up to about \(2^{2n/3}\) queries [51] and latter improved to a tight bound of \(2^{3n/4}\) queries [44, 59], while its r-cascade for general r was proved secure up to roughly \(2^{\frac{rn}{r+2}}\) adversarial queries.

A somewhat independent series of works considered tweakable Even-Mansour (\(\mathsf {TEM}\)) ciphers that are built upon public random permutations [18, 20, 57], which could also be instantiated with fixed-key block ciphers. It is important to note their security is only provable in the ideal (permutation) model.

1.3 Development of Dedicated TBCs

The Tweakey framework was introduced in 2014 by Jean et al.  [41], which provides a general guideline for TBC designs. The core idea is to treat the key and tweak in the same way during the primitive design process so that the cryptanalysis can be unified, and becomes simpler than before. So the word “tweakey” is invented to reflect the combined input of tweak and key. Following tweakey framework, various dedicated algorithms such as the Deoxys-BC in the Deoxys AE design [42], SKINNY [7], and Kiasu [40] have been proposed. In detail, SKINNY takes lightweightness into account, and hence makes use of lightweight linear layer—0/1 matrices—almost MDS rather than MDS, although it still follows AES-like design strategy. Up to date, Deoxys is one of the finalists of the CAESAR competition and SKINNY is one of the lightest TBCs in terms of area in the optimized hardware implementations.

When the tweak length is long, TBC-based designs [3, 38] can take advantage of its efficiency to process additional input such as associated data. There is also a recent direction of designing TBCs of short tweaks to offer a small family of yet independent block ciphers [12], where tweaks are mainly used as domain separators in the design of authenticated encryption schemes.

It is well-known that, to hide the key of a block cipher, it requires several iterations of the simple round functions. Since Tweakey framework does not distinguish key and tweak, the tweak input has been iterated the same amount of rounds as well. We notice that, rather than hiding, the functionality of a tweak is no more than an index to the block cipher in most of use-cases, and are even assumed to be under attacker’s full control in some cryptanalytic settings. Hence, the required level of “protection” for a tweak is essentially lower than that for the key. Inspired by this observation, a natural question to be asked is: what is the minimum number of iterations (or tweak addition) required to produce a secure TBC (especially those with BBB security), with provable security.

1.4 Our Approach (Hybrid of Two Approaches), Provable Security of TBC Modes, and Instantiation with Long-Standing Modules (Similar with AES-PRF)

We seek for an approach slotting between the above two and (hopefully) enjoying the advantages of both, i.e., achieving (some level of) provable guarantees and high efficiency at the same time. Our result is a proposal of a new design of dedicated TBCs based on AES. Our approach is “prove-then-prune”, i.e., proving security and then instantiating with a scaled-down primitive (a reduced-round block cipher), that has been used in symmetric designs for a long time, see e.g., [60] (while the terminology was due to Hoang et al.  [37]). Below we elaborate in detail.

: A New TBC Construction with BBB Security. Our starting point is a new block cipher-based TBC construction with provable BBB security. Concretely, the idealized version of our mode is built upon three secret independent random permutations \(\pi _1,\pi _2\), and \(\pi _3\), and is defined as

$$\begin{aligned} \mathsf {TNT} ^{\pi _1,\pi _2,\pi _3}(T,X)=\pi _3\big (T\oplus \pi _2\big (T\oplus \pi _1(X)\big )\big ), \end{aligned}$$

as pictured in Fig. 1. We term our mode as \(\mathsf {TNT}\), meaning Tweak-aNd-Tweak. It can also be viewed as a cascaded \(\mathsf {LRW1}\) TBC construction (if we “split” \(\pi _2\) into two permutations, then the scheme turns into a cascade of two \(\mathsf {LRW1}\) constructions).

Fig. 1.
figure 1

The \(\mathsf {TNT} ^{\pi _1,\pi _2,\pi _3}\) mode with the notations (for the intermediate values) used in this paper.

While the original (two-permutation-based) \(\mathsf {LRW1}\) construction was proved CPA secure up to birthday \(2^{n/2}\) queries and it turns out to be tight, the security of \(\mathsf {TNT}\) (or cascaded \(\mathsf {LRW1}\)) remains as a long-standing open problem. In this paper, using the \(\chi ^2\) technique recently proposed by Dai et al.  [24], we prove the idealized \(\mathsf {TNT}\) construction is CCA secure up to BBB \(2^{2n/3}\) queries. To our knowledge, this constitutes the first “non-trivial” application of the \(\chi ^2\) technique to domain expanding constructions, and our proof thus demonstrates relevant issues and their solutions.

We refer to Table 1 for a summary of comparison to existing TBC constructions (we omit the \(\mathsf {TEM}\) ciphers as they either appear a bit theoretical or are specific for sponges [57]). It is rather difficult to make a comparison with the ideal cipher-based designs [43, 53, 56, 77]. In general, they achieve \({\ge } n\) bits security (as mentioned) at the expense of a smaller safety margin (similar concern has been raised in other settings [36]). Also, their provable bounds should be interpreted with a bit of caution [58]. In terms of efficiency, it is widely believed that tweak-dependent rekeying used in the above designs as well as [61] is a bit costly, particularly when AES-NI is available.

It appears that \(\mathsf {LRW2}\) and its cascades are the closest designs. In short, while \(\mathsf {LRW2}\) and \(\mathsf {CLRW2}\) accept long tweaks, their uses of AXU hash are expected to result in a lower efficiency when n-bit tweaks already suffice. The additional requirement of AXU hash usually results in lower software efficiency and/or higher gate counts as additional registers and operations are needed.

Table 1. Comparison with previous TBCs. The column \(\otimes \)/AXU states if the design relies on AXU hash or field multiplications \(\otimes \). The column tdk states if the design relies on tweak-dependent rekeying. For all the ideal cipher-based designs, we assume using an ideal cipher with n-bit keys and n-bit blocks.

Instantiation from AES. To take the advantage of the AES-NI for better software performance, it is natural for us to instantiate TNT with AES. To further improve the software performance, we reduce the number of rounds of each of the permutations \(\pi _1\), \(\pi _2\), and \(\pi _3\) to \(6 \), \(6 \), and \(6 \) rounds respectively (rather than the full AES itself), which are named TNT-AES. Although, it is not possible to assume the round-reduced AES to be ideal any more, we show, through comprehensive cryptanalysis, the security of TNT-AES are sound. Similar design strategy was introduced by Hoang et al.  [37] and used in the design of AES-PRF [60] by Mennink and Neves. The estimated performance shows, with help from AES-NI, TNT-AES is among the fastest TBCs in software, and in some cases it can be implemented as light as AES itself in area constrained hardware environment thanks to the simplicity of TNT, smaller than most of the existing TBCs.

Organization. The rest of the paper is organized as follows. Section 2 gives the preliminary necessary for the introduction of the new mode in Sect. 3. The security TNT is proven in Sect. 4. Section 5 proposes a concrete design following TNT based on AES, and finally Sect. 6 concludes the paper.

2 Preliminary

2.1 Notation

For a finite set \(\mathcal {X}\), \(X \xleftarrow {\$} \mathcal {X}\) denotes selecting an element from \(\mathcal {X}\) uniformly at random and \(|\mathcal {X}|\) denotes its cardinality.

2.2 TBC and Its Security

A tweakable permutation with tweak space \(\mathcal {T}\) and message space \(\mathcal {M}\) is a mapping \(\widetilde{\varPi }:\mathcal {T}\times \mathcal {M}\rightarrow \mathcal {M}\) such that for any tweak \(T\in \mathcal {T}\), \(X\mapsto \widetilde{\varPi } (T,X)\) is a permutation of \(\mathcal {M}\). We denote \(\textsf {TP}(\mathcal {T},n)\) the set of all tweakable permutations with tweak space \(\mathcal {T}\) and message space \(\{0,1\}^n\). A tweakable block cipher with key space \(\mathcal {K}\), tweak space \(\mathcal {T}\), and message space \(\mathcal {M}\) is a mapping \(\mathsf {TBC}:\mathcal {K}\times \mathcal {T}\times \mathcal {M}\rightarrow \mathcal {M}\) such that for any key \(K\in \mathcal {K}\), \((T,X)\mapsto \mathsf {TBC}(K,T,X)\) is a tweakable permutation in \(\textsf {TP}(\mathcal {T},n)\).

A secure TBC should be indistinguishable from a tweakable random permutation. As our mode \(\mathsf {TNT}\) is specified in an idealized manner, our security definition is also given for such cases. For this, we denote \(\textsf {P}(n)\) the set of all n-bit permutations. By default, we always allow \(\mathcal {D} \) to make forward and inverse queries to its tweakable permutation oracle (though we do not write this explicitly). With these, for the TBC construction \(C^{\pi _1,\ldots ,\pi _r}\) built upon r independent secret n-bit permutations, we define the advantage of any distinguisher \(\mathcal {D} \) breaking its strong tweakable pseudorandomness (STPRP) as

$$\begin{aligned} \mathbf {Adv}^{\mathrm {stprp}}_{C}(\mathcal {D}) =\Big |\Pr [\pi _1,\ldots ,\pi _r\xleftarrow {\$}\textsf {P}(n):\mathcal {D} ^{C^{\pi _1,\ldots ,\pi _r}}=1]-\Pr [\widetilde{\varPi } \xleftarrow {\$}\textsf {TP}(\mathcal {T},n):\mathcal {D} ^{\widetilde{\varPi }}=1]\Big |. \end{aligned}$$

And for any non-negative integer q, we define the insecurity of \(C^{\pi _1,\ldots ,\pi _r}\) as

$$\begin{aligned} \mathbf {Adv}^{\mathrm {stprp}}_{C}(q)=\text {max}_{\mathcal {D}}\mathbf {Adv}^{\mathrm {stprp}}_{C}(\mathcal {D}), \end{aligned}$$

where the maximum is taken over all distinguishers \(\mathcal {D} \) making exactly q queries to the oracle.

The above definition focuses on the information-theoretic setting. Later in Sect. 5 we will instantiate the multiple secret permutations \(\pi _1,\ldots ,\pi _r\) with multiple “independent” block ciphers \(E_1,\ldots ,E_r\) using the same secret key K (thus the key space does not increase with the number of permutations). Proving the indistinguishability of such two systems \((\pi _1,\ldots ,\pi _r)\) and \(((E_1)_K,\ldots ,(E_r)_K)\) seems out of reach of current techniques (note that existing works typically instantiated \(\pi _1,\ldots ,\pi _r\) with the same block cipher using r independent keys \(K_1,\ldots ,K_r\), which deviates from us). As such, our mode \(\mathsf {TNT}\) will be specified only in the idealized manner.

2.3 \(\chi ^2\) Method

For the proof, we will employ the \(\chi ^2\) method of Dai et al.  [24]. We recall this technique here. Below we mainly follow Dai et al. ’s notations (with some necessary supplementaries borrowed from Chen et al.  [13]). Concretely, consider two stateless systems \(\mathbf {S}_0\) and \(\mathbf {S}_1\) (e.g., \(\mathbf {S}_0\) and \(\mathbf {S}_1\) may be the tweakable random permutation \(\widetilde{\varPi } \) and the TNT construction \(\textsf {TNT} ^{\pi _1,\pi _2,\pi _3}\) respectively) and any computationally unbounded deterministic distinguisher \(\mathcal {D} \) that has query access to either of these systems. The distinguisher’s goal is to distinguish the two systems. It is well-known that, the distinguishing advantage \(\mathbf {Adv}_{\mathbf {S}_0,\mathbf {S}_1}(\mathcal {D})\) is bounded by the statistical distance \(\Vert \textsf {p}_{\mathbf {S}_0,\mathcal {D}}(\cdot )-\textsf {p}_{\mathbf {S}_1,\mathcal {D}}(\cdot )\Vert \), where \(\textsf {p}_{\mathbf {S}_0,\mathcal {D}}(\cdot )\) and \(\textsf {p}_{\mathbf {S}_1,\mathcal {D}}(\cdot )\) are the respective probability distributions of the answers obtained by \(\mathcal {D} \). The \(\chi ^2\) method concerns with bounding \(\Vert \textsf {p}_{\mathbf {S}_0,\mathcal {D}}(\cdot )-\textsf {p}_{\mathbf {S}_1,\mathcal {D}}(\cdot )\Vert \). To this end, if we denote the maximum amount of queries by q, we can define a transcript \(\mathcal {Q} =(\tau _1,\ldots ,\tau _q)\) with \(\tau _i=(T_i,X_i,Y_i)\), and let \(\mathcal {Q} _{\ell }=(\tau _1,\ldots ,\tau _{\ell })\) for every \(\ell \le q\). The distinguisher \(\mathcal {D} \) can make its queries adaptively, but as it makes them in a deterministic manner, the \(\ell \)-th query input is determined by the first \(\ell -1\) query-responses \(\mathcal {Q} _{\ell -1}\).

For system \(\mathbf {S}_b\) with \(b\in \{0,1\}\) and fixed tuple \(\mathcal {Q} _{\ell -1}\), we denote by \(\textsf {p}_{\mathbf {S}_b,\mathcal {D}}(\mathcal {Q} _{\ell -1})\) the probability that \(\mathcal {D} \) interacting with \(\mathbf {S}_b\) yields transcript \(\mathcal {Q} _{\ell -1}\) for its first \(\ell -1\) queries. If \(\textsf {p}_{\mathbf {S}_b,\mathcal {D}}(\mathcal {Q} _{\ell -1})>0\), then we denote by \(\textsf {p}_{\mathbf {S}_b,\mathcal {D}}(R_{\ell }\mid \mathcal {Q} _{\ell -1})\) the conditional probability that \(\mathcal {D} \) receives response \(R_{\ell }\) upon its \(\ell \)-th query, given transcript \(\mathcal {Q} _{\ell -1}\) of the first \(\ell -1\) queries (that deterministically fixes the \(\ell \)-th query). Define for any \(\ell \in \{1,\ldots ,q\}\) and any query-response tuple \(\mathcal {Q} _{\ell -1}\):

$$\begin{aligned} \chi ^2\big (\mathcal {Q} _{\ell -1}\big ) = \sum _{R_\ell } \frac{\big (\textsf {p}_{\mathbf {S}_1,\mathcal {D}}(R_\ell \mid \mathcal {Q} _{\ell -1})-\textsf {p}_{\mathbf {S}_0,\mathcal {D}}(R_\ell \mid \mathcal {Q} _{\ell -1})\big )^2}{\textsf {p}_{\mathbf {S}_0,\mathcal {D}}(R_\ell \mid \mathcal {Q} _{\ell -1})}, \end{aligned}$$
(3)

where the sum is taken over all \(R_\ell \) in the support of the distribution \(\textsf {p}_{\mathbf {S}_0,\mathcal {D}}(\cdot \mid \mathcal {Q} _{\ell -1})\). The \(\chi ^2\) method states the following:

Lemma 1

(\(\chi ^2\) method [24, Lemma 3]). Consider a fixed deterministic distinguisher \(\mathcal {D} \) and two systems \(\mathbf {S}_0,\mathbf {S}_1\). Suppose that for any \(\ell \in \{1,\ldots ,q\}\) and any query-response tuple \(\mathcal {Q} _\ell \), \(\textsf {p}_{\mathbf {S}_0,\mathcal {D}}(\mathcal {Q} _\ell )>0\) whenever \(\textsf {p}_{\mathbf {S}_1,\mathcal {D}}(\mathcal {Q} _\ell )>0\). Then:

$$\begin{aligned} \Vert \textsf {p}_{\mathbf {S}_0,\mathcal {D}}(\cdot )-\textsf {p}_{\mathbf {S}_1,\mathcal {D}}(\cdot )\Vert \le \bigg (\frac{1}{2}\sum _{\ell =1}^q\mathbf {E}\big [\chi ^2(\mathcal {Q} _{\ell -1})\big ]\bigg )^{1/2}, \end{aligned}$$
(4)

where the expectation is taken over \(\mathcal {Q} _{\ell -1}\) of the \(\ell -1\) first answers sampled according to interaction with \(\mathbf {S}_1\).

3 The Idealized \(\mathsf {TNT}\) Mode

In this section, we describe our mode \(\mathsf {TNT}\). As discussed in Sect. 2, we only give its idealized description, which is built upon secret random permutations rather than efficient block ciphers.

Concretely, \(\mathsf {TNT}\) is built upon three independent secret random permutations \(\pi _1,\pi _2\), and \(\pi _3\), and is formally defined as

$$\begin{aligned} \mathsf {TNT} ^{\pi _1,\pi _2,\pi _3}(T,X)=\pi _3\big (T\oplus \pi _2\big (T\oplus \pi _1(X)\big )\big ). \end{aligned}$$
(5)

4 Security Proof for TNT Mode

Theorem 1

When \(q\le 2^n/2\), it holds

$$\begin{aligned} \mathbf {Adv}_{\mathsf {TNT}}^{\mathrm {stprp}}(q)\le \frac{8q^{1.5}}{2^n}. \end{aligned}$$
(6)

Proof

In our proof, \(\mathbf {S}_0\) denotes the tweakable random permutation \(\widetilde{\varPi } \), while \(\mathbf {S}_1\) denotes the \(\textsf {TNT} ^{\pi _1,\pi _2,\pi _3}\) TBC. The condition stated in Lemma 1, i.e., \(\forall \mathcal {Q} _\ell \), \(\textsf {p}_{\mathbf {S}_0,\mathcal {D}}(\mathcal {Q} _\ell )>0\) whenever \(\textsf {p}_{\mathbf {S}_1,\mathcal {D}}(\mathcal {Q} _\ell )>0\), is clearly satisfied.

Given \(\mathcal {Q} _{\ell -1}\), let \(T_\ell \) be the tweak of the \(\ell \)-th query (note that it is determined by \(\mathcal {Q} _{\ell -1}\)). It is easy to see that, regardless of the direction of this query, it holds

$$\textsf {p}_{\widetilde{\varPi },\mathcal {D}}(R_\ell \mid \mathcal {Q} _{\ell -1})=\frac{1}{2^n-\mu _\ell },$$

where \(\mu _\ell \le \ell -1\) is the frequency of the tweak value \(T_\ell \) in \(\mathcal {Q} _{\ell -1}\), i.e.,

$$\mu _\ell =\Big |\big \{(X,Y):(T_\ell ,X,Y)\in \mathcal {Q} _{\ell -1}\big \}\Big |.$$

The real world probability \(\textsf {p}_{\textsf {TNT},\mathcal {D}}(R_\ell \mid \mathcal {Q} _{\ell -1})\) however depends on the concrete state of the \(\ell \)-th query and \(\mathcal {Q} _{\ell -1}\), for which we distinguish eight cases as follows.

Case 1: the \(\varvec{\ell }\)-th query is forward , and \({\varvec{X}_{\varvec{\ell }}}{\varvec{,}}\, {\varvec{Y}_{\varvec{\ell }}}\,{ \varvec{\in } }\,{{\varvec{\mathcal {Q}}}_{\varvec{\ell -1}}}\), i.e., \(\exists T',X',T^*,Y^*:(T',X',Y_\ell ),(T^*,X_\ell ,Y^*)\in \mathcal {Q} _{\ell -1}\). We write

$$\begin{aligned} \textsf {p}_{\mathsf {TNT},\mathcal {D}}(Y_\ell \mid \mathcal {Q} _{\ell -1}) =&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \\ =&\sum _{\mathbf {Inter}}\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ]\cdot \Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}], \end{aligned}$$

where the sum is taken over all the vectors of intermediate values

$$\mathbf {Inter} =\Big ((S_1,\ldots ,S_{\ell -1}),(U_1,\ldots ,U_{\ell -1}),(V_1,\ldots ,V_{\ell -1}),(W_1,\ldots ,W_{\ell -1})\Big )$$

that are possible to appear given \(\mathcal {Q} _{\ell -1}\).

Now, for a certain intermediate vector \(\mathbf {Inter} \), it can be seen that there are three possibilities, according to which we divide all intermediate vectors into three disjoint classes \(\mathcal {A}\), \(\mathcal {B}\), and \(\mathcal {C}\):

  • Class \(\mathcal {A}\): \(\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ]=1\);

    • i.e., the vector \(\mathbf {Inter}\) specifies \(S_\ell \) and \(W_\ell \) as the values corresponding to \(X_\ell \) and \(Y_\ell \), as well as a input-output relation on \(\pi _2\) (subsequently abbreviated as \(\pi _2\)-relation) \((U_i,V_i)\) such that \(T_\ell \oplus S_\ell =U_i\) and \(T_\ell \oplus W_\ell =V_i\).

  • Class \(\mathcal {B}\): \(\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ]=\frac{1}{N-\beta (\mathbf {Inter})}\), where \(\beta (\mathbf {Inter})\) is the number of distinct U values in \((U_1,\ldots ,U_{\ell -1})\);

    • i.e., the two corresponding values \(U_\ell =T_\ell \oplus S_\ell \) and \(V_\ell =T_\ell \oplus W_\ell \) (as before) are “free”, so that \(\Pr [\pi _2(U_\ell )=V_\ell \mid \mathbf {Inter} ]=\frac{1}{N-\beta (\mathbf {Inter})}\).

  • Class \(\mathcal {C}\): \(\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ]=0\).

    • i.e., the two corresponding values \(U_\ell =T_\ell \oplus S_\ell \) and \(V_\ell =T_\ell \oplus W_\ell \) (as before) are “contradictory” to \(\mathbf {Inter}\) : there exists a \(\pi _2\)-relation \((U_i,V_i)\) in \(\mathbf {Inter}\) such that

      • \(T_\ell \oplus S_\ell =U_i\) yet \(T_\ell \oplus W_\ell \ne V_i\); or

      • \(T_\ell \oplus S_\ell \ne U_i\) yet \(T_\ell \oplus W_\ell =V_i\).

By these, we have

$$\begin{aligned}&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \nonumber \\ =&\sum _{\mathbf {Inter} \in \mathcal {A}}\Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}] + \sum _{\mathbf {Inter} \in \mathcal {B}}\Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}]\cdot \frac{1}{N-\beta (\mathbf {Inter})}. \end{aligned}$$
(7)

With this, we derive upper and lower bounds as follows.

The Upper Bound: It’s easy to see \(\beta (\mathbf {Inter})\le \ell -1\). By this and Eq. (7), it holds

$$\begin{aligned}&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \nonumber \\ \le&\Pr [\mathbf {Inter} \in \mathcal {A}\mid \mathcal {Q} _{\ell -1}] + \underbrace{\Pr [\mathbf {Inter} \in \mathcal {B}\mid \mathcal {Q} _{\ell -1}]}_{\le 1}\cdot \frac{1}{2^n-\ell } . \end{aligned}$$
(8)

It remains to bound \(\Pr [\mathbf {Inter} \in \mathcal {A}\mid \mathcal {Q} _{\ell -1}]\). For this, note that once the values in \(\mathbf {Inter}\) except for \((S_\ell ,W_\ell )\) have been fixed, the number of choices for \((S_\ell ,W_\ell )\) is at least \((2^n-\alpha (\mathcal {Q} _{\ell -1}))(2^n-\gamma (\mathcal {Q} _{\ell -1}))\ge 2^{2n}/4\), where \(\alpha (\mathcal {Q} _{\ell -1})\le q\le 2^n/2\) and \(\gamma (\mathcal {Q} _{\ell -1})\le q\le 2^n/2\) are the number of distinct values in \((S_1,\ldots ,S_{\ell -1})\) and \((W_1,\ldots ,W_{\ell -1})\). Out of these \(\ge 2^{2n}/4\) choices, the number of choices that ensure the desired property \(\mathsf {TNT} (T_\ell ,X_\ell )=Y_\ell \) is at most \(\ell -1\), which results from the following selection process: we first pick a pair of input-output \((U_i,V_i)\) with \(i\le \ell -1\), and then set \(S_\ell =T_\ell \oplus U_i\) and \(W_\ell =T_\ell \oplus V_i\). Therefore, \(\Pr [\mathbf {Inter} \in \mathcal {A}\mid \mathcal {Q} _{\ell -1}]\le \frac{4\ell }{2^{2n}}\), and thus the upper bound in this case is

$$\begin{aligned} \Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \le \frac{4\ell }{2^{2n}} + \frac{1}{2^n-\ell }. \end{aligned}$$
(9)

The Lower Bound: It can be seen \(\beta (\mathbf {Inter})\ge \mu _\ell \), since every previous query under the tweak \(T_{\ell }\) gives rise to a unique pair (UV) in \(((U_1,V_1),\ldots ,(U_{\ell -1},V_{\ell -1}))\). Therefore, still from Eq. (7), we have

$$\begin{aligned} \Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \ge&\sum _{\mathbf {Inter} \in \mathcal {B}}\Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}]\cdot \frac{1}{2^n-\mu _\ell } \\ =&\Pr [\mathbf {Inter} \in \mathcal {B}\mid \mathcal {Q} _{\ell -1}]\cdot \frac{1}{2^n-\mu _\ell }. \end{aligned}$$

As before, out of the \((2^n-\alpha (\mathcal {Q} _{\ell -1}))(2^n-\gamma (\mathcal {Q} _{\ell -1}))\) choices of \((S_\ell ,W_\ell )\), the number of choices that ensure the desired property \(T_\ell \oplus S_\ell \notin \{U_1,\ldots ,U_{\ell -1}\}\) and \(T_\ell \oplus W_\ell \notin \{V_1,\ldots ,V_{\ell -1}\}\) is at least \((2^n-\ell )^2\). This means \(\Pr [\mathbf {Inter} \in \mathcal {B}\mid \mathcal {Q} _{\ell -1}]\ge \frac{2^n-\ell }{2^n-\alpha (\mathcal {Q} _{\ell -1})}\cdot \frac{2^n-\ell }{2^n-\gamma (\mathcal {Q} _{\ell -1})}\ge (1-\frac{\ell }{2^n})^2\ge 1-\frac{2\ell }{2^n}\), and thus

$$\begin{aligned} \Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \ge \Big (1-\frac{2\ell }{2^n}\Big )\cdot \frac{1}{2^n-\mu _\ell }. \end{aligned}$$
(10)

Summary. In all, in the first case, we have

$$\begin{aligned}&\Big |\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}]-\frac{1}{2^n-\mu _\ell }\Big | \nonumber \\ \le&\max \bigg \{\frac{4\ell }{2^{2n}}+\frac{\ell -\mu _\ell }{(2^n-\mu _\ell )(2^n-\ell )},\frac{2\ell }{2^n}\cdot \frac{1}{2^n-\mu _\ell }\bigg \} \le \frac{8\ell }{2^{2n}}. \end{aligned}$$
(11)

Case 2: the \(\varvec{\ell }\)-th query is forward , and \({\varvec{X}_{{\varvec{\ell }}}}\, {\varvec{\in }}\, {\varvec{\mathcal {Q}}_{{\varvec{{\ell }-1}}}}\), \({\varvec{Y}_{{\varvec{\ell }}}} \,{\varvec{\notin }}\, {\varvec{\mathcal {Q}}_{{\varvec{\ell -1}}}}\), i.e., \(\exists T',Y':(T',X_\ell ,Y')\in \mathcal {Q} _{\ell -1}\), yet \(\forall T,X:(T,X,Y_\ell )\notin \mathcal {Q} _{\ell -1}\). Now, for a certain intermediate vector \(\mathbf {Inter} \), there are three possibilities, according to which we divide all intermediate vectors into three disjoint classes \(\mathcal {A}\), \(\mathcal {B}\), and \(\mathcal {C}\):

  • Class \(\mathcal {A}\): there does not exist \((U_i,V_i)\) such that \(U_i=T_\ell \oplus S_\ell \), where \(S_\ell \) is specified by \(\mathbf {Inter}\) and corresponds to \(X_\ell \).

  • Class \(\mathcal {B}\): there exists \((U_i,V_i)\) such that \(U_i=T_\ell \oplus S_\ell \), and \(\Pr [\pi _3(T_\ell \oplus V_i)=Y_\ell ]=\frac{1}{2^n-\gamma (\mathcal {Q} _{\ell -1})}\), where \(\gamma (\mathcal {Q} _{\ell -1})\) is the number of distinct values in \((Y_1,\ldots ,Y_{\ell -1})\).

  • Class \(\mathcal {C}\): there exists \((U_i,V_i)\) such that \(U_i=T_\ell \oplus S_\ell \), and \(\Pr [\pi _3(T_\ell \oplus V_i)=Y_\ell ]=0\).

By these, we have

$$\begin{aligned}&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \nonumber \\ =&\sum _{\mathbf {Inter} \in \mathcal {A}}\Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}]\cdot \Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ] \nonumber \\&+ \sum _{\mathbf {Inter} \in \mathcal {B}}\Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}]\cdot \frac{1}{2^n-\gamma (\mathcal {Q} _{\ell -1})}. \end{aligned}$$
(12)

The Upper Bound: For this we need to consider \(\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ]\) for any \(\mathbf {Inter} \in \mathcal {A}\). Let \(U_\ell =T_\ell \oplus S_\ell \). Then it can be seen

$$\begin{aligned}&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ] \nonumber \\ =&\sum _{V_\ell \in \{0,1\}^n} \Pr [\pi _2(U_\ell )=V_\ell \mid \mathbf {Inter} ]\cdot \Pr [\pi _3(T_\ell \oplus V_\ell )=Y_\ell \mid \mathbf {Inter} ] \\ \le&\underbrace{\sum _{V_\ell \in \{0,1\}^n} \Pr [\pi _2(U_\ell )=V_\ell \mid \mathbf {Inter} ]}_{\le 1} \cdot \frac{1}{2^n-\gamma (\mathcal {Q} _{\ell -1})}. \nonumber \end{aligned}$$
(13)

By this, the upper bound in this case is

$$\begin{aligned} \Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \le&\sum _{\mathbf {Inter} \in \mathcal {A}\cup \mathcal {B}}\Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}]\cdot \frac{1}{2^n-\gamma (\mathcal {Q} _{\ell -1})} \\ \le&\frac{1}{2^n-\gamma (\mathcal {Q} _{\ell -1})}\le \frac{1}{2^n-\ell }. \end{aligned}$$

The Lower Bound: Still by Eq. (13), for any \(\mathbf {Inter} \in \mathcal {A}\) we have

$$\begin{aligned}&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ] \\ \ge&\sum _{W_\ell \in \mathcal {GW}} \Pr [\pi _2(U_\ell )=T_\ell \oplus W_\ell \mid \mathbf {Inter} ]\cdot \Pr [\pi _3(W_\ell )=Y_\ell \mid \mathbf {Inter} ] , \end{aligned}$$

where \(\mathcal {GW}\) (“good W set”) is the set of \(W_\ell \) such that:

  • \(W_\ell \notin \{W_1,\ldots ,W_{\ell -1}\}\), and

  • \(T_\ell \oplus W_\ell \notin \{V_1,\ldots ,V_{\ell -1}\}\).

It can seen that \(|\mathcal {GW}|\ge 2^n-\ell -\ell +\mu _\ell =2^n-2\ell +\mu _\ell \): the reason is, for any \((T_i,X_i,Y_i)\in \mathcal {Q} _{\ell -1}\) with \(T_i=T_\ell \), \(W_\ell \ne W_i\Leftrightarrow T_\ell \oplus W_\ell \ne V_i\). On the other hand, \(\Pr [\pi _3(W_\ell )=Y_\ell \mid \mathbf {Inter} ]=\frac{1}{2^n-\gamma (\mathcal {Q} _{\ell -1})}\ge \frac{1}{2^n-\mu _\ell }\), and \(\Pr [\pi _2(U_\ell )=T_\ell \oplus W_\ell \mid \mathbf {Inter} ]=\frac{1}{2^n-\beta (\mathbf {Inter})}\ge \frac{1}{2^n-\mu _\ell }\). Therefore, for any \(\mathbf {Inter} \in \mathcal {A}\) we have

$$\begin{aligned} \Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ] \ge \frac{2^n-2\ell +\mu _\ell }{(2^n-\mu _\ell )^2}. \end{aligned}$$

By these and Eq. (12), we have

$$\begin{aligned}&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \\ \ge&\sum _{\mathbf {Inter} \in \mathcal {A}\cup \mathcal {B}}\Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}]\cdot \frac{2^n-2\ell +\mu _\ell }{(2^n-\mu _\ell )^2} \\ =&\Big (1-\Pr [\mathbf {Inter} \in \mathcal {C}\mid \mathcal {Q} _{\ell -1}]\Big )\cdot \frac{2^n-2\ell +\mu _\ell }{(2^n-\mu _\ell )^2}. \end{aligned}$$

To bound \(\Pr [\mathbf {Inter} \in \mathcal {C}\mid \mathcal {Q} _{\ell -1}]\), note that if \(\mathbf {Inter} \in \mathcal {C}\), then there exists \(Y_i\in \{Y_1,\ldots ,Y_{\ell -1}\}\) such that \(\Pr [\mathsf {TNT} (T_\ell ,X_\ell )=Y_i\mid \mathcal {Q} _{\ell -1}]=1\). For each such \(Y_i\) the probability is at most \(\frac{4\ell }{2^{2n}}\) as analyzed in Case 1. Since there are at most \(\ell -1\le \ell \) choices for this \(Y_i\), we obtain

$$\begin{aligned}&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \ge \Big (1-\frac{4\ell ^2}{2^{2n}}\Big )\cdot \frac{2^n-2\ell +\mu _\ell }{(2^n-\mu _\ell )^2} \end{aligned}$$

as the lower bound. Further note that

$$\begin{aligned}&\frac{1}{2^n-\mu _\ell }-\Big (1-\frac{4\ell ^2}{2^{2n}}\Big )\cdot \frac{2^n-2\ell +\mu _\ell }{(2^n-\mu _\ell )^2} \\ \le&\frac{1}{2^n-\mu _\ell }-\frac{2^n-2\ell +\mu _\ell }{(2^n-\mu _\ell )^2}+\frac{4\ell ^2}{2^{2n}}\cdot \frac{2^n-2\ell +\mu _\ell }{(2^n-\mu _\ell )^2} \\ \le&\frac{2(\ell -\mu _\ell )}{(2^n-\mu _\ell )^2} + \frac{8\ell ^2}{2^{3n}} \le \frac{8\ell }{2^{2n}} + \frac{8\ell }{2^{2n}} = \frac{16\ell }{2^{2n}} . \end{aligned}$$

Summary. In all, in the second case, we have

$$\begin{aligned}&\Big |\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}]-\frac{1}{2^n-\mu _\ell }\Big | \nonumber \\ \le&\max \bigg \{\frac{1}{2^n-\ell }-\frac{1}{2^n-\mu _\ell }, \frac{16\ell }{2^{2n}} \bigg \} \le \frac{16\ell }{2^{2n}}. \end{aligned}$$
(14)

Case 3: the \(\varvec{\ell }\)-th query is forward , and \({\varvec{X}_{\varvec{\ell }}}\, \varvec{\notin }\, {\varvec{\mathcal {Q}}_{\varvec{{\ell -1}}}}\), \({\varvec{Y}_{\varvec{\ell }}} \,{\varvec{\in }} \,{\varvec{\mathcal {Q}}_{\varvec{\ell -1}}}\). The analysis is similar to Case 2 by symmetry, resulting in the same bound

$$\begin{aligned} \Big |\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}]-\frac{1}{2^n-\mu _\ell }\Big | \le \frac{16\ell }{2^{2n}}. \end{aligned}$$
(15)

Case 4: the \(\varvec{\ell }\)-th query is forward , and \({\varvec{X}_{\varvec{\ell }}}{\varvec{,}}\,{\varvec{Y}_{\varvec{\ell }}}\,{\varvec{\notin }}\,{\varvec{\mathcal {Q}}_{\varvec{\ell -1}}}\). The analyses for this case heavily resemble Case 2. First, the same upper bound

$$\begin{aligned} \Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}] \le \frac{1}{2^n-\gamma (\mathcal {Q} _{\ell -1})}\le \frac{1}{2^n-\ell } \end{aligned}$$

can be established. Second, for any \(\mathbf {Inter} \) such that \(\Pr [\mathbf {Inter} \mid \mathcal {Q} _{\ell -1}]>0\), we have

$$\begin{aligned}&\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ] \nonumber \\ \ge&\sum _{S_\ell \in \mathcal {GS},W_\ell \in \mathcal {GW}} \Pr [\pi _1(X_\ell )=S_\ell \mid \mathbf {Inter} ]\cdot \Pr [\pi _2(T_\ell \oplus S_\ell )=T_\ell \oplus W_\ell \mid \mathbf {Inter} ] \\&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \cdot \Pr [\pi _3(W_\ell )=Y_\ell \mid \mathbf {Inter} ] , \end{aligned}$$

where \(\mathcal {GS}\) is the set of \(S_\ell \) such that:

  • \(S_\ell \notin \{S_1,\ldots ,S_{\ell -1}\}\), and

  • \(T_\ell \oplus S_\ell \notin \{U_1,\ldots ,U_{\ell -1}\}\),

and \(\mathcal {GW}\) is the set of \(W_\ell \) such that:

  • \(W_\ell \notin \{W_1,\ldots ,W_{\ell -1}\}\), and

  • \(T_\ell \oplus W_\ell \notin \{V_1,\ldots ,V_{\ell -1}\}\).

It is easy to see \(|\mathcal {GS}|,|\mathcal {GW}|\ge 2^n-2\ell +\mu _\ell \), \(\Pr [\pi _1(X_\ell )=S_\ell \mid \mathbf {Inter} ]=\frac{1}{2^n-\alpha (\mathcal {Q} _{\ell -1})}\ge \frac{1}{2^n-\mu _\ell }\), \(\Pr [\pi _3(W_\ell )=Y_\ell \mid \mathbf {Inter} ]=\frac{1}{2^n-\gamma (\mathcal {Q} _{\ell -1})}\ge \frac{1}{2^n-\mu _\ell }\), and \(\Pr [\pi _2(U_\ell )=T_\ell \oplus W_\ell \mid \mathbf {Inter} ]=\frac{1}{2^n-\beta (\mathbf {Inter})}\ge \frac{1}{2^n-\mu _\ell }\). Therefore, we have

$$\begin{aligned} \Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathbf {Inter} ] \ge \frac{(2^n-2\ell +\mu _\ell )^2}{(2^n-\mu _\ell )^3}, \end{aligned}$$

for which

$$\frac{1}{2^n-\mu _\ell }-\frac{(2^n-2\ell +\mu _\ell )^2}{(2^n-\mu _\ell )^3}\le \frac{4(\ell -\mu _\ell )(2^n-\ell )}{(2^n-\mu _\ell )^3}\le \frac{16\ell }{2^{2n}}.$$

Therefore,

$$\begin{aligned}&\Big |\Pr [\mathsf {TNT} (T_\ell ,X_\ell )\rightarrow Y_\ell \mid \mathcal {Q} _{\ell -1}]-\frac{1}{2^n-\mu _\ell }\Big | \nonumber \\ \le&\max \bigg \{\frac{1}{2^n-\ell }-\frac{1}{2^n-\mu _\ell }, \frac{16\ell }{2^{2n}} \bigg \} \le \frac{16\ell }{2^{2n}}. \end{aligned}$$
(16)

To conclude, when the \(\ell \)-th query is forward, from Eqs. (11), (14), (15), and (16) we have

$$\begin{aligned} \Big (\textsf {p}_{\mathsf {TNT},\mathcal {D}}(Y_\ell \mid \mathcal {Q} _{\ell -1})-\frac{1}{2^n-\mu _\ell }\Big )^2 \le \Big (\frac{16\ell }{2^{2n}}\Big )^2 \le \frac{256\ell ^2}{2^{4n}}. \\ \end{aligned}$$

The remaining Cases 5, 6, 7, and 8 concern with the case where the \(\ell \)-th query is backward, and the analyses are similar to Cases 1, 2, 3, and 4 by symmetry, resulting in the same bound

$$\begin{aligned} \Big (\textsf {p}_{\mathsf {TNT},\mathcal {D}}(X_\ell \mid \mathcal {Q} _{\ell -1})-\frac{1}{2^n-\mu _\ell }\Big )^2 \le \Big (\frac{16\ell }{2^{2n}}\Big )^2 \le \frac{256\ell ^2}{2^{4n}}. \end{aligned}$$

Consequently,

$$\begin{aligned} \chi ^2(\mathcal {Q} _{\ell -1})\le \sum _{R_\ell }\frac{256\ell ^2/2^{4n}}{1/(2^n-\mu _\ell )}\le 2^n\cdot 2^n \cdot \frac{256\ell ^2}{2^{4n}}\le \frac{256\ell ^2}{2^{2n}}, \end{aligned}$$

and

$$\frac{1}{2}\sum _{\ell =1}^q\mathbf {E}\big [\chi ^2(\mathcal {Q} _{\ell -1})\big ]\le \frac{1}{2}\sum _{\ell =1}^q\frac{256\ell ^2}{2^{2n}}\le \frac{1}{2}\cdot \frac{128q^3}{2^{2n}}=\frac{64q^3}{2^{2n}},$$

which implies Eq. (6) by Lemma 1.    \(\square \)

5 Concrete Proposals

In this section, we propose our instantiation of the \(\textsf {TNT}\) construction based on AES, which allows fast software implementations when AES-NI are available. We call the instantiation TNT-AES. To also enjoy the long-standing security of AES, we try to make minimum possible modifications over AES. Following these considerations, we only extend the number of rounds without any modification to its round function or key schedule, and pick the respective numbers of rounds for the three permutations \(\pi _1, \pi _2\), and \(\pi _3\) so that the design is secure against all relevant attacks. More explicitly, when the tweak \(T = 0\), TNT-AES simply becomes AES with more rounds, which clearly leaves higher security margins over AES. Besides, we let the last round be complete instead of missing the MixColumns operation. In the remainder of the section, we give the description of TNT-AES, followed by a comprehensive cryptanalysis, and a comparison of software and hardware performances against other existing TBCs with similar security levels.

5.1 Instantiation Based on AES

The Advanced Encryption Standard (AES) [23] is an iterated block cipher with block size 128 bits and secret key sizes 128, 192, and 256 bits. The internal state of AES, as well as the round keys, can be represented as a \(4 \times 4\) matrix whose elements are byte value (8 bits). The round function consists of four basic transformations in the following order (see Fig. 2):

  • SubBytes (SB) is a nonlinear substitution that applies the same S-box to each byte of the internal state.

  • ShiftRows (SR) is a cyclic rotation of the i-th row by i bytes to the left, for \(i = 0, 1, 2, 3\).

  • MixColumns (MC) is a multiplication of each column with a Maximum Distance Separable (MDS) matrix over \(\mathrm {GF}(2^8)\).

  • AddRoundKey (AK) is an exclusive-or with the round key.

Fig. 2.
figure 2

AES round function

At the very beginning of the encryption, an additional pre-whitening key addition is performed, and the last round is different from the normal rounds by omitting the MixColumns operation. AES-128, AES-192, and AES-256 share the same round function with different numbers of rounds: 10, 12, and 14, respectively.

The key schedule of AES transforms the master key into subkeys that are used in each of the rounds. Here, we describe the key schedule of AES-128. The 128-bit master key is divided into four 32-bit words (W[0], W[1], W[2], W[3]), then W[i] for \(i\geqslant 4\) is computed as

$$\begin{aligned} W[i] = {\left\{ \begin{array}{ll} W[i-4] \oplus \textsf {SB}(\mathtt{RotByte}(W[i-1]))\oplus Rcon[i/4] &{} i \equiv 0~\mathrm{mod}~4, \\ W[i-4] \oplus W[i-1] &{} \mathrm{otherwise.} \end{array}\right. } \end{aligned}$$

The i-th round key is the concatenation of 4 words \(W[4i]~\Vert ~W[4i+1]~\Vert ~W[4i+2]~\Vert ~W[4i+3]\). RotByte is a cyclic shift by one byte to the left, and Rcon are the round constants defined as

$$\begin{aligned} Rcon[i] = {\left\{ \begin{array}{ll} 1 &{} i = 0, \\ 2 \cdot Rcon[i-1] &{} \mathrm{otherwise,} \end{array}\right. } \end{aligned}$$

where ‘\(\cdot \)’ denotes multiplication in \(\mathrm {GF}(2^8)\) with irreducible polynomial \(x^8+x^4+x^3+x+1\).

Although AES-128 consists of 10 rounds, it can be naturally extended to more rounds, each composed of all 4 transformations (AddRoundKey \(\circ \) MixColumns \(\circ \) ShiftRows \(\circ \) SubBytes), and the pre-whitening key addition to the first round is kept as it is. Then, TNT-AES[\(n_1\), \(n_2\), \(n_3\textsf {]}\) is defined to be the extension of AES to \((n_1+n_2+n_3)\) rounds, i.e., \(\pi _1, \pi _2, \pi _3\) are of \(n_1, n_2, n_3\) full AES rounds respectively, and the 128-bit tweak is XOR-ed into the internal state at the output of \(\pi _1\) and \(\pi _2\). It is natural to set \(n_1 = n_3\) due to the symmetry of the design. Concretely, we define TNT-AES[\(6,6,6 \textsf {]}\), and will use TNT-AES to denote this choice for the sake of simplicity. We will justify the round numbers in the security analysis below.

5.2 Preliminary Cryptanalysis

In this subsection, we give our preliminary cryptanalysis against TNT-AES. As TNT-AES consists of \(18 \) rounds in total, which is \(8 \) more rounds than AES-128, we expect higher security margins of TNT-AES when the tweak is treated as a given constant. Hence, we focus on only the cases where the tweaks help the attack from cryptanalysts’ point of view, i.e., it is assumed the tweak is under the attacker’s full control (open tweak), and possibly extends the existing attacks against round-reduced AES. Under such a setting, we verify the most efficient attacks in terms of number of attacked rounds, against TNT-AES and claim the absence of key-recovery attack against the full TNT-AES in the single-key setting. While we do not claim security under the related-key setting for TNT-AES due to lack of security proof for TNT in such setting, our preliminary cryptanalysis below shows that there is no key-recovery attack either.

Following the proven security bound of \(\mathsf {TNT}\), TNT-AES offers 2n/3-bit security, i.e., there exists no key-recovery attack, given that the data (the combination of tweak and plaintext with no restriction on individual input) and time complexities are bounded by \(2^{2 \cdot 128/3} \simeq 2^{85}\). Due to the fact that there is no attack against \(\mathsf {TNT}\) matching the \(2^{2n/3}\) bound, all our security analysis against TNT-AES are following the \(2^n = 2^{128}\) bound for both data and time. This allows TNT-AES offering higher security strength should a better than 2n/3-bit bound be proven for TNT. In summary, we claim that there is no shortcut attack on TNT-AES better than the generic attacks against the corresponding TNT mode.

In what follows, explicit security margins are given under each attack method whenever possible. Before moving to the individual attack methods, an overview of the impact of the tweak to the security at model level is given as follows. As mentioned above, the security margin will be higher for TNT-AES when tweak is a given constant, and we call such a tweak inactive. When the tweak is active, it may be used to cancel differences in differential attack, or to be used as the source of input structure in integral attacks. Under the single-key setting, the activeness of the round functions will be consistent within each of the three permutations \(\pi _1\), \(\pi _2\), and \(\pi _3\). This allows us using 0/1 to denote the activeness of the permutations with 1 for active (0 for inactive), and a simple exhaustive search shows there are activity patterns \(\{(0,1,0),(0,1,1),(1,1,0),(1,0,1), (1,1,1)\}\) for differential attacks, and \(\{(1,1,1), (1,1,0), (0,1,1)\}\) for integral attacks and alike.

Differential and Linear Attacks. In the single-key setting, we will employ the known results of 4-round AES to justify the security of TNT-AES. It is well-known that there are at least 25 active S-boxes in 4 rounds of AES, which makes sure that there exists no 4-round differential characteristic (resp. linear approximation) with differential probability (resp. linear correlation) higher than \(2^{-6 \times 25}\) (resp. \(2^{-3 \times 25}\)) [22]. For the maximum expected differential and linear probability (MEDP and MELP), known results can be obtained following the work of Keliher and Sui [47], which suggests that the upper bound on the MEDP (and MELP) of 4-round AES is about \((53/2^{34})^4 \approx 2^{-110}\). For TNT-AES in the single-key setting where the difference can be injected on the plaintext or the tweak, there is at least one active permutation among \(\pi _1, \pi _2,\pi _3\) since their activity patterns fall in \(\{(0,1,0),(0,1,1),(1,1,0),(1,0,1), (1,1,1)\}\). As long as \(\pi _2\) is active, there must be more than 25 active S-boxes. In the case of (1, 0, 1), it happens only when the first addition of the tweak cancels out the differences introduced from plaintext through \(\pi _1\), and the same difference is then re-introduced through the second addition of tweak through \(\pi _3\). Due to the fact that the same tweak is added and the difference in tweak is the same as well, \(\pi _1\) and \(\pi _3\) can be concatenated together with respective to differences. Note that \(\pi _1+\pi _3\) is of \(12 \) rounds in total, out of which any 4 consecutive rounds will ensure 25 active S-boxes. We also note the security analysis of TNT under such a setting is very similar to that of AES-PRF [60] except one has the control over the extra input tweak in TNT added to the unknown internal state.

In the related-key setting, we only considered differential cryptanalysis, as there is no cancellation of active S-boxes between subkeys and the state in linear approximations. In [73], it is shown that in the related-key setting, there are at least 21 active S-boxes in consecutive 6 rounds of AES-128, and the optimal 6-round differential has probability \(2^{-131}\). Therefore, no useful related-key differential characteristic covering more than \(\pi _2\) can be found no matter whether there is a difference in the tweak or not.

Impossible-Differential Attacks. In [71], it is proven that there does not exist any truncated impossible-differential of AES which covers more than 5 rounds. Furthermore, the best impossible-differential attack, in terms of number of attacked rounds, is 7 rounds against AES-128 [55]. Following a similar discussion for differential attacks, when \(\pi _2\) is active, impossible-differential attack does not apply naturally since \(\pi _2\) is of \(6 \) rounds, more than what impossible-differential distinguisher can cover. For the case of activity pattern (1, 0, 1), there are \(12 \) rounds in total for \(\pi _1+\pi _3\), more than the best attack against AES-128 can cover.

The Demirci-Selçuk Meet-in-the-Middle Attack. The Demirci-Selçuk meet-in-the-middle attack led to the best cryptanalytic result on 7 rounds of AES-128 in the single-key setting, where data/time/memory complexities are below \(2^{100}\) [25]. The distinguisher covers 4 rounds, following a differential characteristic. Note, the distinguisher here tries to limit the number of possibilities for the actual values related to the differential characteristic, and it is not clear how the addition of the tweak helps reduce that. Actually, it is not even clear the addition of round key can help reduce the counts either. Hence, round keys are treated as independent fixed constants in such attacks. Thus, we can treat the tweak in the same way. Therefore, the Demirci-Selçuk meet-in-the-middle attack would work in the same way on TNT-AES as on AES, and 7 rounds of TNT-AES can be attacked.

Yoyo Tricks. In [68], Rønjom et al. presented several key-independent yoyo-distinguishers on 3- to 5-round AES, which require up to \(2^{25.8}\) data and \( 2^{24.8} \) XOR computations. A key-independent impossible-differential yoyo-distinguisher on 6-round AES requiring an amount of \(2^{122.83}\) data was also proposed. Besides, a key-recovery attack on 5-round AES requiring practical complexities was devised based on the 4-round yoyo-distinguisher. In these attacks, the attacker queries pair of plaintexts to the encryption and uses swap operation on the obtained pair of ciphertexts to generate new queries to the decryption, and observes difference in the obtained pair of plaintexts, then she may continually construct new pairs of plaintexts by swapping words in the obtained pairs and iterate the same procedure enough times. It can be seen that, instead of collecting all chosen plaintexts/ciphertexts (CPs/CCs) at once, these attacks use adaptively-chosen-plaintexts/-ciphertexts (ACPs/ACCs). In TNT-AES, tweaks are always inserted as input to the encryption/decryption, and will never be output. So, for activity pattern (0, 1, 1) (resp. (1, 1, 0) for decryption), the attacker cannot play the yoyo game by adaptively choosing and observing the differences of tweak pairs and ciphertext (resp. plaintext) pairs. Accordingly, we claim that these yoyo-distinguishers and yoyo-distinguisher-based key-recovery attacks cannot be directly applied in their current form to TNT-AES.

Subspace Trail Attacks. Subspace trail cryptanalysis [32] can be seen as a generalization of invariant subspace cryptanalysis [52], whereas it can be launched independently on specific choices of round constants or subkeys. By analyzing subspace trails, Grassi et al. re-interpreted the 3-round truncated differential and integral, the 4-round impossible-differential and integral distinguishers on AES  [33]. Besides, new distinguishers on round-reduced AES are found using subspace trail cryptanalysis, including the 5-round impossible-differential distinguisher [33], the 5-round multiple-of-8 distinguisher [34], the 4-round mixture-differential [31], and the 5-round (probabilistic, threshold, and impossible) mixture-differential distinguishers [30]. Exploiting the 4-round mixture-differential distinguisher, a record for key-recovery attack on 5-round AES-128 in single-key model is set [4]. In [6], Bardeh and Rønjom proposed the exchange attacks. Like in yoyo and mixture-differential attacks, exchange attacks also involve swap (exchange) operations on the pairs of chosen data. On 6-round AES, the exchange distinguishers requires \( 2^{88.2} \) CPs and \( 2^{88.2} \) encryptions. In the attacks, new plaintext pairs are obtained by exchanging certain active diagonal of other pairs that are different in diagonals, and an invariant property on the number of active columns of the differences of ciphertext pairs under such exchange operation are considered.

Using subspace trail cryptanalysis and comparing with distinguishers on round-reduced AES, we analyze distinguishers and corresponding attacks on round-reduced TNT-AES. The activity patterns of the three permutations that we considered are (0, 1, 0), (1, 0, 1) , (0, 1, 1) , (1, 1, 0), and (1, 1, 1). The activity pattern (0, 1, 0) requires that all differences are comes from tweaks and canceled by the same tweaks through \( n_2 \) (i.e., \( 6 \)) AES-rounds, which has no shortcut method up to now. Considering that all subspace-trail-based distinguishers on round-reduced AES are no more than \( n_2 \) (i.e., \( 6 \)) AES-rounds, it seems hard to construct an exploitable subspace trail under activity patterns (0, 1, 1), (1, 1, 0) , (1, 1, 1) , which indicate more than a chunk of active \( 6 \)-round AES. The activity pattern (1, 0, 1) implies that the coset of subspace related to the internal states at the end of \( \pi _1 \) (resulted from a set of plaintexts) equals a coset of the same subspace formed by the chosen tweaks (and the differences between tweak pairs should cancel the differences caused by the plaintext pairs), and thus the coset of subspace formed by the chosen tweaks will cause the internal states at the beginning of \( \pi _3 \) forming a coset of the same subspace. A subspace trail on internal states can be seen as bypassing \( \pi _2 \) via choosing a coset of subspace of the tweak. Thus, devising an attack using a subspace trail under activity pattern (1, 0, 1) requires that one can devise a subspace trail attack on the concatenated permutation \( \pi _3 \circ \pi _1 \) that is of \( (n_1 + n_3) \) AES-rounds, which is unknown when \( (n_1 + n_3) > 6 \). In Appendix A, we discuss in detail the subspace-trail-based distinguishers and key-recovery attacks on round-reduced TNT-AES.

Cube Attack, Dynamic Cube Attack. AES is immune to cube attacks [27] or dynamic cube attacks [28] due to the high algebraic degree of the AES S-box. Specifically, the algebraic degree is 7 for one round of AES and increases to 32 (\({<}7^2\)) and 128 (\({<}32\times 7\)) for two and three rounds. Therefore, AES, which has 10 rounds, is believed to be resistant to such types of attacks. So is TNT-AES since it has more rounds than AES.

Integral Attacks and Division Property. The integral attacks utilize an integral distinguisher for 3 rounds (or 4 rounds without MixColumns for the last round), with a starting point of ALL values for a diagonal and a BALANCED output, i.e., the sum of each individual byte is 0. The best attack setting will be to utilize the degrees of freedom from the tweak to achieve the distinguisher starting from the input of \(\pi _2\) in forward direction with activity pattern (0, 1, 1) (or output of \(\pi _2\) in backward direction with activity pattern (1, 1, 0)). The attack will start with a fixed plaintext, and take ALL values of a diagonal from tweak. Thus, the target is \(\pi _2\) + \(\pi _3\) only with a secret input to \(\pi _2\). In the key-recovery phase, the attacker is able to append one round only, so this attack will work for at most \(n_1+5\) out of \((n_1+n_2+n_3)\) rounds, i.e., \(6 + 5\) out of \(18 \) for TNT-AES.

The division property due to Todo et al.  [74, 75] can be viewed as an extension of integral distinguisher, which has been successfully applied to many block ciphers. However, there is no reported results on AES better than integral attack so far.

Slide Attacks. The slide attack was first described by Biryukov and Wagner [10, 11] in 1999 to attack round-reduced DES. The core idea is to make use of the similarity of the round functions and that of key schedule. Thus, the difference of encryption process in its original form and one (or few) rounds shifted is within control, e.g., with high probability. The addition of tweak will allow canceling the difference in at most one round, while TNT-AES has 8 more rounds than AES-128. Hence, we expect higher security margin here. Furthermore, there is no reported slide attack against full AES-128 so far.

(Related-Subkey) Boomerang Attacks. Boomerang attacks [76] construct long distinguishers by connecting two short differential characteristics. Recently, a new tool named Boomerang Connectivity Table [16] was proposed to formulate the dependency that the two differential characteristics contain and offer guidance towards better boomerang distinguishers. We utilize the framework of the boomerang connectivity table when mounting boomerang attacks on TNT-AES. First, we consider the single-key setting where the difference can be introduced on the plaintext or the tweak. When the difference is introduced only on the tweak, as shown in Fig. 3 in Appendix B, high-probability boomerang distinguishers can be constructed on \(n_1 + n_2 + n_3\) rounds, where \(n_1,n_3\) can be any number and \(n_2<6\). When \(n_2\ge 6\), such high-probability distinguishers do not exist. Note that these distinguishers with zero plaintext and ciphertext difference are not useful in key-recovery attacks. When the difference is also introduced to the plaintext or ciphertext, by making \(\pi _2\) inactive through the tweak difference, the cipher can be seen as \(\pi _1\circ \pi _3\) with respective to differences and boomerang attacks of \(n_2+r\) rounds can be mounted, where r is the number of rounds that boomerang attacks of AES-128 can cover and is 5. That is, only 11 rounds can be attacked. Next, we consider the related-subkey setting where the key difference can be injected on a round key. The related-subkey setting is more powerful and usually allows longer boomerang distinguishers than the related-key setting where the difference is injected on the master key. In the related-subkey setting, there exists a 6-round boomerang distinguisher of AES-128 with probability \(2^{-109.42}\) [70]. This distinguisher can be naturally extended to the 7 middle rounds of TNT-AES with the same probability under the condition that the tweak difference cancels the input difference or the output difference of the 6-round boomerang distinguisher. When we add one more round to the bottom or to the top of the 7-round distinguisher, the numbers of active S-boxes will increase at least by one, leading to a negligible probability. Therefore, there seem no boomerang distinguishers of TNT-AES in the related-subkey setting that cover more than 8 rounds.

5.3 Performance

Software Performance. We estimate the software performance of TNT-AES on the basis of the best results of AES software provided by Park et al.  [65]. In what follows, we consider both “Plaintext” and “Tweak” as data since when used in some authenticated encryption schemes, both of them are used to process data such as associated data. Hence, the software performance is then calculated as the total number of CPU cycles divided by the total byte length of plaintext and tweak of the TBCs. To obtain a fair comparison, we estimate the same for other existing TBCs as well (omitting their additional cost for updating tweaks), using the following formula:

$$\begin{aligned} \text {original speed} \times \frac{\text {block size}}{\text {block size + tweak size}}. \end{aligned}$$
(17)

For TNT-AES, the number of rounds are different from AES. To evaluate the performance, we multiply a factor to the speed of AES. Accordingly, the formula we used to calculate the software speed of TNT-AES is (where, AES means AES-128):

$$\begin{aligned} \text {speed of } \textsf {AES} \times \frac{\text {block size}}{\text {block size + tweak size}} \times \frac{{\textsf {TNT}{\text {-}}{\textsf {AES}}}\text { round number}}{\textsf {AES} \text { round number}}. \end{aligned}$$
(18)

We note that the optimization technique proposed in [65] is for the CTR mode of AES, which extends the counter-mode caching [9, 78]. It caches and reuses intermediate results up to AES round 1 (R1) or up to AES round 2 (R2). For TNT-AES, tweaks are added until round \( (n_1 + 1) \). Thus, this optimization technique is applicable. Whereas, for other TBCs in which tweaks are added before the first round, this technique may not be applicable.

Table 2 presents the estimated results on software performance of TNT-AES, together with the results of other TBCs under the similar setting (considering both “Plaintext” and “Tweak” as data).

Table 2. A table of comparison with other TBCs on software (all TBCs are with 128-bit block, 128-bit master key). The platform is Intel Haswell CPU i7-4770, which is the commonly used CPU in references [8, 40, 42, 65].

Rekeying and Retweaking. To see the scenario that profits considerably by using a tweakable block cipher processing tweak efficiently, we performed a performance comparison between retweaking in TNT-AES and rekeying in AES-128. Table 3 reports the timing results. Because in the AES-NI set, the reciprocal throughput of the AESKEYGENASSIST instruction that assists the key-schedule is higher than that of the instruction AESENC that executes one round of encryption, in Table 3, it can be seen that the process of rekeying in AES becomes slower. Whereas, the process of retweaking in TNT-AES benefits a lot from the fast AES-NI instruction for encryption.Footnote 1

Hardware Performance. We estimate the hardware performance of TNT-AES with area minimization as optimizations target. The current record of minimized area of AES is kept by the bit-serial implementations provided by Jean et al.  [39]. Apart from AES, Jean et al. also provided bit-serial implementations of another tweakable block cipher SKINNY. Using those state-of-the-art results provided by Jean et al.  [39], we estimate the area and latency of TNT-AES and make comparisons with other TBCs. The results are summarized in Table 5.

In the table, results for AES, SKINNY-128-256, and Deoxys-BC-256 are all from existing studies. The results for TNT-AES are calculated using the following method based on the results for AES. Let \( \delta \) be the number of bits in the data path in all implementations. Let \( C_{\mathtt {1DFF}} \) be the cost of a 1-bit D flip-flop (D FF), let \( C_{\mathtt {XOR}} \) be the cost of a 2-input XOR gate, and let \( C_{\mathtt {MUX}} \) be the cost of a 2-to-1 Multiplexer in a library. We use Table 4 to estimate \( C_{\mathtt {1DFF}} \), \( C_{\mathtt {XOR}} \), and \( C_{\mathtt {MUX}} \) in various libraries.

Table 3. Software performance of AES-128 when rekeying for every block and that of TNT-AES when fixing a key but retweaking for every block, both with plaintexts as data (unlike in Table 2 where we consider both “Plaintext” and “Tweak” as data), and both with help of AES-NI (on an Intel(R) Core(TM) i7-8565U CPU 1.80 GHz, which belongs to products formerly Whiskey Lake).

Compared with implementations of AES, the additional area cost for implementations of TNT-AES comes from the cost for storing a 128-bit tweak and the cost for implementing the XOR with tweak (we ignore the additional cost for the signals controlling the tweak/key inputs). We note that there are cases where as input, the tweak can be sent twice by the external provider. In such cases, extra storage for the tweak can be saved. We note that this is possible for a design without a “tweak-schedule”. For other designs, such as that permute the bytes of the tweak, this becomes difficult as it requires this permutation to be followed by external provider if not stored locally. In TNT-AES, there is no tweak-schedule, hence no storage for tweak is required. When storage is required, the 128-bit tweak can be stored using 128 1-bit D FF. To implement the XOR with tweak, besides \( \delta \) 2-input XOR gates, \( \delta \) 2-to-1 multiplexers are also required for selecting the bits of tweak after the \( n_1 \)-th round and the \( (n_1 + n_2) \)-th round and selecting constant 0 after other rounds. The additional area cost for XOR gates and multiplexers is \( \delta \times (C_{\mathtt {XOR}} + C_{\mathtt {MUX}}) \). Thus, additional area cost is \( 128 \times C_{\mathtt {1DFF}} + \delta \times (C_{\mathtt {XOR}} + C_{\mathtt {MUX}}) \) when the tweak needs to be stored locally, and \( \delta \times (C_{\mathtt {XOR}} + C_{\mathtt {MUX}}) \) otherwise. To get a better view of the performances, we provide the gate sizes for both scenarios.

For latency, selecting and XOR-ing bits of tweak can be implemented in the same clock cycles for AddRoundKey and SubBytes, thus cost no additional cycles. The additional cycle-cost comes from the fact that TNT-AES has more rounds and the last round is complete instead of missing the MixColumns. Thus, to estimate the latency of TNT-AES, we use the clock cycles taken by one full round of AES (denoted by \( {Cycles}_{\mathtt {round}} \)), times the total number of rounds (\( n_1 + n_2 + n_3 \)), plus the cycles taken by the last AddRoundKey (\( 128/\delta \) cycles), i.e., \( {Cycles}_{\mathtt {round}} \times (n_1 + n_2 + n_3) + 128/\delta \), where \( {Cycles}_{\mathtt {round}} \) is listed in Table 5 (column 8 for AES).

From Table 5, when the tweak has to be stored locally, the hardware performance of TNT-AES is slightly inferior to those of SKINNY-128-256 and Deoxys-BC-256, otherwise, the hardware performance of TNT-AES can be superior.

Table 4. The (estimated) cost (in Gate Equivalent, GE) of regular flip-flops, scan flip-flops, 2-input XOR gates, and 2-to-1 Multiplexers in different libraries.
Table 5. A table of comparison with other TBCs on hardware area (in GEs) and latency (all TBCs are with 128-bit block, 128-bit master key, and 128-bit tweak)

Comparison to . Here, we briefly discuss the comparison between the performance of TNT-AES and that of TAES, where TAES is an AES-based TBC used to instantiate ZOCB and ZOTR that are two tweakable blockcipher modes for authenticated encryption with full absorption [3]. TAES tweaks AES-256 by simply replacing the second half part of the secrete key with 128-bit tweak and keeping all other operations and parameters unchanged. Thus, it has 14 rounds, 128-bit blocks, 128-bit keys, and 128-bit tweaks.

Because TNT-AES consists of 18 AES-rounds, i.e., \( 4 \) more rounds than TAES, under the use-cases where both the key and the tweak are fixed and all sub-tweaks/sub-keys can be precomputed, TAES outperforms TNT-AES. Whereas, for other use-cases where retweaking is necessary, TNT-AES is expected to perform better. The reasons are as follows. TNT-AES has no tweak-schedule, while that for TAES is related to the key-schedule for AES-256. For software implementation using AES-NI, the instruction for one-round encryption outperforms that for the key-schedule as mentioned above. Thus, in retweaking use-cases, TNT-AES will be much faster than TAES. For hardware implementation, when the 128-bit tweak can be stored in external storage, TNT-AES does not need additional storage to process the tweak. The area requirement is hence much less than that of TAES, which requires local storage to hold and process the tweak.

6 Conclusion and Open Questions

In this paper, we proposed a new mode named TNT for constructing tweakable block ciphers with proven BBB security based on three block ciphers. To demonstrate the effectiveness of the mode, an instantiation based on AES named TNT-AES was proposed, which enjoys the long-standing security of AES, fast software performance due to AES new instructions, and hardware efficiency due to the simplicity of TNT mode. Following the prove-then-prune design strategy, we reduced the number of rounds of the three underlying AES-based block ciphers from 10 for the original AES, to \(6 \), \(6 \), and \(6 \), respectively. Our comprehensive cryptanalysis shows no security issues against TNT-AES, while the reduced number of rounds allow achieving competitive software and hardware performances with existing TBCs designed through modular methods. We expect TNT to be a generic way to turn a block cipher into a tweakable block cipher securely, especially for those lightweight block ciphers with smaller block lengths.

Potential Applications. While TNT-AES only supports n-bit tweaks which seems a limitation compared to \(\mathsf {CLRW2} _2\), such a parameter has already been sufficient for many important applications. For example, many TBC-based MACs, including the chaining-via-tweak mode proposed by Liskov et al.  [54] (its security was later proved optimal by Landecker et al.  [51]) and the AXU-hash-based MACs proposed by Cogliati et al.  [19], are exactly built from TBCs with n-bit tweaks, and thus instantiating the TBCs with \(\mathsf {CLRW2} _2\) (as done in [51]) clearly wastes power and causes unnecessary efficiency loss. Consequently, TNT-AES would probably be a better building block. Moreover, TNT-AES could also be used to build BBB secure variable length domain extenders via the construction of Chen et al.  [13] or double-length block cipher via the construction of Coron et al.  [21]. As discussed in [13], such construction may further motivate highly secure format-preserving encryption schemes might be a very valuable alternative to the recently broken standards.

Besides, TNT-AES could be used to replace the TBC module in the standard \(\mathsf {OCB3}\) mode and the \(\mathsf {OTR}\) mode [62] (the 2nd round candidate during CAESAR competition). Both modes are optimally secure when the underlying TBC-module is optimal [49, 62] but fall down to the birthday bound due to instantiating the TBC with XEX-like constructions [67]. Therefore, once instantiating with TNT-AES, we obtain corresponding variants secure against BBB \(2^{2n/3}\) queries in both cases. Consider the application to \(\mathsf {OCB3}\) for concreteness. The resulting AE TNT-AES-\(\mathsf {\Theta CB}\) is a \(\mathsf {\Theta CB}\) instance [49] with TNT-AES being its underlying TBC, and the security is boosted from n/2 bits of \(\mathsf {OCB3}\) to 2n/3 bits. Perhaps surprisingly, the hardware efficiency might be improved as well: the original \(\mathsf {OCB3}\) mode requires to store an AXU hash key \(E_K(0)\) during the lifetime of the master key K, which is avoided in TNT-AES-\(\mathsf {TAE}\).

We anticipate more such applications, especially when AES-based TBCs are used and constructed from other modes than TNT.

The Security Gap. Although the security of TNT is proven to be \(2^{2n/3}\), there is no matching attack – note that Dinur et al.’s attack strategy [26] against the 3-round Even-Mansour ciphers does not help here since the permutations in TNT cannot be queried by the adversary, and Mennink’s distinguisher [59] does not work directly either due to the \(2^{3n/2}\) offline computational complexity besides the \(2^{3n/4}\) online query complexity. Then, the same applies to the instantiation TNT-AES. It will be interesting to see the closure of this gap, by either improving the proven security bound or finding a better attack. We leave this as an open problem to the community.