1 Introduction

Trapdoor functions are an important tool in public key cryptography due to the computational asymmetry they bring about. On the one hand, the function is a proper cryptographic one-way function to anyone who is ignorant of the secret trapdoor information; but on the other hand, anyone who does know this trapdoor information can use it to find inverse images quickly.

The case of surjective trapdoor functions is especially interesting for generating digital signature schemes. A cryptographic hash function maps a message of any size to a random point in the trapdoor function’s output space. An inverse of this point under the trapdoor function, or signature, testifies to the involvement of the trapdoor information, or secret key, in its generation. This testimony ensures the target property of non-repudiation of origin: the secret key holder cannot deny generating the signature at a later date.

Since their inception in the seminal paper by Diffie and Hellman [10], various digital signature schemes have been deployed whose security is based on the hardness of integer factorization [35] and the discrete logarithm problem [30, 36]. However, the advent of quantum computers threatens the security of these signature schemes because both hard problems are solved efficiently by Shor’s quantum algorithm [37]. This ultimatum drives the need to design, develop and deploy so-called post-quantum cryptosystems, i.e., cryptography that can be run on classical hardware but promises to resist attacks by quantum computers.

Even though the RSA trapdoor is broken by quantum computers, the hash-and-sign construction that RSA signatures are based on seems to survive the transition to post-quantum cryptography. To achieve post-quantum secure signature schemes it suffices to exchange the underlying trapdoor for one that has the desired security against quantum adversaries. There is no shortage of trapdoor-based signature schemes based on the MQ problem [11, 21, 34], coding theory [8, 9], or lattices [3, 15, 27].

Unfortunately, the public keys in these schemes are prohibitively large, measurable in hundreds of kilobytes if not megabytes. In contrast, post-quantum signature schemes derived from zero-knowledge proofs require only a one-way function whose selection can be random or might as well be determined by a short seed and an implicit pseudorandom generator. Signature schemes based on zero-knowledge proofs tend to exchange tiny public keys for prohibitively large signatures [7, 18, 23, 38], and moreover require complicated and expansive non-interactivity transforms to retain security against quantum attackers [40]. Although provable security in the case of hash-based signature schemes is much more straightforward, this family of constructions follows the same pattern: tiny public keys but huge signatures [4, 5].

Szepieniec, Beullens and Preneel offer an alternative to the dilemma between large public keys or large signatures [39], motivated by the desire to minimize the combined size of public key and signature. This minimization is particularly important in the context of public key infrastructure (PKI) where a chain of signatures and public keys is transmitted in order to authenticate a message with respect to a pre-shared root public key. The construction of Szepieniec et al. applies specifically to MQ trapdoors and relies on the observation that verifying a couple of random linear combinations of the public key’s polynomial equations can be as good as verifying all of them. The coefficients of this linear combination are determined as a function of the produced signature, and the combination itself is transmitted along with this signature in addition to information authenticating its link to the public key. This transformation reduces the size of public key plus that of the signature by roughly a factor three whilst provably retaining security in the random oracle model; and by a much larger factor at the expense of a heuristic security argument.

This article expands on the paper of Szepieniec et al. in several ways. We observe that this transformation also applies to other post-quantum trapdoor signature schemes, most notably code-based and lattice-based trapdoors. From a general perspective, these three hard problems are variations on a common theme, which we call constrained linear signature schemes. This commonality allows a generic presentation of the transformation. The security proofs of Szepieniec et al. only work in the classical random oracle model. However, security proofs that purport to defend against quantum adversaries should additionally hold in the quantum random oracle model, which our proof does. Moreover, we identify a necessary and sufficient security property, called \((\sigma ,r)\)-hash-and-sign-security ((\(\sigma , r)\)-HSS), that a constrained linear signature scheme must have in order for the more aggressive parameter choices of Szepieniec et al. to be provably secure. This leads to an improved understanding of the security of instantiations of this construction, which includes the DualModeMS submission of Faugère et al. [12] to the NIST PQC standardization project [29]. To showcase the key size improvements that can be achieved with the transformation, we apply the transformation to a lattice-based, code-based and multivariate constrained linear signature scheme with parameters targeting 128 bits of security against quantum computers.

2 Preliminaries

Random Oracle Model. We use a hash function in our construction. For the purpose of proving security we model it by a random oracle, which is a random function \(\mathsf {H} : \{0,1\}^* \rightarrow \{0,1\}^\kappa \) with a fixed output length, typically equal to the security parameter. If necessary, the random oracle’s output space can be lifted to any finite set X. We use subscripts to differentiate the random oracles associated with different output spaces. A security proof relying on the modelling of hash function as random oracles is said to hold in the random oracle model. When quantum adversaries are considered, the security proofs should allow for superposition queries to the random oracle [6]; a security proof with this property is said to hold in the quantum random oracle model.

Trapdoor Functions. A trapdoor function is a function that can be efficiently computed in one direction, but for which it is hard to compute preimages unless by someone who knows a secret piece of information called the trapdoor. We associate three algorithms to a trapdoor function family:

  • \(\mathsf {GenTrapdoor}\) takes a security parameter as input and outputs a trapdoor function f and a trapdoor t.

  • \(\mathsf {Evaluate}\) takes a description of the trapdoor function f and an argument x as input, and returns the evaluation of f at x. In the rest of the paper, we simply write this as f(x).

  • \(\mathsf {Invert}\) takes the function f, the trapdoor t and an image y as input, and outputs a value x such that \(f(x) = y\).

Signature Scheme. A public key signature scheme is defined as a triple of polynomial-time algorithms \((\mathsf {KeyGen}, \mathsf {Sign}, \mathsf {Verify})\). The probabilistic key generation algorithm takes the security level \(\kappa \) (in unary notation) and produces a secret and public key: \(\mathsf {KeyGen}(1^\kappa ) = ( sk , pk )\); the signature generation algorithm produces a signature: \(s = \mathsf {Sign}(\mathsf {sk}, m) \in \{0,1\}^*\). The verification algorithm takes the public key, the message and the signature and decides if the signature is valid: \(\mathsf {Verify}( pk , m, s) \in \{0, 1\}\); we refer to these outputs as “reject” and “accept”, respectively. The signature scheme is correct if signing a message with the secret key produces a valid signature under the matching public key:

$$ (\mathsf {KeyGen}(1^\kappa ) \Rightarrow ({ sk}, { pk})) \quad \implies \quad \forall m \in \{0,1\}^* \, . \, \mathsf {Verify}\left( pk , m, \mathsf {Sign}( sk , m)\right) = 1 .$$

Here and elsewhere we use \(\Rightarrow \) to denote the event of the probabilistic algorithm on the left hand producing the output on the right hand, and \(\implies \) to denote logical implication.

Security is defined with respect to the Existential Unforgeability under Chosen Message Attack (EUF-CMA) game of Goldwasser et al. [17]. The adversary \(\mathsf A\) is allowed to make a polynomial number of queries \(m_i, i \in \{1,\ldots ,q\}, q\le \kappa ^c\) for some c, which the challenger signs using the secret key and sends back: \(s_i \leftarrow \mathrm {Sign}(\mathsf {sk}, m_i)\). At the end of the game, the adversary must produce a pair of values \((m', s')\) where \(m'\) was not queried before: \(m' \not \in \{m_i\}_{i=1}^q\). The adversary wins if \(\mathsf {Verify}( pk , m', s') = 1\). In the game below, the Iverson brackets \([\![\cdot ]\!]\) return 0 if the expression is \(\mathsf {False}\) or 1 if it is \(\mathsf {True}\).

figure a

We define the insecurity function \(\mathsf {InSec}_\mathsf{scheme}^\mathrm{EUF -\mathrm CMA}(Q_\mathsf {S};t)\) as the maximum winning probability across all quantum adversaries that run in time t and that make at most \(Q_\mathsf {S}\) signature queries.

Hash-and-Sign Signature Schemes. Given a trapdoor function family and a hash function \(\mathsf {H}\) that hashes arbitrary messages to elements in the range of the trapdoor functions we can use the hash-and-sign construction to build a (not necessarily secure) signature scheme. The key generation algorithm simply calls the \(\mathsf {GenTrapdoor}\) function to get (ft). The public key is then the description of f, and the trapdoor t is the private key. To sign a message m, the signer uses his trapdoor t to produce a preimage s for \(\mathsf {H}(m)\). This preimage is the signature for m. Lastly, to verify the validity of a signature the verifier computes \(\mathsf {H}(m)\), uses the public key to evaluate f at s and checks if \(f(s) = \mathsf {H}(m)\).

Merkle Tree. A Merkle tree [26] is a balanced binary tree whose root authenticates a list of data items which are contained in the leaves. Every non-leaf node, including the root, has a value equal to the hash of the concatenation of its two children. A leaf can be proven to be a member of the tree by tracing a path from the leaf to the root and listing all siblings of nodes on that path: every step can be verified by computing one hash. We associate three algorithms with a Merkle tree:

  • \(\mathsf {CalculateMerkleRoot}\) takes a list of leaf items, computes the entire Merkle tree, and returns its root.

  • \(\mathsf {OpenMerklePath}\) takes a list of leaf nodes and an index, and outputs its authentication path: the list of all siblings of nodes on the path from the indicated leaf node to the root.

  • \(\mathsf {VerifyMerklePath}\) takes an index, a leaf node, a Merkle path, and a root, and decides whether the leaf node is a member of the tree with the given root.

3 Trapdoor-Based Signature Schemes

3.1 MQ Trapdoors

Multivariate quadratic (MQ) trapdoor functions date back to the C\(^*\) scheme of Matsumoto and Imai [25], which has since given rise to a number of viable candidates including HFE\(_v^-\) [32], UOV [21] and Rainbow [11]. The idea is to compose a special quadratic map \(\mathcal {F} : \mathbb {F}_q^n \rightarrow \mathbb {F}_q^m\) with two linear transforms, \(T \in \mathsf {GL}_m(\mathbb {F}_q)\) and \(S \in \mathsf {GL}_n(\mathbb {F}_q)\) to obtain the public key \(\mathcal {P} = T \circ \mathcal {F} \circ S\). A vector \(\mathbf {s} \in \mathbb {F}_q^n\) that represents an assignment to the variables, is a valid signature for the document \(d \in \{0,1\}^*\) whenever

$$\begin{aligned} \mathcal {P}(\mathbf {s}) = \mathsf {H}(d). \end{aligned}$$
(1)

In order to find \(\mathbf {s}\), the signer computes \(\mathbf {z} = \mathsf {H}(d)\), \(\mathbf {y} = T^{-1}\mathbf {z}\), uses the special structure of \(\mathcal {F}\) to sample an inverse \(\mathbf {x}\) such that \(\mathcal {F}(\mathbf {x}) = \mathbf {y}\), and then computes \(\mathbf {s} = S^{-1}\mathbf {x}\).

We focus on the Rainbow submission to the NIST PQC project [29], where the parameter set \((q = 256, v=68, o_1=36, o_2=36)\) is proposed. In this case, \(n=v + o_1 + o_2 = 140\) and \(m = o_1 + o_2 = 72\). While the proposal does not employ Petzoldt’s compression trick [33] we note that it is possible in principle, in which case \(v(v+1)/2 + vo_1\) columns of the public Macaulay matrix are set as the output of a PRG expanding a seed of 32 bytes.Footnote 1 Allocating five bits per field element, we obtain signatures of 140 bytes and public keys of 356.9 kB. Without Petzoldt’s compression trick the public key is 694.0 kB.

3.2 Code-Based Trapdoors

The first code-based signature scheme was proposed by Courtois, Finiasz and Sendrier (CFS) [8]; it relies on the difficulty of finding a low Hamming weight word associated with a given syndrome. The public key in such a signature scheme is a parity check matrix \(H \in \mathbb {F}_2^{(n-k) \times n}\). A signature \((\mathbf {s}, i) \in \mathbb {F}_2^{1 \times n} \times \mathbb {Z}\) on a document \(d \in \{0,1\}^*\) consists of an error vector and an index; it is valid when the error vector has Hamming weight at most t and syndrome equal to the hash of the document concatenated with the index i. The index i can be thought of as selecting a different hash function. Formulaically:

$$\begin{aligned} H \mathbf {s}^\mathsf {T} = \mathsf {H}(d \Vert i) \quad and \quad \mathsf{HW}(\mathbf {s}) \le t. \end{aligned}$$
(2)

By our calculations, a 128-bit post-quantum security level is achieved with the parameter set \(m=26\), \(t=15\) and thus \(n=2^m=2^{26}\) and \(n-k = tm = 390\). At this point the public key is 3.05 GB but the signatures are 390 bits. We refer to Appendix A for a derivation of these parameters. We choose not to consider the question whether the cryptosystem is practically usable with these parameters and instead focus on the obtained compression factor. The CFS scheme is used as a generic stand-in for code-based signature schemes using the hash-and-sign paradigm and relying on the hardness of syndrome decoding.

3.3 Lattice-Based Trapdoors

A first trapdoor-based signature schemes from lattices was proposed by Goldreich, Goldwasser and Halevi (GGH) at Crypto’97 [16]. The signatures of this scheme leak information about the private key, and the scheme was broken by Nguyen and Regev [31]. Gentry et al. [15] showed how to sample signatures that do not leak information and constructed a provably secure signature scheme. Later improvements by Alwen and Peikert [3] and by Micciancio and Peikert [27] make the scheme more efficient. The main idea is the same in all schemes: the public key is a matrix \(A \in \mathbb {F}_q^{n \times m}\) with large coefficients but such that there exists another matrix \(S \in \mathbb {Z}^{m \times m}\) with small coefficients with \(AS = 0 \, \mathsf {mod} \, q\). In order to generate a signature for a document \(d \in \{0,1\}^*\), the signer uses the secret key S to obtain a small-coefficient vector \(\mathbf {z} \in \mathbb {Z}^m\). It is a valid signature whenever

$$\begin{aligned} A \mathbf {z} = \mathsf {H}(d) \, \mathsf {mod} \, q \quad and \quad \Vert \mathbf {z} \Vert _2 \le \beta , \end{aligned}$$
(3)

for some length bound \(\beta \in \mathbb {R}_{>0}\).

Using the methodology of [28], and the estimator for the concrete hardness of the SIS problem of Albrecht et al. [1], we choose parameters for the scheme of [27] that achieves 128 bits of security. This results in the parameters \(n = 321, q=2^{26}-5 , m=16692\) and \(\beta = 112296\), a public key of \(n \times m \times 26\) bits \( = 16.6 \) MB, and signatures of \(\lceil \log _2(\beta ) \rceil \times m\) bits \(= 34.6\) KB. We chose q to be prime as this is required for our security proof to work. The first half of the matrix A can be chosen randomly, so we can fix this part with a PRG to cut the size of the public key in half.

3.4 A Unifying View

The above three signature schemes can be thought of as variations on a common theme. These schemes are all hash-and-sign signature schemes with a linear trapdoor function \(f: \mathbb {F}_q^{\ell } \rightarrow \mathbb {F}_q^{k}\), but with f restricted to a domain defined by a nonlinear constraint function \(\mathsf {nc} : \mathbb {F}_q^{\ell } \rightarrow \{\mathsf {True}, \mathsf {False}\}\). We call these trapdoor functions constrained linear trapdoor functions, and if they are used in a hash-and-sign construction, we call the resulting signature scheme a constrained linear signature scheme.

For all the constrained linear signature schemes the public key is a matrix \(M \in \mathbb {F}_q^{k \times \ell }\) with \(k < \ell \) which represents the trapdoor function f and a signature is represented by a vector \(\mathbf {s} \in \mathbb {F}_q^\ell \). A signature is valid if \(M\mathbf {s}\) is equal to a target \(\mathbf {t} \in \mathbb {F}_q^k\), which is the evaluation of a hash function at a document, and if the vector \(\mathbf {s}\) also satisfies the constraint \(\mathsf {nc}\). Symbolically:

$$ \mathsf {Verify}({ sk}, m, \mathbf {s}) = 1 \quad \Longleftrightarrow \quad M\mathbf {s} = \mathbf {t} = \mathsf {H}(m) \, \wedge \, \mathsf {nc}(\mathbf {s}) = \mathsf {True}. $$

In the case of lattice-based trapdoors, the signature is valid only if \(\mathbf {s}\) is a short vector. In the case of code-based trapdoors, it is valid only if the Hamming weight of \(\mathbf {s}\) is low. And in the case of MQ trapdoors, the matrix M is the coefficient matrix (or Macaulay matrix) of the quadratic polynomial map \(\mathcal {P}\) and the signature \(\mathbf {s}\) must be factorizable as a vector of products of n variables: \(\mathbf {s}^\mathsf {T} = (x_1^2, x_1x_2, \ldots , x_n^2)\). Formally, we capture this difference between MQ, code-based, and lattice-based trapdoors with the nonlinear constraint \(\mathsf {nc}\), namely by defining for

  • code-based trapdoors: \(\mathsf {nc}(\mathbf {s}) = \mathsf {True} \, \Leftrightarrow \, \text {HW}(\mathbf {s}) \le t\);

  • lattice-based trapdoors: \(\mathsf {nc}(\mathbf {s}) = \mathsf {True} \, \Leftrightarrow \, \Vert \mathbf {s} \Vert _2 \le \beta \);

  • MQ trapdoors: \(\mathsf {nc}(\mathbf {s}) = \mathsf {True} \, \Leftrightarrow \, \exists \, x_1, \ldots , x_n \in \mathbb {F}_q \, . \, \mathbf {s}^\mathsf {T} = (x_1^2, x_1x_2, \ldots , x_n^2)\).

3.5 Additional Security Properties

We say that a surjective trapdoor function f is one-way (OW) if it is hard to find a preimage for a randomly chosen output, and we say that f is hash-and-sign secure (HSS) if using the trapdoor function f in the hash-and-sign construction leads to a signature scheme that is EUF-CMA secure. If f is a constrained linear trapdoor function we can define stronger versions of the OW and HSS security properties that will be useful for the security analysis of the transformation (Fig. 1).

\(\varvec{(\sigma ,r)}\)-One-Wayness. For any two non-negative integers \(\sigma > r\) we define \((\sigma ,r)\)-one-wayness and \((\sigma ,r)\)-hash-and-sign security. To break \((\sigma ,r)\)-one-wayness, an adversary has to find \(\sigma \) preimages \(\mathbf {x}_1, \ldots , \mathbf {x}_\sigma \in \mathbb {F}_q^\ell \) for \(\sigma \) vectors \(\mathbf {y}_1, \ldots , \mathbf {y}_\sigma \in \mathbb {F}_q^k\). However, the adversary is allowed to make mistakes in each of the \(\sigma \) preimages it produces, as long as the errors \(f(\mathbf {x}_i)-\mathbf {y_i}\) are contained in a vector space of dimension r. The (1, 0)-one-wayness property is identical to the one-wayness property, because the adversary only needs to find a preimage for one target and it is not allowed to make any mistakes.

The \((\sigma ,r)\)-OW property is a generalization of the AMQ problem introduced in [39]; an MQ trapdoor \(\mathcal {P}\) is \((\sigma ,r)\)-one-way precisely if the Approximate MQ problem with \(\sigma \) targets and rank r is hard for the map \(\mathcal {P}\).

\(\varvec{(\sigma ,r)}\)-Hash-and-Sign Security. We also define a \((\sigma ,r)\)-variant of the HSS property. The security game behind this property is similar to the EUF-CMA game of the hash-and-sign signature scheme induced by f. To break this property, an adversary has to come up with a message m and \(\sigma \) ‘signatures’ \(\mathbf {s}_1 , \cdots , \mathbf {s}_\sigma \) such that the errors \(f(\mathbf {s}_i) - \mathsf {H}(m||i)\) are contained in a a subspace of dimension r. The adversary can query a signing oracle \(\mathsf {S}\) any (polynomially bounded) number of times. When given a message \(m'\), this signing oracle uses the trapdoor to produce preimages for \(\mathsf {H}(m'||1),\cdots ,\mathsf {H}(m'||\sigma )\) and returns these \(\sigma \) preimages. The adversary loses the game if it returns a message m for which it has queried the signing oracle, as is the case for the familiar EUF-CMA game.

We define the insecurity function \(\mathsf {InSec}_f^{(\sigma ,r)-\mathsf {HSS}}(Q_\mathsf {S},Q_\mathsf {H};t)\) as the maximal winning probability of an adversary that plays the \((\sigma ,r)\)-HSS game of f, that makes \(Q_\mathsf {S}\) queries to the signing oracle, \(Q_\mathsf {H}\) queries to the random oracle and that runs in time t. The (1, 0)-HSS property is equivalent to the HSS property.

Remark 1

If f is a collision-resistant preimage-sampleable trapdoor function (as is the case for some lattice-based trapdoor functions), the one-wayness of f can be reduced tightly to its hash-and-sign security and so OW and HSS are equivalent [15, Proposition 6.1]. Under the same assumption on f, the security proof of [15] can be modified to prove that \((\sigma ,r)\)-OW and \((\sigma ,r)\)-HSS are equivalent for all \(\sigma >r \ge 0\) (Fig. 2).

4 Construction

4.1 Description

This section describes the transform of Szepieniec et al. but adapted to apply generically to constrained linear signature schemes. The parameters for the transformation are:

  • \((\mathsf {KeyGen}, \mathsf {Sign}, \mathsf {Verify})\), the constrained linear signature scheme to start from. We denote the hash function used in the verification algorithm by \(\mathsf {H}_1\) and the nonlinear constraint by \(\mathsf {nc}\).

  • \(\tau \), the number of leaves in the Merkle tree.

  • e, the extension degree of \(\mathbb {F}_{q^e}\), which is the field over which the error-correcting code is defined. This value is constrained by \(q^e \ge \tau \).

  • \(\vartheta \), the number of Merkle paths that are opened with each new signature.

  • \(\sigma \), the number of signatures of the original signature scheme that is included in each signature of the new scheme.

  • \(\mathsf {H}_2\), a hash function that outputs a \(\alpha \)-by-k matrix over \(\mathbb {F}_q\).

  • \(\mathsf {H}_3\), a hash function that outputs a set of \(\vartheta \) numbers between 1 and \(\tau \).

  • \(\mathsf {H}_4\), a hash function used for building a Merkle tree.

The transformation outputs a new signature scheme (\(\mathsf {NEW.KeyGen}\), \(\mathsf {NEW.Sign}\), \(\mathsf {NEW.Verify}\)) with a smaller public key but larger signatures.

Fig. 1.
figure 1

The security game of the \((\sigma ,r)-\mathsf {OW}\) property (left) and of the \((\sigma ,r)-\mathsf {HSS}\) property (right).

Fig. 2.
figure 2

Security properties of constrained linear trapdoor functions, and implications between them.

Random Linear Combinations. A signature of the new signature scheme consists of \(\sigma \) signatures of the original signature scheme, along with some information to verify them. The ith signature is obtained by using the signature generation algorithm of the original contrained-linear signature scheme to sign \(d \Vert i\). It is not necessary to communicate the entire public key \(M \in \mathbb {F}_q^{k \times \ell }\). Rather, it suffices to transmit a few random linear combinations of its rows. Therefore, part of the new signature consists of a matrix T that is equal to RM, where R is drawn uniformly at random from the space of \(\alpha \times k\) matrices. Instead of checking whether \(M\mathbf {s}_i = \mathsf {H}_1(d \Vert i)\), the verifier can now check wheter \(T \mathbf {s}_i = R \mathsf {H}_1(d \Vert i)\). Obviously, if all signatures are valid, then the latter equations will also be satisfied for any matrix R. Conversely, if at least one signature is invalid, i.e., \(M \mathbf {s}_i \not = \mathsf {H}_1(d \Vert i)\) for some i, then the probability that \(RM \mathbf {s} = R\mathsf {H}_1(d \Vert i)\) is at most \(q^{-\alpha }\). By choosing \(\alpha \) large enough, the probability of accepting an invalid signature can be made arbitrarily small.

Determining R. In order for the above argument to work, R must be chosen independently from \(\mathbf {s} = \mathbf {s}_1 \Vert \cdots \Vert \mathbf {s}_\sigma \). Therefore, we determine R with a hash function as \(R = \mathsf {H}_2(d \Vert \mathbf {s}_1 \Vert \cdots \Vert \mathbf {s}_\sigma )\) to ensure that a forger cannot use knowledge about R in his choice of the \(\mathbf {s}_i\).

Verifying T. An attacker can present the verifier with a signature containing a matrix T which is totally unrelated to the matrix M. How can the verifier be sure that the matrix T that is included in the signature, is really equal to RM with \(R = \mathsf {H}_2(d \Vert \mathbf {s}_1 \Vert \cdots \Vert \mathbf {s}_\sigma )\)? We solve this problem with a probabilistic test based on an \(\mathbb {F}_{q}\)-linear error correcting code. This is a code whose alphabet consists of the elements of a finite field \(\mathbb {F}_{q}\), with the property that any \(\mathbb {F}_q\)-linear combination of codewords is again a codeword. We work with Reed-Solomon CodesFootnote 2 over \(\mathbb {F}_{q^e}\) with message length \(L = \lceil \ell /e \rceil \) (we pack e elements of \(\mathbb {F}_q\) into each symbol), codeword length \(\tau \) and minimal codeword distance \(D = \tau -L\). We use \(\mathsf {Enc} : \mathbb {F}_{q^e}^{a \times L} \rightarrow \mathbb {F}_{q^e}^{a \times \tau }\) to denote the operation of encoding the rows of a matrix.

In the key generation phase, we compute \(E = \mathsf {Enc}(M)\). Then we commit to this matrix E by building a Merkle tree whose leaves contain the columns of E, which are denoted by \(e_i\) for \(i \in \{1, \ldots , \tau \}\). The new public key is the root of this tree. If \(T = RM\), then by \(\mathbb {F}_q\)-linearity of the error correcting code, we have that \(\mathsf {Enc}(T)\) is equal to \(R\mathsf {Enc}(M) = RE\). Conversely, if \(T \not = RM\), then \(\mathsf {Enc}(T)\) and RE differ in at least one row. These rows are different codewords, so they differ in at least D of the \(\tau \) symbols. To verify that \(T = RM\), we now select \(\vartheta \) columns \(e_{b_1},\cdots ,e_{b_\vartheta }\) of E with the hash function \(\mathsf {H}_3\) and we check whether the \(b_i\)-th column of T agrees with \(R e_{b_i}\) for all i in \({1,\cdots ,\vartheta }\). If T is not equal to RM, this will go undetected with a probability of at most \((\frac{L}{\tau })^\vartheta \).

Pseudocode. Algorithms 1, 2 and 3 present pseudocode for the new signature scheme (\(\mathsf {NEW.KeyGen}\), \(\mathsf {NEW.Sign}\), \(\mathsf {NEW.Verify}\)) obtained from transforming the old constrained-linear signature scheme (\(\mathsf {KeyGen}\), \(\mathsf {Sign}\), \(\mathsf {Verify}\)).

figure b

Key and Signature Sizes. For a post-quantum security level of \(\kappa \) bits, the new public key is \(2\kappa \) bits in size, as it represents the Merkle root. The new signature consists of \(\sigma \) old signatures, \(\alpha \) linear combinations of the rows of M (each one of which consists of \(\ell \) field elements of size \(\lceil \mathsf {log}_2 \, q\rceil \) bits), \(\vartheta \) columns of \(\mathsf {Enc}(M)\) (each one of which consists of k field elements of \(e \times \lceil \mathsf {log}_2 \, q\rceil \) bits), and \(\vartheta \) Merkle paths of consisting of \(\mathsf {log}_2 \, \tau \) hash images of \(2\kappa \) bits each. Put all together, we have

$$\begin{aligned} |\mathsf {NEW}.{ signature}| = \sigma |\mathsf {OLD}.{ signature}| + (\alpha \ell + \vartheta k e) \times \lceil \mathsf {log}_2 \, q \rceil + 2\vartheta \kappa \times \mathsf {log}_2 \, \tau . \end{aligned}$$
(4)

The old signatures can be represented as \(\ell \) field elements but in some cases a more concise encoding is possible. For instance, CFS signatures require only the positions of the 1-bits, and MQ signatures require only an assignment to the variables from which the vector of quadratic monomials can be derived.

4.2 Security

Before we present the security statement and its proof, we need to introduce a pair of security games that will be important for our security analysis. In particular, we need hash functions that are one-way and second-preimage resistant, in both cases with respect to multiple targets. Both games are formalized with respect to a hash function \(\mathsf {H}\) that is randomly selected from a hash function family \(\mathcal {H}\). We follow the formalisms of Hülsing et al. [20].

figure c
  • In the single-function, multiple-target one-wayness (SM-OW) game, the adversary is given a list of target outputs and it wins if it can produce a single input that maps to any one of the outputs. We write \(\mathsf {InSec}^\mathrm{SM -\mathrm OW}_{\mathsf {H},P}(Q)\) to denote the maximum success probability across all adversaries that make at most Q queries and with respect to the hash function family \(\mathcal {H}\) and where P is the number of target outputs.

  • In the single-function, multiple-target second-preimage resistance (SM-SPR) game, the adversary is given a list of inputs and it wins if it can produce a second preimage that maps to the same output as any one of the input preimages. We write \(\mathsf {InSec}_{\mathsf {H},P}^{\text {SM-SPR}}(Q)\) to denote the maximum success probability across all adversaries that make at most Q queries and with respect to the hash function family \(\mathcal {H}\) and where P is the number of input preimages.

figure d

Hülsing et al. obtain values for these insecurity functions in the random oracle model, i.e. where \(\mathsf {H}\) is drawn uniformly at random from the set of all functions from the given input space to the given output space. In the classical random oracle model we have

$$\begin{aligned} \mathsf {InSec}_{\mathsf {H},P}^{ SM-OW }(Q) = \mathsf {InSec}_{\mathsf {H},P}^{ SM-SPR }(Q) = \frac{(Q+1)P}{|\mathsf {range}(\mathsf {H})|}. \end{aligned}$$
(5)

In the quantum random oracle model, where the adversary is allowed \(\hat{Q}\) quantum queries, we have

$$\begin{aligned} \mathsf {InSec}_{\mathsf {H},P}^{ SM-OW }(\hat{Q}) = \mathsf {InSec}_{\mathsf {H},P}^{ SM-SPR }(\hat{Q}) = \varTheta \left( \frac{(\hat{Q}+1)^2P}{|\mathsf {range}(\mathsf {H})|}\right) . \end{aligned}$$
(6)
figure e

The SM-OW game does not quite capture one of the transitions in our security proof. The reason for this is that the adversary cannot be given a definite list of target output images because whether an output of the hash function is suitable for the adversary depends on the input of the hash function. We model this task by a new game, marked element search (MES), in which the adversary does not have a list of target outputs but a marking function \(\mathsf {mark} : \mathsf{domain}(\mathsf {H}) \times \mathsf{range}(\mathsf {H}) \rightarrow \{0,1\}\) that determines whether the pair \(({ input}, { output})\) is suitable. We write \(\mathsf {InSec}_{\mathsf {H},\mathsf {mark}}^{ MES }(Q)\) to denote the maximum success probability across all adversaries that make at most Q queries to the hash oracle in the MES game. In the quantum random oracle model this notion is reducible to SM-OW.

figure f

Proposition 1

(\(\mathsf {SM - OW} \le \mathsf {MES}\)). In the (quantum) random oracle model, we have that for any marking function \(\mathsf {mark}\) with \(P = \mathsf {max}_X \left| \{Y \, | \, \mathsf {mark}(X,Y) = 1\} \right| \),

$$\begin{aligned} \mathsf {InSec}_{\mathsf {H},\mathsf {mark}}^{ MES }(Q) \le \mathsf {InSec}_{\mathsf {H},P}^{ SM - OW }(Q). \end{aligned}$$
(7)

Proof

We show an algorithm, \(\mathsf {B}_\mathsf {SM - OW}\) in the SM-OW game, that simulates a given algorithm \(\mathsf {A}_{\mathsf {MES}}\) for the MES game with marking function \(\mathsf {mark}\), and wins with at least the same probability. The input of \(\mathsf {B}_\mathsf {SM - OW}\) is a list of P images \(\{Y_1, \ldots , Y_P\}\) and access to a random oracle \(\mathsf {H}\). The algorithm \(\mathsf {B}_\mathsf {SM - OW}\) programs a random oracle \(\mathsf {H}'\) that on input X returns \(\sigma _X^{-1}(\mathsf {H}(X))\), where \(\sigma _X\) is a permutation (chosen deterministically) with the property that the elements Y that satisfy \(\mathsf {mark}(X,Y)=1\) are mapped into the set \(\{Y_1, \ldots , Y_P\}\). By assumption, \(\left| \{Y \, | \, \mathsf {mark}(X,Y) = 1\} \right| \le P\), so such a permutation always exists. Note that \(\mathsf {B}_\mathsf {SM - OW}\) is bounded in the number of queries it can make to \(\mathsf {H}\), but not bounded in time or memory. Therefore it will be able to choose such a permutation \(\sigma _X\). Then, \(\mathsf {B}_\mathsf {SM - OW}\) invokes \(\mathsf {A}_\mathsf{MES}\) with the programmed random oracle \(\mathsf {H}'\). Since \(\mathsf {H}'\) only applies a permutation to the ouput of \(\mathsf {H}\), the ouputs of \(\mathsf {H}'\) will be independent and uniformly distributed. Hence, \(\mathsf {H}'\) is itself a perfect random oracle. Pseudocode for \(\mathsf {B}_\mathsf {SM - OW}\) is given below.

figure g

Clearly, the number of queries that \(\mathsf {B}_\mathsf {SM - OW}\) makes to \(\mathsf {H}\) is identical to the number of queries made by the simulated algorithm \(\mathsf {A}_{\mathsf {MES}}\). Eventually, \(\mathsf {A}_{\mathsf {MES}}\) returns a preimage X. \(\mathsf {A}_{\mathsf {MES}}\) wins the MES game if \(\mathsf {mark}(X,\sigma _X^{-1}(\mathsf {H}(X))) = \mathsf {True}\). By our choice of \(\sigma _X\) this implies that \(\sigma _X(\sigma _X^{-1}(\mathsf {H}(X))) = \mathsf {H}(X) \in \{Y_1,\cdots ,Y_P\}\), which shows that \(\mathsf {B}_\mathsf {SM - OW}\) wins his SM-OW game in this case. So \( \mathsf {InSec}_{\mathsf {H},\mathsf {mark}}^\mathsf {MES}(Q) \le \mathsf {InSec}_{\mathsf {H},P}^\mathsf {SM - OW}(Q) .\)    \(\square \)

We are now in a position to state and prove our security claim.

Theorem 1

Let NEW be the signature scheme derived from applying the transformation to a constrained linear scheme OLD. The maximum winning probability across all time-t adversaries in the EUF-CMA game against NEW that make \(Q_s\) signature queries and \(Q_1, Q_2, Q_3, Q_4\) queries to the random oracles \(\mathsf {H}_1, \mathsf {H}_2, \mathsf {H}_3, \mathsf {H}_4\) respectively is bounded by

(8)

Proof

We show through a sequence of four games how an adversary for the EUF-CMA game against NEW can be transformed into an adversary for the \((\sigma ,r)\)-HSS property of the underlying constrained linear trapdoor function f that wins with the same probability conditional on each of the transitions being successful. By bounding the failure probability of each transition and summing the terms we obtain a bound on the winning probability of the adversary against NEW. The sequence of games is as follows:

  • The first game \(\mathsf {G}_1\) is the EUF-CMA game against NEW.

  • The second game \(\mathsf {G}_2\) drops the Merkle tree. Instead, the public key consists of all the \(\tau \) columns of E, and the verifier checks directly if the columns that are included in the signature are correct.

  • The game \(\mathsf {G}_3\) drops the codeword identity testing. Instead, the public key is now the original public key (i.e., M), and the verifier tests directly if the matrix T, which is included in the signature is equal to RM.

  • The last game \(\mathsf {G}_4\) drops the random linear combinations for signature validity testing, instead \(\mathsf {G}_4\) is won if the errors \(f(\mathbf {s_i})-\mathsf {H}_1(m||i)\) are contained in a subspace of dimension r. \(\mathsf {G}_4\) is thus the \((\sigma ,r)\)-HSS game for the constrained linear trapdoor function f.

In games \(\mathsf {G}_2\), \(\mathsf {G}_3\) and \(\mathsf {G}_4\), the adversary \(\mathsf {B}\) simulates the previous game’s adversary \(\mathsf {A}\) in order to win his own game. In particular, this means that \(\mathsf B\) must answer the signing queries that \(\mathsf A\) makes. This is not a problem, because in all cases \(\mathsf {B}\) can just forward the queries that \(\mathsf {A}\) makes to its own signing oracle, remove some information that is not required for the game that \(\mathsf {A}\) is playing from the signature and pass the response back to \(\mathsf {A}\). In each case, we define the transition’s failure probability as the probability that \(\mathsf A\) wins but \(\mathsf B\) does not. In all cases the adversary \(\mathsf A\) has unbridled access (perhaps even quantum access) to the hash functions \(\mathsf {H}_1\), \(\mathsf {H}_2\), \(\mathsf {H}_3\) and \(\mathsf {H}_4\).

The event that \(\mathsf {A}\) wins \(\mathsf {G}_1\) but \(\mathsf B\) does not win \(\mathsf {G}_2\) occurs only if the signature outputted by \(\mathsf {A}\) passes the Merkle root test, but the columns included in this signature do not agree with the columns in \(E = \mathsf {Enc}(M)\). This event requires finding a second preimage for one of the \(2\tau - 1\) nodes of the Merkle tree, so the failure probability is bounded by

$$ \mathsf{InSec}^\mathrm{SM -\mathrm SPR}_{\mathsf {H}_4, 2\tau - 1 }\left( Q_4 \right) . $$

Likewise, the event that \(\mathsf {A}\) wins the \(\mathsf {G}_2\) game, but \(\mathsf B\) does not win the \(\mathsf {G}_3\) game occurs only if the columns \(e_{b_1},\cdots ,e_{b_\vartheta }\) of E in the signature outputted by \(\mathsf {A}\) are correct, but still T is not equal to RM. This implies that \(\mathsf {Enc}(T)\) differs from RE in at least \(\tau -L\) columns (since the rows are codewords from a code with minimal distance \(\tau -L\)), but that none of these columns were not chosen by the random oracle \(\mathsf {H}_3\). Finding \(m||\mathbf {s}_1||\cdots ||\mathbf {s}_\sigma ||T\), such that this happens is a marked element search with marking function

$$ \mathsf {mark}_1(m||\mathbf {s}_1||\cdots ||\mathbf {s}_\sigma ||T,b_1||\cdots ||b_\vartheta ) = {\left\{ \begin{array}{ll} \mathsf {False} \quad \,\text {if }T=RM \\ \mathsf {False} \quad \,R e_{b_i} \not = \mathsf {Enc}(T)_{\star ,b_i} \text { for some } i \\ \mathsf {True} \quad \text {otherwise} \\ \end{array}\right. }. $$

Since there are at most L indices for which the columns of \(\mathsf {Enc}(T)\) and \(R\mathsf {Enc}(E)\) are identical, there are at most \(\left( {\begin{array}{c}L\\ \vartheta \end{array}}\right) \le L^\vartheta \) marked elements for a given input. The failure probability is therefore bounded by

$$ \mathsf{InSec}_{\mathsf {H}_3,\mathsf {mark}_1}^\mathrm{MES}\left( Q_3 \right) \le \mathsf{InSec}_{\mathsf {H}_3,L^\vartheta }^\mathrm{SM-OW} \left( Q_3 \right) . $$

Finally, the event that \(\mathsf {A}\) wins game \(G_3\) but that \(\mathsf B\) does not win \(\mathsf {G}_4\) happens when the errors span a vector space of dimension strictly larger than r (\(\mathsf {B}\) does not win), but that all these error lie in the kernel of \(R = \mathsf {H}_2(m||\mathbf {s}_1||\cdots ||\mathbf {s}_\sigma )\) (otherwise \(\mathsf {A}\) does not win). Finding \(m||\mathbf {s}_1||\cdots ||\mathbf {s}_\sigma \) such that this happens is a marked element search for the marking function

$$ \mathsf {mark}_2(m||\mathbf {s}_1||\cdots ||\mathbf {s}_\sigma , R) = {\left\{ \begin{array}{ll} \mathsf {False} \quad \,\,\text {if } R(f(\mathbf {s}_i)-\mathsf {H}_1(m||i)) \not = 0 \text { for some } i \\ \mathsf {False} \quad \text {if } \dim (\langle f(\mathsf {s}_i)-\mathsf {H}_1(m||i) \rangle _{i=0,\cdots ,\sigma } ) > r\\ \mathsf {True} \quad \text {otherwise} \end{array}\right. }. $$

For a choice of \(m||\mathbf {s}_1||\cdots ||\mathbf {s}_\sigma \) there are only good matrices R if the space spanned by the errors \(f(\mathsf {s}_i)-\mathsf {H}_1(m||i)\) has dimension at least \(r+1\). If this is the case then the good matrices R are precisely the \(\alpha \)-by-k matrices whose kernel contains the error space. Therefore there are at most \(q^{\alpha (k-r+1)}\) good matrices for each choice of \(m||\mathbf {s}_1||\cdots ||\mathbf {s}_\sigma \). Therefore the failure probability of the last step is bounded by

$$ \quad \quad \quad \quad \quad \quad \quad \mathsf{InSec}_{\mathsf {H}_2,\mathsf {mark}_2}^\mathrm{MES}(Q_2) \le \mathsf{InSec}_{\mathsf {H}_2,q^{\alpha \times (k-r+1)}}^\mathrm{SM-OW}\left( Q_2\right) . \quad \quad \quad \quad \quad \quad \quad \square $$

Joining Theorem 1 with Eqs. (5) and (6) gives the following corollaries.

Corollary 1

In the classical random oracle model,

Corollary 2

In the quantum random oracle model,

There are two ways to use the transformation. One can choose \(\sigma = 1\) and \(\alpha \) large enough such that \(q^{\alpha /2}\) reaches the required post-quantum security level, i.e., \(q^{\alpha / 2} > 2^\kappa \). Corollary 2 with \(r=0\) then guarantees that the resulting signature scheme is EUF-CMA secure, provided that the constrained linear trapdoor function f that we started from is (1, 0)-HSS. This assumption is equivalent to the EUF-CMA security of the original signature scheme \(\mathsf {OLD}\). We also note that in this case the security proof is tight, meaning that no security is lost (in the QROM) by applying the transformation in this way.

One can also use the transformation with \(\sigma > r\), and a lower value of \(\alpha \) such that \(q^{\alpha \cdot (r+1) /2}\) reaches the required security level. This reduces the size of the public keys even further, but this comes at the cost of a stronger security assumption on the constrained linear trapdoor function f. In this case Corollary 2 says that the resulting signature scheme is EUF-CMA secure, if the underlying constrained linear trapdoor function is \((\sigma ,r)\)-HSS.

4.3 Applying the Transformation

Table 1 presents a comparison of the transformation applied to the three constrained linear trapdoor signature schemes treated in Sect. 3. For the Rainbow and Micciancio-Peikert schemes part of the public key can be generated with a PRNG to reduce the size of the public key. This trick is compatible with our construction, so we have taken this into account. In all cases, 128 bits of security against quantum computers was targeted for an apples-to-apples comparison.

Table 1. Comparison of constrained linear signature schemes before and after public key compression. Legend: NC = no compression; PS = our provably secure technique based on the assumption that the original hash-and-sign signature scheme is secure; SA = the approach relying on stronger assumptions.

The shrinkage is the most striking when \(k \gg \alpha \cdot \sigma \), because this is when the largest part of the matrix M is omitted. The mediocre shrinkage of \(|{ pk}| + |{ sig}|\) for the provably secure case \((\sigma = 1)\) suggests that for the trapdoors considered, k is already quite close to the lower bound \(k \ge \kappa / \mathsf {log}_2 \, q\) needed for \(\kappa \) bits of security. The greater compression factor attained when \(\sigma >1\) is due mostly to the representation of the old signatures in far less than \(\ell \cdot \mathsf {log}_2 \, q\) bits.

5 Conclusion

This paper generalizes the construction of Szepieniec et al. [39] to a wide class of signature schemes called constrained linear signature schemes. This construction transforms a constrained linear signature scheme into a new signature scheme with tiny public keys, at the cost of larger signatures and while reducing their combined size. We prove the EUF-CMA security of the resulting signature scheme in the quantum random oracle model, and for a more aggressive parametrization we identify the \((\sigma ,r)\)-hash-and-sign security notion as a sufficient property for security. This improves the understanding of the security of instantiations of this construction, which includes the DualModeMS submission to the NIST PQC standardization project [12, 29]. Finally, to showcase the generality and facilitate comparison, the construction is applied to an \(\mathcal {MQ}\)-based, a code-based and a lattice-based signature scheme, all targeting the same security level. In some cases the combined size of a signature and a public key can be reduced by more than a factor 300.

We close with some notes on the practicality of the transformation. From Table 1 we see that our transformation improves the practicality of state of the art multivariate and code-based signature schemes for applications such as public key infrastructure (PKI), where the metric \(|\mathsf {sig}| + |\mathsf {pk}|\) is important and the performance of signing a message is less critical (most signatures in a PKI chain are long-lived and need not be created often). Code-based signature schemes remain not very practical, despite the improvements our construction makes. For example, applying the construction to the CFS scheme results in signatures of 8.15 MB. Still, if better code based signature schemes are developed, the construction will likely to be able to improve the quantity \(|\mathsf {sig}| + |\mathsf {pk}|\). For example, even though the pqsigRM [22] proposal to the NIST PQC project does not have a completely unstructured matrix as public key, our construction can still reduce \(|\mathsf {sig}| + |\mathsf {pk}|\) by a factor 6 from 329 kB to 60 kB in this case (with \(\alpha = 4, \sigma = 64\)). Unfortunately, comments on the NIST forum indicate that the pqsigRM proposal might not be secure [2].

State of the art hash-and-sign lattice-based signature schemes are built on structured lattices to achieve smaller public keys (e.g. Falcon relies on NTRU lattices [14]). Therefore, our construction does not improve on state of the art lattice-based schemes. Rather, our construction can be seen as an alternative to using structured lattices that provably does not deteriorate the security of the original schemes. In contrast, it is possible that switching to structured lattices has a negative impact on security.