1 Introduction

Quantum key distribution (QKD) protocols allow for secure transmission of information between two entities, Alice and Bob, by distributing a symmetric secret key via a quantum channel [1, 2]. The process involves a quantum stage where quantum information is distributed and measured. This quantum stage is succeeded by post-processing. In this purely classical stage, the results of the measurements undergo a reconciliation process to rectify any discrepancies before a secret key is extracted during the privacy amplification phase. The emphasis of this work is on the phase of information reconciliation, which impacts the range and throughput of any QKD system.

Despite the considerate development of QKD technology using binary signal forms, its high-dimensional counterpart (HD-QKD)[3] has seen significantly less research effort so far. However, HD-QKD offers several benefits, including higher information efficiency and increased noise resilience [4,5,6,7]. Although the reconciliation phase for binary-based QKD has been extensively researched, little work has been done to analyze and optimize this stage for HD-QKD, apart from introducing the layered scheme in 2013 [8]. This study addresses this research gap by introducing two novel methods for information reconciliation for high-dimensional QKD and analyzing their performance.

Unlike the majority of channel coding applications, the (HD)-QKD scenario places lesser demands on latency and throughput while emphasizing significantly the minimization of information leakage. Spurred by this unique setting, the strong decoding performance of nonbinary LDPC codes [9], and their inherent compatibility with high dimensions, we investigate the construction and utilization of nonbinary LDPC codes for post-processing in HD-QKD protocols as the first method.

The second method we investigate is the Cascade protocol [10]. It is one of the earliest proposed methods for reconciling keys. While many rounds of communication required by Cascade and concerns about resulting limitations on throughput have led to a focus on syndrome-based methods [11,12,13] in the past decade, recent research has shown that sophisticated software implementations can enable Cascade to achieve high throughput even with realistic latency on the classical channel [14, 15]. Motivated by these findings, we explore the usage of Cascade in the reconciliation stage of HD-QKD and propose a modification that enables high reconciliation efficiency for the respective quantum channel.

To the best of our knowledge, the only prior work investigating information reconciliation for high-dimensional QKD is the aforementioned layered scheme, introduced in 2013. The layered scheme is based on decoding bit layers separately using \(\lceil \log _2(q)\rceil \) binary LDPC codes. It is similar in concept to the multilevel coding and multistage decoding methods used in slice reconciliation for continuous-variable (CV) QKD [16].

LDPC codes have been widely studied and optimized for the use in binary QKD systems [17,18,19,20] and can reach good efficiency with high throughput. The use of nonbinary LDPC codes has also been investigated for binary QKD [21]. Modifications of Cascade have been shown to achieve efficiency performance close to the theoretical limit [22] on binary QKD systems while simultaneously reaching high throughput [14]. Except for the layered scheme, neither LDPC codes, Cascade, nor any other error correction method has yet been optimized, modified or analyzed for the use in high-dimensional QKD.

This work has the following outline: In Sect. 2.1, the general scenario of Information Reconciliation is introduced. This is followed by an introduction of nonbinary LDPC codes and density evolution in Sect. 3.1. In Sect. 2.3, the usage of Cascade is reviewed and a novel algorithm for high-dimensional information reconciliation, high-dimensional Cascade, is introduced. The results, i.e., the performance of both nonbinary LDPC codes and high-dimensional Cascade, are shown in Sect. 3 followed by a discussion and comparison in Sect. 4. The work concludes in a short review of the achieved results in Sect. 5.

2 Background

In this section, we describe the general setting and channel model, and introduce relevant figures of merit. We then continue to describe the two proposed methods, nonbinary LDPC codes and high-dimensional Cascade, in more detail.

2.1 Information reconciliation

The goal of the information reconciliation stage in QKD is to correct any discrepancies between the keys of the two parties while minimizing the information leaked to potential eavesdroppers. Generally, Alice sends a random string \(\textbf{x}=(x_0,...,x_{n-1})\), \(x_i = {0,...,q-1}\) of n qudits of dimension q to Bob, who measures them and obtains his version of the string \(\textbf{y}=(y_0,...,y_{n-1})\), \(y_i = {0,...,q-1}\). A practical example of a qudit can be seen in Fig. 1. We assume that the quantum channel can be accurately represented by a substitute channel where \(\textbf{x}\) and \(\textbf{y}\) are correlated as a q-ary symmetric channel since errors are typically uncorrelated and symmetric. The transition probabilities of such a channel are as follows:

$$\begin{aligned} \text {P}(y_i|x_i) = {\left\{ \begin{array}{ll} 1-p &{} y_i=x_i,\\ \frac{p}{q-1} &{} \text {else}. \end{array}\right. } \end{aligned}$$
(1)
Fig. 1
figure 1

Example of a qudit using a time-bin implementation [31]. The dimension of the qudit is set by the number of bins grouped together, while the value is determined by the measured arrival time

Here, the parameter p represents the channel transition probability. We refer to the symbol error rate between \(\textbf{x}\) and \(\textbf{y}\) as the quantum bit error rate (QBER) in a slight abuse of notation but consistent with experimental works on HD-QKD. In our simulations, we assume the QBER to be an inherent channel property, making it equivalent to the channel parameter p. In addition to the qudits, Alice also sends classical messages, e.g., syndromes or parity bits, which are assumed to be error-free. From a coding perspective, this is equal to asymmetric Slepian–Wolf coding with side information at the receiver, where the syndrome \(\textbf{s}\) represents the compressed version of \(\textbf{x}\), and \(\textbf{y}\) is the side information. A more detailed explanation of this equivalence can be found in [23]. For an interpretation of Cascade in the context of linear block codes see [22]. Any information leaked to a potential eavesdropper at any point during the quantum key distribution must be subtracted from the final secret key during privacy amplification [24]. The information leaked during the information reconciliation stage will be denoted by leak\({}_{\text {IR}}\). In the case of LDPC codes, assuming no rate adaptation, it can be upper-bounded by the syndrome length in bits, \(\text {leak}_{\text {IR}} \le m\), with m being the length of a binary representation of the syndrome string. In the case of Cascade, it can be upper-bounded by the number of parity bits sent from Alice to Bob [25], although attention has to be paid to special cases in relation to the parameter estimation phase of QKD post-processing [26]. Using the Slepian–Wolf bound [27], the minimum amount of leaked information required to successfully reconcile with an arbitrarily low failure probability in the asymptotic limit of infinite length is given by the conditional entropy:

$$\begin{aligned} \text {leak}_{\text {IR}} \ge n\text {H}(X|Y). \end{aligned}$$
(2)

The conditional entropy (base q) of the q-ary symmetric channel, assuming independent and identically distributed input X, can be expressed as

$$\begin{aligned} \text {H}(X|Y) = -((1-p)\text {log}_q(1-p) + p\cdot \text {log}_q(\frac{p}{q-1})). \end{aligned}$$
(3)

A code’s performance in terms of relative information leakage can be measured by its efficiency f, given by

$$\begin{aligned} f = \frac{\text {leak}_{\text {IR}}}{n\text {H}(X|Y)}. \end{aligned}$$
(4)

It is important to note that an efficiency of \(f>1\) corresponds to leaking more bits than required by the theoretical minimum of \(f=1\), which represents the best possible performance according to the Slepian–Wolf bound. In practice, systems have \(f>1\) due to the difficulty of designing optimal codes, finite-size effects, and the inherent trade-off between efficiency and throughput. For more details on achievable information leakage, including respect to finite-size effects, see, for example, [28]. In the following sections, we restrict ourselves to q being a power of 2. Both approaches can function without this restriction, but it allows for more efficient implementation of the reconciliation and is commonly seen in physical implementations of the quantum stage due to symmetries.

The information reconciliation phase is succeeded by an Error Verification stage wherein an estimate of the expected probability of correctness is obtained. Here, correctness refers to the agreement of both Alice’s and Bob’s versions of the key after information reconciliation. In practical terms, this is frequently accomplished through the exchange and comparison of hashes [29, 30]. In case of a disagreement, the error correction can be repeated with the cost of doubling the leaked information. If this is not feasible, e.g., the additional leakage prohibits any secret key extraction, the keys of this round are discarded.

2.2 Nonbinary LDPC codes

2.2.1 Codes & decoding

We provide here a short overview over nonbinary LDPC codes and their decoding based on the concepts and formalism of binary LDPC codes. For a comprehensive review of those, we refer to [32].

Nonbinary LDPC codes can be described by their parity check matrix \(\textbf{H}\), with m rows and n columns, containing elements in a Galois Field (GF) of order q. To enhance clarity in this section, all variables representing a Galois field element will be marked with a hat, for instance, \(\hat{a}\). Moreover, let \(\oplus , \ominus , \otimes ,\) and \(\oslash \) denote the standard operations on Galois field elements: Addition, subtraction, multiplication, and division [33]. An LDPC code can be depicted as a bipartite graph, known as the Tanner graph. In this graph, the parity check equations form one side, called check nodes, while the codeword symbols represent the other side, known as variable nodes. The Tanner graph of a nonbinary LDPC code also has weighted edges between check and variable nodes, where each weight corresponds to the respective entry of \(\textbf{H}\). The syndrome \(\textbf{s}\) of the q-ary string \(\textbf{x}\) is computed as \(\textbf{s} = \textbf{H}\textbf{x}\).

For decoding, we employ a log-domain FFT-SPA [34, 35]. In-depth explanations of this algorithm can be found in [36, 37], but we provide a summary here for the sake of completeness. Let Z represent a random variable taking values in GF(q), such that \(\text {P}(Z_i = k)\) indicates the probability that qudit i has the value \(k=0,...,q-1\). The probability vector \(\textbf{p}=(p_0,...p_{q-1})\), \(p_j = \text {P}(Z=j)\) can be converted into the log-domain using the generalized equivalent of the log-likelihood ratio (LLR) in the binary case, \(\textbf{m}=(m_0,...,m_{q-1})\), \(m_j = \text {log}\frac{\text {P}(Z=0)}{\text {P}(Z=j)} = \log (\frac{p_0}{p_j})\). Unless specified otherwise, the logarithm is taken to base e. Given the LLR representation, probabilities can be retrieved through \(p_j = \exp (-m_j)/\sum _{k=0}^{q-1} \exp (-m_k)\). We use \(p(\cdot )\) and \(m(\cdot )\) to denote these transforms. To further streamline notation, we define the multiplication and division of an element \(\hat{a}\) in GF(q) and an LLR message as a permutation of the indices of the vector:

$$\begin{aligned} \hat{a} \cdot \textbf{m}&:= (m_{\hat{0} \otimes \hat{a}},...,m_{\hat{q-1} \otimes \hat{a}}) \end{aligned}$$
(5)
$$\begin{aligned} \textbf{m} / \hat{a}&:= (m_{\hat{0}\oslash \hat{a}},...,m_{\hat{q-1} \oslash \hat{a}}), \end{aligned}$$
(6)

where the multiplication and division of the indices occur in the Galois Field. These permutations are necessary as we need to weigh messages according to their edge weight during decoding. We further define two transformations involved in the decoding,

$$\begin{aligned} \mathbf {\mathcal {F}}(\textbf{m}, \hat{H}{ij})&= \mathcal {F}(p(\hat{H}{ij}\cdot \textbf{m})) \end{aligned}$$
(7)
$$\begin{aligned} \mathbf {\mathcal {F}}(\textbf{m}, \hat{H}{ij})^{-1}&= m(\mathcal {F}^{-1}(\textbf{m}))/\hat{H}{ij}, \end{aligned}$$
(8)

where \(\mathcal {F}\) represents the discrete Fourier transform. Note that for q being a power of 2, the fast Walsh Hadamard transform can be utilized. The decoding process then consists of two iterative message-passing phases, from check nodes to variable nodes and vice versa. The message update rule at iteration l for the check node to variable node message corresponding to the parity check matrix entry at (ij) can be expressed as

$$\begin{aligned} \textbf{m}^{(l)}_{ij,\text {CV}} = \mathcal {A}(\hat{s}^{\prime }_i) \mathbf {\mathcal {F}}^{-1}( \underset{j\prime \in \mathcal {M}(i)/j}{\Pi } \mathbf {\mathcal {F}}(\textbf{m}^{(l-1)}_{ij\prime }, \hat{H}_{ij\prime }), \hat{H}_{ij}), \end{aligned}$$
(9)

where \(\mathcal {M}(i)\) denotes the set of all check nodes in row i of \(\textbf{H}\). The matrix \(\mathcal {A}\), defined as \(\mathcal {A}_{kj}(\hat{a}) = \delta ( \hat{a} \oplus k \ominus j) - \delta (a\ominus j)\), accounts for the possible occurrence of nonzero syndromes [37]. The weighted syndrome value is calculated as \(\hat{s}^{\prime }_i = \hat{s}_i \oslash \hat{H}_{ij}\). The a posteriori message of column j can be written as

$$\begin{aligned} \tilde{\textbf{m}}^{(l)}_j = \textbf{m}^{(0)}(j) + \underset{i^{\prime } \in \mathcal {N}(j)}{\sum } \textbf{m}^{(l)}_{i^{\prime }j,\text {CV}}, \end{aligned}$$
(10)

where \(\mathcal {N}(j)\) is the set of all check nodes in column j of \(\textbf{H}\). The best guess \(\tilde{\textbf{x}}\) at each iteration l can be calculated as the minimum value of the a posteriori, \(\tilde{x}_j^{(l)} = \text {argmin} (\tilde{\textbf{m}}^{l}_j)\). The second message passings, from variable to check nodes, are given by

$$\begin{aligned} \textbf{m}_{ij, \text {VC}}^{(l)} = \tilde{\textbf{m}}^{(l)}_j - \textbf{m}^{(l)}_{ij, \text {CV}}. \end{aligned}$$
(11)

The message passing continues until either \(\textbf{H}\tilde{\textbf{x}} = \textbf{s}\) or the maximum number of iterations is reached.

To allow for efficient reconciliation for different QBER values, a rate-adaptive scheme is required. We use the blind reconciliation protocol [11] as it has a better performance with respect to efficiency than direct rate adaption [38]. A fixed fraction \(\delta \) of symbols is chosen to be punctured or shortened. Puncturing refers to replacing a key bit with a random bit that is unknown to Bob. For shortening, the value of the bit is additionally sent to Bob over the public channel. Puncturing, therefore, increases the code rate, while shortening lowers it. The rate of a code with p punctured and s shortened bits is then given by

$$\begin{aligned} R = \frac{n-m-s}{n-p-s}. \end{aligned}$$
(12)

To see how rate adaption influences the bounding of \(\text {leak}_{\text {IR}}\), see [39]. The blind scheme introduces interactivity into the LDPC reconciliation. Given a specific code, we start out with all bits being punctured and send the respective syndrome to Bob. Bob attempts to decode using the syndrome. If decoding fails, Alice transforms \(\lceil n(0.028 - 0.02R)\rceil \) [40] punctured bits into shortened bits, and resends the syndrome. This value is a heuristic expression and presents a trade-off between the number of communication rounds and the efficiency. Bob tries to decode again and requests more bits to be shortened in case of failure. If there are no punctured bits left to be turned into shortened bits, Alice reveals plain key bits instead. This continues until either decoding succeeds or the whole key is revealed by Alice successively sending all key bits through the public channel.

2.2.2 Density evolution

In the case of a uniform edge weight distribution, the asymptotic decoding performance of LDPC codes for infinite code length is entirely determined by two polynomials [41, 42]:

$$\begin{aligned} \lambda (x) = \sum _{i=0}^{d_{\text {v, max}}} \lambda _i x^{i-1} \quad \rho (x) = \sum _{i=0}^{d_{\text {c, max}}} \rho _i x^{i-1}. \end{aligned}$$
(13)

In these expressions, \(\lambda _i\) (\(\rho _i\)) represents the proportion of edges connected to variable (check) nodes with degree i, while \(d_{\text {v,max}}\) (\(d_{\text {c,max}}\)) indicates the highest degree of the variable (check) nodes. Given these polynomials, we can then define the code ensemble \(\mathcal {E}(\lambda , \rho )\), which represents all codes of infinite length with degree distributions specified by \(\lambda \) and \(\rho \). The threshold \(p_t(\lambda , \rho )\) of the code ensemble \(\mathcal {E}(\lambda , \rho )\) is defined as the worst channel parameter (QBER) at which decoding remains possible with an arbitrarily small failure probability. This threshold can be estimated using Monte Carlo density evolution (MC-DE), which is thoroughly described in [43]. This technique repeatedly samples node degrees according to \(\lambda \) and \(\rho \), and draws random connections between nodes for each iteration. With a sufficiently large sample size, this simulates the performance of a cycle-free code. Note that MC-DE is particularly well suited for nonbinary LDPC codes, as the distinct edge weights aid in decorrelating messages [43]. During the simulation, we track the average entropy of all messages [43]. When it falls below a certain value, decoding is considered successful. If this does not occur after a maximum number of iterations, the evaluated channel parameter is above the threshold of \(\mathcal {E}(\lambda ,\rho )\). Utilizing a concentrated check node distribution (which is favorable according to [44]) and a fixed code rate, we can further simplify to \(\mathcal {E}(\lambda )\). The threshold can then be employed as an objective function to optimize the code design, which is commonly achieved using the differential evolution algorithm [45].

2.3 Cascade

2.3.1 Binary cascade

Cascade [10] is one of the earliest schemes proposed for information reconciliation and has seen widespread use due to its simplicity and high efficiency. Successive works on the original Cascade protocol have been trying to increase its performance by either substituting the parity exchange with error correction methods [46], or by optimizing parameters like the top-level block sizes [47]. The binary Cascade protocol acting on a single frame can be summarized in the following steps:

  • Iteration 1:

    1. 1.

      The binary frame is divided into non-overlapping blocks of size \(k_1\), where the value of \(k_1\) usually depends on the estimated QBER of the frame.

    2. 2.

      Alice and Bob calculate the parity of each top-level block and share them over a classical channel.

    3. 3.

      For those blocks where a mismatch between the parity of Alice and the parity of Bob is detected, a binary search is used to detect a single error. In general, iff a parity mismatches between Alice and Bob, a binary search can be performed on that block. The binary search consists of three steps:

      1. 1.

        Split the respective block in half.

      2. 2.

        Calculate and exchange the parities of the two sub-blocks. One of the sub-blocks has a mismatching parity.

      3. 3.

        If the mismatching sub-block contains only 1 bit, the error is found. Otherwise, repeat Step (a) with the mismatching sub-block.

  • Iteration i:

    1. 1.

      Apply a permutation on the frame and divide it into new top-level blocks of size \(k_i\). Repeat Steps 2 and 3 as described for the first iteration on the new blocks.

    2. 2.

      Cascade step: Any erroneous bit detected in iteration i also takes part in blocks created in previous iterations. After correcting those bits, their parity mismatches again and allows for the detection of another error using binary search. This error again takes part in blocks of all other iterations and can be used to detect more errors, creating the ”cascading" effect of the Cascade step.

The protocol stops after a fixed number of iterations. To the best of our knowledge, state-of-the-art in terms of efficiency with values up to \(f=1.025\) is reached by a modification [22] of the original Cascade. The additions of this modification constitute mainly of separating bits into groups of similar confidence and choosing optimal block sizes on those groups in Iteration 2. We will denote this version of Cascade as “binary Cascade” and it will be used for all comparisons. It has also been used as the base for a recent implementation reaching the highest throughout so far of 570 Mbps [14].

2.3.2 High-dimensional Cascade

We propose the following modification to use Cascade for high-dimensional data, which we will denote by high-dimensional Cascade (HD-Cascade). We only highlight the differences compared to binary Cascade as described in [22]. All the modifications we propose are additions to the binary algorithm and reduce back to binary Cascade for \(q=2\) as they rely on correlations that only exist for \(q>2\).

Fig. 2
figure 2

Example of the cascading step inside the first iteration of HD-Cascade. Blocks i and j are random top-level blocks in the first iteration. The bits at positions (i, 1) & (j, 1) and (\(i,-2\)) & (\(j,-1\)) originate from the same symbol, respectively. A and B denote the current parity of the block for Alice and Bob, respectively. a Block j has a matching parity, so no binary search is possible. Block i has mismatching parity, a binary search reveals position 1 to be erroneous. b After correcting the error, Bob’s parity flips to match that of Alice. By requesting and correcting the associated partner bit in block j, a mismatch in the parities is introduced. We can therefore run a binary search on block j and detect the error at position (\(j,-1\)). c By requesting and correcting the respective partner bit in block i, the parities mismatch again and allow for another round of binary search to reveal the error at position (i, 2). d All blocks have matching parities

  • Modification 1 Initially, we map all symbols to an appropriate binary representation. Prior to the first iteration, we shuffle all bits while maintaining a record of which bits originate from the same symbol. We denote these bits as ”partner bits". This mapping effectively reduces the expected QBER used for block size calculations in all iterations to \(\text {QBER}_{\text {BIN}} = \frac{q}{2(q-1)} \text {QBER}_{\text {SYM}}\).

  • Modification 2 Upon detecting an error at any point during the protocol, immediately do the following:

    1. 1.

      Request all partner bits of the erroneous bit.

    2. 2.

      If any of the partner bits are erroneous, the blocks they have been participating in now have mismatching parities again. Run a binary search on all mismatching blocks. Repeat Step 1 until no new errors are found.

    The conditional probability for a partner bit to be erroneous given the values of all previously transmitted bits for these bits is close to 1/2. To be precise, it is equal to 1/2 for bits that have not yet participated in any parity checks and then varies with the length of the smallest block they participated in [22]. Note that this procedure allows for a cascading process in the first iteration already, and therefore requires all blocks to be processed sequentially followed by a Cascade step for each single error for maximum impact of the Cascade step.

  • Modification 3 The fraction of errors corrected in the first iteration is significantly higher (often \(>95\%\) in our simulations for high dimensions) compared to the binary version. This is due to the possibility of running a cascading process in the first iteration already, see Fig. 2 for an example. Consequently, we need to increase the block sizes for the following iterations as the dimensionality increases, see Table 2. The importance of partner bits increases with increasing dimension.

2.3.3 Parallel high-dimensional Cascade

While the proposition in Sect. 2.3.2 achieves great efficiency, it also requires all blocks to be processed serially. This results in a limited throughput, as a large amount of messages is required, i.e., a single message for every requested parity. We therefore propose the following adaptions for a more practical implementation of high-dimensional Cascade:

  1. 1.

    Modification 4 In the first iteration, we split the binary representation into \(v=\log _2(q)\) groups of size n; n being the number of qudits per frame. Each group only contains bits of the same bit-plane, i.e., the first group contains all the first bits of all symbols, the second group contains all the second bits of all symbols and so on. We then apply permutations that are restricted to each group only. After locating the error positions of one group using binary search, the corresponding partner bits will be located in all other groups. All blocks of one group can therefore be processed in parallel. When calculating the top-level block sizes of the following blocks, we adjust the expected QBER using the number of partner bits. It can be calculated for group i, \(i=1,...,v\) as:

    $$\begin{aligned} \text {QBER}_i = \text {QBER}_{\text {BIN}} - \frac{1}{2n}\sum _{j<i} \frac{\text {PB}_j}{v}, \end{aligned}$$
    (14)

    where \(\text {PB}_j\) denotes the number of partner bit requests originating from group j. To reduce the number of messages sent, we do not cascade on the partner bits until all groups have finished this first stage. The block creation for all other iterations follows the procedure described in [22].

  2. 2.

    Modification 5 This modification replaces Modification 2 and the Cascade step of the binary protocol. After completing the binary search for all top-level blocks of an iteration, we collect all found errors into a list. We then do the following for the list of known errors:

    1. (a)

      All known error positions are fed into the Cascade step as a group and processed in parallel. For each error, locate the smallest block for which the error participates in a not-flagged iteration.

    2. (b)

      Run a binary search on those smallest blocks that have a mismatch and locate the new error positions. Add the new errors to the list of all errors. Request the partner bits of these new errors if not already known and add them to the list of all known errors if they are erroneous. Flag the origin iterations of the blocks as already used for the respective input bits. The origin iteration of a block is that iteration in which the block has been created, either as a top-level block or as part of the binary search.

    3. (c)

      Remove all errors from the list that have all iterations flagged. Correct all known errors and repeat Step (a). If there are no new errors to be found, the Cascade step terminates.

    The Cascade step can be stopped early in a trade-off between efficiency, throughput, and frame error rate.

3 Results

3.1 Nonbinary LDPC codes

While the code design and decoding techniques described above are feasible for any dimension q, we focus on \(q = 4\) and 8 as those are common in current implementations [48]. Nine codes were designed with code rates between 0.50 and 0.90 for \(q=4\) (\(q=8)\), corresponding to a supported QBER range between 0 and \(18\%\) (\(24.7\%\)). We used 100000 nodes with a maximum of 150 iterations for the MC-DE, the QBER was swept in 20 steps in a short range below the best possible threshold. In the differential evolution, population sizes between 15 and 50, a differential weight of 0.85, and a crossover probability of 0.7 were used. A sparsity of at most 10 nonzero coefficients in the polynomial was enforced, with the maximum node degree chosen as \(d_{\text {v,max}}=40\). The sparsity allowed for reasonable optimization complexity, the maximum node degree was chosen to avoid numerical instability which we observed for higher values.

The results of the optimization can be found in Table 1 in form of the node degree distributions (13) and their performance according to density evolution by reporting their simulated threshold (density evolution threshold, DET). The efficiency was evaluated for the highest supported QBER, noted as the ensemble efficiency (EEff). The all-zero codeword assumption was used for the optimization and evaluation, which holds for the given scenario of a symmetric channel [36]. For all rates, the designed thresholds are close to the theoretical bound (2). LDPC codes with a length of \(n=30000\) symbols were constructed using Progressive Edge Growth [49], and a log-FFT-SPA decoder was used to reconcile the messages. The simulated performance of the finite-size codes can be seen in Fig. 6 for a span of different QBER values, each data point being the mean of 100 samples. We used the blind reconciliation scheme for rate adaption [11]. The mean number of decoding tries required for Bob to successfully reconcile is also shown. The valley pattern visible in the efficiency of the LDPC codes is due to the transition between codes of different rate, and a slight degradation in performance for high ratios of puncturing or shortening. The decoder used a maximum of 100 decoding iterations. As expected for finite-size codes, they do not reach the asymptotic ensemble threshold but show sub-optimal performance [37].

Table 1 Degree distributions for four- and eight-dimensional nonbinary LDPC codes

3.2 High-dimensional Cascade

The performance of HD-Cascade, see Sect. 2.3.2, was evaluated on the q-ary symmetric channel for dimensions \(q = 4\), 8, 32, and for a QBER ranging from \(1\%\) to \(20\%\). The results are shown in Fig. 3. For comparison, a direct application of the best-performing Cascade modification on a binary mapping is also included. Binary Cascade’s performance can be understood as generating an information leakage corresponding to an input key with an error rate of \(\text {QBER}_{\text {BIN}}\). The proposed high-dimensional Cascade uses the same base Cascade with the additional adaptations discussed in Sect. 2.3.2. For \(q=2\), HD-Cascade reduces to binary Cascade, resulting in equal performance. Both methods use the same block size of \(n = 2^{16}\) bits for all cases. The used top-level block sizes \(k_i\) for each iteration i can be seen in Table 2, where \([\cdot ]\) denotes rounding to the nearest integer. The block sizes have been chosen heuristically through numerical optimization. Additionally, the layered scheme is included as a reference [8]. All data points have a frame error rate below 1% and show an average of 1000 samples. The wave pattern observable for the efficiency of Cascade in Fig. 3 and Fig. 6 is due to the integer rounding operation when calculating the block sizes. Discretized block sizes being a power of two have been shown to be optimal for the binary search in this setting [22].

Fig. 3
figure 3

Efficiency of different approaches evaluated on a q-ary symmetric channel. Layered refers to the layered scheme, Bin to direct application of binary Cascade (serial), and HD to the high-dimensional Cascade proposed in this work. Data points represent the mean of 1000 samples and have a FER of less than \(1\%\)

Table 2 Block sizes used for HD-Cascade

The increase in both the range and secret key rate resulting from using HD-Cascade instead of directly applying binary Cascade is depicted in Fig. 4. The used protocols are 1-decoy state QKD protocols [50, 51], with the secret key length \(l_q\) per block given as

$$\begin{aligned} l_q \le \log _2(q) D_0^Z + D_1^Z (\log _2(q) - \text {H}_\text {HD}(\Phi _Z,q) ) - \text {leak}_{\text {IR}} - 6 \log _2(19/\epsilon _{\text {sec}}) - \log _2(2/\epsilon _{\text {cor}}), \end{aligned}$$
(15)

where \(D_0^Z\) is a lower bound on vacuum events and \(D_1^Z\) is a lower bound on single photon events. We refer to the supplementary information of [50] for derivation of these bounds. \(\text {H}_\text {HD}(\Phi _Z,q)\) is the high-dimensional Shannon entropy,

$$\begin{aligned} \text {H}_\text {HD}(\Phi _Z,q) = -\Phi _Z \log _2(\Phi _Z / (q-1)) - (1-\Phi _Z) \log _2 (1-\Phi _Z), \end{aligned}$$
(16)

where \(\Phi _Z\) is an upper bound on the phase error rate, \(\epsilon _{\text {sec}}\) is the security parameter, and \(\epsilon _{\text {cor}}\) is the correctness parameter. Experimental parameters for the simulation are derived from [52] for \(q=2\) and 4, where a combination of polarization and path is used to encode the qudits. For \(q=8\) and 32, we used a generalization of the setup. Additional losses might transpire due to increased experimental complexity which are not considered in the simulation. Some of the parameters are listed in Table 3.

Table 3 Parameters used for the SKR simulations

The improvement in the relative secret key rate r obtained using HD-Cascade is shown in Fig. 7. We also analyzed the performance of HD-Cascade on experimental data provided by the experiment in [52], which confirms the simulated performance.

We further analyze the performance of the parallel high-dimensional Cascade implementation for \(q = 4, 16\). The results can be seen in Fig. 5. While a slight penalty in efficiency can be observed, the number of messages sent for a single frame is greatly reduced compared to the serial approach. In the serial approach, the number of messages is roughly given by \(n\text {H}(X|Y)\), i.e., around 8000 messages for \(q=16\), QBER\(=5\%\), and \(f=1.05\), compared to a mean of 239 and 189 messages for \(q=4\) and \(q=16\) for the parallel implementation. Notably, the number of messages seems to decrease for higher dimensions. All data points represent the mean of 2000 samples, the frame error rate is below 0.1% for all QBER values.

Fig. 4
figure 4

Secret key rate vs. channel loss for different dimensions. The decreasing dotted/dashed lines show results for direct application of binary Cascade whereas the solid lines show the performance for HD-Cascade. The increasing dotted/dashed lines show the respective QBER

Fig. 5
figure 5

Top: The efficiency of the parallel high-dimensional Cascade implementation for dimensions 4 and 16. Bottom: The number of messages sent from Alice to Bob for a single frame

Fig. 6
figure 6

Top: Efficiency of using nonbinary LDPC codes and HD-Cascade for different QBER values for \(q=8\). Bottom: Number of decoding tries in the blind scheme used for the LDPC codes, i.e., the number of messages required. The number of messages required for HD-Cascade can be seen in Fig. 5

Fig. 7
figure 7

Relative improvement of the secret key rate for using HD-Cascade compared to binary Cascade. Experimental data provided by recent experiment [52]

4 Discussion

4.1 Nonbinary LDPC codes

Nonbinary LDPC codes are a natural candidate for the information reconciliation stage of HD-QKD, as their order can be matched to the dimension of the used qudits, and they are known to have good decoding performance [9]. Although they typically come with increased decoding complexity, this drawback is less of a concern in this context, since the keys can be processed and stored before being employed in real-time applications, which reduces the significance of decoding latency. Nevertheless, less complex decoder algorithms like EMS [53] or TEMS [54] can be considered to allow the usage of longer codes and for increasing the throughput with a small penalty on efficiency.

The node degree distributions we constructed show ensemble efficiencies close to one, \(1.037 - 1.067\) for \(q=4\) and \(1.024-1.080\) for \(q=8\). To the best of our knowledge, there is no inherent reason for the efficiencies of \(q=8\) to be lower than for \(q=4\). It is rather just a heuristic result due to optimization parameters fitting better. Although the ensembles we found display thresholds near the Slepian–Wolf bound, we believe that even better results could be achieved by expanding the search of the hyperparameters involved in the optimization, such as the enforced sparsity and the highest degree of \(\lambda \), and by performing a finer sweep of the QBER during density evolution. The evaluated efficiency of finite-size codes shows them performing significantly worse than the thresholds computed with density evolution, with efficiencies ranging from 1.078 to 1.14 for QBER values in a medium range. This gap can be reduced by using longer codes and improving the code construction, e.g., using improved versions of the PEG algorithm [55, 56]. The dependency of the efficiency on the QBER can further be reduced, i.e., flatting the curve in Fig. 6, by improving the position of punctured bits [57].

While working on this manuscript, the usage of nonbinary LDPC codes for information reconciliation has also been proposed in [58]. They suggest mapping symbols of high dimensionality to symbols of lower dimensionality but still higher than 2 if beneficial, in similarity to the layered scheme. This can further be used to decrease computational complexity if required.

4.2 HD-cascade

HD-Cascade has improved performance on high-dimensional QKD setups compared to directly applying binary Cascade. We can see significant improvement in efficiency, with mean efficiencies of \(f_{\text {HD-Cascade}}=1.06\), 1.07, 1.12 compared to \(f_{\text {Cascade}}=1.22\), 1.36, 1.65 for \(q=4\), 8, 32, respectively. Using the setup parameters of a recent experimental implementation of four-dimensional QKD [52], a resulting improvement in range and secret key rate can be observed, especially for higher dimensions. For \(q=32\), an increase of more than \(10\%\) in secret key rate over all QBER values and an additional 2.5 dB in tolerable channel loss are achievable according to our simulation results.

The serial approach demonstrates high efficiency across all QBER values but suffers from a strong increase in execution time with higher error rates and an impractical requirement on communication rounds. Apart from the inherent scaling of Cascade with the QBER that is also present for binary implementations, this is additionally attributable to the immediate cascading of partner bits. This penalty can be greatly reduced when implementing the parallel high-dimensional version. The resulting penalty on the mean efficiency is very small with mean efficiencies of \(f_{\text {HD-Cascade, practical}}=1.07\), 1.08 for \(q=4\), 16, respectively. We again see the sawtooth pattern that can also be observed in binary Cascade and is attributed to discrete jumps in block sizes [47]. We observed a significantly higher variance in the efficiency with respect to QBER for the parallel approach, especially for \(q=4\). We assume that this is due to better-performing block size selection for \(q=16\) and that an optimized parameter selection will reduce the sawtooth pattern and increase efficiency overall for all values of q. The search for optimized parameters has been investigated and is with great impact on the efficiency of binary Cascade, see again [47] for an overview. The search for well-performing block size selections for HD-Cascade could therefore be an interesting object of further research.

While the many rounds of communication required by Cascade have raised concerns about resulting limitations on throughput, recent research has shown that sophisticated software implementations can enable Cascade to achieve high throughput even with realistic latency on the classical channel [14, 15]. While high-dimensional Cascade has little additional computations compared to binary Cascade, the number of messages is increased due to the additional rounds of cascading on the partner bits. For \(q=4\), 16, the number of messages per frame can be seen in Fig. 5. Notably, the number of messages required seems to decrease with increasing dimension, resulting in an increasing throughput with increasing dimension. The impact on throughput is dependent on the block size, communication delay, and computing time, an analysis of this for the binary case can be found in [15] and is required for the high-dimensional case before a general behavior can be attributed. This is in contrast to LDPC codes whose throughput would scale negatively with increasing dimension. In our simulations, the relative increase in throughput between \(q=4\) and \(q=16\) is \(30\%\) for a QBER of \(1\%\) and a delay of 1 ms between Alice and Bob.

4.3 Comparison

Overall, HD-Cascade and nonbinary LDPC codes show good efficiency over all relevant QBER values, with HD-Cascade performing slightly better in terms of efficiency (see Fig. 6) at the expense of increased interactivity. Both show significant improvement compared to the layered scheme. The performance of the layered scheme can be seen for \(q=32\) in Fig. 4, notably for a much smaller block length (data read off Fig. 5 in [8]). Later experimental implementations report efficiencies of \(f=1.25\) [59] (\(q=3\), \(n=1944\), \(p=8\%\)) and \(f=1.17\) [60] (\(q=1024\), \(n=4000\), \(p=39.6\%)\). These papers report their efficiencies in the \(\beta \)-notation, we converted them to the f-notation for comparison. \(\beta \) is commonly used in the continuous-variable QKD community, whereas f is more widespread with respect to discrete-variable QKD. They can be related via [47]

$$\begin{aligned} \beta (\text {H}(X)-\text {H}(X|Y)) = \text {H}(X)-f\text {H}(X|Y). \end{aligned}$$
(17)

HD-Cascade shows a flat efficiency behavior over all ranges, compared to the LDPC codes, which have a bad performance for very low QBER values and show an increase in performance with increasing QBER, see Table 1. This behavior can also be observed in LDPC codes used in binary QKD [17, 18, 21]. While the focus of this work lies in introducing new methods for high-dimensional information reconciliation with good efficiencies, the throughput is another important measure, especially with continuously improving input rates from advancing QKD hardware implementation. While an absolute and direct comparison of throughput strongly depends on the specific implementation and setup parameters and is not a contribution of this work, relative performances can be considered. Cascade has low computational complexity but high interactivity which can limit throughput in scenarios where the classical channel has a high latency.

Nonbinary LDPC codes, on the other hand, have low requirements on interactivity (usually below 10 syndromes per frame using the blind scheme, see Fig. 6) but high computational costs at the decoder. Their decoding complexity scales strongly with q but only slightly with the QBER, as its main dependence is on the number of entries in its parity check matrix and the node degrees. It should be noted that the QBER is usually fairly stable until the loss approaches the maximum range of the setup, e.g., see Fig. 4, and that higher dimensions tend to operate at higher QBER values [7]. It should be emphasized that for QKD, low latency in post-processing is often not a priority as keys do not need to be available immediately but can be stored for usage, with varying importance for different network scenarios. QKD systems can be significantly bigger and more expensive than setups for classical communication. This allows for reconciliation schemes with comparatively high latency and high computational complexity, for example, by extensive usage of pipelining [14, 61, 62].

5 Conclusion

We introduced two new methods for the information reconciliation stage of high-dimensional quantum key distribution and investigated their performance. We present nonbinary LDPC codes designed and optimized specifically for the use in high-dimensional QKD systems. They allow for reconciliation with good efficiency while maintaining a low interactivity between the two parties. For HD-QKD systems of dimension 8, the codes we constructed show efficiencies between \(f=1.078\) and \(f=1.14\) for QBER values between \(3\%\) and \(15\%\). We further propose new modifications to Cascade to make it suitable for high-dimensional QKD that greatly increase its efficiency on those systems. The main modification manifests in the request of partner bits as additional means of detecting errors in Cascade. Our simulations show mean efficiencies of \(f=1.06\), 1.07 and 1.12 for dimensions 4, 8 and 32. We also analyze the impact of HD-Cascade on the secret key rate compared to using binary Cascade and note significant improvement, i.e., more than \(10\%\) for a 32-dimensional system for all possible channel losses and an increase in range corresponding to an additional 2.5 dB loss.