The information of high-dimensional time-bin encoded photons

We determine the shared information that can be extracted from time-bin entangled photons using frame encoding. We consider photons generated by a general down-conversion source and also model losses, dark counts and the effects of multiple photons within each frame. Furthermore, we describe a procedure for including other imperfections such as after-pulsing, detector dead-times and jitter. The results are illustrated by deriving analytic expressions for the maximum information that can be extracted from high-dimensional time-bin entangled photons generated by a spontaneous parametric down conversion. A key finding is that under realistic conditions and using standard SPAD detectors one can still choose frame size so as to extract over 10 bits per photon. These results are thus useful for experiments on high-dimensional quantum-key distribution system.


I. INTRODUCTION
It is well know that entangled photons can be used to extract shared random bits. The number of extractable bits per photon pair depends on the dimensions of the entangled degree of freedom. For example, polarization entanglement allows at most one shared bit per photon pair. An alternative is to use the arrival time of a photon. Encoding within the arrival time of a pair of photons offers an experimentally viable way of generating high-dimensional entangled states [1][2][3][4]. High-dimensional entangled states have many interesting properties [5][6][7] and can allow for multiple shared bits extracted from each photon pair. This can be beneficial for quantum key distribution (QKD), where each detected photon pair could encode over 10 bits of information [8].
There are several benefits to encoding within the time of arrival as opposed to other degrees of freedom, such as the spatial modes. One key advantage is to minimize the effects of detector dead-time, which not only limits the rate at which information can be communicated, but also impacts on security within QKD [9,10]. Another benefit is that temporal modes can be easily coupled into fibres, which is not the case for beams with non-zero orbital angular momentum [11][12][13].
It is clear that imperfections such as loss have a strong effect on how much information we can extract from high-dimensional entangled photons. It is thus vital to model the effects of realistic experimental errors. Any model must take account of the photons source, channel losses and imperfect detec-tors. Nevertheless, it has been shown that it may still be possible to extract over 10 bits per photon pair under reasonable experimental conditions [14].
In practice, the amount of extractable information depends on the error correcting scheme. In turn, this can depend on the physical implementation. The case of time-bin encoding raises specific problems. For example, in standard polarization based QKD schemes, one uses the timing information to help correct losses. It is thus possible to remove all the cases where Alice and Bob do not share coincident photons. This approach is clearly not suitable when information is encoded in the arrival time. Instead, we require a method for correcting errors that does not reveal the timing information. A common way of circumventing this is to split the arrival time into time-bins, which are then grouped together to form a frame [15]. Alice and Bob then publicly announce the number of photons detected in each frame. The use of frame encoding, while greatly facilitating error correction, does add an additional constraint to the extractable information. A realistic model must take this into account.
The aim of this work is to determine the extractable information from high-dimensional, temporal-entangled photons.
In particular, we determine the maximum number of shared bits that, on average, one obtains using frame encoding schemes. It is important to stress that while the main motivation for this work comes from QKD, we are not proposing a new QKD protocol. As such we do not concern ourselves with the task of securing the bits. Instead, we establish the maximum shared information that can be obtained via reconciliation.
The task of securing the bits will generally depend on the exact nature of the setup.
The results we present are not only useful for QKD. For instance, it is has been argued that the mutual information can be used to quantify the entanglement within and SPDC source [18]. Furthermore, the results can also be used to quantify the capacity of fibre array [14], which can be used, for instance, in time-multiplexing of detectors [19,20].
The general formalism we present can model experiments such as illuminating a nonlinear crystal with a mode-locked laser,(see Fig 1 and the description in the next section). The approach is, however, not tied to this setup and applies to general sources of entangled photons. For instance, the formalism can be applied to cases where the Poissonian approximation is not appropriate. One could thus use our approach to model many different time-bin based experiments. In addition, the formalism also takes account of asymmetric channel losses, dark counts, jitter and other such effects. The breadth and generality of the considered errors is beyond that which is considered in previous works [15,24].
To understand how these results can be useful, consider a QKD experiment with detector jitter. A common approach to reducing jitter is to increase the width of the time-bins. This is, however, not always possible or practical. Furthermore, even when we can increase the time-bin width, this affects the amount of information one can extract. In this context, an important question is whether it is better to increase the width of the time-bins or to correct the jitter errors using a reconciliation protocol. To answer this question one must calculate the mutual information in the presence of jitter. The can be achieved using the results of this paper.
Another way in which our approach goes beyond existing results, such as [15,24], is to calculate explicitly the effect of frames that contain two or more photon pairs. Our findings are thus complementary to those of [25], which presents a layered protocol for extracting information from general multi-array frames. The aim of the current work is to find the maximum possible extractable information using any frame-encoding protocol. This should prove important for optimization and design of new errorcorrection codes for high-dimensional QKD.

II. FRAME ENCODING
Pairs of photons have been prepared experimentally where their arrival time is entangled [1]. A common way of generating such photons pairs is to use spontaneous parametric down-conversion (SPDC) [8,16,17]. Figure 1 shows a typical setup, where a nonlinear crystal is pumped by a mode-locked laser [21,22]. The incoming pulses are classically coherent. As down-conversion is a unitary process, the coherence between the pulses is transferred to a coherence between the amplitudes to generate photon pairs in each time-bin. In the ideal case, two parties, called Alice and Bob, use this setup to generate a random sequence photon pairs that are perfectly correlated in time. Alice and Bob then use single-photon counter modules and synchronized time-tagging devices to obtain the timing information. Setups such as this have been realized experimentally [21][22][23].
To make use of such time-entangled states, the arrival time is divided into a discrete set of timebins. For the case of a mode-locked train of pulses, the time-bin width is set by the pulse spacing. In alternate setups where photons are generated by a single pulse or a continuous-wave laser, the time is discretized by dividing the time into discrete timebins. If the widths of these time-bins are chosen appropriately, then Alice and Bob should detect their photons within the same time-bin. The uncertainty in the arrival time can then be used to extract shared random bits. An eavesdropper could then be detected by measuring within another basis [26][27][28][29][30][31][32][33].
FIG. 1: A schematic for a experimental setup that generates and distributes high-dimensional, time-of-arrival entangled photons. A mode-locked laser generates a coherent train of pulses. The pulses pumps a nonlinear crystal that produces entangled photon pairs in some of the time-slots; many of the slots contain no photons (color online).
In real experiments, there will always be errors. Alice and Bob will thus carry out error-correction to obtain a shared random string. A simple approach is to group together several contiguous time-bins to form a frame [15]. For each frame, Alice and Bob announce the number of time-bins in which they detect photons. Let K A and K B denote the number of time-bins in which photons were detected by Alice and Bob respectively. 1 One can use K A and K B to classify the frames; we thus write (K A , K B )-frames, to denote the class of frames where Alice see K A clicks while Bob sees K B . Error-correcting codes can then be developed to deal with each class of frame. In many setups, the chance that Alice and Bob will detect multiple photons within a frame is low. In such situations, it is sufficient to consider only cases such as (1, 1) and (2, 2)-frames.
We envisage more complicated frame-encoding schemes, where Alice and Bob don't publicly announce K A and K B . In all that follows, we consider only simple schemes where K A and K B are announced. The reason is that an understanding of this situation is vital also for the more complicated protocols. In particular, we will show that the shared information for the case where K A and K B are announced differ from the more complicated protocols by a single term. Thus, the results we present can also be used to calculate the shared information for more advanced protocols.
Let N represent the number of time-bins that comprise each frame. If Alice observes clicks in K A time-bins, then the number of possible distinct measurement records she could have is given by the Binomial coefficient A particular record of measurement results, or measurement patterns, can be denoted by an N -bit binary string that indicate the location of the timebins where photons are detected. For example, if N = 4, K A = 2, and Alice sees clicks in the first and third time-bins, then the corresponding binary string is 1010. It will prove useful to introduce a further piece of notation. We denote Alice's measurement patterns symbolically as A KA r , where r is the binary string that uniquely describes the pattern. We describe Bob's measurement pattern using the same notation, where each A is changed to B.
If there are no errors, then Alice and Bob each see the same measurement pattern. When losses are present, then Alice and Bob's measurement patterns can be different. Nevertheless, there will still be some correlation in their results. We expect that it should still be common to observe clicks within the same time-bins. For a particular frame, let L be the number time-bins in which they both share clicks. For example, if Alice has the pattern 0101, while Bob sees the pattern 1100, then L = 1. Clearly, Min{K A , K B } ≥ L. Furthermore, the allowed values for L satisfy the inequality N ≥ K A + K B − L.
1 means that Alice detected a click within one time-bin. In principle, it is possible that this click corresponded to multiple photons or even a dark count.
For fixed values of K A , K B and L, the total number of different joint patterns for Alice and Bob is which is a multinomial coefficient. These observations will prove useful later on.
We want to determine the information contained within an average frame. The mutual information per frame is denoted as H f rame (A : B). To find the number of shared bits per photon, we divide H f rame (A : B) by the average number of photon pairs found within a frame. We then calculate H f rame (A : B) using the method outlined in [14]. For a frame encoding scheme, we will not reach the bits per photon limit set by H f rame (A : B). The reason for this is that, in a frame encoding scheme, Alice and Bob publicly announce the number of clicks they see in each frame. They then apply error correction codes individually to each class of frame. This necessarily results in a loss of randomness, and hence, of random bits.
The shared information that we can extract is related to the mutual information. However, it is not H f rame (A : B), but instead the mutual information post-selected on when K A and K B have specific values. This implies that we must use conditional probabilities in place of the standard probabilities to determine the conditional mutual information [34].
Suppose Alice observes x clicks while Bob sees y, the maximum shared information per frame is given by the conditional mutual information is the joint conditional probability for Alice and Bob to obtain the patterns A x r and B y s , while P (A x r |K A = x) and P (B y s |K B = y) are the marginal conditional probabilities for Alice and Bob, respectively. The conditional mutual information H(A : B|K A = x, K B = y) gives the maximum number of bits per frame that can be extracted from (x, y)-frames.
One average, Alice and Bob can extract is the probability for Alice and Bob to observe x and y clicks, respectively. Notice that H(A : B|K A , K B ) = H f rame (A : B), hence we have lost some information. We find that H f rame (A : B) = H(A : i.e., it is the entropy in the uncertainty in the number of clicks per frame. The loss of information thus follows simply from the fact that Alice and Bob announce the values of K A and K B . In a practical application, one may not be able to develop effective error correcting codes for all of the classes of frame. In this instance, H(A : B|K A , K B ) will over-estimate the extractable information. The actual extractable information can be found by averaging H(A : B|K A = x, K B = y) over the frames for which we do have error correcting codes. For example, if we only have codes for (1,1)-frames, then the extractable information is P (K A = 1, K B = 1)H(A : B|K A = 1, K B = 1).

III. CALCULATING THE CONDITIONAL PROBABILITIES AND CONDITIONAL MUTUAL INFORMATION
In this section, we present a general procedure for calculating the conditional probabilities and hence the conditional mutual information. The approach allows us to calculate the probabilities for a general source and includes the effects of errors such as channel losses and detector imperfections. The first step is to work out the detection probabilities for a single time-bin. We then use these to construct the probabilities to observe specific measurement patterns, from which the conditional probabilities are calculated.
One thing to notice is that the coherence between the time-bins does not appear in our calculations. The reason for this is that our present results are for the case when we measure within the time-of-arrival basis. Such measurements cannot detect coherence between time-bins. Furthermore, they will actually destroy any coherence. The situation we thus consider is one where a measurement has been made of the photons time-of-arrival, which inevitably disturbs the temporal coherence. The temporal coherence would, however, be reveled if one measured in a basis that was a superposition of time-bins. For instance, consider a mode-locked laser generating a train of coherence pulses acting as a pump for a nonlinear crystal. The coherence between the time-bins of the down-converted photons is not evident if we measure the time-of-arrival. However, it has been demonstrated experimentally using a Franson interferometer [35].

A. Single time-bin probabilities
To calculate the single time-bin detection probabilities, we must first model the source, channel and detectors. The approach we use is based on that presented in [14]. We thus only give a brief recap of the important points. First, we assume that the source produces pairs of entangled photons, where the probability to produce m pairs within any given time-bin is P s (m). For simplicity, we initially assume that P s (m) is the same for each time-bin. Let λ be the average number of photon pairs produced per time-bin, hence λ = m mP s (m).
The information is encoded in the temporal location of the photon pairs, not their number. For this reason, we assume that the detectors do not resolve photon number. For ideal detectors, with no losses, the probability to observe a click in a time-bin is ∞ m=1 P s (m). All real detectors, however, suffer losses. Let Alice and Bob's detectors loss be ξ A and ξ B respectively. The probability to detect a single photon that is incident on Alice's detector, is thus ξ A . In addition to the losses due to inefficiencies in the detector, there are also losses from transmission of the photons from the source to the detectors. Let η a and η b be the losses in Alice and Bob's channels, respectively. Combining the two sources of loss into a single total efficiency, Alice's total efficiency is We are now in a position to calculate the probability for Alice and Bob to observe photons within a single time-bin. The key mathematical method is to use moment generating function to include the effects of loss. See [36] for a full discussion on moment generating functions and their properties.
For a source described by the probability distribution P s (m) and total losses for Alice and Bob of η A and η B , respectively, we define the moment generating function where we have neglected dark counts for now. Consider a single time-bin. The probability for Alice and Bob to observe a click within a given time-bin is denoted by π AB i,j , where i, j ∈ {0, c} and c represents a click while 0 signifies no click. It can be shown that the probabilities are [14] π AB The effect of dark counts is taken account of using the following procedure. Let P ij represent Alice and Bob's probability to detect photons within a single time-bin when dark counts are present. We find that where q is the probability to observe a dark count in a single time-bin. We see that the above probabilities sum to one. The marginal probabilities P A i and P B j are found from the joint probability P ij . A key feature of these general expression for P ij is that they are valid for any choice for the source probability P s (m). Thus, our results are not be limited to any particular physical implementation.

B. Probabilities for each frame
The probability for Alice (or Bob) to observe a particular measurement pattern is calculated using the relevant single time-bin probabilities P A i or P B j . The probability for Alice to see a pattern A KA The total probability for Alice to observe a measurement pattern with photons detected in x time-bins is The probabilities for Bob have the same form, but instead use the probabilities P B j . The joint detection probabilities P (A KA r , B KB s ) is calculated using the single time-bin joint detection probabilities. We find that The total probability for Alice and Bob to detect photons within x and y time-bins, respectively, is thus where Equation (8) leads to the same marginal probabilities as in Eq. (6). The conditional probabilities are calculated from Eqs. (6), (7) and (8), by recalling the definition of a conditional probability: P (X|Y ) = P (X, Y )/P (Y ).
We find that The reason for this is that, in this limit, Alice and Bob must observe the same patterns and K A = K B . This implies that P (K A = x) and P (K B = y) will be zero if x = y. In the case of η A = η B = 1, then K A = K B = K and the conditional mutual information has the simple form In the remainder of this section, we focus on the case where both η A and η B = 1.

C. Information per photon pair
We calculate the various entropic quantities, by using Eqs. (6) through to (9). For instance, we find that H f rame A typical application of this theory is in highdimensional QKD, which aims to encode multiple bits on each photon pair. It is thus worth considering the bits per photon pair. The average number of photon pairs generated within each frame is N λ. Due to losses, the average number of photon pairs that one detects per frame is N (η A η B λ + q 2 ). The average number of bits per generated photon pair is H 1,1 (A : B)/(N λ), while the average number of bits per detected photon pair is H 1,1 (A : ). These quantities can be calculated using the conditional probabilities given in Eq (9), within Eq (2). We find that, by using only (1, 1)-frames, the average number of shared bits per detected photon is +P cc P 00 log(P cc P 00 ) + (N − 1)P c0 P 0c log(P c0 P 0c ) , where Γ = (N − 1)P c0 P 0c + P cc P 00 and P ij are given in Eq. (5). The expression for the average number of bits per generated photon has the same form as (11), but with η A η B λ+q 2 replaced in the denominator by λ. We stress that Eq. (11) includes the effects of losses, dark counts and a general source. Furthermore, it can also be applied to situations where the dark count rates are different on each side. This is accomplished by modifying only the probabilities P ij , not the form of Eq. (11).
If we have error correcting codes for both (1,1) and (2,2)-frames, then the total amount of extractable information is the sum of the information for the two cases, i.e., H d (A : . In general, the information we can extract using only (x, y)-frames is calculated using the conditional probabilities (8) and (9). As x and y become large, the resulting expressions become more complex. Nevertheless, we can still obtain analytic expressions by following the same straightforward procedure.
As Alice and Bob publicly announce K A and K B , they are losing all the information contained within the correlation of these quantities. It is possible to develop approaches that retain some of this information. As showed in Sec. II, the correlations in K A and K B give a total contribution of H(K A , K B ) bits per frame to the total shared information per frame. In terms of bits per detected photon pair, we can gain an additional H(K A , K B )/(η A η B λ+q 2 ) bits per photon. In practice, protocols generally will access a certain fraction f of these bits, where 0 ≤ f ≤ 1.

IV. ADDITIONAL ERRORS: AFTER-PULSING AND DETECTOR DEAD-TIME
Losses and dark counts are not the only errors that affect the extractable information. There are also effects such as detector jitter, after-pulsing and detector dead-times. The discussion of jitter is postponed until the next section. In this section, we explain how the formalism is modified to describe after-pulsing and dead-time.
After-pulsing occurs when the detection of a photon sets up a feedback process that can lead to the detector registering a click at a later time [37]. Afterpulsing will thus temporarily increases our chance to see a dark count after we register a click. One approximate model of after-pulsing is to increase the dark count probability q for some fixed number of time-bins β after a detection. One important feature of after-pulsing is that it occurs regardless of what triggered the detector. This means that afterpulsing occurs also for dark counts. The single timebin detection probabilities, Eqs. (5), include contributions from dark counts. This means that our approach will take account of after-pulsing that is generated both by photons and from dark counts.
The value for β can be large [37]. This means that a click near the beginning of a frame can result from after-pulsing from the previous frame. Similarly, the average position of detected photons is random, which means the location of the β timebins will also be random. Recall, however, that we are calculating the shared information for an average frame. To take account of these difficulties, the fairest approach is to modify q for all time-slots. In this case, the information per photon will retain the form given in in Eqs. (11) and (12), but where the value of q has been suitably increased.
After a photon is detected, it is common for a detector to loose sensitivity to subsequent photons for a period of time. This interval of time is know as the detector's dead-time [38]. If the duration of the dead-time is equal to the width of M d time-bins, then we will not observe photons for at least the next M d time-slots after a detection. Dead-time is not a serious problem for (1,1)-frames, provided that the frame is longer than the period of dead-time. In this limit, Eq. (11) is still valid. However, the effects of detector dead-time will be important for classes of frames such as (2,2)-frames. For these cases, we must adopt the following modified procedure.
First, we calculate the moment generating function and the single time-bin probabilities. We then calculate the probabilities to observe each pattern, however, now we must set P (A KA r ), P (B KB s ) and P (A KA r , B KB s ) equal to zero if r or s contain 1's in time-slots that are closer together than M d . We then calculate the new probabilities P (K A , K B ), P (K A ) and P (K B ), together with the new conditional probabilities. Finally, the conditional probabilities are used in Eq. (2) to calculate the conditional mutual information. The approach is best illustrated by an example.
Suppose we have a detector with dead-time of the order of one time-bin width, i.e., M d = 1. This means that it is impossible for Alice to observe two photon measurement patterns such as 1100 or 0110. The probability to observe such patterns must be set to zero, hence P (A 2 1100 ) = 0. This reduces the number of two photon patterns that Alice can observe from N (N − 1)/2 to (N − 1)(N − 2)/2. In general, dead-time reduces the total number of allowed twophoton patterns to It is convenient to introduce a function ∆ M d (X K r ) that is 0 if the pattern X K r contains 1's that are closer together than M d . Otherwise, the function returns the value of 1. For example ∆ 1 (1010) = 1, while ∆ 1 (1100) = 0. The new probabilities to observe measurement patterns can be expressed in terms of P (A KA r , B KB s ), P (A KA r ) and P (B KB s ), the probabilities for the case when there is no dead-time effect. As an example, consider the new probabilities, denoted by a tilde, for the case when M d = 1. We find that The reason for dividing by either (P 00 ) 2 or [P ] 2 is to take into account the fact that, after our two clicks, we cannot detect anything. This is distinct from not observing a click, which happens with joint probability P 00 and marginal probabilities P A 0 and P B 0 . We note that Eq. (13) neglects the effect of obtaining a click within the last time-bin, e.g., observing a pattern such as 01001. In cases like this, we should only divide P (A KA r ) by P . This is because we don't see the dead-time for the last detection. When the frame size is large, the relative probability to observe a click within the last time-bin becomes small. In this regime, our approximation is very good.
The modification of the above results to the case where M d > 1 is straightforward. If we neglect the effects of the frame edge, then P When M d becomes larger, relative to the frame size, the approximation may seem dubious. One can explicitly take account of the edges by changing the probabilities such as P (A 2 1 001). However, as M d becomes large, this also increases the probability that a detector cannot register photons in the beginning of a frame due to dead-time from a click in the previous frame. The effect of these two edge effects is to act in opposite ways. One increases the pattern probabilities, while the other acts to decrease them. The net effect is that, to some extent, both effects compensate for each other. It is thus still a good approximation to neglect both edges.

V. DETECTOR JITTER
One thing we have omitted so far is the temporal response of the detectors. In any real detector, there can be a randomly fluctuating delay between a photon being incident on the detector and it firing. This is very important in time-binned experiments, as it can cause a photon to be registered in the wrong time-bin. This effect is known as detector jitter [39]. In this section we show how jitter can be included within our model. For the sake of clarity, we illustrate the approach only for (1, 1)-frames. The general method, however, can also be applied to other frame classes.
A simple way of modeling jitter is to calculate a discrete set of 'jump' probabilities from the temporal response. Mathematically, the temporal response is the probability distribution to register a photon at a time t after it was incident on the detector. 2 By integrating over the width of each time-bin, we convert the continuous probability distribution into a discrete set of detection probabilities. Suppose a photon is generated within the r-th time-bin. Let J n be the probability that we observe a click within the (r + n)-th time slot. The probability to observe the photon within the correct time-bin is thus J 0 . Clearly, n J n = 1. Often, J n is non-zero only for n = 1 or 2.
One difficulty in modeling jitter is the presence of dark counts. The single time-bin detection probabilities include dark counts, which are not subject to jitter. However, the probability q is the same for each time-bin. 3 The dark count probability is thus invariant to shifts in time. This suggests that contribution from dark counts within P ij should also be approximately invariant to temporal shifts. Hence, to an excellent approximation, it is not effected by jitter. 4 This observation means we can use the jump probabilities directly with the probabilities P ij , without having to separate out the contribution from dark counts.
We begin by looking at the marginal probability for Alice to observe a particular measurement pattern. It is convenient to modify our notation. As we are interested in the case where K A = K B = 1, we represent Alice's pattern as A i , where i is the location of the time-bin where the photons are detected. For example, A 1 represents the pattern where Alice observes a click within the first time-bin.
As a further simplification, we limit our analysis to the case when the detector's response is short enough so that only J 0 and J 1 are greater than zero and J 0 + J 1 = 1 while J 2 = 0. Appendix A explains how to generalize the results to the case when J 2 = 0. Consider two time bins that are away from the edges of the frame. In these time-bins we observe a single click in the second of them, i.e., our measurement pattern is 01. If there were no jitter, then this pattern occurs with probability P A 0 P A c . When the detector jitter is not negligible, then the pattern could have arisen from two possible situations. First, there was no delay and we observe the photons in the correct time-slot, which occurs with probability J 0 . Alternatively, jitter could have caused a photon that was in the first time-bin to be registered within the second. We find that the total probability to observe the pattern 01 is If a photon is incident on the last time-bin within a frame, then jitter can cause it to be lost to the frame. Similarly, a click within the first time-slot of each frame could have come from the previous frame. Let P e be the probability to not see a click within the last time-bin of a frame. We find that We thus see that the probabilities for Alice to observe a given pattern is where 1 < i < N . The results for Bob will have the same form, but with each A changed to a B. The joint probabilities, P (A i , B j ), are more involved. To simplify our exposition, we assume symmetric channel losses, i.e., η A = η B . The case where η A = η B is described in Appendix B. Each pattern can be broken up into a small set of events, from which each pattern can be constructed. For example, one event is described by the probability for both Alice and Bob to observe clicks within the same time-bin. The probabilities of these events can be expressed in terms of J 0 , J 1 and the single timebin probabilities P ij . The probabilities P (A i , B j ) are expressed in terms of the event probabilities.
Let P 11 be the probability for Alice and Bob to both observe clicks within the same time-bin. The fact that the detectors suffer from jitter means that we must consider two time-bins to calculate P 11 . It is found that where we use the fact that P AB c0 = P AB 0c when η A = η B . The probability that both Alice and Bob do not see a click in the last time-bin of their frame is P e 00 = P 00 + J 2 1 P cc + 2J 1 P 0c , where we have again use the fact that P AB 0c = P AB c0 . It is possible that Alice and Bob can see clicks in adjacent time-bins. For example, Alice's detector could fire within the n-th time-bin, while Bob's fires within the (n + 1)-th. Let P 1 * be the probability for Alice's detector to fire in a time-bin directly before Bob's. Similarly, let P * 1 be the probability for Bob to observe a detection in the n-th time-bin while Alice sees one in the (n + 1)-th time-slot. To calculate P * 1 and P 1 * , we need to consider three time-bins. We find that where P 0 = P A 0 = P B 0 and P c = P A c = P B c . The final situation that we consider is when Alice and Bob obtain clicks in different time-bins, which are not adjacent. The fact that J 2 = 0 implies that we can be certain that any photons detected by Alice and Bob were not from the same photon pair. Let P 10 be the probability for Alice to observe a click in a time bin when Bob does not see a click in the same or adjacent time bins. Similarly, P 01 is the probability for Bob to observe a click while Alice does not see one in nearby time slots. We find that P 10 = P 01 = J 0 P 00 P c0 + J 1 P c0 P 0 These event probabilities can be used to construct P (A i , B j ). One complication with calculating P (A i , B j ) is that we be must careful of detection events near the edges of the frame. For instance, the probability P (A 1 , B 1 ) is different from P (A 3 , B 3 ). This difference is due to the fact that a detection in the first time-bin could have come from the previous frame. When the size of the frame becomes large, the relative effects of the edges becomes small. One could thus neglect the effects of the edges 5 . In this case, the probability for Alice and Bob to observe particular measurement patterns is given by In Appendix C, we give the full form of P (A i , B j ), where edge effects are not neglected. Using the probabilities (21), we find that The post-selected information per detected photon pair is thus The assumption that we neglect edge effects means that Eqs. (21), (22) and (23) are all valid only for N ≥ 6. In Appendix C, we compare the approximate results given above with the more complicated exact results. It is shown that even for N = 8, the difference between the exact and approximate expressions can be very small (less than 0.1%). Thus we can safely use the approximate expression given in Eq. (23).

VI. RESULTS FOR A MODE-LOCKED LASER PUMPING A SPDC SOURCE
The previous results will now be illustrated by looking at a specific experimental setup. The situation we consider is a mode-locked laser that produces a train of coherent pulses that pump a nonlinear crystal. The pulses are generated such that each pulse is coherent to one another [21][22][23]. We fix the parameters of the crystal and laser such that we observe SPDC that produces a pair of photons that are correlated in time. The down-converted photon pair is split with one half kept by Alice, while the other is sent to Bob. The experimental configuration is shown in Fig. 1.
The spacing of the pulses define natural time-bins for Alice and Bob. Alice and Bob thus choose the widths of their time-bins so that they contain a single pulse. To a good approximation, the probability that Alice and Bob observe m photon pairs in each time-bin is given by a Poissonian distribution where λ is the average number of photon pairs generated in each time-bin. For this source, the moment generating function, defined in Eq.
Using this within Eqs. (4) and (5) yields joint detection probabilities for each time-bin The marginal probabilities for Alice (Bob) are P . Suppose that the main sources of errors are losses and dark counts. One can use Eq. (25) directly within (11) to determine how many shared bits per detected photon we can extract using only (1, 1)frames. The extra information contained in (2, 2)frames can be calculated using Eq (12). These results can be used to optimize the frame size N . Furthermore, one can also investigate how the experimental parameters affect the number of bits per photon. This could be important, for instance, in evaluating the advantages of improving the detector's efficiency.
To illustrate our results we look at typical parameters for two detectors: a single-photon avalanche detector (SPAD) and a superconducting nanonwire detector. We assume that we have time-bins of width 130 ps. The SPAD has efficiency η = 0.7, dark count In both (a) and (b), the dashed red line is for (1, 1)frames, the dotted blue line is for (2, 2)-frames and the solid black lines if for both the (1, 1) and (2, 2)-frames. Fig. (a) is for a single photon avalanche detector with η = 0.7 and q = 6.53 × 10 −8 . Fig. (b) is for a nanowire detector with η = 0.9 and q = 1.3 × 10 −10 (color online). rate of 500/s and an after-pulsing rate of 0.5%. The effective dark count probability, which includes the effects of after-pulsing, is q = 6.53×10 −8 . For the superconducting nanowire detectors η = 0.9, the dark count rate is 1/s and the after-pulsing rate effectively zero. We calculate the dark count probability as q = 1.3 × 10 −10 . Figures 2 and 3 shows the information within (1, 1) and (2, 2)-frames as a function of the frame size N , for a SPAD and superconducting nanowire detector, for two different values of λ.
We see in Fig. 2 that, for N = 1000, (2, 2)-frames can contain a significant fraction of the total shared bits. This is not always true, however, as shown in Fig. 3. Another important point to note from Fig.  3 is that, for N = 3000, we can extract over 11 bits per photon pair using either of the two detectors. For both the detectors we considered, the efficiency is high. If the detectors have low efficiencies, then we would obtain less information; it would thus become crucial to optimize the frame size. For example, consider a detector with q = 6.53 × 10 −8 and η = 0.3. We find that, for a source with λ = 5.33 × 10 −5 , we obtain 10.3 bits from the (1, 1)-frames by choosing N = 3579.
frames can be difficult. If we find that, for given values of loss and the dark count rate, H d (A : B|K A = 2, K B = 2) is negligible, then we know that it would not be worth using these frames. 6 This result also provides a good guide to determine the regime that one must work in so that (2, 2)-frames contribute significantly. Similarly, one can use the results of Sec. III to calculate the information within (2, 1)frames. One could thus investigate the gains from developing error correcting codes for these and other situations.
The previous results did not include the effects of detector dead-times. However, if we are to fully evaluate the information contained within (2,2)-frames, then we must take this effect into account. Results for this can be found using the approach detailed in Sec. IV. For the SPAD, the dead-time is 30 ns, which corresponds to approximately 230 time-bins, while for the superconducting nanowire, the deadtime is 20 ns, which corresponds to 154 time-bins. Figure 4 compares the the shared information in (2,2)-frames for the case of dead-time and no deadtime, where part (a) is for the SPAD and (b) is for the superconducting nanowire. Both curves are for λ = 5.33 × 10 −5 . While dead-time can reduce the information, we still see that useful information can still be extracted from (2,2)-frames.
In many realistic situations, detector jitter is nonnegligible. We can include the effects of jitter by using the formalism described in Sec. 5. There is, however, a subtlety when one applies the theoretical results to an experiment. It is common to calculate the heralded efficiency directly from experimental data. Detector jitter decreases the probability to observe photons within a particular period of time. If one is not careful, then we could over estimate losses and hence under estimate η. To illustrate this, consider the example where we have a source that can produce a single photon within a specific time-bin. One can use this source to estimate η by looking at the probability w to detect the photon. If our detectors suffer from jitter, then w = η. Instead, we have w = ηJ 0 when we neglect dark counts.
In some situations, under estimating the efficiency can be a good thing. For example, a reduction in η will decrease our estimate of the number of bits we can extract. We could use this as a crude way of taking account of jitter. Such an approach would, however, be too pessimistic if we have already included jitter explicitly within our model. In the rest of this section, we will assume that the total efficiency η has been estimated such that it is completely associated with losses.
Jitter causes a decrease in the correlation within Alice and Bob's timing information. This inevitably leads to a decrease in the number of shared bits. To evaluate the effects we calculate H d (A : B|K A = 1, K B = 1) for the SPAD and superconducting nanowire detector, with a λ = 2.0 × 10 −5 . We take J 0 = 0.9 for the SPAD and J 0 = 0.97 for the nanowire detector. Figure 5 shows H d (A : B|K A = 1, K B = 1) plotted as a function of N . The solid black curve corresponds to the SPAD, while the dashed blue line is for the superconducting nanowire. We see that for an appropriate choice of N , we can still obtain greater than 10 bits per photon using either detector. One can get a better feel for how jitter affects use by looking at how H(A : B|K A = 1, K B = 1) changes with J 0 . Consider a setup with λ = 2.0 × 10 −5 , η = 0.7 and q = 6.53 × 10 −8 , i.e. the parameters for the SPAD. We could extract 11.1 bits per photon for N = 4000, if we had no jitter (J 0 = 1). If instead, J 0 = 0.9, then we could extract 10.2 bits per photon for frames of size N = 4000.

VII. CONCLUSIONS
The time-of-arrival degree of freedom provides an experimentally viable means of implementing highdimensional quantum information protocols, and is particular well suited for quantum communication.
One important example of this is high-dimensional QKD. Such schemes can, however, be hampered by the difficult in performing error correction. A pertinent example of this is in time-bin based QKD, where unlike in polarization based QKD, one cannot use each photon's arrival time to help correct errors. Instead, it is common to split the arrival time into discrete time-bins, which are grouped together to form frames. A key question is how this affects the amount of shared information that Alice and Bob can extract. We answer this question and obtain general results for the maximum number of extractable shared bits for photons entangled within their time-of-arrival, when using frame encoding.
Our results go beyond existing work in a number of areas. First, we present results for frames that contain photon pairs within multiple time-bins. We can thus investigate how many bits are lost by neglecting such events and when such events should be kept. The results can thus be used to improve the efficiency of extraction of shared bits from noisy experimental setups.
Another way in which the current work improves on existing works is in the range of errors considered. We study systems that suffer from asymmetric losses, dark counts, after-pulsing, dead time and jitter. The formalism also works for a general choice of source. The results for the case of jitter are of particular interest. We have found analytic expressions for the extractable information when we have jitter in addition to losses and dark counts. This could be important for optimization of high-dimensional QKD. For example, in some experiments, we have freedom in the choice of the time bin width. Often, one chooses the width such that it minimizes jitter. By using our results, we can more efficiently choose the time-bin width so as to optimize the shared information.
The results are illustrated by considering at entangled photons generated by a nonlinear crystal that is pumped by a mode-locked laser. This source produces a train of pulses that are temporally coherent to one another. Two different types of detector were considered, a SPAD and a superconducting nanowire detector. The results show that under appropriate conditions, we can chose a frame size so as to extract over 10 shared bits per photon pair.
One issue we have not considered is how one might actually extract the shared information, i.e., reconciliation. There has been some work in this area [15,24,25]. Some of the present authors have developed a reconciliation protocol that is tailored to the case of a mode-locked laser pumping a down-conversion source [40]. This protocol can treat multi-photon events and can recover some of the frame-to-frame information contained within the photon number uncertainty.
The calculations in Sec. 5 assumed that J n = 0 for n ≥ 2. This is consistent with the detector's temporal response (i.e., its probability distribution) being effectively zero over more than two time-bins. While this assumption is often true, there are detectors for which it would not hold. In this Appendix, we briefly outline how to generalize the previous results. The aim is not to present extensive results, but instead to show how to adapt the previous results. The general approach is illustrated by investigating at the case when J 2 = 0, but J 3 , J 4 , ... = 0, hence J 0 + J 1 + J 2 = 1.
Recall, J 0 is the conditional probability to register a photon in the correct time-bin, i.e., the time-bin in which the photon actually was incident on the detector. The conditional probability to register a click n time-bins after it was incident on the detector, is given by J n . We first consider Alice's (or equivalently Bob's) marginal probabilities. In the absence of jitter, the probability for Alice to see a click in a given time bin is P A c . When we do have jitter, then a detected photon could have originated in previous time slots. The probability to observe a click thus changes. To calculate the new probability P 1 , we consider three time-bins, as the term J 2 can cause a photon to jump over two time-bins. We find that the probability for Alice to see a click in a given time bin is where P A 0 = 1 − P A c . Equation (A1) is composed of three separate terms. The first term in Eq. (A1) corresponds to a photon that has 'jumped' two timebins due to jitter. The second term is for a photon that is detected within a time-bin directly after the correct one. Finally, the third terms corresponds to the detector firing within the time slot in which it was incident on the detector.
To calculate the probability P e , that we don't see a click at the last time-bin of a frame we must consider two time-bins. We find that the probability is Again, we have three terms corresponding to three possible ways in which the event could be realized. The probability to observe a given measurement pattern is again constructed from P 1 , P e and P A 0 . One thing we must take care of are photons detected in time slots near the edge of each frame. It is possible that these correspond to photons from previous frames, which are registered in a later frame due to jitter. These edge effects mean that P (A 1 ) or P (A 2 ) will not equal P (A i ), where i is a time bin in the middle of the frame. Similarly, P (A N ) = P (A i ), where again i corresponds to a time bin near the middle of the frame. We find that where i < 2 < N − 1. The probability for Alice to post-select on a K A = 1 frame is P (K A = 1) = j P (A j ). The results for Bob will have the same form.
The joint probabilities P (A m , B n ) can be calculated in the same fashion by first recalculating P 11 , P e 00 , P 10 and P 1 * . However, now we require an extra term P 1 * * , which corresponds to Alice seeing a photon two time-bins before Bob does. Each of these probabilities will again be calculated by looking at several time-bins. For instance, to calculate P 11 , we must consider three time-bins for Alice and Bob. As an example, the new form for P 11 is P 11 = J 2 0 P 2 00 P cc + J 2 1 P 00 P cc + J 2 2 P cc + 2J 0 J 1 P 00 P c0 P cc + 2J 0 J 2 P c0 P 0 P 1 + 2J 1 J 2 P c0 P 1 , where we have used the fact that P c0 = P 0c when η A = η B . Notice that (A4) contains more terms than Eq. (17), which was derived for J 2 = 0. These extra terms result from the fact that now the detector's response is longer and thus jitter can cause a photon to be register two time slots after it was incident on the detector.

Appendix B: Jitter with asymmetric losses
The results for jitter given in Sec. V assumed that η A = η B , to simplify the expressions. In this Appendix, we briefly show how the results are modified for asymmetric loss. The marginal probabilities for Alice and Bob contain terms that depend only on η A or on η B . Thus, there is no need to modify these results. The joint probabilities P (A i , B j ) will, however, need to be modified.
The first step is to calculate the probabilities for the individual events, e.g. P 11 , P e 00 , etc. The key issue is that now P c0 = P 0c , which was implicitly assumed within the derivations. As a first step, consider the probability that Alice and Bob both see a click within the same time-bin. We find that P 11 = J 2 0 P 00 P cc + 2J 0 J 1 (P 0c P c0 ) + J 2 1 P cc . (B1) The new probability that both Alice and Bob don't detect photons in the last time-bin of their frame is P e 00 = P 00 + J 2 1 P cc + J 1 [P 0c + P c0 ].
The probability for Alice and Bob to obtain clicks in adjacent time-bins is given by P 1 * and P * 1 . Previously, we found that P 1 * = P * 1 , which is not true in general. We find that P 1 * = J 2 0 P 00 P c0 P 0c + J 2 1 P B c P A 0 P c0 + J 0 J 1 P A 0 P 00 P cc + P B 0 P 0c P c0 , P * 1 = J 2 0 P 00 P c0 P 0c + J 2 1 P A c P B 0 P 0c + J 0 J 1 P B 0 P 00 P cc + P A 0 P 0c P c0 . (B3) The final event probability is for the case when Alice and Bob obtain clicks in different and non-adjacent time-bins. Again we will find that P 10 = P 01 , given explicitly by P 10 = J 0 P 00 P c0 + J 1 P c0 P B 0 + J 0 J 1 P 00 P cc + J 2 1 P c0 P B c . P 01 = J 0 P 00 P 0c + J 1 P 0c P A 0 + J 0 J 1 P 00 P cc + J 2 1 P 0c P A c .
The event probabilities will, again, be used to construct the joint frame probabilities P (A i , B j ). As before, we simplify our results by assuming that we can neglect edge effects. Using this assump-tion, we find that P (A i , B i ) = P 11 P e 00 P AB where |i − j| > 1. The conditional probabilities and all the relevant entropic quantities can now be calculated as before.
Appendix C: Comparison of the exact and the approximate results for jitter In this Appendix, we compare the approximate results for jitter to the longer, but more accurate results. In all of what follows, we assume that only J 0 and J 1 are not equal to zero and that η A = η B .
In general, the frame edges influence the probabilities for each pattern, e.g. P (A 1 , B j ) = P (A 3 , B j ). This is because we analysis each frame separately. We thus loose information about what happens in the time-bins directly before the beginning of each frame. The probabilities shown in Eq. (21) are derived by neglecting the edges. When we include the edges, we find that the probabilities become P (A 1 , B 1 ) = P 11 P e 00 P AB There will be N − 2, P (A i , B i ) terms for 1 < i < N . Similarly, there are N − 3, P (A 1 , B i ), P (A i , B 1 ), P (A j , B N ) and P (A N , B j ) terms, where 2 < i < N and 1 < j < N −1. One can also verify that there are N − 1 terms such as P (A j , B j+1 ) and P (A j+1 , B j ), where j = 1, ..., N − 1. Finally, the number of remaining terms can be found by recalling that the joint probability contains a total of N 2 different outcomes. Equation (C1) is significantly more complicated than (21). Using these probabilities, we calculate H d (A : B|K A = 1, K B = 1) and compare this with the approximate result given in (23). Figure  6 (a) shows a direct comparison for η A = η B = 0.3, λ = 5.33 × 10 −4 , q = 3.9 × 10 −8 and J 1 = 0.4 as a function of N . The solid black curve is exact expression, while the dashed red curve is the approximate expression. The percentage difference between the exact and approximate results is shown in Fig. 6 (b). We see that the agreement between the two results is excellent for large N . Somewhat surprisingly, the approximation is accurate to less that 1% for frames as small as N = 8. The match between the exact results and the approximate ones holds also for different values for η A , η B , λ and q. For example, for η A = η B = 0.7, λ = 5.33 × 10 −5 , q = 6.53 × 10 −8 and J 1 = 0.1, then we find a percentage difference of less than 0.001% for N = 10.  Fig. (a) shows plots of the post-selected information H d (A : B|KA = 1, KB = 1), as a function of the frame size N . The black curve is the exact result, while the dashed red curve is the approximate result. Fig. (b) shows a percentage difference between the exact and approximate results, as a function of the frame size. All curves are for ηA = ηB = 0.3, λ = 5.33 × 10 −4 , q = 3.9 × 10 −8 and J1 = 0.4 (color online).