A Little Bit of Classical Magic to Achieve (Super-)Quantum Speedup

We introduce nebit, a classical bit with a signed probability distribution. We study its properties and basic transformations that can be applied to it. Then, we introduce a simple dynamical model – a classical random walk supplemented with nebits. We show that such a model exhibits some counterintuitive non-classical properties and that it can achieve or even exceed the speedup of Grover’s quantum search algorithm. The proposed classical dynamics never reveals negativity of nebits and thus we do not need any operational interpretation of negative probabilities. We argue that nebits can be useful as a measure of non-classicality as well as a tool to find new quantum algorithms.


Introduction
We know that quantum computers can be much faster at certain tasks than classical Turing machines. Flagship examples are the Shor's factoring algorithm [1], the Grover's search algorithm [2] and boson sampling [3]. What we still, at least to some degree, do not know is why it so. It is clear that quantum superposition and quantum entanglement appear in quantum computations but how do they exactly contribute to quantum speed-up? Moreover, we do not have a measure of how non-classical any given quantum computation is. And perhaps most importantly, designing quantum algorithms is notoriously hard. Since the birth of the idea of quantum computation [4] we only know a handful of quantum codes in the quantum supremacy regime. The idea of classical dynamics with nebits is an attempt to address these increasingly important questions at the advent of rapidly developing quantum computers [5].
This paper flirts with a non-orthodox or even iconoclastic ideas. Thus, to preempty an immediate dismissal, we note that these ideas were entertained long before by the other researchers (see for example [6][7][8][9][10][11][12][13][14][15][16][17][18]). For instance, R. Feynman studied negative probabilities in hope to resolve renormalisation issues of quantum electrodynamics [10]. E. Wigner was perhaps more successful, leaving behind a potent Wigner function formalism [11], which is extensively used in quantum optics and other branches of quantum physics. Both Feynman and Wigner were extremely pragmatic in their approach to negative probabilities-as long as the final stage of calculations does not contain negative probabilities, everything is perfectly all right. In this sense, negative probabilities were for them just a computational tool and this is our modus operandi as well. This neatly removes any need to speculate about a meaning of negative probabilities and let us focus on more hardheaded tasks such as a deeper understanding of the advantages of quantum computer.
Before we proceed with our arguments we would like to make another remark, following Wheeler who coined a phrase "great smokey dragon" as a poetic metaphor highlighting some counterintuitive aspects of quantum mechanics. It is well accepted by most physicists (if they are forced to make such a philosophical declaration) that quantum theory is an input-output black box process. You prepare a quantum state (input), evolve it (black box) and finally measure it (output). Wheeler's "great smokey dragon" symbolises strange internal workings of this quantum black box that do not have a classical equivalent. It is futile to look inside the box to see what happens because any observation disturbs and changes it unlike in classical theories where things exist objectively without a need for observation. Our approach parallels this quantum mechanical paradigm. We present a black box with internal classical dynamics supplemented with 'hidden' nebits whose output mimics the output of the Grover search algorithm on quantum computer. Nebits are never observed at the output, they are permanently locked inside of the black box. Again, in a truly classical box one could peek inside and see what is in there but we do not allow this as one of the rules of the game.
Finally, techniques developed in this paper, we hope, can be used as a tool to find new quantum algorithms. Such algorithms are notoriously difficult to come by. Mimicking quantum mechanics with classical stochastic dynamics supplemented with nebits might provide some intuitions or at least hints of how to translate it into purely quantum algorithms.

Nebit and its Properties
We first study some basic properties of a nebit. What follows is by no means a systematic study of this highly counterintuitive yet mathematically precise object. It's rather a bunch of loose observations that we need to explain the main ideas of this paper. Consider a binary system, a nebit, with states labelled as 0 and 1 in analogy to a classical bit. However, unlike for classical bits we allow these states to be taken with signed probabilities, i.e., probabilities that can be negative but still normalised to one: p 0 = 1 + , p 1 = − , where ≥ 0 . As we wrote in the Introduction, we do not attempt to give any operational interpretation to negative probabilities, which we take simply as objects obeying precise mathematical rules [6]. Those rules, with a few irrelevant nuances, are that of Kolmogorovian probability theory.

Entropy of Nebit
To derive an entropy of a nebit we use the following scenario. Let us first consider a random process S that is applied to a single bit in a well define state, say 0. The process does not change the value of the bit with probability q and flips it with probability 1 − q . It is therefore represented by the following bi-stochastic matrix The process S can be realized via the controlled-NOT (CNOT) operation for which the control bit is a random bit with a distribution p 0 = q and p 1 = 1 − q . The CNOT operation flips the target bit if the control bit is 1 and does nothing otherwise.
The initial entropy of the target bit is H 0 = 0 , but after the process it changes to which is exactly the entropy of the control bit. In the above and throughout the paper we use the convention that the logarithm is base two. Next, let us apply another CNOT operation to this target bit, but this time let the control bit be a nebit with a distribution p 0 = 1 + and p 1 = − . The effective operation realized on the target bit is given by a quasi-bi-stochastic matrix The crucial observation is that if q = 1+ 1+2 the process S ′ is an inverse process of S, i.e., the product of the corresponding matrices is an identity matrix S � ⋅ S = I . Therefore, if q = 1+ 1+2 , the target bit after the second CNOT is in its initial well deter- (2) mined state with entropy H 2 = H 0 = 0 . This means that the process S ′ removes exactly H 1 entropy from the target bit, therefore the entropy of the nebit can be defined as which is a negative number H ∈ (−1, 0) . Moreover H → −1 as → ∞.

Negativity Catalysis
In the following sections we are going to show that nebits can be used to improve efficiency of some classical protocols and that this efficiency depends on the amount of negativity . It is therefore natural to ask if it is possible to increase , say, with the help of another nebit. Let us consider two independent nebits, the first one described by probabilities {1 + , − } and the second one by {1 + � , − � } . Their corresponding probability vectors are where the subscript '2' denotes a two-state system. The statistical independence implies that these two nebits can be jointly considered as a four-state system with the probability vector Next, consider a CNOT operation on both nebits After the CNOT the resulting marginal probability distributions are We see that the control nebit does not change but the second nebit becomes more negative. This logical nebit operation can be interpreted as catalysis of negativity. Interestingly, one is unable to use the same two nebits in the second catalysis process because of correlations created by CNOT. The second application of the CNOT operation reverses the catalysis and restors the nebits to the initial state. The CNOT catalysis only works for independent nebits.

Creation of General Negative Probability Distributions
We are also going to show that in some cases it is useful to work with signed probability distributions over more than two states. We are therefore going to present how to obtain such distributions from a single nebit.
where p k ≥ 0 and p k+1 < 0 . Moreover Below we show how to generate this distribution starting from a nebit with a distribution {1 + , − }.
Let us consider two stochastic processes: S 1 and S 2 . The process S 1 starts with a single event that occurs with probability one and generates a probability distribution , distributed over the first k events; S 2 starts with a single event and generates a probability distribution

Classical Dynamics with Nebits
Dynamics of a classical system can be represented as a trajectory in the state space. As we are interested in computational algorithms we limit ourselves to discrete state spaces and discrete time. This is not a serious limitation and one can translate our results to the continuous state space.
We enumerate time steps with integers, t = 0, 1, 2, … and thus a T-step state space trajectory is a sequence of states where s 0 is the initial state of the system. The first step takes the system from s 0 → s 1 and so on. After T steps the system ends up in the state s T .
The dynamics does not have to be reversible but we assume that it is deterministic, i.e., for any given state there exist a unique state to which the system is transformed to in the next step. Indeed, this is what defines a trajectory. For example, a trajectory corresponding to a reversible dynamics can look like this whereas a trajectory corresponding to an irreversible dynamics, given that the transition rules are time-independent, can look as The irreversibility in the latter follows from the loss of information in the state b about its predecessor: has it been a or b?
Randomness in a deterministic dynamics is the result of observer's ignorance about the observed system. It can have two different origins. The first one is insufficient preparation and measurement readouts precision, in which case the initial, intermediate and final states are statistical ensembles. The second one is insufficient knowledge of the exact dynamics of the system such that at some point one is unsure if the correct transition rule is a → b or a → c . In the first case, instead of a single trajectory one follows a collection of trajectories in the state space. In particular, one follows the evolution of a whole region of the state space. For reversible dynamics the size/cardinality of this region is conserved (Liouville theorem) but this may not be true for irreversible systems. In the second case one observes splitting and merging of the trajectories such as in the Brownian motion, which can be modelled by a random walk. In this model the trajectories split because one is not sure whether some complex external forces make the particle move to the left or to the right.
Before we introduce the aforementioned nebit black magic, let us go through a simple example illustrating the exact workings of the model considered in this work. This may be perceived by some readers as an unnecessary pedantism but one can never be too careful with non-orthodox ideas where intuitions have to be built up afresh.
We start with the state space consisting of N states S = {a 1 , a 2 , … , a N } and choose the following cyclic and reversible transition rule a i → a i+1 with a N → a 1 . If the system is initialised in the state a 7 and it evolves for a sufficiently long time its trajectory is Now, if the initial state of the system is not precisely defined, say, it is a 3 with probability p and a 7 with probability 1 − p one ends up with a mixture of trajectories Therefore, after two steps the system is in the state a 5 with probability p and a 9 with probability 1 − p.
Next, let us consider a different situation. Imagine that the system is initiated in the above random state and that due to some external influences there is a probability 1/2 that in one step the transition rule will be applied twice. In other words, before each step one tosses a fair coin and if the result is heads one applies the transition rule once, but if its tails one applies the transition rule twice. Therefore, after one step the mixture of the trajectories is   after two steps it is and so on. After two steps the probability of finding the system in the state a 6 is p 4 + p 4 = p 2 , since there are two trajectories going to this state and we need to sum up their corresponding probabilities.

Trajectories with Negative Probabilities
Now we inject negative probabilities into the dynamics. We start with the following observation: in the previous example there was an obvious assumption that we did not trace trajectories whose probabilities were zero. Such trajectories simply did not occur. However, this assumption is not that obvious when negative probabilities come into play -one needs to be very careful what is traced and what is not.
In order to get some intuition about what can happen let us start with a simple example that leads to troublesome interpretations, which we would like to avoid in the future. Imagine a single step of a random evolution that allows for trajectories with negative probabilities. The system can follow one of four possible trajectories, first three with probability 1/2 and the last one with probability −1∕2 At t = 0 the system is in a 0 , since the probability of finding it in this state is 1 2 + 1 2 = 1 . There are also trajectories starting from a 2 , but we do not observe this state, since the corresponding probability is 1 2 − 1 2 = 0 . Therefore, in the orthodox scenario with only positive probabilities these trajectories would not be considered. However, at t = 1 the trajectories split and we suddenly observe that the system can be in one of the four different states: a 0 , a 1 and a 2 , each with probability 1/2, and a 3 with probability −1∕2 . From the point of view of an observer a single trajectory starting from a 0 splits into two and another two trajectories spontaneously emerge from a 2 . The problem we do not want to deal with is how to interpret trajectories with negative probabilities as well as events with inflated probabilities -e.g. events that are complementary to events with negative probabilities.
In order to avoid these problems we need to set some restrictions on possible dynamics so that negative and inflated probabilities are never directly observed. Perhaps one of the simplest solutions is to impose that whenever a negative probability trajectory and a positive probability trajectory split, the negative probability trajectory must immediately merge with some other positive probability trajectory. This way negative probabilities will always be 'hidden under' positive probabilities. Therefore, in our approach we allow for some spontaneous emergence of negative trajectories, provided that they are always properly compensated with the positive ones. We study in more details some examples of such dynamics in the next sections.

Negative Random Walks
Let us consider T steps of a dynamics generated by a random distribution over some set of K trajectories where a (j) m ∈ S = {a 1 , a 2 , … , a N } corresponds to the state of the system after m-th step along the j-th trajectory. This trajectory occurs with probability p j , which can be negative. Let us assume that the distribution {p 1 , … , p K } is analogous to the distribution described by Eq. (10) which can be generated with the help of a single nebit.
The above evolution can be interpreted as a kind of a random walk on the system's state space S . We call it a negative random walk. In order to be sure that one never observes negative and inflated probabilities we need to set some restriction on trajectories and the distribution {p 1 , … , p K } . To do this let us note that the probability that after m steps the system is in the state a k is m ≠ a k . Therefore, the restriction takes the following form

Super-Ballistic Negative Random Walk on a Chain
A single step of the above random walk can in principle take a system from any state to any other state. However, for many physically motivated examples the topology of a state space S is not arbitrary and is given by some graph. This graph determines which states can be placed on subsequent positions for any allowed trajectory.
Perhaps the most studied example of such a graph is a chain graph, i.e., a segment of a one-dimensional discrete space in which positions are described by integers x = 1, 2, … , N . The chain graph is often used to study diffusion in one-dimensional space. The system can be interpreted as a particle walking on a discrete line where the states represent positions a k ≡ k . In a single step the particle can change its position from x to x ± 1 or stay in the same place (with the exception of boundaries -if Let us first consider a standard random walk in which the particle is initially localized at some position. If we prepare a uniform probability distribution over all trajectories starting from the initial position we will observe a diffusive spread. However, we can manipulate this probability distribution to obtain a ballistic spread. Diffusive spread means that the standard deviation of the spatial probability distribution is proportional to the square root of the number of steps, whereas ballistic means that it is proportional to the number of steps. Due to the topology of a chain graph a random walk on it can be at most ballistic. The greatest spread can be achieved by the following random walk. Consider T = N − 1 steps of a walk generated by an even mixture of two trajectories The particle starts at position x = 1 and then it either stays at x = 1 or always moves one step to the right. Each possibility occurs with probability 1/2. After T steps the particle is in an even mixture of being at x = 1 and x = N . It is straightforward to show that the standard deviation of the position after m steps is Next we show the corresponding negative random walk on a chain can spread much faster, i.e., it can be super ballistic. This effect stems from the spontaneous emergence of trajectories that we observed in the previous section, however this time we are going to show how to avoid direct observation of negative probabilities. We consider T = N − 1 steps along the following mixture of trajectories The trajectories that occur with positive probabilities represent a particle that stays at a fixed position. The only movement is generated by trajectories that occur with negative probabilities. The sum of all positive probabilities in the above distribution is 3 2 + (N − 2) , whereas the sum of all negative ones is −(N − 2) − T . Since all probabilities need to add up to one we conclude that T = 1 2 , hence = 1 2T . The probability that after m-th step the particle is at position x is given by (23) (m) = m 2 . (24) .
This is a quite counter-intuitive nonlocal process. The particle does not move through the chain but rather jumps directly from x = 1 to x = N , seemingly ignoring the topology of the graph. Interestingly, at the beginning and at the end of the walk the above probabilities are the same as in the standard positive probability case considered above. However, it is straightforward to show that in this case the standard deviation of the position after m steps is In Fig. 1 we plot standard deviations (23) and (28) and show that negative random walk on a chain is super-ballistic.
Finally, we should mention at this point the research on quantum walks. Quantum walks are quantum counterparts of classical random walks that take advantage of the interference phenomenon and are known to spread faster than classical ones (though still ballistically on a chain graph). The fast spreading can be used to construct efficient quantum algorithms [19,20]. We speculate that many features of quantum walks can be simulated with classical random walks supplemented with nebits.
(28) Orange dots correspond to a negative random walk and blue dots correspond to a classical random walk described by Eq (22). Negative random walk exhibits super-ballistic behaviour

Search with Negative Probabilities
Consider a database consisting of N elements. We assume that these elements correspond to states of some system S = {a 1 , a 2 , … , a N } . Next imagine that there is a marked element, say the state a N , and our goal is to find it. Let us first discuss a standard probabilistic search algorithm. In this case the optimal search method is the simplest one. We are going to evolve the system through all states and check whether the state of the system is marked or not. Of course, the number of elements we need to check before we find the marked one depends on the way we order them, but since we do not know in advance which element is marked, the best solution is to consider a random order. Therefore, we prepare an even mixture of all trajectories and evolve the system. In the above the sum is taken over all N! permutations of states (i) . A single step of the protocol consists of two parts. First, we check the state of the system. If it is in the marked state a N we stop the protocol and announce: FOUND. If it is not, we PROCEED to the next state along the trajectory.
What is the probability that the announcement is made after exactly t steps of the evolution? There are (N − 1)! trajectories for which a N is at position t, therefore the probability that the announcement is made exactly after t steps is 1/N. Hence, the average search time is Moreover, the probability that the announcement is made after t steps, or earlier, is t/N. Therefore, it takes N steps to be sure that the marked state is found.
Next, let us consider a strategy we could use if we had access to a nebit. Let us start with the same initial mixture of trajectories (29) as before. The new strategy is a simple modification of the previous one. This time if we find that the system is in the marked state we check the outcome of the nebit. If the outcome is zero (with probability 1 + ) we announce: FOUND. However, if the outcome is one (with probability − ) we PROCEED to evolve the system along the trajectory.
First, note that the probability that the announcement is made exactly after t steps is (1 + )∕N . This is due to the same reason as before and the ∕N improvement comes from the use of the nebit. The probability that the announcement is made after t steps, or earlier, is t(1 + )∕N . This time it takes = ⌊N∕(1 + )⌋ steps to be almost sure that the marked state is found. The average search time is [a (1) , a (2) , … , a (N) ], Clearly, for = √ N we achieve the efficiency of the Grover search algorithm, but in principle we can achieve our goal in a single step if = N − 1.
It is obvious that the achieved speedup is due to the inflated probability (1 + ) . What is not obvious is that the proposed protocol, that runs for steps, does not allow one to directly observe negative nor inflated probabilities. To prove it, note that because of the symmetry the probability that the system is in an unmarked state is the same for all unmarked states. Say we focus on a state a 1 . We already known that there are (N − 1)! trajectories for which a 1 is at position t. And because we have used the nebit we need to count for how many of such trajectories the marked state a N precedes a 1 . If we fix positions of a 1 and a N we get (N − 2)! different trajectories. Given that a 1 is at position t, a N can take one of the t − 1 preceding positions or one of the N − t remaining positions. A trajectory for which a N follows a 1 occurs with the probability 1 N! , but a trajectory for which a N precedes a 1 occurs with the probability − N! . Therefore, the probability that during the t-th step the system is in the state a 1 is This probability decreases linearly in t and it reaches zero for t = N+ 1+ ≥ . For large N the un-realistic single step speedup comes at the cost of the negative nebit entropy H N−1 of around −1 . In case of Grover speedup �H √ N � < �H N−1 � , however H √ N → −1 as N becomes large. Interestingly, hence in both cases the negative entropy cost is comparable.

Discussion
It is a well established tradition in theoretical physics to play with non-orthodox ideas to gain some insights into behaviour of complicated physical systems. In this paper, we study limits of non-classical behavior with the help of a hypothetical nebit -a binary system with signed probabilities. We present a simple classical simulation of some aspects of quantum computation. We are not interested in philosophy of nebits but only in their formal mathematical properties necessary to achieve quantum speedup using simple, classical dynamical systems. We hope that the insights we can gain from studying nebits will show us different ways of understanding quantum computing and quantum mechanical processes in general. What immediately springs to mind is (a) quantitative classification of quantum computing algorithms' non-classicality in relation to their classical counterparts and (b) a way to generate new quantum algorithms from nebit supplemented classical ones. We already elaborated on (a) in this manuscript calculating the nebit cost of the Grover search algorithm (analysis of quantum contextuality and other quantum algorithms in preparation). It will be interesting to elaborate on (b) but this is the scope of our future research. The nebit model presented here may resemble a non-contextual hidden variables (NHV) theory, but this similarity is superficial. NHV models are, by definition, positive joint probability distributions for quantum measurements, including complementary observables such as spin measurement along two orthogonal directions or position and momentum of a quantum particle. We must stress that negative probabilities never appear in NHV models. The nebit model simulates positive probability distributions observed in the lab but non-observable events, forbidden by quantum theory, can have negative probability distributions in the nebit model, something we do not know how to interpret and thus irrelevant to us. In this sense, we can see nebits as an alternative to quantum mechanical probability amplitudes.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.