## The main idea

The key idea of [1] is to apply a self-inverse operation to a quantum state $$|I\rangle$$ encoding a quantum image and a key state $$|K\rangle$$ which can be interpreted as a quantum image of the same format as well. The output of the proposed encryption operation is a quantum state $$|X\rangle$$ which again corresponds to a quantum image of the same format.

## Discussion of the encryption operation

In Eq. (9) of [1], the authors define the NEQR model for a quantum image I of size $$2^n\times 2^n$$ as follows:

\begin{aligned} |I\rangle =\frac{1}{2^n}\sum _{i=0}^{2^n-1}\sum _{j=0}^{2^n-1}|c_{i,j}\rangle |i\rangle |j\rangle . \end{aligned}
(1)

The indices i and j encode the x and y coordinates of a pixel with color value $$c_{i,j}$$. The key K for the encryption is represented in the same way (see also Eqs. (22) and (23) of [1]) as

\begin{aligned} |K\rangle =\frac{1}{2^n}\sum _{\mu =0}^{2^n-1}\sum _{\nu =0}^{2^n-1}|k_{\mu , \nu }\rangle |\mu \rangle |\nu \rangle . \end{aligned}
(2)

For clarity, we use Roman indices ij for the image state $$|I\rangle$$ and Greek indices $$\mu ,\nu$$ for the key state $$|K\rangle$$.

The encryption operation is said to be given by Eq. (24) of [1] which we repeat literally here:

\begin{aligned}&|x_{i,j}^7,x_{i,j}^6,\ldots ,x_{i,j}^0\rangle \otimes |i\rangle |j\rangle \nonumber \\&\quad ={\hat{C}}_{\text {not}}^{\otimes 8}|k_{i,j}^7,k_{i,j}^6,\ldots ,k_{i,j}^0\rangle |c_{i,j}^7,c_{i,j}^6,\ldots ,c_{i,j}^0\rangle \otimes |i\rangle |j\rangle \nonumber \\&\quad =|C_{\text {not}}(k_{i,j}^7,c_{i,j}^7)C_{\text {not}}(k_{i,j}^6,c_{i,j}^6)\ldots C_{\text {not}}(k_{i,j}^0,c_{i,j}^0)\rangle \otimes |i\rangle |j\rangle . \end{aligned}
(3)

Note that the second line of (3) includes the eight qubits specifying the “color” values $$k^7_{i,j}\ldots k^0_{i,j}$$ of the pixed at position (ij) in the key state $$|K\rangle$$, while those qubits are missing from the first and last line.

In the following, we discuss two possible interpretations of the proposed encryption operation. The first is based on the quantum circuit for the encryption algorithm given in Figure 4 of [1] (reproduced here in Fig. 1), while the second is based on the explanation given by the authors around Eq. (24):

”...each entry $$|x_{i,j}^7,x_{i,j}^6,\ldots ,x_{i,j}^0\rangle \otimes |i\rangle |j\rangle$$ is computed according to Eq. (24) for each pair of indices $$|i\rangle$$, $$|j\rangle$$.”

### Encryption following Fig. 4

Looking at the quantum circuit for the encryption algorithm given in Figure 4 of [1], one finds that the qubits related to the pixel positions, denoted $$|X_u\rangle$$ and $$|Y_i\rangle$$ in the figure, do not enter any operation. Furthermore, the qubits related to the pixel positions of the key state $$|K\rangle$$ are not shown at all. The qubits related to the color values in the key state $$|K\rangle$$ are drawn in magenta. The horizontal lines depicting them are not connected to any output. Related, the number of qubits of the right hand side of the first line of Eq. (24) is different from the left hand side of the first line.

When we apply the eight CNOT gates of Eq. (24) (also shown in the figure) to the tensor product of the image $$|I\rangle$$ and the key $$|K\rangle$$, we obtain

\begin{aligned} {\hat{C}}_{\text {not}}^{\otimes 8}\bigl (|K\rangle |I\rangle \bigr )&={\hat{C}}_{\text {not}}^{\otimes 8}\left( \frac{1}{2^n}\sum _{\mu =0}^{2^n-1}\sum _{\nu =0}^{2^n-1}|k_{\mu ,\nu }\rangle |\mu \rangle |\nu \rangle \otimes \frac{1}{2^n}\sum _{i=0}^{2^n-1}\sum _{j=0}^{2^n-1}|c_{i,j}\rangle |i\rangle |j\rangle \right) \nonumber \\&= \frac{1}{2^n}\sum _{\mu =0}^{2^n-1}\sum _{\nu =0}^{2^n-1}|k_{\mu ,\nu }\rangle |\mu \rangle |\nu \rangle \otimes \frac{1}{2^n}\sum _{i=0}^{2^n-1}\sum _{j=0}^{2^n-1}|c_{i,j} \oplus k_{\mu ,\nu }\rangle |i\rangle |j\rangle \nonumber \\&= \frac{1}{4^n}\sum _{\mu =0}^{2^n-1}\sum _{\nu =0}^{2^n-1} \sum _{i=0}^{2^n-1}\sum _{j=0}^{2^n-1} |k_{\mu ,\nu }\rangle |\mu \rangle |\nu \rangle \otimes |c_{i,j}\oplus k_{\mu ,\nu }\rangle |i\rangle |j\rangle . \end{aligned}
(4)

Note that the state on the right of the tensor product sign in (4) depends on both the indices ij and $$\mu ,\nu$$, i.e., the state is in general entangled with respect to the bipartition indicated by the tensor product sign.

When we discard the qubits of the key state $$|K\rangle$$, the state corresponding to the encrypted image will be a mixed state, i.e., a state involving some randomness. An alternative description is to perform a measurement of the qubits related to the key state $$|K\rangle$$ with respect to the computational bases. This will yield a random, uniformly distributed pixel position $$(\mu _0,\nu _0)$$ with corresponding color value $$k_{\mu _0,\nu _0}$$. (As the CNOT gates do not depend on the variables $$\mu ,\nu$$, it is actually sufficient to consider the color value $$k_{\mu _0,\nu _0}$$.) Then, the state of the encrypted image will be

\begin{aligned} |X(\mu _0,\nu _0,k_{\mu _0,\nu _0})\rangle = \frac{1}{2^n} \sum _{i=0}^{2^n-1}\sum _{j=0}^{2^n-1} |c_{i,j}\oplus k_{\mu _0,\nu _0}\rangle |i\rangle |j\rangle . \end{aligned}
(5)

Note that all pixels will be modified in the very same way. More severely, the random value $$k_{\mu _0,\nu _0}$$ is unknown and cannot be deterministically reproduced in the decryption step.

The state in eq. (5) is clearly different from the state

\begin{aligned} |{\widetilde{X}}\rangle = \frac{1}{2^n} \sum _{i=0}^{2^n-1}\sum _{j=0}^{2^n-1} |c_{i,j}\oplus k_{i,j}\rangle |i\rangle |j\rangle . \end{aligned}
(6)

which one would obtain taking the uniform superposition of the states on the right hand side of Eq. (24) for all values of ij.

### Encryption conditioned on the pixel position

As already mentioned, the authors add the explanation that the encrypted image should be computed “for each pair of indices $$|i\rangle , |j\rangle$$.” Recall that the pixel indices ij in the original image and the pixel indices $$\mu ,\nu$$ in the image used as key are independent of each other.

One attempt in that direction could be to condition the CNOT operations on the indices ij and $$\mu ,\nu$$ being equal. In this case, the state of the whole systems reads

\begin{aligned}&\frac{1}{4^n} \sum _{i=0}^{2^n-1}\sum _{j=0}^{2^n-1} |k_{i,j}\rangle |i\rangle |j\rangle |c_{i,j}\oplus k_{i,j}\rangle |i\rangle |j\rangle \end{aligned}
(7)
\begin{aligned}&\quad + \frac{1}{4^n}\sum _{\mu \nu =0}^{2^n-1} \sum _{\begin{array}{c} {\scriptstyle i,j=0}\\ {\scriptstyle (i,j)\ne (\mu ,\nu )} \end{array}}^{2^n-1} |k_{\mu ,\nu }\rangle |\mu \rangle |\nu \rangle |c_{i,j}\rangle |i\rangle |j\rangle . \end{aligned}
(8)

While the summand in (7) somehow resembles the desired state (6), the pixel position ij occurs in the qubits related to the key $$|K\rangle$$ as well. Ignoring those register would be equivalent to measuring the pixel position. Moreover, the whole state is a superposition of the states (7) and (8), again different from the intended result.

## Concluding remarks

More generally, the authors aim at a reversible quantum operation $$U_{\text {enc}}$$ that acts on a quantum image $$|I\rangle$$ and a key $$|K\rangle$$ in order to produce an encoded quantum image $$|X(K)\rangle$$ of the same format as the original image. At the same time, the encoded image $$|X\rangle$$ should not be entangled with the rest of the output of the operation, i.e.,

\begin{aligned} U_{\text {enc}}\bigl (|K\rangle \otimes |I\rangle )=|K'\rangle \otimes |X(K)\rangle . \end{aligned}
(9)

In the proposed encryption protocol, the modification of the image $$|I\rangle$$ is controlled by the basis states of the Hilbert space related to the key $$|K\rangle$$. In general, the key $$|K\rangle$$ is a superposition with respect to that basis, and hence any nontrivial encoding operation necessarily creates entanglement between the Hilbert space of the key and the Hilbert space of the image. Ignoring the key after the encryption operation (partial trace) results in a mixed state of the output, and decryption is not possible in general.