1 Introduction

Humans detect edges in 2D images routinely visually. In industrial applications, edge detection is used, e.g., for extracting the structure of objects, features, or regions within an image. Thereby, changes in material properties like surface defects can be detected.

In classical image processing, a standard way to highlight the edges is to compute the image gradient. It requires the pixel-wise computation of the partial gray value derivatives which is achieved by convolving the image with a filter mask assigning suitable weights to pixels in a chosen discrete neighborhood. In the filtered image, high gray values indicate a gray value change in the original, whereas low gray values indicate homogeneous neighborhoods without changes and edges. Various methods for edge detection have been suggested, for instance, the Prewitt, Sobel, or Laplace filters or Canny’s edge detector (Gonzalez and Woods 2018). The idea is to calculate pixel-wise approximations of the derivatives in vertical and horizontal direction. The filters mentioned above differ in the choice of the weights in a \(3 \times 3\) filter mask. In the Canny edge detector, filtering is complemented by applying some threshold functions to suppress non-maxima and decide which components are really edges and which are rather due to noise.

Exploiting a real quantum computer, we can benefit from exponentially lower memory usage in terms of the number of qubits compared to the number of bits needed to represent an image classically. Several approaches for quantum edge detection have been proposed. However, most of them are only formulated in theory or for a quantum computer simulator (Mastriani 2014; Fan et al. 2019; Widiyanto et al. 2019; Ma et al. 2020; Zhang et al. 2015). When applied on real quantum computers, they are limited by high error rates, the small number of qubits available, and low coherence times. This can lead to results too noisy to be interpretable.

For example, in QSobel (Zhang et al. 2015) — a quantum version of the well-known classical Sobel filter — some steps like the COPY operation or the quantum black box for calculating the gradients of all pixels can currently not be implemented. To fill these gaps is a topic of current research. The Quantum Hadamard Edge Detection algorithm was suggested as a more efficient alternative (Yao et al. 2017). Implementations for a state vector simulator for an \(8\times 8\) pixel gray value image and for a \(2\times 2\) pixel image on a real quantum computer are provided in the Qiskit textbook (Asfaw et al. 2021). Larger image sizes are briefly discussed, too, but to our knowledge have not yet been tested in practice.

Here, we introduce a hybrid method motivated by classical filtering and making use of Tacchino’s quantum machine learning algorithm (Tacchino et al. 2019) and its extension to gray value images (Mangini et al. 2020). We use a quantum information-based cost function to compare an image patch of a test image with a binary filter mask. We perform this calculation for two filter masks highlighting vertical and horizontal edges and combine their results. With the filter mask size, we control the number of qubits and gates. For the edge detection task, we only need very few gates and by that keep the error in the current NISQ era low. In our method, the error of each circuit is independent of the image size. This way, we can push the size of images that can be processed on the current circuit-based superconducting quantum computers of IBM (2021) to a yet unreached limit.

This paper is organized as follows. In Section 2, we roll out the strategy for solving the edge detection task by an artificial neuron and filtering in the purely classical setting. In Section 2.2, we explain some preliminaries needed to replace the classical artificial neuron by its quantum version in Section 2.3. In Section 2.4, we present the idea of our quantum edge detector with 2D and 1D masks, and discuss the improvements of the version with the 1D mask in theory. We describe the experimental setup in Section 3. Experimental results are shown and discussed in Section 4. Section 5 concludes the paper.

2 Method

2.1 Edge detection by artificial neurons

In classical image processing, we find edges of objects in an image by filtering (Gonzalez and Woods 2018). First- or second-order derivatives can be used to track gray value changes in the image and thereby to detect edges. Here, we opt for the first-order derivatives. We calculate digital approximations of the partial derivatives at every pixel location in the image. Let \(I_{in}\) be a gray value input image. Then, the partial derivatives in x- and y-direction are estimated by

$$\begin{aligned} \frac{\partial I_{in}(x,y)}{\partial x}\approx I_{in}(x+1,y)-I_{in}(x,y) \end{aligned}$$


$$\begin{aligned} \frac{\partial I_{in}(x,y)}{\partial y}\approx I_{in}(x,y+1)-I_{in}(x,y). \end{aligned}$$

This can be implemented by convolving \(I_{in}(x,y)\) with the one-dimensional filters from Fig. 1 a and b, respectively.

Fig. 1
figure 1

One-dimensional (a, b) and two-dimensional filter masks (c, d) for horizontal and vertical direction

Several approaches for avoiding edge effects are commonly used, e.g., zero padding, replicate padding, or mirroring, see Gonzalez and Woods (2018). We use mirroring to prevent generation of artificial edges. Two-dimensional discrete derivative filters can be defined analogously, as the examples in Fig. 1 c and d. In general, there is no limit to the range of values of the weights or the size of the filter masks (Gonzalez and Woods 2018). However, we only cover these two filter masks for the two-dimensional case in this paper. Using the derivative filters for both directions, we create two output images \(I_{h}, I_{v}\) highlighting horizontal and vertical edges, respectively. We combine them by a pixel-wise maximum:

$$\begin{aligned} I_{both}=\max (I_{h},I_{v}). \end{aligned}$$

A pixel in the image \(I_{both}\) has a high gray value if it belongs to an edge in horizontal or vertical direction. If neither the horizontal nor the vertical filter detects an edge, then both \(I_{h}\) and \(I_{v}\) have low values which yields a small maximum in \(I_{both}\).

Finally, we segment the edge pixels by a binarization, for instance, a global gray value threshold chosen by Otsu’s method (Gonzalez and Woods 2018). Post-processing is performed by using Fraunhofer ITWM’s image processing software ToolIP (Fraunhofer Institute for Industrial Mathematics 2021).

A special case of classical filtering is also found in the basic element of a feed-forward network, a certain kind of an artificial neural network. The goal is to learn values of weights such that a function f mapping the inputs to some outputs y is well approximated. Let \(c\), \(w\) be the real valued classical input and weight vectors, respectively. The basic element of an artificial neural network is an artificial neuron — a mathematical function, which first calculates the weighted sum of one or more inputs and then applies a non-linear activation to yield the output. It is defined by

$$\begin{aligned} y=f(c, w)=\rho (w^T \,c+b), \end{aligned}$$

where \(\rho\) is the activation function and b an additional so-called bias shifting the activation function for more flexibility of the network. By connecting a large number of artificial neurons in layers and by ordering layers consecutively, we can construct a feed-forward neural network. For further details, we refer to Goodfellow et al. (2016).

2.2 Quantum image processing preliminaries

We summarize some quantum image processing preliminaries, before explaining our quantum version of the edge detector. We start at a classical image, want to find the edges in the image using a quantum computer, and finally get back a classical image in which the edges are highlighted. A key element to achieve this is encoding of the gray values of the classical image into quantum states. For encoding, unitary operations, also called gates, are applied to an initial state of the quantum computer. There are several encoding methods like basis, amplitude, or phase encoding (Weigold et al. 2021).

We use phase encoding to keep the number of qubits low. That means, we first transform the 8-bit gray values of an image into angles \(\theta =(\theta _0, \ldots , \theta _{N-1})\) with \(\theta _j\in [0,\pi ]\), \(j\in \{0,\ldots ,N-1\}\). Similar to Geng et al. (2021), we use the linear transformation

$$\begin{aligned} \theta _j=c_j/255\cdot \pi , \end{aligned}$$

calculated element-wise, for all \(j\in \{0,\ldots , N-1\}\). The transformed input vector is defined by

$$\begin{aligned} \tilde{c}=(e^{i\theta _0}, e^{i\theta _1}, \ldots , e^{i\theta _{N-1}}). \end{aligned}$$

This way, we transform the gray values to angles, encode them as phases in the quantum computer, and measure the outcome. The measurement itself is probabilistic. That means, we run the same algorithm multiple times, count the frequencies of the possible states, and derive an empirical probability distribution. The number of executions is also called number of shots.

To compare the outcomes of the quantum computer, we use the Hellinger fidelity (Hellinger 1909) derived from the Hellinger distance. Let PQ be two discrete probability distributions with probability weights \(p=(p_1, \ldots , p_n)\), \(q=(q_1, \ldots , q_n)\). Then, the Hellinger distance is defined by

$$\begin{aligned} HD(P,Q)=\frac{1}{\sqrt{2}}\sqrt{\sum _{j=1}^n\left( \sqrt{p_j}-\sqrt{q_j}\right) ^2}, \end{aligned}$$

see Hellinger (1909). The Hellinger fidelity is defined by

$$\begin{aligned} F(P,Q)=\left( 1-HD^2(P,Q)\right) ^2=\left( \sum _{j=1}^n\sqrt{p_jq_j}\right) ^2. \end{aligned}$$

It takes values in the interval [0, 1] with higher values for more similar distributions.

Gibbs and Su (2002) compare a variety of distance metrics for probability distributions, including the popular discrepancy metric, Kullback-Leibler divergence, and total variation distance, and discuss their relation to the Hellinger distance. We follow the quantum computing community’s mainstream in choosing the Hellinger fidelity because it matches the square of the absolute amplitude of a quantum state. Fuzzy similarity (Acampora et al. 2021) has been suggested, too. Systematically comparing distance measures is however outside the scope of this paper.

For all quantum calculations, we use the open-source software development kit Qiskit (Abraham et al. 2019). Besides the standard gates like NOT (X), controlled-NOT (CX), multi-controlled-NOT (C\(^n\)X), or Hadamard (H) gates, we also use phase shift gates (P) in this paper to encode the classical information. They have the matrix form

$$\begin{aligned} {P}(\theta )=\left( \begin{array}{ccc} 1 &{}&{} 0\\ 0 &{}&{} e^{i\theta } \end{array}\right) , \end{aligned}$$

and represent a rotation around the Z-axis by an angle \(\theta\) in the Bloch sphere. Ancilla qubits are also used, especially in conjunction with controlled operations such as the CX- or C\(^n\)X-gates. These are additional qubits which can be used for storing states, since quantum computers implement only reversible logic. At the price of more CX- or C\(^n\)X-gates, ancilla qubits can reduce the number of measurements. This is advantageous because we only need one measurement of the ancilla qubit. Thus, the structure is comparable to the classical artificial neuron with one outcome.

For the definitions and implementations of the standard gates and other basic concepts, we refer to Nielsen and Chuang (2000) and Asfaw et al. (2021). Additionally, Table 1 provides some standard quantum mechanical notions from linear algebra used in this paper.

Table 1 Standard quantum mechanical notions from linear algebra, similar to Nielsen and Chuang (2000)

2.3 Quantum artificial neuron

Our quantum edge detector is motivated by Tacchino’s (Tacchino et al. 2019) quantum algorithm for an artificial neuron. We will use the extension from Mangini et al. (2020) which also allows for treating gray value images. Our method is sketched in Fig. 2. In the following, we explain each step, from left to right.

Fig. 2
figure 2

Scheme for edge detection in a \(30 \times 30\) pixels sample image using \(2\times 2\) filter masks. We use the ‘qasm_simulator’ and IBM’s German backend ‘ibmq_ehningen’ (executed on November 15, 2021) with 32,000 shots, and ToolIP (Fraunhofer Institute for Industrial Mathematics 2021) for post-processing

Let \(\left| {k}\right. \rangle\) be the \(2^n\)-dimensional computational basis states indexed by the decimal number k corresponding to the vector of zeros and ones as binary number. We write the corresponding quantum state for the input vector \(\tilde{c}\), see (6), using \(n=\log _2N\) qubits

$$\begin{aligned} \left| {\Theta }\right. \rangle =\frac{1}{2^{n/2}}\sum _{k=0}^{2^n-1}\tilde{c_k}\left| {k}\right. \rangle . \end{aligned}$$

and encode the weight vector analogously as

$$\begin{aligned} \left| {\Gamma }\right. \rangle =\frac{1}{2^{n/2}}\sum _{k=0}^{2^n-1}\tilde{w_k}\left| {k}\right. \rangle . \end{aligned}$$

for weights \(\gamma =(\gamma _0, \ldots , \gamma _{N-1})\) with \(\gamma _j\in [0,\pi ]\) and corresponding vector

$$\begin{aligned} \tilde{w}=(e^{i\gamma _0}, e^{i\gamma _1}, \ldots , e^{i\gamma _{N-1}}). \end{aligned}$$

The inner product of the encoded input \(\Theta\) and weight quantum states \(\Gamma\) is then

$$\begin{array}{l}\left\langle\Gamma|\Theta\right\rangle=\frac1{2^n}\sum_{k,l=0}^{2^n-1}\widetilde{c_k}\widetilde{w_l}^\ast\left\langle l|k\right\rangle\\\;\;\;\;\;\;\;\;\;\;\;=\frac1{2^n}\widetilde c^T\widetilde w\;\ast=\frac1{2^n}\left(e^{i\left(\theta_0-\gamma_0\right)}+\cdots+e^{i\left(\theta_{2n-1}-\gamma_{2n-1}\right)}\right),\end{array}$$

where the second equality follows from the orthonormality of \(\left| {k}\right. \rangle\) and \(\left| {l}\right. \rangle\). Thus, the calculation corresponds to the scalar product of the input vector from (6) and the conjugated weight vector from (12), analogously to the classical artificial neuron. We set \(b=0\) in (4) and \(\rho (\cdot )= |\cdot |^2\) is the activation function of the quantum neuron.

To encode the inner product, unitary operations/gates have to be applied. In quantum computing, the qubits are usually initialized in well-prepared states. First, we transform this initial state into the input quantum state by the unitary operation \(U_I\). The following operation \(U_W\) yields the inner product of input and weight quantum state. Via a multi-controlled-NOT (C\(^n\)X) gate targeting an ancilla qubit and controlled by n qubits, we extract the result by measuring the ancilla qubit.

In Qiskit (Abraham et al. 2019), the state \(\left| {0}\right. \rangle\) is the initial state for all qubits. Thus, the n-qubit state at the beginning is \(\left| {00\ldots 0}\right. \rangle =\left| {0}\right. \rangle ^{\otimes n}\), where \(\otimes\) stands for the tensor product. The operation \(U_I\) creates the input quantum state

$$\begin{aligned} U_I\left| {0}\right. \rangle ^{\otimes n}=\left| {\Theta }\right. \rangle \end{aligned}$$

as given in (10).

It can be built in two steps. First, we apply Hadamard gates \({H}^{\otimes n}\) to the qubits, to create a balanced superposition state \(\left| {+}\right. \rangle ^{\otimes n}\) with \(\left| {+}\right. \rangle =(\left| {0}\right. \rangle +\left| {1}\right. \rangle )/\sqrt{2}\).

Second, the appropriate phase has to be added to the equally weighted superposition of all the states in the n qubits computational basis, in order to obtain \(\left| {\Theta }\right. \rangle\). This corresponds to the diagonal unitary operation

$$\begin{aligned} U(\theta )=\left( \begin{array}{cccc} e^{i\theta _0} &{} 0 &{}\cdots &{} 0 \\ 0 &{}e^{i\theta _1} &{}\cdots &{} 0 \\ \vdots &{} \vdots &{}\ddots &{} \vdots \\ 0 &{} 0 &{}\cdots &{} e^{i\theta _{2^n-1}} \end{array}\right) . \end{aligned}$$

Instead of calculating the complete unitary matrix \(U(\theta )\), we decompose it into

$$\begin{aligned} U(\theta )=\prod _{j=0}^{2^n-1}U(\theta _i) \end{aligned}$$

with \(U(\theta _j)\left| {j}\right. \rangle =e^{i\theta _j}\left| {j}\right. \rangle\). With one \(U(\theta _j)\), we apply a phase shift to one computational basis state and leave all the other states unchanged. Practically, this is realized by a combination of X-gates (to which state we want to apply a phase shift) and a multi-controlled phase shift gate C\(^{n-1}\)P\(( \theta )\) as defined in (9) for \(n=1\). In total, we have

$$\begin{aligned} U_I\left| {0}\right. \rangle ^{\otimes n}=U(\theta ) {H}^{\otimes n}\left| {0}\right. \rangle ^{\otimes n}=\left| {\Theta }\right. \rangle . \end{aligned}$$

The unitary \(U_W\) is encoded similarly, just conjugated, see Mangini et al. (2020). Consequently, the actual prepared quantum state is

$$\begin{aligned} \left| {\phi }\right. \rangle =(U(\gamma ) {H}^{\otimes n})^{\dagger} \left| {\Theta }\right. \rangle . \end{aligned}$$

To extract the results, we apply X-gates X\(^{\otimes n}\) to the qubits, such that the desired coefficient is the one of the state \(\left| {1}\right. \rangle ^{\otimes n}\). This step completes the unitary operator

$$\begin{aligned} U_W= {X}^{\otimes n} {H}^{\otimes n}U(\gamma )^\dagger . \end{aligned}$$

Finally, we use an ancilla qubit as in Mangini et al. (2020) and map the result to it by a multi-controlled-NOT gate (C\(^n\)X) with n control qubits

$$\begin{aligned} {C}^n {X}( {X}^{\otimes n}\left| {\phi }\right. \rangle \left| {0}\right. \rangle _a)=\sum _{j=0}^{2^n-2}r_j\left| {j}\right. \rangle \left| {0}\right. \rangle _a+r_{2^n-1}\left| {11\cdots 1}\right. \rangle \left| {1}\right. \rangle _a, \end{aligned}$$

where \(r_{2^n-1}=\left\langle {\Gamma }\vert {\Theta }\right\rangle\).

Probabilities in quantum mechanics are represented by the squared modulus of wave function amplitudes (Born rule (Born 1926)). This fact, combined with the global phase invariance, yields the activation function

$$\begin{aligned} |\left\langle {\Gamma }\vert {\Theta }\right\rangle |^2=\frac{1}{2^{2n}}\left|\sum _{j=0}^{2^n-1}e^{i(\theta _j-\gamma _j)}\right|^2=\frac{1}{2^{2n}}\left|1+\sum _{j=1}^{2^n-1}e^{i(\tilde{\theta }_j-\tilde{\gamma }_j)}\right|^2, \end{aligned}$$

where \(\tilde{\theta }_j=\theta _j-\theta _0\) and \(\tilde{\gamma }_j=\gamma _j-\gamma _0\) for \(j\in \{1,\ldots ,2^n-1\}\). Figure 3 shows the circuit for \(n=2\) qubits.

Fig. 3
figure 3

Quantum circuit for a \(2\times 2\) input image patch (encoded in \(U_I\)) and \(2\times 2\) filter mask (encoded in \(U_W\)). Two qubits plus an additional ancilla qubit are needed. The redefined pixel values for the input are encoded in angles \(\tilde{\theta }_j\) and the filter mask weights in \(\tilde{\gamma }_j\), for \(j\in \{1,2,3\}\). It holds \(\tilde{\theta }_j=\theta _j-\theta _0\) and \(\tilde{\gamma }_j=\gamma _j-\gamma _0\) for \(j\in \{1,2,3\}\)

2.4 Quantum edge detection

2.4.1 Quantum edge detection with 2D mask

In order to use the idea of the quantum artificial neuron of the previous section for quantum edge detection, we have to split the input image into \(2\times 2\) patches. The vectorized version of this patch serves as input vector c and the vectorized version of the 2D mask as weight vector w. The main idea of our quantum edge detection is to replace the classical calculation of the inner product by the quantum artificial neuron. All the other classical steps from Section 2.1, like selecting vertical and horizontal directions, combining them, and applying a threshold, remain the same and are calculated on a classical computer.

2.4.2 Quantum edge detection with 1D mask

For the sake of generality, we chose an approach motivated by classical 2D filtering. However, if we are interested in edges only, then we can also use one-dimensional filter masks as in Fig. 1 a and b. This is advantageous since we only have to encode two classical pixel values into a quantum state. We only need one qubit for that and fewer gates compared to the two-dimensional filtering described above. That way, the algorithm is much less error-prone. A circuit for the one-dimensional case is shown in Fig. 4.

Fig. 4
figure 4

Quantum circuit for a two-pixel input image patch (encoded in \(U_I\)) and two-pixel filter mask (encoded in \(U_W\)). The redefined pixel values for the input are encoded in angles \(\theta _j, \theta _{j+1}\) and the filter mask weights in \(\gamma _j, \gamma _{j+1}\). By that, we get the one-dimensional filter as visualized in Fig. 1 a and b. The H- and P-gates are converted further into basis gates in the transpilation step. In total, two SX- and three R\(_ {z}\)-gates are needed for this circuit

Only two Hadamard gates (H), four Phase gates (P), and two NOT gates (X) are required. Analytically, we describe the circuit by

$$\begin{aligned} \begin{aligned} U(\theta , \gamma )\left| {0}\right. \rangle &=U_W U_I\left| {0}\right. \rangle = {HP}(-\gamma _j) {XP}(-\gamma _{j+1}) {P}(\theta _{j+1}) {XP}(\theta _j) {H} \left| {0}\right. \rangle \\&=0.5\left( e^{i\lambda _{j+1}}+e^{i\lambda _j}\right) \left| {0}\right. \rangle + 0.5\left( e^{i\lambda _{j+1}}-e^{i\lambda _j}\right) \left| {1}\right. \rangle , \end{aligned} \end{aligned}$$

where \(\lambda _j=\theta _j-\gamma _j\) and \(\lambda _{j+1}=\theta _{j+1}-\gamma _{j+1}\). Note, that the angles for the filter mask weights in \(U_W\) are the negative of the original one and the order of the gates in \(U_W\) is completely opposite to \(U_I\) (see (13) and Mangini et al. 2020). At the end, we get the probability of measuring the qubit in state \(\left| {0}\right. \rangle\) as

$$\begin{aligned} |\langle \left. {0}\right| U(\theta ,\gamma )\left| {0}\right. \rangle |^2=\frac{1}{4}|e^{i\lambda _{j+1}}+e^{i\lambda _j}|^2, \end{aligned}$$

where \(\theta =(\theta _j, \theta _{j+1})\) and \(\gamma =(\gamma _j, \gamma _{j+1})\). t is estimated by counting the frequencies of the \(\left| {0}\right. \rangle\) state of multiple runs and coincides with the absolute square of the amplitude in front of the \(\left| {0}\right. \rangle\) state of (22).

On a real backend, all unitary gates of the circuits are transformed to the basis gates. Here, this results in a decomposition into three R\(_{z}\)- and two SX-gates and a circuit depth of six. Other gates, in particular the error-prone CX-gates are not required. For the two-dimensional case shown in Fig. 3, we need around 31 R\(_ {z}\)-, 22 CX-, 6 SX-, 5 NOT gates, and have approximately a circuit depth of 49 depending on the coupling map and the configuration of the chosen backend.

A drawback of the quantum edge detection with 1D mask are some missing elements in the image, as visible in Fig. 14 in Section 4.2. To circumvent this problem, an additional direction is required. We add input image patches for the diagonal direction \(I_{in}(x,y)\), \(I_{in}(x+1,y+1)\) of the input image. The outcome \(I_{d}\) is combined with those for the vertical and horizontal directions by the pixel-wise maximum yielding

$$\begin{aligned} I_{total}=\max _{m\in \{h,v,d\}}I_{m}. \end{aligned}$$

2.4.3 Improved quantum edge detection with 1D mask

Using the 1D masks, we need fewer gates, and therefore also observe less noise. Now we modify this solution in several ways to further reduce the numbers of circuits and jobs and to shorten the execution time. We compare six variants of the implementation in the following. The first one, denoted by Std32T, is the one-dimensional variant with 32,000 shots from above. In the second one, Std50, we decrease the number of shots to 50 while the method and the circuit remain the same.

The remaining four variants of the one-dimensional quantum edge detector involve mid-circuit measurement, parallelism, and also 50 measurements, and are dedicated to detecting edges in larger images using less circuits to be applicable on the current quantum computers. So far, we have to execute each of the three directions (horizontal, vertical, and diagonal) separately. In the third variant, Seq50, we combine all three directions in one circuit sequentially by using mid-circuit measurements allowing qubits to be individually measured at any point in the circuit. IBM launched this feature of their backends in the beginning of 2021 (IBM 2021). We use it to measure the required qubit three times, once for each direction. Note, that we reset the qubit to its initial state \(\left| {0}\right. \rangle\) after the first and second measurement. Figure 5 shows this variant.

Fig. 5
figure 5

Circuit scheme for Seq50. Quantum circuit for a two pixel input image patch. The first qubit encodes the pixels in diagonal, the second those in horizontal, and the third those in vertical direction. We have \(\gamma _m=(0,\pi )\), for all \(m\in \{h,v,d\}\), since in all cases the first pixel of the filter mask is black and the second white (see Fig. 1 a and b). The H- and P-gates must be converted further into basis gates. In total, six SX- and nine R\(_{z}\)-gates are needed for this circuit

With this improvement, we decrease the number of circuits by a factor of 3. To retrieve the results for the three directions, we marginalize the counts from the experiment over the three indices (0 for diagonal, 1 for horizontal, and 2 for vertical). For that, we use Qiskit’s utility function marginal_counts (Abraham et al. 2019).

In the fourth variant, Para50, we combine the three directions in parallel instead of sequentially as before and marginalize the counts as in the third variant. Figure 6 yields the circuit for this variant.

Fig. 6
figure 6

Circuit scheme for Para50. Quantum circuit for a two-pixel input image patch. The first qubit encodes the pixels in diagonal, the second those in horizontal, and the third those in vertical direction. We have \(\gamma _m=(0,\pi )\), for all \(m\in \{h,v,d\}\), since in all cases the first pixel of the filter mask is black and the second white (see Fig. 1 a and b). The H- and P-gates must be converted further into basis gates. In total, six SX- and nine R\(_ {z}\)-gates are needed for this circuit

Thanks to the parallel execution, we need less time to apply all gates than in Seq50. However, we need three qubits instead of one. The fifth variant, Para50_3pix, extends Para50’s main idea to more pixels. Instead of encoding only one pixel per circuit we parallelize the scheme for three pixels in one circuit as shown in Fig. 7.

Fig. 7
figure 7

Circuit scheme for Para50_3pix. We apply Para50  three times in parallel. The first index of the angle \(\theta\) is the result \(\{0,1,2\}\) of calculating the actual position of the input image patch modulo 3. That means, the first input image patch is encoded in the qubits \(qr_0-qr_2\), the second in \(qr_3-qr_5\), the third in \(qr_6-qr_8\), the fourth again in \(qr_0-qr_2\), and so on. The second index describes the direction \(m\in \{h,v,d\}\). It holds \(\gamma _m=(0,\pi )\), for all \(m\in \{h,v,d\}\), since in all cases the first pixel of the filter mask is black and the second white (see Fig. 1 a and b). In total, 18 SX- and 27 R\(_ {z}\)-gates are needed for this circuit, if we decompose the Hadamard and Phase gates into basis gates

With this adaption, we triple the number of required qubits but simultaneously divide the number of required circuits by three. Clearly, this idea can be extended to more qubits, but we refrain from exemplifying this here.

Finally, a mixture of Seq50  and Para50  leads to the sixth and last variant, that we cover in this paper, SeqPara50. We take the mid-circuit measurement from Seq50, but encode four pixel values in the three directions, instead of only one. That is, we extend Seq50  by two pixels per qubit and parallelize this scheme on two qubits. If we apply operations to a qubit after a mid-circuit measurement, we reset the qubit to the initial state \(\left| {0}\right. \rangle\). The circuit for SeqPara50  is shown in Fig. 8.

Fig. 8
figure 8

Circuit scheme for SeqPara50. We apply Seq50 two times sequentially and two times in parallel. The first index of the angle \(\theta\) is the actual position \(\{0,1,2,3\}\) of the input image patch modulo 4. That means, the first input image patch for all three directions is encoded in the top left Seq block, the second in the top right, the third in bottom left, and the forth in the bottom right block. The second index describes the direction \(m\in \{h,v,d\}\). We have \(\gamma _m=(0,\pi )\), for all \(m\in \{h,v,d\}\), since in all cases the first pixel of the filter mask is black and the second white (see Fig. 1 a and b). In total, 24 SX- and 36 R\(_ {z}\)-gates are needed for this circuit, if we decompose the Hadamard and Phase gates into basis gates

This way we divide the number of required circuits by 12 compared to Std32T and Std50. We need 12 measurements per circuit for the four pixels. Clearly, the idea of this method can be extended for more pixels, both by more qubits and by more operations per qubit.

To compare the outcomes of the six variants, we use the Hellinger fidelity, which is defined in (8).

As reference image, we calculate the pixel-wise maximum of the horizontal, vertical, and diagonal direction of the corresponding analytical descriptions (23). Note, that we use the state \(\left| {0}\right. \rangle\) there. The state \(\left| {1}\right. \rangle\) could also be used but would return the inverse image, with black edges and white background. For the three directions \(m\in \{h,v,d\}\), we have the analytical description

$$\begin{aligned} |\langle \left. {0}\right| U(\theta _m,\gamma _m)\left| {0}\right. \rangle |^2=\frac{1}{4}|e^{i\lambda _{j+1,m}}+e^{i\lambda _{j,m}}|^2. \end{aligned}$$

where \(\lambda _{j,m}=\theta _{j,m}-\gamma _{j}\) and \(\lambda _{j+1,m}=\theta _{j+1,m}-\gamma _{j+1}\).

The pixel-wise maximum of the three resulting images is the reference image. The gray value frequencies of this image enter the Hellinger fidelity as entries \(q_j\). The frequencies of the outcome of the real backends are plugged into (8) as \(p_j\).

3 Near-term quantum computers setting

Here, we describe our setting for evaluating our method from the previous section. It includes software, a classical computer, and quantum computers.

We use the open-source software development kit Qiskit (Abraham et al. 2019) for working with IBM’s circuit-based superconducting quantum computers (IBM 2021). They provide a variety of systems, also known as backends, which differ in the type of the processor, the number of qubits, and their connectivity. Access is provided via a cloud. In this paper, we use the backends ‘ibm_auckland’, ‘ibm_washington’, ‘ibmq_guadalupe’, ‘ibmq_mumbai’, ‘ibmq_sydney’, and ‘ibmq_ehningen’. The corresponding coupling maps are shown in Fig. 9.

Fig. 9
figure 9

Coupling maps of backends used in this paper. Colors code the readout errors (points) and the CX errors for the connections between the qubits (lines). Dark blue indicates a small error, purple a large one. We only show one example (‘ibm_auckland’) for the backends with 27 qubits (ibm_auckland, ‘ibmq_mumbai’, ‘ibmq_sydney’, and ‘ibmq_ehningen’) as the others look quite similar except for small deviations in the errors (see Table 3)

Additionally, the processor types and the performance values of the respective backend in terms of scale (number of qubits), quality (quantum volume), and speed (circuit layer operations per second [CLOPS]) are given in Table 2.

Table 2 Processor type and actual performance of the used backends as measured in December 2021. The backend ‘ibmq_ehningen’ has currently no value for the speed

Besides the various coupling maps and performance values, the backends underlie external influences. Characteristics of the backends, like CX error, readout error, or decoherence times, can change hourly. Calibration should diminish this effect, errors are however averaged over 24 h. Typical average values for CX error, readout error, decoherence times T1, T2, and frequency are shown in Table 3.

Table 3 Typical average calibration data of the six chosen backends. The values are from December 2021

In addition to quantum computers, a classical computer is needed for preparing data and generating and storing the circuits before sending them to the quantum computer. We use a computer with an Intel Xeon E5-2670 processor running at 2.60 GHz, a total RAM of 64 GB, and Red Hat Enterprise Linux 7.9.

Transpilation is needed for transferring a circuit designed on a classical computer to a quantum computer: First, to match the topology of a specific backend (see for example in Fig. 9). Second, to transform all gates to basis gates. Third, to optimize the operations. We use the default transpiler of Qiskit (Abraham et al. 2019; Asfaw et al. 2021).

4 Experimental results

In this section, we show examples of what can be expected with current hardware for a classical edge detection task.

4.1 Quantum edge detection with 2D mask

4.1.1 Binary image

Starting with the experiment from Fig. 2, we use a \(30\times 30\) binary sample image and two binary filter masks in horizontal and vertical direction (see Fig. 1 c and d). Black pixels are interpreted as an angle 0 and white pixels as π. For each combination of input image patch and filter mask, we create one circuit. Thus, the \(30\times 30\) sample image requires 900 circuits for each direction. The results are interpretable and correct (see Fig. 2 on the right side) even without error correction or mitigation techniques to reduce noise.

If we plot the histogram of the pixel-wise maximum of both directions \(I_{both}\) (see Fig. 10), the various types of pixels (edges, background, diagonals or endpoints of lines) are clearly distinguishable in three areas. Consequently, it is easy to choose a suitable threshold value.

Fig. 10
figure 10

Typical histogram of the combined output image by applying the pixel-wise maximum of both directions. Each possible grayvalue from 0 to 255 forms a histogram bin (horizontal axis). The vertical axis yields the corresponding frequencies. The threshold can be chosen in a threshold range even if the method is applied on a real backend (used backend here: ‘ibmq_ehningen’)

For binary images, it is in theory also possible to use an approach based on the generation of hypergraph states similar to Tacchino et al. (2019) instead of the circuit given in Fig. 3. This is due to the fact that the prepared real equally weighted states like in (10) and (11) (\(\tilde{c_k}, \tilde{w_k}\in \{-1,1\}\)) coincide with the quantum hypergraph states (Rossi et al. 2013). By that, we can decrease the number of gates, especially the number of controlled gates. Since we deal with a quite small circuit, using the hypergraph states only yields a small improvement. For larger circuits, especially with multiple qubits that have to be entangled, the difference will be more pronounced.

4.1.2 Gray value image

As a toy example for a gray value image, we created a \(30\times 30\) image (see Fig. 11a) with sharp edges. The quantum algorithm and the method are the same as above since the algorithm is already adapted to gray value images. We insert the angles (converted gray values as shown in Section 2.3) into the quantum algorithm, get the results, and post-process as in the binary case. The outcomes are shown in Fig. 11.

Fig. 11
figure 11

Results for a \(30\times 30\) gray value image, created for test purposes. Only small deviation between the \(I_{both}\) of the ’qasm_simulator’ (sim) and backend ‘ibmq_ehningen’ (back). All edges are detected in both cases

Of course, the gray values affect the values of the outcome. Compared to Fig. 2, the \(I_{both}\) image in Fig. 11 b and c also shows lower values for the foreground and higher values for the background, which makes the threshold choice more difficult (see Fig. 12). The three areas shown in Fig. 10 are partly no longer distinguishable for all single pixels. However, it is still possible to detect all of the edges.

Fig. 12
figure 12

Typical histogram of the combined output image by applying the pixel-wise maximum of both directions (used backend here: ‘ibmq_ehningen’). Each possible grayvalue from 0 to 255 forms a histogram bin (horizontal axis). The vertical axis yields the corresponding frequencies

Figure 13 shows the outcomes for a downscaled classical image processing test image. The main edges in the image are detected.

Fig. 13
figure 13

Results for the downscaled \(30\times 30\) House image of the USC-SIPI image database (Sawchuk et al. 1973). Only small deviation between the \(I_{both}\) of the ’qasm_simulator’ (sim) and backend ‘ibmq_ehningen’ (back)

4.2 Quantum edge detection with 1D mask

As in the two-dimensional case, we move the two-pixel sliding window through the whole image. For each step we create one circuit as visualized in Fig. 4. In total, we have the same amount of circuits needed to encode the image. That is, 900 circuits for a \(30\times 30\) gray value image. The outcome for the binary sample image (see Fig. 2) with the one-dimensional quantum edge detector is shown in Fig. 14.

Fig. 14
figure 14

Results for our \(30\times 30\) binary test image with one-dimensional filtering. Only small deviation between the \(I_{both}\) of the ’qasm_simulator’ (sim) and backend ‘ibmq_ehningen’ (back) with 32,000 shots. Due to the one-dimensional filtering, pixels at the top left corner of the objects are not detected as edges

The method is well suited to detect vertical and horizontal edges in the image. However, some connections between the detected edges are missing like that in the top left corner of the objects.

This effect also holds for the diagonal edges of the house roof or the tree and explains the differences between the outcomes from Fig. 2 and Fig. 14. With the adaption of (24), the missing edge pixels in Fig. 14 are detected as shown in Fig. 15.

Fig. 15
figure 15

Results \(I_{total}\) for our \(30\times 30\) binary test image with one-dimensional filtering in three directions. Solved the missing pixel problem of Fig. 14. Used backend ‘ibmq_ehningen’. Outcomes for ’qasm_simulator’ are omitted here, since there are no visual differences compared to the backend outcomes

4.3 Comparison of quantum edge detection with 1D and 2D mask

The main difference between the two variants is the size and depth of the quantum circuits. In the one-dimensional case, only one qubit and five gates are needed. In the two-dimensional case, we need three qubits, more gates, and especially the error-prone CX-gates. If there is no connection between required qubits, additional SWAP-gates (three CX-gates per SWAP-gate) are inserted in the transpilation step. Therefore the depth of the circuit on the real backend becomes larger.

Since in the quantum edge detection with 1D filters we need fewer gates and no CX-gates, it is also more robust to noise than that with 2D filters. The various combinations, which can occur, are a further reason. We calculate the inner product of the encoded input and weight quantum states. In the one-dimensional case, these are the two angles from the input image patch.

In the two-dimensional case, we have three angles for the input image patch and three angles for the weights. By that, we have more classes (see, for example, Fig. 10). Not all of them can be distinguished from each other with a simple threshold value. Especially for gray value images, the values for edges can be indistinguishable from those of the background with noise. This effect is visualized in Fig. 16, especially with a lower number of shots (1000 shot).

Fig. 16
figure 16

Comparison one-dimensional and two-dimensional quantum edge detector with three \(30\times 30\) sample images and 1000 or 32,000 shots. Used backend ‘ibmq_ehningen’

For the one-dimensional quantum edge detector, there are only small visual differences between the results with 1000 shots and those with the maximum number of 32,000 using for example the backend ‘ibmq_ehningen’. The edges are visible and not strongly influenced by noise. This is not true for the two-dimensional variant. The more gates and the resulting errors make edge detection difficult, especially for the \(30\times 30\) House image (see the bottom row in Fig. 16). The number of shots there is not sufficient to handle these errors. With a higher number of shots, like the 32,000, it is possible.

The execution time depends linearly on the number of shots. Thus, reducing the number of shots is a good way to reduce execution times. For example, the quantum edge detection with 32,000 shots nearly takes 43 min per job (assuming that 300 circuits can be processed per job), where 1000 shots only require 90 s when using the backend ‘ibmq_ehningen’. Consequently, with the one-dimensional quantum edge detector, more jobs can be executed in the same time interval with usually better results as shown for example in Fig. 16.

4.4 Improved quantum edge detection with 1D mask

Table 4 summarizes the six variants of the one-dimensional quantum edge detector. For the comparison, we take a \(30\times 30\) gray value image as reference and assume that 300 circuits can be executed per job on the real backends. This was the case for IBM’s advanced backends in December 2021.

Table 4 Summary of the six variants for the one-dimensional quantum edge detector at the example of a \(30 \times 30\) gray value image. We take 300 circuits per job as reference number (limit of IBM’s advanced backends November 2021). To determine the execution times, we repeated the calculations on IBM’s backend ‘ibm_auckland’ 24 times between 22 and 29 December 2021

The six methods differ in the number of shots, in the number of qubits, the number of circuits, and therefore also in the number of jobs, which have to be submitted to IBM. As a consequence, the execution time on the real backends varies for the six variants, too. We see a slightly bigger reduction of the execution time when using only 50 shots instead of 32,000 due to the linear correlation of the number of shots and the execution time (Asfaw et al. 2021). Using Seq50  or Para50, we decrease the number of jobs by a factor of three so also the execution time approximately decreases by that factor. Furthermore, the fewer circuits/jobs explain why Para50_3pix  and SeqPara50  need even less time.

To compare the results of the six variant qualitatively, we use the Hellinger fidelity as defined in Section 2.4.3. Figure 17 contains boxplots of the fidelities for five backends.

Fig. 17
figure 17

Hellinger fidelity for the six variants and five of IBM’s backends. For the boxplot, we executed the codes three times a day between 22 and 29 December 2021 resulting in 24 runs

The backends used in our study have more qubits than required for the six variants (see Fig. 9 and Table 4). There are several strategies to select the qubits, such as the qubits with the lowest CX errors or the qubits with a large connectivity to other qubits, to avoid additional SWAP-gates. Since we apply only one qubit operations, we select the qubits with the lowest readout error.

Note that measurement error mitigation or alternatives like zero noise extrapolation (LaRose et al. 2020) are often used to reduce the errors of qubits and gates. In contrast to our previous work (Geng et al. 2021), the exact probability of measuring a particular state is not required here. We apply a threshold to the resulting probabilities such that we can handle some noise in the calculations. Therefore we refrained from applying error mitigation.

All of the backends perform quite similarly, except for the newly released backend ‘ibm_washington’. One reason for this could be the release date right before the executions. For older backends, possible bugs have been discovered, whereas this may not yet be the case for ‘ibm_washington’. However, the improvement of the systems and especially of the newer systems is an ongoing process. Quantum computers get more stable and less error-prone with calibrations and adjustments. So, better results can be expected now already.

Thanks to our method’s robustness with respect to noise, even the low fidelity results from ‘ibm_washington’ are completely interpretable. See Fig. 18 for the outcomes of SeqPara50  before and after applying a threshold. Some noise effects are visible for example in the background in Fig. 18a with slightly higher gray values than expected. However, the foreground and background still differ sufficiently. Figure 18c shows the detected edges by using the worst case of SeqPara50  and ‘ibm_washington’ backend, after applying an Otsu threshold. All edge pixels are detected even with the worst result of all experiments.

As expected, Std32T  features the highest fidelity due to the high number of shots decreasing the effect of the finite-sampling shot noise.

For the last five variants, we observe only slightly worse results and more variation in the results for Para50_3pix  and SeqPara50. A reason for that is the higher amount of measurements per circuit. We need 9 and 12 measurements per circuit for Para50_3pix  and SeqPara50, respectively. This increases the error per circuit and decreases the fidelity at the end.

Fig. 18
figure 18

Worst and best results of 24 executions between 22 and 29 December 2021 for SeqPara50  using ‘ibm_washington’ before applying a threshold. The value in brackets refers to the fidelity. Even with the worst result from all presented backends and variants of one-dimensional quantum edge detection, the edges are detectable by using an Otsu threshold afterwards as shown in 18c

4.5 Larger images

The advantage of our hybrid method is that for larger images the method itself and thus the basic errors remain the same. We create more circuits while keeping the size of the circuits the same. Therefore, our method is beneficial for practical usage in the current NISQ era. The House image with its original size of \(256\times 256\) and the corresponding results are shown in Fig. 19. Due to the findings from Section 4.3, we exemplary use the one-dimensional variant Seq50.

Fig. 19
figure 19

Results for the \(256\times 256\) House image (Sawchuk et al. 1973). We use method Std50, but all other five variants of the one-dimensional quantum edge detector yield similar results. Only small deviations of \(I_{total}\) from the ’qasm_simulator’ (sim) and backend ‘ibm_auckland’ (back) are visible

The simulator and the backend outcomes differ only minimally, and the edges of the house are recognizable. This low noise error is mainly due to the very short quantum circuits. Thus, we can detect edges in arbitrarily large images with the current backends in today’s NISQ era.

Due to the limitations in the maximal number of circuits (exploratory and advanced 300, core 900, and open backends 100 circuits per job at IBM as of November 8, 2021 (IBM 2021)), we split the circuits into several jobs and execute them sequentially. The jobs for the input image patches should be executed as directly consecutively as possible or at least with relatively equal calibrations. Otherwise, calibration variations show in the images, especially for larger quantum circuits with a lot of gates. All six one-dimensional variants turned out to produce similar results. Hence, we only show the outcome of Std50  here.

Theoretically, it is also possible to process the entire \(256\times 256\) image in one circuit, e.g., with an extension of SeqPara50. However, the number of measurements per job is currently limited. The exact number is not publicly available, but some own experiments have shown that about 16,000 measurements per job are possible. For \(2^{a}\times 2^{a}\) images, where \(a\in \mathbb {N}\), this means a maximum image size of \(64\times 64\) with SeqPara50  in one job. For larger images, we split the image into several parts and combine the results of multiple jobs classically afterwards.

5 Conclusion and discussion

In this paper, we practically implement a hybrid quantum edge detector in the current NISQ era. Starting from the quantum algorithm for an artificial neuron, we first develop a method that allows us to find edges in a gray value image using two-dimensional filter masks and replace these later by one-dimensional ones. This allows us to significantly reduce the circuit depth, the number of gates, and therefore also the influence of noise. Especially, we do not need any error-prone CX-gates. Due to this improvement, our method detects edges with a number of shots as low as 50. This reduces execution time significantly.

We develop four additional variants of the one-dimensional quantum edge detection algorithm to adapt the method for larger images. In these, we consider several directions or pixels sequentially and/or in parallel, which leads to a reduction in the number of circuits. That way, we have to submit fewer jobs and can reduce the execution times further.

All the methods discussed in this paper aim at minimizing the error that currently occurs in the context of quantum computing. This allows for very good results with current hardware at the price of a high number of circuits. To reduce it, larger filter masks or a larger step size of the filter masks could be applied. Even padding could be omitted. However, this would deteriorate the results and impede detection of individual edges.

Of course, we are not limited to the presented variants. For example, we can encode in Seq50  more pixels sequentially or extend the idea of Para50_3pix  further. Especially, SeqPara50  leaves space for customization. There, we use a \(2\times 2\) pattern, where two pixels are encoded sequentially and repeat this for a second qubit. Instead of that, we can encode more pixels in one circuit. For example, we can implement a \(16\times 16\) pattern. Thus, in each circuit 256 pixels are encoded for all three directions. With that, we only need 256 circuits to encode a \(256\times 256\) image like the one in Fig. 19a. The number of circuits is in the range of allowed circuits per job. Thus, we theoretically need only one job on an IBM backend.

Note that currently the total number of measurements per job is limited. This, for example, restricts the flexibility of SeqPara50  as not all pixels of a large image can be encoded in one job. Instead, the results of multiple jobs have to be combined classically afterwards.

Each of the presented methods solves the quantum edge detection task. Other filtering tasks can be solved, too, by simply adapting the weights of the filter mask. For example, we can adapt the algorithm to enhance, denoise, or blur an image.

To summarize, we implement a hybrid edge detector for larger images on a real quantum computer. To our knowledge, this has not been done before. The algorithmic idea based on quantum machine learning can be adapted flexibly to other tasks. This is a clear advantage compared to pure edge detection methods.