1 Introduction

Particle accelerator experiments aim to understand the nature of particles by colliding groups of particles at high energies and try to observe creation of particles and their decays, e.g. to validate theories. The Large Hadron Collider (LHC) at the European Organisation for Nuclear Research (CERN) provides proton-proton collisions to four main experiments as well as other small experiments and fixed-target experiments. In order to achieve a high sensitivity, these experiments use advanced software and hardware.

In addition, these experiments will require very fast processing units as the time between two consecutive collisions is very short (reaching up to 1 MHz for ATLAS and CMS according to The ATLAS Collaboration (2015), Contardo et al. (2015)) and Albrecht et al. (2019). A big data storage and processing problem arise, when the fast data acquisition is combined with sensitive hardware. A total disk and tape spaces of 990 PetaBytes and around 550 thousand CPU cores were pledged to LHC experiments in 2017 according to a report by CERN Computing Resources Scrutiny Group (CRSG) (Lucchesi 2017).

Currently, the LHC is going through an upgrade period to increase the number of particles in the beam (i.e. luminosity) (Apollinari et al. 2015). Therefore, the future High Luminosity LHC (HL-LHC) experiments will require much faster electronics and software to process the increased rate of collisions.

In particle track reconstruction problem, the aim is to identify the trajectory of particles using the measurements of the tracking detectors. Accelerated particles interact/collide near the origin of the coordinate system of detectors. Products of these interactions travel outwards from the origin. Charged particles bend in a direction depending on their electric charge. When these particles pass through the detectors, they create signals in the detector called as hits. Particle track reconstruction aims to connect hits belonging to the same particles to assign a trajectory (Fig. 1).

Fig. 1
figure 1

Drawing of particle track reconstruction problem. Particles interact near the origin of the coordinate system. Products of these interactions travel outwards from the origin. Charged particles bend in a direction depending on their electric charge. When these charged particles pass through the detectors, they create signals called as hits. Particle track reconstruction aims to connect hits belonging to the same particles

The efficient reconstruction of particle tracks is one of the most important challenges in the HL-LHC upgrade. Although there are novel algorithms (ATLAS Collaboration 2019; Bocci et al. 2020) available that are able to handle the current rate of collisions, they suffer from higher collision rates as they scale worse than quadratically (e.g. \(\mathcal {O}(n^{6})\)) (Magano et al. 2021).

Recent developments in Quantum Computing (QC) allowed scientists to look at computational problems from a new perspective. There is a great effort to make use of these new tools provided by QC to gain high speed-ups for many computational tasks in High Energy Physics (Guan et al. 2021). There are many problems investigated. This include but not limited to physics analysis at LHC using kernel (Wu et al. 2021a; Heredge et al. 2021) and variational methods (Wu et al. 2021b; Terashi et al. 2021), simulating parton showers (Jang et al. 2021) and imitating calorimeter outputs using Quantum Generative Adversarial Networks (Chang et al. 2021).

Researchers have been investigating QC tools for a computational advantage for the particle track reconstruction problem, since it also suffers from scaling. While there are several attempts using adiabatic QC (Bapst et al. 2019; Zlokapa et al. 2019), Quantum Associative Memory Shapoval and Calafiura (2019) or Quantum search routines (Magano et al. 2021), this work focuses on hybrid variational methods.

In this work, we aim to give a complete overview on our developments, where we investigated the use of a hybrid quantum-classical graph neural network (QGNN) approach to solve the particle track reconstruction problem (Tüysüz et al. 2020a, 2020b, 2020c) that has been trained on the publicly available TrackML Challenge dataset (Amrouche et al. 2019, 2021). We present an analysis of several well-performing Quantum Circuits and give a comparison with its classical equivalent, HEP.TrkX (Farrell et al. 2018), on which our approach is based on.

The rest of the paper is organized as follows. Details of the dataset and pre-processing methods are given in Section 2. The QGNN model is explained in detail in Section 3. Results and comparisons with novel methods are given in Section 4, along with a discussion of the findings. Finally, our summary and comments on possible improvements are presented in Section 5.

2 The dataset and pre-processing

The publicly available TrackML Challenge dataset provides 10,000 events to emulate the HL-LHC conditions (Amrouche et al. 2019). It has become a benchmark dataset for researchers after the conclusion of the challenge (Amrouche et al. 2021) and allows comparisons across different methods. The simulated tracking detector geometry of the dataset is that of a general purpose collider experiment. The schema of this geometry in 2 cylindrical coordinates (r,z) can be seen in Fig. 2. Horizontal layers in the center of the detector represent a barrel-shaped geometry, while vertical layers represent a disk-shaped geometry and are generally referred to as end-cap layers.

Fig. 2
figure 2

TrackML detector geometry projected to 2 dimensions (r, z). The region highlighted in orange indicates the detector layers used in this work. Drawing is adapted from Amrouche et al. (2019)

In many Quantum Machine Learning applications, it is very hard to work with large datasets due to restrictions on simulation times. A pre-processing step is necessary, which reduces the amount of samples and prepares the data format for the model.

The pre-processing procedure starts by selecting the first 100 events from the dataset. Although it would be ideal to use all events of the dataset, computation time restrictions of the QC simulation limited us to use only a portion of the dataset. Then, particle hits are restricted to the barrel region of the detector, which is the region highlighted in Fig. 2. This limits the number of tracks and the ambiguity in identifying the particle trajectories. In addition, a cut in the transverse momentum of the particle is applied to further reduce the number tracks.

After reducing the number of tracks to reasonable numbers, the next step is to create graphs out of the remaining particle hits. Particle hits become nodes and the track segment candidates will be defined as edges of graphs at this stage. Then, a set of restrictions is applied to all possible graph edges for creating a graph with as less fake edges as possible, while preserving as many true edges as possible.

These restrictions are defined using a cylindrical coordinate system, which is widely used in High Energy Physics to leverage the symmetries of the detectors. We follow the same convention and present some of the definitions visually in Fig. 3 for further clarification.

Fig. 3
figure 3

A sketch of the cylindrical coordinate system for particle collisions. The beam is along the z-axis and the particles collide near z= 0. The r axis is the projection of the transverse (x-y) plane

As the first step of the hit graph construction, only the edges that connect nodes of consecutive detector layers are considered. Then, edges with the pseudorapidity (\(\eta = -\ln (\tan (\phi / 2))\) larger than 5 are eliminated. Pseudorapidity is a measure of the angle to the z-axis used in High Energy Physics.

Next, the ratio of difference in ϕ to r (Δϕ/Δr), where ϕ is the angle to the z-axis, is required to be smaller than 6 × 10− 4. Finally, a z intercept (z0) of all edges is required to be smaller than 200 mm to eliminate highly oblique edges.

The Pseudo-code of the algorithm is also presented in Algorithm 1. Detailed plots of these selections are presented in Appendix A.

figure a

In total, 100 graphs from 100 events are obtained with this method. The graph production is done with a 99% efficiency and a purity of 51%, which are defined as:

$$ \begin{array}{@{}rcl@{}} \text{Efficiency} &= &\frac{\text{\# of selected true track segments}}{\text{\# of initial track segments}}, \end{array} $$
(1)
$$ \begin{array}{@{}rcl@{}} \text{Purity} &=& \frac{\text{\# of selected true track segments}}{\text{\# of selected track segments}}. \end{array} $$
(2)

After the graph construction, the dataset is stored in form of 4 matrices; stores 3 spatial coordinates of all nodes (in cylindrical coordinates; r,ϕ,z), Ri and Ro (\(R_{i}, R_{o} \in \{0,1\}^{N_{V} \times N_{E}}\)) store input and output nodes of all edges, and \(y \in \{0,1\}^{N_{E}}\) stores the labels of edges. Their definitions can be seen below.

$$ \begin{array}{@{}rcl@{}} R_{i}^{ jk}& =& \left\{\begin{array}{ll} 1, & \text{if} k^{th} \text{edge is input of} j^{th} \text{node} \\ 0, & \text{otherwise} \end{array}\right. \end{array} $$
(3)
$$ \begin{array}{@{}rcl@{}} R_{o}^{ jk}& =& \left\{\begin{array}{ll} 1, & \text{if} k^{th} \text{edge is output of} j^{th} \text{node} \\ 0, & \text{otherwise} \end{array}\right. \end{array} $$
(4)
$$ \begin{array}{@{}rcl@{}} y_{k} &=&\left\{\begin{array}{ll} 1, & \begin{aligned} \text{if nodes of} k^{th} \text{edge belong to}\\ \text{same particle} \end{aligned} \\ 0, & \text{otherwise} \end{array}\right. \end{array} $$
(5)

Constructed graphs have 8784 ± 1877 edges (NE) and 5583 ± 804 nodes (NV) on average. An example graph showing the fake and true edges is presented in Fig. 4.

Fig. 4
figure 4

Graphs are produced with the pre-processing of each event. 2D projection of hits, fake and true edges of an event are plotted. All hits are plotted with black circles. Fake (on the left) and true (on the right) edges of a graph are plotted in the Cartesian coordinates (transverse plane). There are 5162 true and 5508 fake edges of this event

The pre-proprecessing method is identical to the one used in HEP.TrkX project (Farrell et al. 2018), except the pT restriction, which is used to reduce total number of particles in an event. This is done intentionally in order to compare our results with the classical equivalent model.

3 The hybrid quantum-classical graph neural network model

A graph neural network (GNN) is a Neural Network model that acts on features of the graph, such as nodes, edges or global features (Veličković et al. 2018). GNNs have shown great success in many occasions for node and graph classification and link prediction (Wu et al. 2021c). Their success led to applications in High Energy Physics for many problems such as track and particle flow construction (Farrell et al. 2018; Ju et al. 2020; Shlomi et al. 2021; Biscarat et al. 2021; Pata et al. 2021). This situation attracted the interest of the Quantum Machine Learning community to develop quantum graph neural networks for different applications (Verdon et al. 2019; Chen et al. 2021).

The hybrid quantum-classical graph neural network (QGNN) model that we propose takes a graph as the input and returns a probability as the output for all edges of the initial graph. The QGNN builds up an attention passing graph neural network model proposed by Veličković et al. (2018), following the same strategy as the HEP.TrkX project of Farrell et al. (2018). In contrast to the classical GNN approach, we add a Quantum Neural Network (QNN) layer to Multi Layer Perceptrons (MLP).

The QGNN consists of 3 parts. The first one is the Input Network, whose task is to increase the dimension of the input data. It takes the spatial coordinate information (e.g. 3 cylindrical coordinates) and passes them through a single fully connected Neural Network layer with sigmoid activation and an output size corresponding to the hidden dimension size (ND). Then, these new data points are concatenated (⊕) to form the initial node feature vector, where .

$$ v = x \oplus \phi_{FC}(x) $$
(6)

As a next step, the node feature vector is fed to Edge and Node Networks, which process the graph iteratively in order to obtain a final edge probability value (e) for each of the edges. During this process, the same Edge and Node Network is sequentially executed for a predetermined number of iterations (NI) times and finally the same Edge Network is used one more time to obtain final edge probabilities (\(e \in [0,1]^{N_{E}} \)). This pipeline is summarized with a simple drawing in Fig. 5.

Fig. 5
figure 5

Schematic of the QGNN architecture. The pre-processed graph is fed to an Input Network, which increases the dimension of the node features. Then, the graph’s features are updated with the Edge and Node Networks iteratively, number of iterations (NI) times. Finally, the same Edge Network is used one more time to extract the edge features of the graph that predicts the track segments. There is only one Edge Network in the pipeline, two Edge Networks are drawn only for visual purposes. The pipeline is adapted from Farrell et al. (2018)

3.1 The Edge Network

The Edge Network takes pairs of nodes into account and returns the probability for those two nodes to be connected. Initially, the connectivity of each pair of nodes is given by the connectivity matrices Ri and Ro. Using these matrices, node feature vectors bo and bi of all initially connected edges, or so called doublets (bobi), are obtained.

$$ \begin{array}{@{}rcl@{}} b_{o}^{ k} = \sum\limits_{j=1}^{N_{V}} R_{o}^{ jk}v_{j} \quad b_{i}^{ k} = \sum\limits_{j=1}^{N_{V}} R_{i}^{ jk}v_{j} \end{array} $$
(7)

The feature vectors of input and output nodes of each edge are concatenated in order to be fed into a Hybrid Neural Network (HNN, ϕEdgeNetwork). The HNN returns edges features (e), which are the probabilities for each edge, to be part of a real trajectory or not. Next, the edge features are passed to the Node Network.

$$ e_{k} = \phi_{EdgeNetwork}\left( b_{o}^{ k} \oplus b_{i}^{ k}\right) $$
(8)

3.2 The Node Network

The Node Network builds up on the edge feature matrix given by its predecessor, the Edge Network. Based on this input information, the node features are updated. In this case, a combination of each node of interest and its neighbors from upper and lower detectors is created, forming a triplet. Here, the node features of the neighbors’ are scaled with the corresponding edge features.

$$ \begin{array}{@{}rcl@{}} v^{\prime}_{j,input} = \sum\limits_{k=1}^{N_{E}} e_{k}R_{i}^{ jk}{b_{o}^{k}} && v^{\prime}_{j,output} = \sum\limits_{k=1}^{N_{E}} e_{k}R_{o}^{ jk}{b_{i}^{k}} \end{array} $$
(9)

Similar to the Edge Network, the triplet is fed to a Hybrid Neural Network (ϕNodeNetwork).

$$ v_{j} := \phi_{NodeNetwork}\left( v^{\prime}_{j,input} \oplus v^{\prime}_{j,output} \oplus v_{j}\right) $$
(10)

This time, the HNN returns new node features v. The updated features are passed again to the Edge Network and this process is repeated for NI times. This allows the aggregation of information from farther nodes of the graphs and updates the hidden features accordingly.

3.3 The hybrid neural network

Our approach employs Hybrid Neural Networks (HNNs), which combine both classical and quantum layers. The HNN starts with a single fully connected neural network (FC NN 1) layer with sigmoid activation. The output dimension of this layer is equal to number of qubits (NQ) used by the quantum layer. Then, the output of the FC NN 1 is used in the encoding step of the QNN. Finally, the measurements of the QNN are fed to another FC NN with sigmoid activation, which has the output dimension of 1 (in the case of Edge Network) or hidden dimension size (ND) (in the case of Node Network). This architecture, as presented in Fig. 6, allows full flexibility in the hidden dimension size, the number of qubits and the type of the QNN. Details of input and output dimensions of all layers can be seen in Table 1.

Fig. 6
figure 6

The Hybrid Neural Network (HNN) architecture. The input is first fed into a classical fully connected Neural Network (FC NN) layer with sigmoid activation. Then, its output is encoded in the QNN with the information encoding circuit (IEC). Next, the parametrized quantum circuit (PQC) applies transformations on the encoded states. The output of QNN is obtained as expectation values for each qubit that is measured. A final FC NN layer with sigmoid activation is used to combine the results of different qubit measurements. The same HNN architecture is used in Edge (upper input and output dimension) and Node Networks (lower input and output dimension) with different parameters. The input and output dimension sizes change according to the network type. Details of the dimensions of each layer are given in Table 1

Table 1 Input and output dimensions of layers used in the HNN. QNN has the output dimension of 1 if the circuit measures only one qubit (e.g. MPS and TNN)

The QGNN model is experimented with different quantum layers to understand potential benefits. These type of quantum models with parametrized quantum circuits have been called differently in the literature (McClean et al. 2018; Farhi and Neven 2018; Benedetti et al. 2019; Mitarai et al. 2018; McClean et al. 2016; Romero and Aspuru-Guzik 2021). Here, we use the name Quantum Neural Network (QNN) as we use them in a similar fashion to Neural Network layers.

The QNN of our choice consists of three consecutive parts. An information encoding circuit (IEC) encodes classical data to states of the qubits followed by a parametrized quantum circuit (PQC) that is applied to transform these states to their optimal location on the Hilbert Space. Finally, measurements are performed along the z-axis with the σz operator.

Information encoding has a significant effect on the training capacity of QNN models (Schuld et al. 2021), therefore a lot of attention is required when deciding on how to do it. We employ angle encoding, because it provides an encoding which uses significantly less gates compared to others, e.g. amplitude encoding, and it needs almost no classical processing (Larose and Coyle 2020).

Encodings such as amplitude encoding allow encoding of classical information by using significantly less qubits, but this advantage is usually reverted by the number of gates required to build the circuit. For example, the amplitude encoding of a feature vector only uses log2n qubits but needs 4n single and two qubit gates. On the other hand, the angle encoding requires n number of qubits and a single qubit gate per qubit (Leymann and Barzen 2020). This allows an easier implementation and experimentation with angle encoding. An angle encoding of a four dimensional feature vector can be performed, e.g. with the circuit given in Fig. 7.

Fig. 7
figure 7

Angle encoding quantum circuit of a four dimensional feature vector with respect to the y-axis

The QNN encodes classical incoming information on the qubits via rotational gates in the desired axis using angle encoding. In order to obtain a unique and bijective representation of the classical data, the rotation angle is mapped between 𝜃 ∈ [0,π] due to the periodicity of the cosine function. This is relevant since the expectation value is taken with respect to the σz-operator at the end of the circuit execution.

The PQC is the part of the QNN model that is going to be tuned in order to provide the desired output. As in classical Neural Network layers, those initially randomly assigned variables are optimized during training to fit the certain training objective, i.e. to minimize the overall loss function. In order to achieve a good training performance, choosing a good combination of IEC and PQC is essential. Although there are many practical and theoretical work to understand this better (Sim et al. 2019; Leyton-Ortega et al. 2021; Hubregtsen et al. 2021), our current understanding of which combination works for which task is still limited (Schuld et al. 2021). We therefore try to cover a range of PQCs and fix the IEC to a specific angle encoding to provide more controlled results. We consider two types of PQCs.

The first PQC type consists of circuits with a hierarchical architecture. Matrix Product State (MPS) (Bhatia et al. 2019) and Tree Tensor Network (TTN) (Grant et al. 2018) inspired circuits belong to this group. However, these PQCs measure only one qubit. Thus, they are only implemented in case of the Edge Network as a multi-dimensional output is needed for the Node Network. Examples of MPS and TTN circuits can be seen in Fig. 8.

Fig. 8
figure 8

Hierarchical (a, b) and layered (c, d) PQCs used in the HNN. (a) MPS applies two-qubit gates with a ladder-like architecture, (b) while TTN uses a tree-like architecture implement RY and CZ gates. MPS and TTN circuits measure only one qubit at the end. (c) Circuit 19 employs RX, RZ and CRX gates in a nearest neighbour fashion, (d) while Circuit 10 do this with RY and CZ gates. Circuit 10 and Circuit 19 can be extended to any number of layers by repating the circuit. Circuit 10 and Circuit 19 measure all qubits available

The second type of PQCs are more common in the QML literature. They consist of layers of parametrized gates that are generally followed by controlled operations. These circuits act on all qubits fairly, meaning all qubits can be measured to obtain information. This makes them suitable for both Edge and Node Networks. Another important advantage of this type of PQCs is having different descriptors in the literature, which allows to compare different properties such as expressibility and entanglement capacity. In this work, two circuits with different expressibility and entanglement capacity were chosen from Sim et al. (2019), namely Circuit 10 (Fig. 8d) and Circuit 19 (Fig. 8c).

This work compares the two different configurations of PQCs. Models with the labels circuit 10 and circuit 19 use the same circuits with different initial parameters for the Edge and Node Networks, as it was done in previous comparisons. While the models with TTN-10 and MPS-10 labels use circuit 10 for the Node Network and either a TTN or an MPS type of PQC for the Edge Network. These definitions are also presented in Table 2.

Table 2 Labels of PQC settings used in the HNN

Expressibility measures a PQC’s ability to explore the Hilbert Space (Sim et al. 2019). It is a numerical method that samples two random states from a given PQC. Fidelities of these states are computed and a distribution \((\hat {P}_{PQC}(F ; \theta ))\) is obtained after collecting many samples (e.g. 5000 samples for four qubits). Then, this process is repeated by sampling Haar random states. Finally, two distributions are compared using Kullback Leibler divergence (DKL). Expressibility (E) is expressed as;

$$ \mathrm{E} = D_{KL} (\hat{P}_{PQC}(F ; \theta) \parallel P_{Haar}(F)) $$
(11)

The value of Expressibility is less for more expressive circuits. In order to avoid confusion E = −log10(E) will be used as Expressibility. Hence, the Expressibility value (E’) increases with more expressive circuits (Hubregtsen et al. 2021).

Similarly, Entanglement Capability is a numerical method that quantifies a PQCs ability to produce entangled states (Sim et al. 2019). It averages the Meyer-Wallach entanglement measure (Q) over many random samples obtained from the PQC (e.g. 5000 samples for four qubits). For example, a fully entangled two qubit state \(\left (|{{\varPsi }}\rangle = \frac {|{00}\rangle +|{11}\rangle }{\sqrt {2}} \right )\) has Q(|Ψ〉) = 1 and a state with no entanglement (e.g. |Φ〉 = |01〉) has Q(|Φ〉) = 0. Entanglement Capability (Ent) is expressed as;

$$ Ent = \frac{1}{\|{S}\|} \sum\limits_{\theta_{i} \in S} Q(|{\psi_{\theta_{i}}}\rangle) $$
(12)

Expressibility and Entanglement Capability are descriptors that allow comparison of layered PQCs. Expressibility and Entanglement Capability improve with more layers and will be a point of interest in our discussions. For example, Circuit 10 has less Expressibility and Entanglement Capability compared to Circuit 19, with the same number of qubits and layers. This is because Circuit 10 has only RY and CZ gates compared to additional RZ and CRX gates of Circuit 19, which brings additional degrees of freedom. The differences between these circuits will be another point of interest in our comparisons to better understand the behaviour of PQCs. Expressibility and Entanglement Capability of Circuit 10 and Circuit 19 is presented in Appendix B.

The reason behind analyzing PQCs under two types is related to their response to scaling with increasing number of qubits and layers. We use gradient-based optimizers due to the hybrid nature of QGNN model. Gradient-based optimizers require the model to produce strong enough gradient signals to be able to explore the loss landscape. This might become a problem when a model is scaled. Barren Plateaus are the name given to flattening of the loss landscape (McClean et al. 2018). They appear in models where gradients vanish exponentially with an increasing model size and they are one of the greatest challenges in training variational quantum algorithms (VQAs) (Cerezo et al. 2021). In general, layered-type PQCs suffer from Barren Plateaus, which makes them hard or even impossible to train for a large amount of qubits and layers. On the other hand, the absence of Barren Plateaus were shown for some PQCs with hierarchical architectures, such as Quantum Convolutional Neural Networks (QCNNs) (Pesah et al. 2020) and TTNs (Zhao and Gao 2021). Because of this, comparing these two types of PQCs have great importance to better understand the behaviour of hybrid models at large scales.

3.4 Training the network

Training hybrid quantum-classical neural networks requires software that can differentiate both types of networks. Pennylane by Bergholm et al. (2018) is one of the most popular open-source tools that provides this feature. Pennylane was used along with PyTorch (Paszke et al. 2019) during the early stages of this work. However, this combination turned out to be too slow to handle both the dataset and the model. A computational speed-up in training has been achieved using Qulacs (Suzuki et al. 2020). Although it provided faster training, it was still not enough. Finally, the combination of Cirq, Tensorflow and Tensorflow Quantum (Cirq Developers 2021; Abadi et al. 2016; Broughton et al. 2020) produced the optimal scenario, in which we were able to reduce the training times to less than a week. The quantum circuit simulations are performed with only taking analytical results into account, i.e. without sampling the quantum circuits. Although analytical results do not reflect hardware conditions, we made this choice in order to obtain results in a reasonable amount of time.

The 100 events selected from the dataset are separated randomly with a 50/50 ratio to be used as training and validation sets. Models are trained using the binary cross entropy loss function, given in Eq. 13, where yi is the truth label and \(\hat {y}_{i}\) is the model prediction for an edge.

$$ L = -\frac{1}{N_{E}} \sum\limits_{i=1}^{N_{E}} y_{i} \log (1-\hat{y}_{i}) + (1-y_{i})\log {\hat{y}_{i}} $$
(13)

The Adam optimizer (Kingma and Ba 2017) with a learning rate of 0.01 is used to train all trainable parameters of the hybrid model. The learning is done with a batch size of 1 per graph and continued up-to 10 or 20 epochs depending on model complexity. All models are trained for 5 independent initializations and their mean is presented in all results.

4 Results and discussion

We trained the hybrid model with many configurations to explore the potential of the method. Here, we present four key comparisons of features that have a significant effect on the performance of the model.

First, the effect of angle embedding axis choice on the training performance of circuit 10 and 19 is compared. Circuit 10 is a PQC that consists of RY and CZ gates, while circuit 19 is a PQC with RZ, RX and CRX gates. The comparison is made by setting number of qubits and number of hidden dimension size to 4 (NQ = ND = 4). Then, number of layers is also set to 1 (NL = 1) and number of iterations is set to 3 (NI = 3). The best loss values of each model is plotted in Fig. 9a. In both cases, the x and y-axis embedding resulted in better loss values, compared to z-axis. The z-axis embedding requires deeper circuits to match other axes’ representation capacity, since the measurements are taken with respect to the Pauli-Z operator. Because of this outcome, the y-axis angle embedding is considered for the rest of the results. Training curves of these comparisons are presented in Appendix F.

Fig. 9
figure 9

Best validation loss comparison with respect to different parameters of the hybrid GNN model. (a) The axis of angle embedding comparison considers the best loss obtained for different embedding axes by setting ND = NQ = 4, NL = 1 and NI = 3. (b) The number of layers comparison considers the best loss for various numbers of layers (NL) by setting ND = NQ = 4 and NI = 3. (c) The number of iterations comparison considers the best loss for different numbers of iterations (NI) by setting ND = NQ = 4 and NL = 1. (d) The hidden dimension size comparison considers the loss for different hidden dimension sizes (ND) by setting NQ = ND, NL = 1 and NI = 3. 5 instances of all models with different initial parameters are trained for 10 or 20 epochs depending on complexity for each setting, and the mean of best losses are presented. The error bars represent the ± standard deviation of the best losses of all 5 runs

There are contrasting results in the literature on how expressibility and entanglement capacities affect the training performance. Recently, Hubregtsen et al. (2021) showed a positive correlation between expressibility and accuracy, while Leyton-Ortega et al. (2021) showed the opposite. They found that more expressive models perform worse and also overfit more. On the other hand, entanglement has been shown to limit the trainability of models depending on how it propagates in between qubits by Marrero et al. (2020) and Zhang et al. (2020). To better understand the situation on our case, we tested two models with circuit 19 and circuit 10 with various number of layers. circuit 19 has better expressibility and entanglement capacity compared to circuit 10 (Sim et al. 2019). Best loss values obtained after a training with 20 epochs is plotted with respect to number of layers in Fig. 9b with NQ = ND = 3 and NI = 3. We could not observe a significant difference between two models. This situation might be a result of using an encoder and decoder consisting of a fully connected neural network in the model, which could have compensated for the different expressive capacity of the models. However, increasing expressibility and entanglement capacity of both models resulted in worse performance in both cases. In this way, our results are consistent with results of Leyton-Ortega et al. (2021). This behaviour is thought to be the result of Barren Plateau formation (McClean et al. 2018). Training curves of these comparisons are presented in Appendix D.

The number of iterations of a GNN is an important parameter that determines a model’s performance (Farrell et al. 2018; Veličković et al. 2018). It allows propagation of information to farther nodes. A comparison with NQ = ND = 3 and NL = 1 is made with circuit 10 and circuit 19 and the results are presented in Fig. 9c. Training results show that the best loss is obtained for NI = 3 for the hybrid cases. However, this is not the case in the classical case. Ju et al. (2020) report an NI = 8 as the optimal value for their model with 128 hidden dimensions in their extended project Exa.TrkX. The increase in the value of the lowest loss with increasing number of iterations might be due to low expressive capacity of the whole model, as this comparison is made only with a ND = 4. Training curves of these comparisons are presented in Appendix E.

In order to investigate how these hybrid models scale, their performances with respect to increasing the hidden dimension size and qubits are compared. This comparison is made with the choice of NQ = ND, NI = 3 and NL = 1. Two different configurations of PQCs are compared. Models with the labels circuit 10 and circuit 19 use the same circuits with different initial parameters for the Edge and Node Networks, as it was done in previous comparisons. While the models with TTN-10 and MPS-10 labels use Circuit 10 for the Node Network, a TTN or an MPS type of PQC for the Edge Network. These definitions are given in Table 2.

A comparison of the best losses is made after 10 epochs and presented in Fig. 9d. The performance of the models improve consistently with the increasing hidden dimension size. This shows that learning capacity of the model benefits from more dimensions. The model with circuit 10 outperforms the rest consistently. However, there seems to be a saturation of the best loss as the hidden dimension size increases. Training curves of these comparisons are presented in Appendix C.

Finally, the hybrid model is compared against the classical model at different hidden dimension sizes and presented in Fig. 10. For this comparison the same choice of NQ = ND, NI = 3 and NL = 1 is followed. This result shows that the hybrid model scales similarly to the classical model until a certain hidden dimension size. We did not perform simulations for qubits larger than 16 due to restrictions set by simulation times and classical hardware resources.

Fig. 10
figure 10

Best validation loss comparison with respect to different hidden dimension sizes (ND) of the hybrid and classical GNN models. The comparison is made with the choice of NQ = ND, NI = 3 and NL = 1. 5 instances of all models with different initial parameters are trained for 10 epochs, and the mean of best losses are presented. The error bars represent the ± standard deviation of the best losses of all 5 runs

5 Conclusion

In this work, we implemented a hybrid quantum-classical GNN (QGNN) model for particle track reconstruction using the TrackML dataset (Amrouche et al. 2019). This is the first end-to-end implementation of a hybrid quantum-classical GNN model with attention passing to the best of our knowledge. We showed that the model can perform similar to classical approaches for low number of hidden dimension sizes. We investigated how the model scales for different hyper-parameters. circuit 10 consistently performed the best among other models in all comparisons. On the other hand, circuit 10 has the worst expressibility and is the lowest entangling model in a single layer configuration. Numerical results indicate that larger PQC models are harder to train, as it was shown in many instances (McClean et al. 2018; Leyton-Ortega et al. 2021).

The current status of Quantum hardware restricted us to use only simulations of Quantum Circuits. This was mainly due to thousands of circuit executions required by the model. Because of the high pile-up conditions of the TrackML dataset, the graphs have thousands of nodes and edges, and therefore using hardware is a challenge for this approach. A forward pass of the presented QGNN model builds NI + 1 circuits for each edge and NI circuits for each node. This also limited us experimenting with larger sized models due to restrictions in simulating Quantum Circuits. In order to cope with this problem, we enforced a pT-cut for reducing the number of particles, a small number of qubits (up to 16) was used. We also used analytical results with no noise and trained models up to 20 epochs at maximum. Very large RAM requirements and the significant increase of training times to more than a week for models with 16 qubits were the limiting factor in our results.

This work explores an advantage in reducing the size of high dimensional NN layers with Quantum Circuits that have significantly less qubits. However, results obtained with NQ = ND were only able to match the performance of the classical model. On the other hand, this does not mean that an advantage is impossible. There is more to explore to better understand the potential of this approach.

First, the QGNN model was only experimented with simple encoding circuits (angle encoding), while more sophisticated encodings are conjectured to significantly affect the performance of QNN-based models (Schuld et al. 2021). Furthermore, recent work by Abbas et al. (2021) showed that QNN models with four qubits and ZZ Feature Maps of depth two has a larger effective dimension compared to its classical equivalent, which leads to a better learning capacity. Therefore, a further study is needed to explore the potential benefits of different data encodings.

On top of that, this work does not explore any noise effects, which is considered as one of the limiting factors of VQAs as it can lead to Barren Plateaus (Wang et al. 2021). Hardware noise is conjectured to slow down training of VQAs. However, this does not mean that noise always disfavors VQAs. It is argued that hardware noise can help to explore the loss landscape (Cerezo et al. 2021). In many instances, VQAs were shown to have noise resilience (Sharma et al. 2020; Gentini et al. 2020) and benefit from noise (Cao and Wang 2021; Campos et al. 2021). Furthermore Mari et al. (2021) showed that gradients and higher order of derivatives can be accurately obtained under hardware and shot noise. The results from the literature indicate that understanding the effect of noise on variational models is essential to estimate their potential. In this work, experiments with noise were attempted but then abandoned due to technical limitations posed by the size of the dataset.

The QGNN model can be further improved by employing better training schemes (Leyton-Ortega et al. 2021) and noise aware optimizers (Arrasmith et al. 2020). Further research directions include exploring more sophisticated data encodings and understanding effect of noise. It would be more beneficial to work with a smaller dataset for exploring hybrid GNN models that target NISQ hardware.