Event Classification with Quantum Machine Learning in High-Energy Physics

We present studies of quantum algorithms exploiting machine learning to classify events of interest from background events, one of the most representative machine learning applications in high-energy physics. We focus on variational quantum approach to learn the properties of input data and evaluate the performance of the event classification using both simulators and quantum computing devices. Comparison of the performance with standard multi-variate classification techniques based on a boosted-decision tree and a deep neural network using classical computers shows that the quantum algorithm has comparable performance with the standard techniques at the considered ranges of the number of input variables and the size of training samples. The variational quantum algorithm is tested with quantum computers, demonstrating that the discrimination of interesting events from background is feasible. Characteristic behaviors observed during a learning process using quantum circuits with extended gate structures are discussed, as well as the implications of the current performance to the application in high-energy physics experiments.

in particle collisions occurred using high-energy accelerators. In high-energy physics (HEP) experiments, particles created by collisions are observed by layers of highprecision detectors surrounding collision points, producing a large amount of data. The large data volume, which is so-called big data, has motivated the use of machine learning (ML) techniques in many aspects of experiments, including triggering, event reconstruction, detector simulation, data-quality control as well as data analysis, to improve their performances. In addition, computational resources are expected to be reduced for specific tasks by adopting relatively new techniques such as ML. This will continue over next decades; for example, a next-generation proton-proton collider, called High-Luminosity Large Hadron Collider (LHC) [1,2], at CERN 1 is expected to deliver a few exabytes of data every year and requires huge computing resources for the data processing. Quantum computing (QC), on the other hand, has been evolving rapidly over the past years, with a promise of a significant speed-up or reduction of computational resources in certain tasks. Early attempts to use QC for HEP have been made, e.g, on data analysis [3,4], charged particle tracking [5,6] and vertexing [7], particle shower simulation [8,9] and jet clustering algorithm [10]. The techniques developed in HEP are also adapted to QC, e.g, the unfolding techniques for physics measurement are applied to QC in Refs. [11,12]. Among these attempts, the quantum machine learning (QML) is considered as one of the QC algorithms that could bring quantum advantages over classical methods, as discussed in literatures, e.g, [13].
Most frequently-used ML technique in HEP data analysis is the discrimination of events of interest, e.g, 1 The European Organization for Nuclear Research located in Geneva, Switzerland, https:://www.cern.ch signal events originating from new physics beyond the Standard Model (SM) of particle physics, from background events. The ATLAS [14] and CMS [15] experiments at the LHC have adapted ML algorithms in various physics analyses, including, e.g, measurement of the properties of the Higgs boson [16] and search for new particles such as those predicted by the theory of Supersymmetry (SUSY) [17]. In this paper, we have investigated the application of QML techniques to the task of the event classification in HEP data analysis. To our knowledge, the first attempt to utilize QC for HEP data analysis is performed in Ref. [3] for the classification of the Higgs boson using quantum annealing [18].
We focus on QML algorithms developed for gatebased quantum computer, in particular the algorithms based on variational quantum circuit [19]. In the variational circuit approach, the classical input data are encoded into quantum states and a quantum computer is used to obtain and measure the quantum states which vary with tunable parameters. Exploiting a complex Hilbert space that grows exponentially with the number of "quantum bits" (or qubits) in quantum computer, the representational ability of the QML is far superior to classical ML that grows only linearly with the number of classical bits. This motivates the application of ML techniques to quantum computer, which could lead to an advantage over the classical approach. The optimization of the parameters is performed using classical computer, therefore the variational method is considered to be suitable for the present quantum computer, which has difficulty in processing deep quantum circuits due to limited quantum coherence. Practically, actual performance of the variational quantum algorithm depends on the implementation of the algorithm and the properties of the QC device. The primary aim of this paper is to demonstrate the feasibility of ML for the event classification in HEP data analysis using gatebased quantum computer.
First, the variational quantum algorithms are described in Sect. 2, followed by the classical approwaches that are used for the comparison. Section 3 discusses the experimental setup used in the study, including the dataset, software simulator and quantum computer. Results of the experiments are discussed in Sect. 4, followed by discussions on several observations about the performance of the quantum algorithms in Sect. 5. We conclude the studies in Sect. 6.

Variational Quantum Approaches
In this study we consider an approach based on variational quantum circuit with tunable parameters [19]. The quantum circuit used in this algorithm is constructed, as shown in Fig. 1, using three components: 1) quantum gates to encode classical input data x into quantum states (denoted as U in (x)), 2) quantum gates to produce output states used for supervised learning (denoted as U (θ)) and 3) measurement gates to obtain output values from the circuit, that are subsequently compared with the corresponding input labels y. In this study the measurement is performed 1,024 times on each event to obtain certain values of the observables, e.g, the expectation values Z of the Pauli-Z operators. For the classification of events into two categories, the first two qubits are typically measured. The U (θ) gates used in 2) are parameterized such that they are optimized to model input training data by iterating the computational processes of 1)-3) by N iter times and tuning the parameters θ. The parameter tuning is performed using a classical computer by minimizing a cost function, which is defined such that a difference between the input labels y and the measured values Z can be quantified. The optimized U (θ) circuit with the tuned parameters is used, with the same U in (x) gates, to classify unseen data for testing. The U in (x) and U (θ) are often built by using a same set of quantum gates multiple times and the number of the repetition is denoted by N depth in and N depth var , respectively. In this study, we use two implementations of the variational quantum algorithms, called Quantum Circuit Learning (QCL) [20] and Variational Quantum Classification (VQC) [21]. The QCL is used for simulating the performance of the variational quantum algorithm. The VQC is used for testing the variational algorithm RY(sin -1 (x1))

RZ( 6)
e -iHt RX( 7) on real quantum computer and simulator with small samples, as discussed in detail below.

Quantum Circuit Learning
A QCL circuit used in this study for the 3-variable classification is shown in Fig. 2. The U in (x) in QCL is characterized by the series of single-qubit rotation gates R Y and R Z [20]. The angles of the rotation gates are obtained from the input data x to be sin −1 (x) and cos −1 (x 2 ), respectively. The input data are needed to be normalized within the range [−1, 1] by scaling linearly using the maximum and minimum values of the input variables. The U (θ) is constructed using a timeevolution gate, denoted as e −iHt , with the Hamiltonian H of an Ising model with random coefficients (for creating entanglement between qubits) and the series of R X , R Z and R X gates with angles as parameters. The nominal N depth var value is set to 3 after optimization studies. This results in 27 parameters in total for the 3-variable case. The structure for the 5-and 7-variable circuits is the same as the 3-variable case, leading to the total parameters of 45 and 63, respectively. The cost function is defined using a cross-entropy function in scikit-learn package [22], and the minimization of the cost function is performed using COBYLA. See [20] for more details about the implementation. Figure 3 shows a VQC circuit for the 3-variable classification used in this study. The U in (x) consists of a set of Hadamard gates and rotation gates with angles from the input data x (the latter is represented as U φ (x) in the figure). The U φ (x) is composed of single-qubit rotation gates using the U φ {k} (x) term written in Eq. (32) of the supplementary information of Ref. [21], also referred to as the"First Order Expansion" (FOE). The

RZ( 12)
× 1 and U (θ) circuits used in this study for the VQC algorithm.
is not repeated in this study unless otherwise stated, thus N depth in = 1. The U (θ) part of the circuit is also taken from that in [21] but simplified by not repeating a set of entangling gate (U ent ) and singlequbit rotation gates R Y and R Z (surrounded by the dashed box in Fig. 3). The U ent is implemented using the Hadamard and CNOT gates, as in Fig. 3. The total number of θ parameters is 12 (20,28) for the 3 (5, 7)variable classification. The cost function for the VQC algorithm is a cross-entropy function and the minimization is performed using COBYLA as well.

Classical Approaches
The ML application to the classification of events has been widely attempted in HEP data analyses. Among others, a Boosted Decision Tree (BDT) in the TMVA framework [23] is one of the most commonly used algorithms. A neural network (NN) is another class of multivariate analysis methods, and an algorithm with a deep neural network (DNN) has been proven to be powerful for modelling complex multi-dimensional problems. We use BDT and DNN as benchmark tools for comparison with the performance of the variational quantum algorithms.
In this study we use the TMVA package 4.2.1 for the BDT and the Keras 2.1.6 with TensorFlow 1.8.0 backend for the DNN. The BDT and DNN parameters used are summarized in Table 1. The maximum depth of the decision tree (MaxDepth) and the number of trees in the forest (NTrees) vary with the number of events used in the training (N train event ) to avoid over-training. The DNN model is a fully-connected feed-forward network composed of 2-6 hidden layers with 16-256 nodes. The Table 1 Parameter settings for the BDT and DNN used in this study. The definitions of the BDT parameters are documented in Ref. [23].

BDT Parameter Value
BoostType Grad NTrees 10 (N train numbers of hidden layers and nodes are also optimized separately for N train event to avoid over-training.

Experimental Setup
Our experimental test of the variational quantum algorithms is performed using both simulators of quantum computers and real quantum computers available via the IBM Q Network [24]. As a benchmark scenario for the HEP data analysis, we consider a problem of discriminating events with SUSY particles from the most representative background events.

Dataset
We use the "SUSY Data Set" available in the UC Irvine Machine Learning Repository [25], which was prepared for studies of Ref. [26]. The signal process, labelled true, targets a chargino-pair production via a Higgs boson. Each chargino decays into a neutralino that escapes detection and a W -boson that subsequently decays into a charged lepton and a neutrino, resulting in a final state with two charged leptons and a missing transverse momentum. The background process, labelled false, is a W -boson pair production (W W ) with each W -boson decaying into a charged lepton and a neutrino. Therefore, both the signal and background processes have the same final state. Monte Carlo simulation is used to produce events of these processes as described in [26].
In our main studies a small fraction of the data is used because the process of the full data (5 million events) with the quantum algorithms requires significant computing resources. For the comparison of the quantum and classical MLs, five sets of data containing 100, 500, 1,000, 5,000 and 10,000 events are used for training and other five sets of data with the same number of events for testing. For the classical MLs, additional four sets of data containing 50,000, 100,000, 200,000 and 500,000 events are used to study the dependence on the sample size.
The dataset contains 18 variables characterizing the properties of the SUSY signal and W W background events, ranging from low-level variables such as lepton transverse momenta to high-level variables such as those reflecting the kinematics of W -bosons and/or charginos (detailed in [26]). Figure 4 shows the normalized distributions of the 18 variables for the signal and background events. Among those, the following 3, 5 and 7 variables, which are quoted as N var = 3, 5 and 7 later, are considered in the main study: The choice of these variables is based on a ranking of AUC (area under ROC curve) values obtained using the DNN algorithm. In addition, all the 18 variables are used for evaluating the best performance which the classical MLs can reach, as described below.

Simulator
We use quantum circuit simulators to evaluate the performance of the quantum algorithms. The QCL circuit is implemented using Qulacs 0.1.8 [27], a fast quantum circuit simulator implemented in C++, with Python 3.6.5 and gcc 7.3.0, and the performance is evaluated on cloud Linux servers managed by OpenStack at CERN.
The VQC circuit is implemented using Aqua 0.6.1 in the Qiskit 0.14.0 [28], a quantum computing software development framework (Qiskit Aqua framework). The VQC performance is evaluated using a QASM simulator on a local machine as well as real quantum computer explained below.

Quantum Computer
We use the 20-qubit IBM Q Network quantum computers, called Johannesburg and Boeblingen, for evaluating the VQC performance. The quantum computers are accessed using the QuantumInstance class in the Qiskit Aqua framework. The U in (x) part of the VQC circuit ( Fig. 3) is created separately for each event because the U φ (x) gates depend on the input data x. For the training and testing, we use 40 events each, composed of 20 signal and 20 background events. The θ parameters are determined by iterating the training process as explained in Sect. 2.1. The N iter is set to 100 unless otherwise stated.

Qulacs Simulator
First, the classification performance of the QCL algorithm evaluated using the Qulacs simulator is compared with those of the BDT and DNN. Due to a significant increase of the computational resources with N var for the QCL (discussed later), the N var is considered only up to 7.     phase as a function of N train event for N var = 3, 5 and 7. For each algorithm, a single AUC value is obtained from a test sample after each training, and the calculation is repeated 100 (30) times at N train event ≤ 10, 000 (50, 000 ≤ N train event ≤ 500, 000). Shown in the figure is the average of the AUC values and its uncertainty. As expected, it is apparent from the BDT and DNN curves that the performance of these two algorithms improves rapidly with increasing N train event and then flattens out. The BDT works well over the entire N train event range while the DNN performance appears to improve faster at very small N train event and exceed BDT at N train event beyond ∼ 1, 000. In the case of N var = 7 and N train event = 500, 000, the AUC values are 0.8729 ± 0.0003 for the DNN and 0.8696 ± 0.0006 for the BDT. When using all the 18 variables with 2,000,000 events for the training and testing each, the average AUC value from only five trials is 0.8772 ± 0.0004 (0.8750 ± 0.0004) for the DNN (BDT).
The performance of the QCL algorithm is characterized by the relatively flat AUC values regardless of N train event . Increasing the N var appears to degrade the performance if the N train event is fixed, and the same behavior is also seen for the DNN with N train event ≤ 500 (not clearly visible for the BDT). The DNN algorithm overcomes this and eventually improves the performance with increasing N var by using more data. Investigating how the QCL algorithm behaves with more data is a future subject, as discussed below. Nevertheless, for the N var and N train event (≤ 10,000) ranges considered all the three al-

Quantum Computer and QASM Simulator
The VQC algorithm with N var = 3 has been tested on the 20-qubit IBM Q Network quantum computers and the QASM simulator, as explained in Sect. 3.3. The present study focuses only on the classification accuracy with the real quantum computer. Figure 7 shows the values of the cost function as a function of N iter for both the quantum computer and the simulator in a training phase. For each of the quantum computer and the simulator, the training is repeated five times over the same set of events and their cost-function values are shown. When running the algorithm on the quantum computer, the first three hardware qubits [0, 1, 2] are used [29]. The figure shows that both the quantum computer and the simulator have reached the minimum values in the cost function after iterating about 50 times. However, the cost values for the quantum computer are constantly higher and more fluctuating after reaching the minimum values, indicating that there are contributions from errors due to hardware noise.
The ROC curves for the quantum computer and the simulator obtained from the training and testing samples are shown in Fig. 8, averaged over the five trials of the training or testing. The AUC values for the testing samples are considerably worse than those for the training ones because of the small sample sizes. This has been checked by increasing the N train event from 40 to 70, 100, 200, 500 and 1,000 for the simulator (    seen in the table, the over-training largely disappears as the sample sizes increase. Figure 9 shows the ROC curves from the simulator for the two sample sizes of N train event = 40 and 1,000, confirming that the over-taining is not significant for the latter. The AUC values are consistent between the quantum computer and the simulator within the standard deviation (Fig. 8), but the simulator results are considered to be systematically better because the input samples are identical. The VQC results are also compared with the QCL operating at the same condition, i.e, N var = 3, N train event = 40 and N iter = 100. The QCL results vary with the depth of the U (θ) circuit (the nominal N depth var is 3), but they agree with the VQC results within relatively large uncertainties. The AUC values and their standard deviations in the training phase are summarized in Table 3.

Performance with different QCL models
As seen in Fig. 6, the QCL performance stays approximately flat in N train event and gets slightly worse when increasing the N var at fixed N train event . Since the computational resources needed to explore the QCL model with more variables (N var >≈ 10) or larger sample sizes (N train event > 10K) are beyond our capacity as discussed below, understanding the behavior is a subject for future study.
To investigate a possibility that the QCL performance could be limited by insufficient flexibility of the circuit used (Fig. 2), alternative QCL models with the U (θ) circuit of N depth var = 5 or 7, instead of 3, are tested. This changes the AUC values by 1-2% at most for the N train event of 100 or 1,000 events, which is negligible com- pared to the statistical fluctuation. Another type of QCL circuit is also considered by modifying the U in (x) to include 2-qubit gates for creating entanglement, as shown in Fig. 10 Fig. 2. On the other hand, the new U in (x) appears to improve the performance by 5-10% with respect to the original U in (x) when N depth var is set to 1. This indicates that a more complex structure in the U in (x) could help improve the performance when the U (θ) is simplified. However, the performance of the new U in (x) with N depth var = 1 is still considerably worse than the nominal QCL model in Fig. 2.

Performance with different VQC models
The VQC circuit used in this study (Fig. 3) is simplified with respect to the one used in Ref. [21]. To examine whether more extended circuits could improve the performance, alternative VQC models are tested using QASM simulator. The first alternative model is the one in which the U φ (x) in Fig. 3 (FOE) is replaced with the so-called "Second Order Expansion" (SOE), constructed as the U φ {l,m} (x) and U φ {k} (x) terms in Eq. (32) of the supplementary information of Ref. [21]. The second alternative model is the one with extended U in (x) and U Testing these models show that the AUC values stay almost constant (within at most 2%) regardless of the N depth in or N depth var if the U φ (x) is fixed to either FOE or SOE. But, the performance improves by about 10% when changing the U φ (x) from FOE to SOE at fixed . On the other hand, no improvement is observed when testing the SOE with a real quantum computer. Moreover, the standard deviation of the AUC values becomes significantly larger for the SOE with quantum computer. These could be qualitatively understood to be due to increased errors from hardware noise because the number of single-and twoqubit gate operations increases by 60% when switching from the FOE to SOE at N depth in = N depth var = 1.

Comparison with DNN model with less number of parameters
A characteristic difference between the QCL and DNN algorithms is on the number of trainable parameters (N par ). As in Sect. 2.1, the N par is fixed to 27 (45, 63) for the QCL with 3 (5, 7) variable case. For the DNN model in Table 1, the N par varies with N train event as given in Table 4. Typically the N par of the DNN model is about 6-13 times more than that of the QCL model at N train event = 100, and the ratio increases to 75-165 (200-470) at N train event = 1, 000 (10,000). Comparing the two algorithms with a similar number of trainable parameters could give more insight into the QCL performance and reveal a potential advantage of the variational quantum approach over the classical method. A new DNN model is thus constructed to contain only one hidden layer with 5 (6, 7) nodes for 3 (5, 7) variable case, resulting in the N par of 26 (43, 64). The rest of the model parameters is identical to that in Table 1. Shown in Fig. 11 is the comparison of the AUC values for the new DNN and QCL models at N train event ≤ 10, 000. It is indicated from the figure that the QCL can learn more efficiently than the simple feed-forward network with the similar number of parameters when the sample size is below 1,000. Exploiting this feature in the application to HEP data analysis would be an interesting future subject.

CPU/memory usages for QCL implementation
The QCL algorithm runs on the Qulacs simulator with cloud Linux servers, as described in Sect. 3.2. Under this condition, we examine how the computational resources scale with the problem size. For the creation of input quantum states with U in (x), both CPU time and memory usage grow approximately linearly with N var or N train event . The creation of the variational quantum states with U (θ) shows an exponential increase in CPU time and memory usage with N var (i.e, number of qubits) up to N var = 12, roughly a factor 8 (4) increase in CPU time (memory) by incrementing the N var by one. The overall CPU time is by far dominated by the minimization process with COBYLA. It increases linearly with N train event but grows exponentially with N var , making it impractical to run the algorithm a sufficient number of times for N var ∼ 10 or more. The memory usage stays constant over N var during the COBYLA minimization process.

Conclusion
In this paper, we present studies of quantum machine learning for the event classification, commonly used as the application of conventional machine learning techniques to high-energy physics. The studies focus on the application of variational quantum algorithms using the implementations in QCL and VQC, and evaluate the performance in terms of AUC values of the ROC curves. The QCL performance is compared with the standard classical multi-variate classification techniques based on the BDT and DNN, and the VQC performance is tested using the simulator and real quantum computers. The overall QCL performance is comparable to the standard techniques if the problem is restricted to N var ≤ 7 and N train event <∼ 10, 000. The QCL algorithm shows relatively flat AUC values in N train event , in contrast to the BDT and DNN algorithms, which show that the AUC values increase with increasing N train event in the considered N train event range. This characteristic QCL behavior could be considered as a possible advantage over the classical method at small N train event where the DNN performance gets considerably worse if the number of trainable parameters of the DNN model is constrained to be similar to that of the QCL.
The VQC algorithm has been tested on quantum computers only for a small problem of N train event = 40, but it shows that the algorithm does acquire the discrimination power. There is an indication that the actual VQC performance varies when it runs on the simulator or real quantum computer, most likely due to errors in quantum hardware. This appears to prevent us from using an extended quantum circuit such as the Second Order Expansion for the encoding of classical input data. The QCL and VQC algorithms show similar performance when they run on the simulators with the same conditions for the N var and N train event values. With a better control of the measurement and gate errors, it is expected that the performance of the variational quantum machine learning will further improve.