Quantum Machine Learning for b -jet identiﬁcation

Machine Learning algorithms have played an important role in hadronic jet classiﬁ-cation problems. The large variety of models applied to Large Hadron Collider data has demonstrated that there is still room for improvement. In this context Quantum Machine Learning is a new and almost unexplored methodology, where the intrinsic properties of quantum computation could be used to exploit particles correlations for improving the jet classiﬁcation performance. In this paper, we present a brand new approach to identify if a jet contains a hadron formed by a b or ¯ b quark at the moment of production, based on a Variational Quantum Classiﬁer applied to simulated data of the LHCb experiment. Quantum models are trained and evaluated using LHCb simulation. The jet identiﬁcation performance is compared with a Deep Neural Network model to assess which method gives the better performance.


Introduction
Machine Learning (ML) methods are widely used in experimental particle physics [1] data manipulation.One of the most successful applications is the classification of hadronic jets at the Large Hadron Collider (LHC) experiments [2].Jets are streams of particles produced via fragmentation and hadronization of quarks and gluons that emerge from particle collisions, e.g.proton-proton collisions at LHC.They are complex objects formed by many detectable particles, and it is possible to identify their properties by exploiting particle content and correlations, commonly referred to as jet substructure.Typical jet classification problems are the identification of the heavy-flavour hadron produced in the jet hadronization (e.g.b hadron vs c hadron) or the identification of the charge of the heavy-flavour quark that constitutes this hadron (e.g.b vs b).State-of-the-art ML methods, such as Deep Neural Networks (DNN) [3], Graph Neural Networks [4], Tensor Networks [5] and Particle Clouds [6], have been applied to jets data collected by the LHC experiments, with a clear improvement of the classification performance with respect to classical non-ML methods.
Recently, Quantum Computing (QC) has set the scene for a revolution in ML.The new approach consists of using quantum circuits to tackle classification tasks, in the framework of Quantum Machine Learning (QML) [7]: data are embedded into a quantum state, which is then passed to a variational quantum circuit, and by varying the circuit parameters a training procedure is performed by means of minimising a classical loss function.Probability measurements of the final state are then used to perform the classification.Given the intrinsic properties of quantum computation, namely superposition and entanglement, the new approach could lead to new insights from the classification point of view.Jets that originate from gluons, or quarks of a certain charge and flavor, would have a characteristic particle content and correlations between them, which could be exploited to aid the identification of the original particle.It is evident that this will open new physics researches possibilities not even considered nowadays.
QML techniques have recently been applied to solve High Energy Physics (HEP) problems, such as signal versus background separation [8][9][10][11], and particle track reconstruction [12,13].A more detailed review of QML applications to HEP can be found in Ref. [14].This paper presents the first application of QML to the task of jet identification.QML methods are performed on simulators and applied to simulated LHCb samples, to identify the charge of the b quark that forms the b hadron produced in the jet hadronization.In the rest of the paper, this task is simply referred as b-jet tagging.
The paper is structured in the following way: Sec. 2 provides a description of the LHCb jet reconstruction and identification together with the used dataset.In Sec. 3 the considered QML algorithms are presented while the analysis flow is described in Sec. 4. The results are discussed in Sec. 5.The conclusions and future developments are presented in Sec. 6.

Jet reconstruction and identification at LHCb
LHCb [15] is a single-arm spectrometer designed to study b and c hadrons in the forward region of proton-proton collisions.The reference system used at LHCb is defined by the z-axis (the beam axis) parallel to the proton beam, the x-axis parallel to the gravity acceleration, and the y-axis perpendicular to the other two.The direction of particle momentum is identified by the angles θ and φ, where θ is the angle between the momentum and the z-axis, and φ is the azimuthal angle between the projection of the momentum in the xy plane and the y-axis.The pseudorapidity η is defined as η = −log tan θ 2 .The LHCb detector covers the region in the pseudorapidity range 2 < η < 5, and consists of a tracking system and a particle identification system [16].The tracking system is formed by a vertex detector, several tracking stations and a dipole magnet.The vertex detector efficiently reconstructs the decay vertex of b-and c-hadron decays, while the tracking stations measure the trajectories (tracks) of charged particles and their momenta.The particle identification system is formed by two Ring Imaging Cherenkov detectors, two calorimeters (electromagnetic and hadronic) and a muon detector, that allows to precisely determine the type of the particles produced in the collision.
Jets are reconstructed using inputs selected by the Particle Flow algorithm [17].These are charged particles detected by the tracking system, and neutral particles reconstructed in the calorimeter system as energy clusters isolated from tracks [18].The selected particles represent the input of the anti-k t clustering algorithm [19] for jet clusterization.The jet momentum is defined as the sum of the momenta of the particles forming the jet.The jet axis is defined as the direction of the jet momentum.Particles inside a jet are approximately contained in a cone structure with a distance from the jet axis ∆R = (∆η) 2 + (∆φ) 2 = 0.5, where ∆η is the difference in pseudo-rapidity and ∆φ is the difference in the azimuthal angle with respect to the jet axis.
The goal is to distinguish between jets that contain a b or b hadron just after the hadronization, i.e. in the instant of b-hadron production and not at decay, since neutral B-mesons can undergo flavour oscillation.Therefore the analysis is restricted to a sample of jets that belong to these two categories, labelled as b jets and b jets.
This preliminary selection is performed in two steps: • reconstruction of a vertex (secondary vertex) significantly displaced from the primary proton-proton interaction point, using tracks detected by the vertex detector [20], representing the b-hadron decay point; • identification of the jet that contains the secondary vertex within its cone.
The b-jet tagging subsequently becomes a binary classification problem where the jet can belong to one of the two exclusive categories: b jets or a b jets.The charge of the b quark at production is correlated to the charge of the b-hadron decay products.This correlation is not perfect, since neutral B-mesons can oscillate, and the charge of the b quark at production may be different from the charge at decay.As an example, in semi-leptonic decays, the b hadron can produce a muon, whose charge is directly related to the charge of the b quark.However, due to possible B-meson oscillations and to the large number of particles, including muons, in a jet the information is diluted and that has to be taken into account.Two types of b-jet tagging algorithms are currently used in LHCb: • exclusive algorithms, based on information coming from particles inside the jet strictly correlated with the b-hadron decay, such as the muon; • and inclusive algorithms, which aim to exploit the jet sub-structure, i.e. information coming from the jet constituents, as shown in Fig. 1.In the exclusive method the information comes from a particle, e.g. the muon, whose charge is correlated to the b hadron (lower jet); in the inclusive method, the information is extracted from the jet constituents (upper jet).The magnitude of the particle momentum transverse to the jet axis is labelled as p rel T .
The QML approach presented in this paper belongs to the category of inclusive algorithms.However, for the sake of comparison, the results are compared to the muon tagging [21], which is an exclusive algorithm.This tagger selects the muon with the highest momentum with respect to the z axis, p T , inside the jet.This simple requirement is sufficient to identify the muon coming from a b hadron and therefore to infer the quark charge by measuring the muon charge.The efficiency of this algorithm is limited by the probability that a b hadron decays to a final state with a muon, which is ∼ 10% [22].
The performance of different b-jet tagging algorithms are compared using the tagging power tag [23][24][25], defined as tag = eff (2a − 1) 2  (1) as the figure of merit, where eff is the tagging efficiency, i.e. the fraction of jets where the classifier takes a decision, and a is the accuracy, i.e. the fraction of correctly tagged jets with respect to the tagged jets.The tagging power is the effective fraction of events that contribute to the statistical uncertainty in a measurement where the b-jet tagging is applied.

Data samples description
LHCb simulated samples are used in the studies presented in this paper.The b b di-jets samples are produced within the LHCb simulation framework [26], which uses Pythia8 [27] to generate proton-proton interactions and jet fragmentation and hadronization at a centerof-mass energy of 13 TeV, and EvtGen [28] to simulate b hadron decays.The Geant4 software [29], embedded in the LHCb framework, is used to simulate the detector response.Pairs of b and b jets are selected by requiring for each jet a p T greater than 20 GeV/c and Table 1: Summary of the features contained in the two datasets.
a η in the range 2.2 < η < 4.2, to ensure that they are well inside the instrumented part of the detector.After the pre-selection, a fixed number of 16 different features related to the jet substructure are used as input to the classifiers.Among the reconstructed particles inside a jet the muon, kaon, pion, electron and proton with the highest p T are selected.For each particle three physical variables are considered: the magnitude of the transverse momentum to the jet axis (p rel T ), the charge (q), and the distance, measured in the (η,φ) space, between the particle and the jet axis (∆R).If a particle type is missing, the relative features are set to 0. The last feature is the weighted jet charge Q, defined as the sum of the charges of the particles inside the jet weighted with the particles The analysis for the b-jet identification is performed with two datasets.The complete dataset includes the events selected with the 16 features described above.The muon dataset contains jets with at least one muon and only four features: p rel T , ∆R, q of the muon and the weighted jet charge Q.Table 1 summarises the characteristics of the data samples.

Quantum Machine Learning models
A quantum algorithm is implemented by means of a quantum circuit, namely a collection of linked quantum gates acting on a n-qubit quantum state: the measurements on the final state represent the outcome of the quantum algorithm.Parametrized Quantum Circuits (PQCs) [30] are a type of circuit that contains adjustable gates with tunable parameters.The Variational Quantum Classifier (VQC) [31] is a hybrid quantum-classical algorithm to perform classification tasks using a Machine Learning model based on a PQC with the following structure: Data encoding data x, the features representing the jet substructure in this application, are pre-processed and encoded into a subset of the parameters of a PQC.The stage produces a quantum state |x representing the input jet.
Variational circuit the state |x is processed by a PQC, U (θ), featuring trainable parameters θ to be optimised during the training phase.This stage produces a final state |ψ = U (θ)|x .
Prediction expectation values computed on the final state |ψ are mapped to probabilities for the two labels, P b and Pb.The training process aims to match the label predictions with the true charge of the b-hadron in the jet in the instant of production, available in the simulations.
Two different PQC models are studied in this work: Amplitude Embedding and Angle Embedding, described below.

Amplitude Embedding
The Amplitude Embedding model consists in a PQC made by an embedding circuit followed by a variational circuit.The schematic representation of this model is shown in Fig. 2. The embedding circuit consists of an Amplitude Encoder that encodes up to 2 n features into the amplitude of a n-qubit quantum state, or equivalently, a vector of N features can be encoded using log 2 N qubits: where x i is the i th feature and |n i is the i th vector of the computational basis.The definition requires the x vector to be normalised, If the number of features to encode is not a power of 2, the remaining amplitudes can be padded with constant values.This model embeds the 16 (4) variables of the complete dataset (muon dataset) into the amplitudes of a 4-qubit (2-qubit) quantum state.The variational stage consists of a variable number L of strongly entangling layers.A strongly entangling layer consists of trainable generic rotational gates R(α i , β i , γ i ) applied to each qubit followed by a collection of CNOT gates applied to neighbouring pairs of qubits, considering the last one as a neighbour of the first one.The complexity of this kind of circuit can be tuned by changing the number of strongly entangling layers L: for a generic n-qubit circuit, the number of trainable parameters of the model N par is equal to On the final state, the expectation of the Pauli operator of the first qubit σ 0 z ∈ [−1, +1] is measured and used to define the probabilities P b and Pb of being a b-jet and a b-jet, respectively:

Angle Embedding
The structure of the Angle Embedding model, represented in Fig. 3, differs from the Amplitude Embedding model in the encoding used to embed the features of the datasets into the quantum state: in this case, the embedding circuit consists in a Angle Encoder that embeds 16 (4) features of the complete dataset (muon dataset) as rotation angles of  16 (4) x-axis rotational gates R x (θ i ).Therefore, this circuit structure requires a one-to-one correspondence between qubits and input features: that makes it impractical to adopt with high-dimensionality datasets, due to computational constraints of quantum simulators.The variational stage of the circuit is identical to the Amplitude Embedding model, featuring a variable number L of strongly entangling layers that can be opportunely chosen to tune the number of parameters N par , defined in Eq. 5, and, therefore, the complexity.The measurement of the expectation value of the Pauli σ z operator is mapped to the tagging probabilities P b and Pb as expressed in Eq. 6 and Eq. 7, identically to the Amplitude Embedding model.

b-jets identification procedure
Quantum circuits are simulated by means of noiseless simulators (noise impact is studied in Sec.5.3) using Pennylane [32], a Python framework designed specifically for QML applications.The quantum circuit is embedded into a classical optimisation algorithm, using the Jax [33] Python library.Since the quantum algorithms results are compared to classical DNN ones, the same analysis is performed with a standard feed-forward network, implemented using the Keras [34] framework with the TensorFlow [35] back-end.Additional details on the structure and the optimisation of the DNN are reported in App. A.

Training and testing phases
The muon and complete datasets are both split into training and testing sub-datasets: about 60% of the samples are used in the training process that includes also the validation and the remaining 40% are used to test, evaluate and compare the classifiers.In the muon dataset analysis, 60000 jets are used for training and 40000 jets are used for testing.The complete dataset training is performed on 400000 jets and remaining 290000 are used for testing and assessing performance.In the analysis of the muon dataset the Angle Embedding structure is studied and compared to a DNN with the same input variables.
In the case of the complete dataset, both Angle Embedding and Amplitude Embedding classifiers are considered, and the 16 variables are used as input for a DNN.The training process aims to find the values of the model parameters θ that minimise the Mean Squared Error loss function where N is the number of training jets, P i b is the predicted probability, defined in Eq. 6, for the i-th jet, and T i is the target probability for the i-th jet, i.e., 1 for a b jet and 0 for a b jet.Due to the large number of jets in the datasets, the quantum models are trained implementing a mini-batch gradient descent [36] algorithm using the ADAM optimiser [37] to minimise Eq. 8.The training dataset is split in several mini-batches containing a fixed number of training samples.During each training step, the gradient of Eq. 8 is evaluated, averaging over the training samples of a mini-batch, and used to update model parameters.
A training epoch is completed when the whole training dataset is processed, namely after a number of steps equal to the number of mini-batches.Unless specified otherwise, the models are trained with learning rate1 ξ = 0.01 for 100 epochs, while the mini-batch size is fixed to the maximum value allowed by memory constraints.The output of the classifier gives the probability that a jet is generated by a b or a b quark.The label with the highest probability is assigned to the jet, i.e. if P b > 0.5 (P b < 0.5) then it is classified as a b jet ( b jet).In Fig. 4a the output distributions for the two classes (b and b jets) after the training procedure are shown; a separation between the two distributions around 0.5 is visible leading to a good classification.It should be noted that the P b distribution is shifted toward 1 for b quarks, and toward 0 for b quarks, as expected.Fig. 4b shows the output distribution for the Angle Embedding classifier on 16 qubits: in green the jets that are correctly classified, in red the jets that are wrongly classified and the sum of all jets in grey.As expected, correctly classified jets tend to stay close to 0 and 1 while the wrongly classified jets are peaked around 0.5, where the prediction power is minimum.Figure 5 shows the Receiver Operating Characteristic curve (ROC) and the Area Under Curve (AUC) for the DNN and the quantum classifiers for the muon dataset and the complete dataset.

Results on b-jet tagging
The performance of the classifiers is evaluated by using the jet tagging power, defined in Eq. 1.The tagging power is computed as a function of the jet p T and η for both the quantum and the classical classifiers.In order to optimise the tagging power, a region symmetric with respect to 0.5 is defined, where no classification is performed.The width ∆ cut of the excluded region is defined for each classifier by maximising the tagging power evaluated using all the jets in the dataset.Such an exclusion region reduces the tagging efficiency because less jets are tagged, but enhances the identification probability.The probability distributions and the excluded region are shown in Fig. 6: indeed the region where the prediction power is minimum is excluded.The width ∆ cut of the excluded region for each classifier and for muon and complete dataset are summarised in Tab. 2. For comparison, the unoptimised tagging power, obtained by identifying as b ( b) the jets with P b above (below) 0.5, is presented in App.B. The tagging power tag as a function of jet p T and η for the classical and quantum classifiers applied to the muon dataset is shown in Fig. 7.All the distributions have similar behaviour demonstrating that no bias is created by any algorithm.The tagging power dependence on the jet p T is as expected, since at high p T the reconstruction and identification of the jet particle content is more difficult, leading to a lower tagging power [38].The Angle Embedding algorithm results are slightly below the DNN ones, but they are compatible within the statistical uncertainty, while both algorithms outperform the Amplitude Embedding model and the muon tagging approach.The muon tagging performance is expected, since it only uses the muon charge q for the prediction.The other algorithms use also the muon p T , the muon ∆R and the weighted jet charge Q.The simple 2-qubit Amplitude Embedding model shows a slightly better performance with respect to the muon tagging, but still worse than the DNN and the Angle Embedding model.
The tagging power tag for DNN and quantum classifiers on the complete dataset as function of jet p T and η, is shown Fig. 8. Also in this case, the usual dependence on the jet p T is visible.As expected, the tagging power of the QML and DNN is higher in the complete dataset with respect to the muon dataset, since a larger number of features is used.As before, for QML and DNN the performance is above the muon tagging approach, given that these classifiers use the information coming from the jet substructure.Differently from the application to the muon dataset, in the complete dataset the QML algorithms perform slightly worse than the DNN, with slightly better performance for the Angle Embedding structure than the Amplitude Embedding.It can be deduced that the DNN makes a better use of the features when a larger number of them is used.

Dependence of the results on number of training events and circuit depth
The dependence of the quantum algorithms performance on the number of training events and the circuit complexity has to be evaluated if near-term applications on quantum hardware are considered.These parameters have an impact on the execution times and therefore on the possibility to use it.The performance dependence on the number of training samples is an interesting parameter to compare QML and DNN methods, in order to assess the differences between the two approaches.Given the high computational efforts of simulating complex circuits with several qubits, only the muon dataset is used.
For QML, the Angle Embedding structure is considered with different number of strongly entangled layers and different number of training events.The results are compared with the same DNN considered in the previous section; a comparison with more complex networks is described in App. A. The metric used to quantify the goodness of the quantum classifier is the accuracy on a test subset of 40000 jets.The performance is calculated averaging over 10 training rounds.In Fig. 9 (a) the accuracy of the Angle Embedding circuit is shown as a function of the number of layers of the circuit.As expected, by increasing the depth of the circuit, and therefore its complexity, the accuracy increases.This behaviour stops at around 5 layers, where the accuracy is saturating and no further improvement is evident.It is clear that, for a given number of features and training data, the Angle Embedding model does not profit of an arbitrarily large number of layers, therefore it is possible to keep a low number of layers, and subsequently a lower complexity of the circuit, to obtain the best performance.This would reduce also the computing time and resources needed for the simulation.The accuracy as a function of the number of training events for the Angle Embedding circuit and the DNN is shown in Fig. 9 (b).Increasing the number of training events the performance of the quantum algorithm is similar to the DNN, but when the number of training events decreases the quantum algorithm keeps very high performance, while the DNN is not able to perform a good classification.This means that, with respect to the DNN, the QML method reaches optimal performance with a lower number of events, which helps in reducing the resources usage.

Time performance
Time performance is a fundamental figure of merit to understand the feasibility of simulating such quantum algorithms.Here, the time performance for the Angle Embedding and the Amplitude Embedding circuits is evaluated for the muon dataset, as a function of the number of strongly entangling layers in the circuit structure.The quantum algorithms are trained using 4 NVIDIA Volta V100 GPUs, and the training time to train 60000 jets for 100 epochs is evaluated.Results are shown in Fig. 10.Results show that it takes less time to train the Amplitude Embedding circuit than the Angle Embedding structure, and the training time increases as the number of layers, although with a greater rate for the Angle Embedding circuit: this may be expected since the complexity of the circuit increases with the number of layers, therefore it takes more time to simulate the quantum circuit.

Noise models
In order assess the performance of the algorithms in quantum hardware it is important to understand the impact of noise on quantum circuits.Two kinds of noise can affect quantum algorithms: • coherent noise: it originates from unitary errors in the application of quantum gates.
This lead to the construction of a different quantum state with respect to the desired one.A typical source of this kind of noise is non-ideal calibrations of the quantum hardware; • incoherent noise: this noise results from the interaction between the quantum hardware and the environment.This noise gives quantum states that are not pure anymore and are described by mixed states, i.e. probability distributions over different states.
The simulations of noise contribution taking into account both sources of noise in quantum circuit measurements have been performed using the pennylane-qiskit plugin [32,39].This plugin allows to simulate noise models coming from different real IBM quantum computers [40] and to keep the Pennylane syntax.The result is a simulation of a quantum algorithm on a real device structure.Four IBM quantum computers are considered: ibmq-belem, ibmq-santiago, ibmq-jakarta and ibmq-toronto, which have different numbers of qubits (respectively 5, 5, 7 and 27 qubits), different quantum volumes2 (respectively 16, 32, 16 and 32) and different qubits structure, as shown in Fig. 11 for the ibmq-santiago and ibmq-belem which have the same number of qubits.
Studies are performed on the Angle Embedding circuit structure with three strongly entangled layers.A small subset of the muon dataset is used because simulating circuits including noise contribution is more time and computationally consuming; on the other hand, with a low number of events the quantum algorithm performance is sufficiently high, as shown in Fig. 9.In this way, a subset of 1000 jets of the muon dataset is selected for training while validation is performed on a subset of 10000 jets.For each noise model the training is performed for 50 epochs using ADAM with a learning rate ξ = 0.01 and batch size of 10 jets.The results are averaged over five rounds of training, using five independent training subsets.The relevant figure of merit to assess noise models performance is the accuracy on the validation test.The results are shown in Fig. 12 and summarised in Tab. 3. Models including noise need more epochs to reach convergence, but in the end  the results are consistent with those of noiseless simulations within error.Such a result demonstrates that the proposed circuit model for the muon dataset is robust to noise.

Conclusions
The first application of QML algorithms to identify the charge of the b hadron produced in the jet hadronization, at the LHCb experiment has been presented.The results using the muon dataset show that the Angle Embedding structure reaches a performance consistent with the DNN while being better than the muon tagging approach.The amplitude encoding structure presents no evident improvement with respect to the muon tagging method.When the complete dataset is used, the best quantum algorithm performance Noise model accuracy no noise 0.640 ± 0.017 ibmq-belem 0.629 ± 0.047 ibmq-santiago 0.633 ± 0.038 ibmq-jakarta 0.637 ± 0.042 ibmq-toronto 0.631 ± 0.044 is given by Angle Embedding structure.The study of the performance dependence on the circuit depth has shown that the number of layers is a parameter to be optimised.More layers, which means more complex structures, do not necessarily result in improved performance, since saturation is reached.The impact of the noise on the results appears to be negligible, suggesting that the proposed circuits could be implemented on a quantum hardware if available.QML algorithms achieve performance consistent with classical methods like the DNN with low-complexity circuits and a smaller number of training events.This could have important implications for LHC experiments where often the training phase is the most expensive in terms of resources.However, when a large number of features is employed, the DNN performs better than QML algorithms.Here huge improvements are expected when the hardware will be available.In fact, the comparison of QML models to classical kernel methods [42] shows that QML models achieve classification tasks separating data in Hilbert spaces whose dimension scales exponentially with the number of qubits involved.As this number increases, quantum simulations on classical computers become unfeasible.
The full exploitation of QML in high energy physics experiments at colliders, just began.As for any brand new tool, unexpected applications may manifest themselves while using.For example, results obtained involving tensor network methods [5] have shown that quantum algorithms can allow to study correlations among the features.This is done by measuring the entanglement correlations between qubits of an optimised multi-qubit system.This approach could allow to extract information on jet constituents correlations starting from the measurement of the entanglement between the qubits of a trained VQCs.

A Deep Neural Network model and further comparisons
The DNN is built following a classical feed-forward structure shown in Fig. 13: it starts with a Batch Normalisation Layer [43] and it applies several Dense layers, each one followed by a Dropout [44] layer.Depending on the number of input features the input vector for the DNN changes (4 variables for the muon dataset and 16 for the complete one).For the hidden layers the ReLu3 activation is used while a sigmoid4 function is applied to the output node.The network is trained using the ADAM optimiser [37].The model is trained for 250 epochs with an early stopping of 25 epochs on the test loss function and a learning rate of 0.0001.Given the classification task, we take cross entropy as loss function.A scheme of the DNN model is shown in Fig. 13.Different DNN structures have also been considered, in order to have a fair comparison between quantum and classical classifiers.Usual state-of-the-art networks make use of LSTM [45] and convolutional layers, therefore the following DNN models are studied: • "LSTM" model: starting from the DNN structure we apply a LSTM layer to the particle features (therefore excluding the "global" variable of the total jet charge) before the first Dense layer.The Dense structure is identical to the DNN structure.
• "LSTM+Conv" model: from the LSTM model we firstly apply a convolutional layer of the particle features (as before not applied to the total jet charge).
The two models are shown in Fig. 14.The DNN performance in terms of tagging power is compared to the LSTM and LSTM+CONV models.Results for tagging power as function of the jet p T and η are shown in Fig. 15, where results for different models are comparable within the error and therefore allow us to consider just the DNN model for the comparison with quantum algorithms.

B Unoptimised tagging power distributions
In this section the distributions for the unoptimised tagging power are shown, both for the muon and the complete datasets.This is done in order to check if there are performance biases when cutting on the probability distributions to maximise the tagging power.In

Figure 1 :
Figure 1: Sketch representing possible jet tagging methods.In the exclusive method the information comes from a particle, e.g. the muon, whose charge is correlated to the b hadron (lower jet); in the inclusive method, the information is extracted from the jet constituents (upper jet).The magnitude of the particle momentum transverse to the jet axis is labelled as p rel T .

Figure 2 :
Figure 2: Circuit representation of the Amplitude Embedding model.In blue, variables are embedded into the amplitudes of a quantum state.In red, trainable generic rotational gates to be optimised during the training phase.In green, CNOT gates entangling qubits with a circular topology.

Figure 3 :
Figure 3: Circuit representation of the Angle Embedding model.In blue, x-axis rotational gates used to embed the variables into the quantum circuit.In red, trainable generic rotational gates to be optimised during the training phase.In green, CNOT gates entangling qubits with a circular topology.

Figure 4 :
Figure 4: Probability distributions for jet tagged to b (blue) and b quarks (yellow), showing separation around 0.5 (a).Probability distribution for the Angle Embedding circuit: jet correctly (wrongly) tagged are plotted in green (red), showing around 0.5 worse classification.The probability distribution for all jets is shown in grey (b).

Figure 5 :
Figure 5: ROC distributions and AUC score for DNN (green), Angle Embedding (blue) and Amplitude Embedding circuits (yellow) for the muon dataset (a) and the complete dataset (b).The dashed line represents a random classifier.

Figure 6 :
Figure 6: Probability distributions for jet tagged to (blue) b and (yellow) b quarks

Figure 7 :
Figure 7: Tagging power tag with respect to (a) jet p T and (b) jet η for the muon dataset.The Angle Embedding circuit and the DNN show similar performance.

Figure 8 :
Figure 8: Tagging power tag with respect to (a) jet p T and (b) jet η for the complete dataset.The quantum algorithms perform slightly worse than the DNN, with the Angle Embedding circuit performing better than the Amplitude Embedding circuit.

Figure 9 :
Figure 9: (a) Accuracy of the Angle Embedding structure on the muon dataset versus the number of layers.(b) Accuracy of the (red) Angle Embedding structure and (blue) DNN on the muon dataset versus the number of training events.

Figure 10 :
Figure 10: Training time for 100 epochs for the Angle Embedding (red) and Amplitude Embedding (blue) circuits on the muon dataset with respect to the number of circuit layers.

Figure 12 :
Figure 12: Validation accuracy for noise models as a function of the number of epochs.Blue (light-blue) band represent 1σ (2σ) uncertainty bounds for the noiseless model.

Figs. 16
and 17 the unoptimised tagging power distributions as function of jet p T and η respectively for the muon and the complete datasets are shown.No evident biases are present and the same considerations done for the optimised distributions are valid.

Figure 16 :Figure 17 :
Figure 16: Unoptimised tagging power ε tag with respect to (a) jet p T and (b) jet η for the muon dataset.The angle embedding circuit and the DNN show similar performance.

Table 2 :
Width ∆ cut for different classifiers and dataset.

Table 3 :
Accuracy for noisy circuits obtained with the muon dataset.