Learning the noise fingerprint of quantum devices

Noise sources unavoidably affect any quantum technological device. Noise’s main features are expected to strictly depend on the physical platform on which the quantum device is realized, in the form of a distinguishable fingerprint. Noise sources are also expected to evolve and change over time. Here, we first identify and then characterize experimentally the noise fingerprint of IBM cloud-available quantum computers, by resorting to machine learning techniques designed to classify noise distributions using time-ordered sequences of measured outcome probabilities.


Introduction
In the quantum technologies context, no quantum device can be considered an isolated (ideal) quantum system.For this reason, the acronym Noisy Intermediate-Scale Quantum (NISQ) technology has been recently introduced [1] to identify the class of early devices in which noise in quantum gates dramatically limits the size of circuits and algorithms that can be reliably performed [2,3].As early quantum devices become more widespread, a question that naturally arises is to understand, at the experimental level, whether in a generic quantum device the signature left by inner noise processes exhibits universal features or is characteristic of the specific quantum platform.Moreover, one may wonder to determine if such a noise signature has a time-dependent profile or can be effectively considered stable, in the sense of constant over time, while the device is operating.
The answers to these questions are expected to be crucial in defining a proper strategy to mitigate the influence of noise and systematic errors [4][5][6][7][8], possibly going beyond standard quantum sensing techniques [9][10][11][12][13][14] and overcoming current limitations on probes dimension and resolution [9,10,[15][16][17][18].On top of that, it gains even more importance in case one proves that noise signatures are peculiar to the single device, with the consequence that the issue of attenuating noise effects may be harder than expected.Indeed, each quantum technologies platform, ranging from superconducting circuits [19,20] to trapped ions quantum computers [21], photonic chips [22,23] and topological qubits [24], could need ad hoc solutions that usually are expensive and incompatible from a device to another.In addition, if the noise properties of a quantum device happen to be time-dependent, the system necessarily requires continuous calibrations, thus hindering not only the available runtimes, but also the accessibility from the external user and the replicability of the experiments performed on it.Furthermore, in case the noise fingerprint of the considered device can be easily discerned and remains unchanged over time, one could be able to identify from which specific quantum device certain data were generated just by looking at the noise fingerprint.However, this aspect might create problems, in principle, for possible future usages of the device in privacy-sensitive applications.
In this paper, we aim to shed light on the previously discussed aspects by providing a powerful tool, based on Machine Learning (ML) techniques, for the classification of noise fingerprints in quantum devices with same technical specifications but physically placed in different environmental conditions.ML [25,26] -originally introduced in the classical domain to learn from data, identify distinctive patterns, and then make decisions with minimal human intervention -has been already proven useful to characterize open quantum dynamics [27][28][29] and to carry out quantum sensing tasks [8,[30][31][32], as for example the learning and classification of non-Markovian noise [30,32] or the detection of qubits correlations [31].
Here, we first design a quantum circuit that mimics the transport dynamics of a quantum particle on a network of 16 nodes that are identified by the states of the computational basis {|0000 , |0001 , . . ., |1110 , |1111 }.The designed quantum circuit is measured (by locally applying the Z Pauli operators on some qubits of the circuit) in 9 distinct parts that, from now on, we denote as measurement steps.The routine allowing to record all the outcome in each measurement step is instead denoted as execution.Moreover, the repetition of a given number of executions is called run.Employing the open-access quantum computers offered by the IBM Quantum Experience [33], we experimentally classify a set of quantum devices by executing in all of them the same quantum transport circuit (testbed circuit).The classification is enabled by the presence of a peculiar noise fingerprint associated to each quantum machine.In more details, the ML models are trained by taking as input the distributions of the outcomes recorded at the 9 measurement steps of the testbed circuit.As shown in the next sections, the classification is successfully achieved with a test accuracy greater than 99%, both on diverse IBM machines and on single devices but at different times from one execution to another.Indeed, from our experiments we can observe that the noise fingerprint of each tested quantum devices has also a clear time-dependence, meaning that executions of a quantum circuit, implemented at different times, can be associated to distinctive main traits.
These experimental evidences lead us to the conclusion that different IBM quantum devices exhibit distinctive, and thus distinguishable, noise fingerprints that, however, can be characterized and predicted by ML methods.Therefore, the proposed solution might be pivotal to certify the timescheduling and the specific machine on which a given quantum computation is executed.Moreover, learning the noise fingerprint of the quantum device under analysis could play a key role both for diagnostics purposes -especially in all those contexts where logic quantum operations cannot be error-corrected [2] and thus need to be "noise-free" as much as possible -and to accomplish benchmarking and certification [34] of quantum noise sources within a pre-established error threshold.

Experimental platform
For our experiments we employ the IBM Quantum cloud services to run remotely quantum circuits on several machines.To interact with the remote services, we use the Qiskit Software Development Kit (SDK) [35], which is an open-source Python SDK, useful both to simulate quantum dynamics (with or without noise) and to program a given set of operations on a real quantum device.Overall, we have at our disposal up to 11 superconducting quantum computers ranging from a single qubit up to 15 qubits, with different topology and calibration routines.For all the available devices and their characteristics, we direct the reader to the IBM documentation [33].
The accessibility and availability of the IBM devices allow to carry real experiments having the flexibility of taking either a lot of samples in a short amount of time, or collecting samples from the same circuit but at longer Fig. 1 On the left, circuit implementation of the quantum transport dynamics employed as a testbed.The quantum circuit, which involves 4 qubits, is repeated more than once (in our experiments, 3 times), and 2 of the 4 qubits are measured at regular steps.The outcome probabilities obtained by our measurements, which together form the datasets to train, validate and test the used ML models, are fed into a Support Vector Machine (SVM) -schematically represented on the right -in order to be classified.
time intervals.As it will be shown below, both these aspects will be properly exploited in carrying out our experiments.Moreover, one can also run the same exact circuit not only on a single device but on multiple machines, thus enabling the creation of complete datasets of quantum experiments to be fed in ML algorithms.Regarding the generation of our datasets, we refer the reader to the source codes at the address provided at the end of the manuscript.
Overall, several experiments (explained in detail later) have been conducted on different IBM chips (specifically, 'Yorktown', 'Athens', 'Bogota', 'Casablanca', 'Lima', 'Quito', 'Santiago', 'Belem', and 'Rome').The chips differ by two main aspects.The first is the architecture (or connectivity) of the qubits, which ranges from a simple line topology to a ladder or a star topology.The second important difference is the so-called quantum volume [36] (8,16,32 for the machines used in our experiments) that quantifies the maximum dimension of a circuit that can be effectively executed, and is correlated also with the noise affecting each device.Indeed, some quantum machines are inherently noisier than other, and even single qubits inside a machine can have a distinctive noise profile.All these peculiar differences in noise and topology represent the fingerprint that we aim to exploit using our method.
Before proceeding, it is worth stressing that, albeit the proposed experiments are carried out on superconducting devices, the gate-model approach adopted here is valid in principle for a large class of NISQ devices.

Testbed quantum circuit
To learn the noise fingerprint of IBM quantum devices, we employ a quantum walker on a network of 16 nodes realized by the quantum states |0000 , |0001 , . . ., |1110 , |1111 through the circuit in Fig. 1.Notice that, for our purposes, the number of qubits of the testbed circuit can be just a few; however, this does not imply that the proposed solutions cannot be applied to circuits with generic dimension.
The idea of our testbed circuit is to simulate a quantum transport dynamics, whereby, by initialising a quantum particle in one node of the network (specifically, in the state |0000 ), the particle "flows" across the allowed pathways thanks to the action of local operations and of controlled NOT (CNOTs) and Toffoli gates (denoted in Fig. 1 by a light blue and light purple rounds, respectively, with the symbol 'plus' inside).We recall that the CNOT is a twoqubit quantum operation, commonly used to entangle/disentangle Bell states, that flips the second qubit when the first qubit is in |1 .Instead, the Toffoli gate is a universal "controlled-controlled-not" (3-qubit) operation where a third qubit is flipped when two control qubits are both in |1 .In our circuit in Fig. 1, two qubits (i.e., q 3 and q 2 in the figure) are used to get information on the particle, providing at each measurement the pair of bits (0, 0), (0, 1), (1, 0), (1,1), where the first and second bits correspond, respectively, to the outcomes measured on q 3 and q 2 .Conversely, qubits q 0 and q 1 are employed as ancilla qubits to move, and thus control, the particle.Then, this quantum circuit is repeated 3 times, with the aim to collect data on the quantum dynamics in each IBM device.As already mentioned in the Introduction, the resulting quantum circuit (given by repeating 3 times the circuit in Fig. 1) is locally measured in 9 distinct parts (corresponding to the measurement steps) thanks to the simultaneous application of Z Pauli operators σ z on the qubits q 3 and q 2 , from which the measurement outcomes are collected.It is worth noting that the procedure we are proposing is not based on repeated measurements as in a quantum monitoring protocol or in Zeno quantum dynamics [37][38][39][40], since, each time a measurement is performed at a given measurement step (say the k-th, with k = 1, . . ., 9), the whole testbed quantum circuit is regenerated and then (locally) measured at the subsequent step, i.e., the (k + 1)-th.
In a single repetition, the quantum circuit is initialized in |0000 that corresponds to the measurement outcomes (0, 0), and then two Hadamard gates (blue squares 'H' in Fig. 1) are applied to both q 0 and q 1 .Thus, since the two CNOT gates are conditioned to q 0 and q 1 respectively, the probability to get 1 or 0 in q 2 and q 3 after the CNOTs is 0.5.In this way, after the Pauli-X rotation (green squares 'X' in Fig. 1) and the Toffoli gate, the system is in the state 1  2 (|0110 + |0111 + |1001 + |1100 ) before that the qubits q 2 , q 3 are measured along the z-axis (black squares in Fig. 1).This entails that, at the end of the circuit, measuring q 3 and q 2 provides the results {(0, 0), (0, 1), (1, 0), (1, 1)} with probabilities respectively {0, 0.5, 0.25, 0.25}.Of course, such a dynamic only occurs under ideal unitary evolution, which is not the case of the implementation on real experimental devices.In our case, the noisy environment, in which the machines are immersed, alters each realization of the simulated quantum (transport) dynamics, thus making stochastic the evolution of the particle within the circuit.As we will prove below, this randomness is a specific feature of each machine and changes from one device to another, thus allowing us to perform classification tasks.Specifically, are the discrepancies between the measured outcome probabilities (from qubits q 2 and q 3 ) on one or more IBM machines that enable to learn the corresponding noise fingerprint, and then classify from which device the input data have been generated.Here, it is worth noting that, despite from one implementation to another a slight different physical Hamiltonian may be implemented in the chips of each quantum device, the variations observed in the measurement outcome distributionshaving a prominent random nature -are not ascribable to such a deterministic aspect, but to a stochastic cause thus pertaining to an external noise source.However, a same stochastic process can affect differently two equivalent quantum dynamics but originated by two distinct physical Hamiltonian operators.Therefore, the fingerprint that we leverage for the classification can be due not only to differences in the noise profiles affecting the quantum devices, but also on their dependence on the way the testbed circuit is physically implemented.
While our picture of considering the implemented quantum dynamics as the ones of a quantum walker may be pretty useful for illustrative purposes, we want to stress that this kind of dynamics has been chosen just for its simplicity and generalizability with other types of NISQ devices.Indeed, our results shown in the following are quite general, since they do not depend on specific dynamics and do not require initial assumptions.Accordingly, we expect that such results may be re-obtained in other quantum devices, even ones not necessarily designed to carry out computing tasks.

Machine learning model
Let us provide some details on the adopted ML model, i.e., the popular Support Vector Machine (SVM) [26].
The dataset yielded as input to the SVM is a set of n points x q ∈ R p , with q = 1, . . ., n, each of them living in the p-dimension space of the data features, where a feature is a distinctive attribute of the data set elements.
In binary classification problems, to each x q with q = 1, . . ., n is associated a class y q ∈ {−1, 1} that represents the desired output of the SVM.By contextualizing it to our problem, the binary classes y q denote if a given set of points x q have been generated (+1) or not (−1) by a specific machine or in a time window/interval.A SVM for binary classification is trained such that the two classes of points (provided as input to the ML model) are separated by the hyperplane that maximises the distance between the hyperplane itself and the nearest points of the classes (commonly denoted as margin).If the points x q of the data set are not linearly separable (which is most often the case), then the value of the margin is negative and the points cannot be classified.To circumvent this problem, SVMs employ a clever mapping in an higher-dimensional space (called feature-space) with polynomial or Radial Basis Function (RBF) kernels that allows for an easy classification as in Fig. 1.The extension to multiclass classification problems is then obtained by associating a class with multiple values to each x q .In our experiments, part of the generated dataset is used as a validation set to choose the best mapping among the kernels: linear (meaning that the data is already linearly separable), polynomial with degree 2, 3 and 4, and RBF.In many cases, just the simple linear kernel is enough to successfully perform the classification, but in other cases (e.g., in multiclass classification) the more complex kernels may be beneficial.
Finally, in our experiments, the classification accuracy is computed by comparing the predictions ŷ returned by the ML models with the desired classes y of the test set: where 1{•} is the indicator function such that 1{c} = 1 if c is true, and 1{c} = 0 otherwise.In this regard, to clarify the naming convention for the reader, we refer to: "training", "validation" and "test" sets, to identify three non-overlapping partitions of the data.These partitions are used respectively to: train the model, validate the best parameters, and test the performance on unseen data.In the experiments we randomly select 60% of the data to train the SVM model, 20% to validate different configurations (i.e., SVM kernel type), and 20% to report the results on unseen data.

Experiments description
The results, which we are going to show, concern three series of ML experiments that use two different datasets, obtained from the IBM quantum chips mentioned above.
In the first two experiment series, the ML models are trained both to discriminate the noise fingerprint of different quantum devices and to identify a time-dependence in each of them.The training of some of the models is performed on the dataset here denoted as FAST that collects the outcome distributions measured in temporally-close executions of the testbed quantum circuit on 7 different IBM quantum machines (i.e., 'Athens', 'Bogota', 'Casablanca', 'Lima', 'Quito', 'Santiago', 'Yorktown').In these experiments, 20 parallel tasks (corresponding to the maximum allowed number) are appended to the IBM fair-share queue, and, once a task is concluded, another task is immediately added.For each task the testbed circuit is run 8 000 times for each one of the 9 different steps, and the probabilities to get the measurement outcomes are computed over 1 000 shots among the total 8 000 to obtain 8 different outcome probabilities times 9 steps per task.
Conversely, in the third ML experiment series, we perform a robustness analysis by making stricter the time constraints on the employed datasets.Specifically, in those experiments, and in part of the previous ones, we employ a second dataset, called SLOW, which is composed of measurement distributions extracted from executions in two different quantum machines ('Belem' and 'Quito') more "slowly" than the data in the first dataset.As represented in Fig. 2, more "slowly" means that only one task per time is appended to the queue and then run, waiting at least 2 minutes from the conclusion of the previous task.Moreover, for each task the testbed circuit is executed, for each one of the 9 steps, 1 000 times that corresponds to the number of shots set to compute the outcome distributions.Fig. 2 Elapsed hours to collect all the measurement outcomes on the IBM machines 'Belem' and 'Quito' (solid blue and dashed red lines, respectively) for the dataset SLOW.Each point of the curves, obtained over 1 000 executions of the testbed quantum circuit for each measurement step k = 1, . . ., 9, is associated to the relative physical/real time in which the measurement probabilities are computed in a single run.Notice that, if compared with the time scale of the vertical axis (y-axis), which is expressed in hours, the computation of the 9 000 executions of each run can be considered practically instantaneous, i.e., in the order of some seconds.Moreover, the anomalous behaviour of the curves after 1 500 runs has to be attributed to the policy of the IBM fair-share queue.
We recall that in each execution, for both the FAST and SLOW datasets, the qubits q 3 , q 2 of the testbed quantum circuit (the full circuit is obtained by repeating 3 times the circuit in Fig. 1) are measured iteratively after each CNOT and Toffoli gate, for a total of 9 outcome probabilities at the consecutive measurement steps k = 1, . . ., 9. Overall, for each machine, we have collected 2 000 sequences of 9 probability distributions built with the measurement outcomes from the qubits q 3 and q 2 of the testbed circuit.This means that a total of 2 000 000×9 single executions have been run on each quantum machine that we employed to generate the FAST dataset, and similarly for the SLOW one.
As final remark, let us note that the FAST dataset is employed for the experiments illustrated in section 6 and part of section 7, while the SLOW dataset to complete the experiments in section 7 and perform in section 8 a robustness analysis at different time scales.

Quantum devices classification
As first, we present binary classification experiments.For each pair of IBM machines, a SVM model is trained using the dataset FAST (introduced in section 5) with the aim to identify on which device the executions of the testbed quantum circuit are run.The inputs of the SVM model are the distributions of the measurement outcomes from qubits q 3 and q 2 recorded at the discrete measurement steps k = 1, . . ., 9. Specifically, two different kinds of inputs are set: In the first we consider only the outcome distributions measured at the single step k with k ∈ [1,9], while in the second we concatenate all the measurement probabilities in ordered sequences 1, . . ., k.Then, our ML experiments are performed by alternatively taking the two types of inputs; we will report below the resulting accuracy values for both of them.From the results of our experiments -reported in table 1 -we observe that it is sufficient to use only the outcome probabilities corresponding to the first three measurements at k = 1, 2, 3 to reach more than 99% of accuracy in discriminating all the pairs of tested machines.This implies that, in a realistic deployment scenario, one needs less data than the amount acquired here to reach good classification performances.An additional observation we can Table 2 Classification accuracy, denoted as α(•), of multiclass SVMs trained with the measurement probabilities collected in the dataset FAST.A large number of executions are run on 7 different IBM machines (1st column of the table) in correspondence of the measurement steps k ∈ [1, 9] (2nd column).Different inputs are tested: Outcome distributions at single steps (3rd column), sequences of measurement probabilities computed on windows of width from 2 to 5 steps before each k (from 4th to 7th columns), and the sequences of all measurement probabilities obtained from the 1st to the kth step (8th column).Finally, the last row of the table reports, for all the ks, the averages of the accuracy values in the rows above; the average of the last column is omitted since the accuracy values therein are calculated on models with different numbers of input measurement steps.make is that the accuracy is not monotonic in k when considering the classifier using single measurement data.This can be due to the fact that, at various measurement steps, to distinguish the noise fingerprint from a single measurement probability might be easier or harder.On the other hand, we can also observe that the accuracy is steadily increasing when as input is set the sequence of all outcome distributions up to any measurement step k.Hence, from this we can deduce that, to identify the noise fingerprint of IBM quantum devices, sequences of outcome distributions recorded at more than one measurement steps need to be taken into account.This is also the reason why we deem important to frame the issues addressed in this paper as belonging to a noise fingerprint in time instead of single shot measures.
Let us now extend the binary SVM algorithms to multiclass classification problems, in which more quantum devices are simultaneously discriminated.In our experiments, the so-called one-vs-rest strategy is adopted [25], where for n distinct classes we train n different binary classifiers that discriminate the elements of a class from the others.In particular, our multiclass SVM is trained with the aim to identify to which IBM quantum machines, among the 7 that have been used, belongs a given set of measured outcome probabilities (from the testbed quantum circuit) of the FAST dataset.The results in table 2 reports the test accuracy values returned by the models that are trained with different input data.As in binary classification, for one kind of input data, the model is trained with the outcome distributions obtained at single step k with k ∈ [1, 9] (3rd column of table 2), while another set of input data is provided by concatenated measurement probabilities 1, . . ., k (8th column).Moreover, for the purpose of multiclass classification, further input are also adopted: At each step k the model is trained not only with the outcome distributions at the kth step, but also with a window of preceding measurement probabilities belonging to [k − s, k] with s integer number.Regarding s, the range from 1 (4th column of table 2) to 5 (7th column of table 2) is considered.As for the binary case, the SVM is able to successfully discriminate between the tested machines just by using the measurement outcomes taken in few measurement steps.While the accuracy using the outcomes at single-time measurement steps oscillates, the time-ordered sequence monotonically increases.That confirms our previous observations about the need of a time sequence to have a reliable fingerprint.In addition, the models trained with the input data on sliding windows allow us to understand the effective need of outcome distributions taken from more than a single measurement step for the classification of the noise fingerprint.In such case, we observe that the accuracy at each step k steadily increases with the size of the set of considered steps, and this holds also by looking at the average of the accuracy values computed over all the measurement steps.It is worth noting that the last column on table 2 expresses a similar strategy, where the single accuracy values are provided as output of the models trained on a window (with increasing dimension) that always starts from the 1th to the kth step.In other words, the first accuracy values on top of columns from 3 to 7 correspond to the elements of the last columns for k from 1 to 5.
The high-level of accuracy (even more than 99%) in carrying out binary and multiclass classification of the IBM quantum machines is an evidence for the presence of a strong underlying noise fingerprint in the dynamics of NISQ devices.Indeed, this is the key feature that can allow one to identify, basically in a deterministic way, from which quantum machine a given set of measurement has been obtained.

Noise fingerprint at different time scales
Since the environment of the IBM quantum devices changes quite often (e.g., the machines are calibrated up to multiple times in an hour), we have slightly modified our experiments to prove also the existence of a noise fingerprint that pertains to the temporal evolution of the chip on which a given quantum circuit is executed.To confirm this hypothesis, we have designed a temporal classification setting that we employ with data from both the FAST and SLOW datasets.
Regarding the experiments using the FAST dataset, two sets of measurement outcome distributions are collected for the machine 'Casablanca', one temporally separated from the other by 24 hours.After that, similarly to what done in the previous experiments, a SVM model is used to discriminate the executions implemented the first day on the IBM device from the ones performed on the second day.From these experiments, whose results are shown in table 3, we observe that the designed ML algorithms are able to detect a characteristic fingerprint, still induced by the presence of noise sources, in a single quantum device but in measurement steps separated by a quite long Table 3 Classification accuracy, denoted as α(•), of SVMs -trained with two sets of outcome distributions from the dataset FAST, temporally separated by 24 hours -to predict in the IBM machine 'Casablanca' which executions were implemented the first day and which the second day.Also in this case, the inputs to the SVMs at the measurement steps k (2nd column of the table) are the outcome distributions at single steps (3rd column) or the sequences of measurement probabilities computed at each k (4th column).(24 hours) time interval.In such classification tasks, an accuracy of 95% is achieved by the ML models, just by taking as input the sequence of outcome distributions at the first measurement steps k = 1, 2, 3.

Machine
In order to better quantify the evolution in time of the noise fingerprint, we use data from the runs of 'Belem' in the SLOW dataset.Respect to the  3 Maximum reached accuracy for SVM models trained on sequences of measurement outcomes for all the steps k = 1, . . ., 9 taken from the 'Belem' quantum machine and collected in the dataset SLOW.The model is trained to classify the executions in the window of runs from 1 to 200 from the ones in a subsequent window of 200 runs.Initially, the latter is adjacent to the first window, then it is moved by increasing the gap between the two windows.The plotted curve is then obtained by drawing the accuracy values for the corresponding gaps, expressed in hours.Note that a gap of 6 hours correspond to approximately 90 runs.
previous dataset, the data from the runs in SLOW are more evenly distributed in time so that we have decided to split the data in 10 adjacent windows, each of them containing 200 consecutive runs.Subsequently, the SVMs models are trained to classify if a run has been computed on the first window (from run 1 to run 200) or in another window of the remaining 9. From the results in table 4, we can observe that is difficult to distinguish the runs pertaining to the first window from the runs in the adjacent window (i.e., runs from 201 to 400 in the third column), either considering as input the single outcome distributions at the kth measurement step (the top part of table 4) or the sequences of measurement probabilities from step 1 to step k (bottom part).As a matter of fact, we do not reach 90% in neither case.Conversely, when we consider the subsequent windows (runs after 400 on the next columns), thus at a greater distance from the first window, the classification task becomes easier.
Analogously to the previous experiments, the single measurement outcomes do not seem to carry enough information on the noise fingerprint and the classification accuracy depends on the choice of k.Instead, when we consider the sequences of outcomes for all the steps, we can observe that the noise fingerprint in the first window of runs can be much better distinguished from the corresponding fingerprint in all the subsequent windows, except the neighbouring one.The window from run 1401 to run 1600 seems more challenging to classify with respect to the others.One possible reason for this can be that, as one can see from fig. 2, around the run 1500 the policies of the IBM fairshare queue caused a discontinuity in time.This means that the data distribution inside the aforementioned window has more variance with respect to the data in the other windows and for the ML models can be find more difficult to classify the data.However, even in that case the classification accuracy reaches 100% when using the sequence of measurement probabilities for all the steps k = 1, . . ., 9.
In these experiment, the execution times for all the runs in each window is approximately 12 hours (except for the previously-discussed window from run 1400 to 1600).Thus, we can deduce that 12 hours of time distance between the windows are sufficient to distinguish the noise fingerprint at different times with 100% of accuracy.To find the minimum necessary hours gap, in fig. 3 we report the reached accuracy of a SVM model trained to distinguish the runs in the first window (from run 0 to 200) of 'Belem' within the SLOW dataset from the runs in another window with an increasing time gap among them.We can observe that, in this case, already after 6 hours the noise fingerprint is distinguishable with an accuracy of 100%.In general, we can observe that even starting from different windows in time, and using different window sizes, more than 95% of accuracy is reached after a few hours (in the order of one day).
Overall, we can thus conclude that a clear temporal dependence of noise fingerprint is present in our experiments, even when the same quantum machine is taken into account.

Robustness analysis
Finally, we investigate the robustness of the learned fingerprint at different time scales.For this purpose, taking the IBM machines 'Belem' and 'Quito', we temporally order all the executions of the testbed quantum circuit, by dividing them in 10 distinct windows of 400 consecutive runs, i.e., 200 runs per machine.The elapsed time between runs has been already reported in Fig. 2. In this way, after have generated the SLOW dataset (introduced in section 5) with 2000 runs per machine, the SVMs are trained to classify on which device, among 'Belem' or 'Quito', the testbed quantum circuit has been executed.Specifically, in any experiment designed for the robustness analysis, the ML model is trained over the data collected in a time window of 200 consecutive runs (overall, we consider 10 distinct time windows), and then tested in all the considered time windows including the one used for the training.
All the obtained results -summarized in table 5 -point out the following peculiar feature.Unsurprisingly, the SVM reaches 100% of accuracy in the time window used for the training of the ML model (corresponding to the diagonal of the table), and then, in proximity of the time windows on the diagonal, the accuracy decreases monotonically.This corresponds to the intuition that the machine-related noise fingerprint "fade" with time, due to the evidence -discussed in the previous section -that the noise fingerprint of the IBM quantum devices exhibits a quite prominent time-dependence.However, surprisingly, we observe that the accuracy returns to 100% for time windows of runs far from the training one.We conjecture that this counter-intuitive phenomenon may be due either to the periodic calibration of the machines or to the slowdown induced by the fair-share queue.The latter, indeed, may be also observed in the last part of the SLOW dataset in Fig. 2, and is supported by the evidence that, if we restrict the experiment to the runs from 1 to 1 000 (i.e., the range Table 5 Classification accuracy of SVMs trained to classify on which quantum device, among 'Belem' or 'Quito', a given set of data has been generated.The training of the models is performed with the outcome distributions collected in the dataset SLOW, and then divided in 10 distinct time windows of 200 runs (the first window includes the runs from 1 to 200, the second from 201 to 400, etc).We recall that each run contains the outcomes from all the 9 measurement steps in each execution.The row and column indexes denote, respectively, the number of time windows whose data are used to train and test the ML model.Finally, the reported accuracy values are calculated by using the outcome distributions computed at all the measurement steps k = 1, . . ., 9. 1 2 3 4 5 6 7 8 9 10 1 1.000 1.000 0.995 0.925 0.880 0.865 0.995 1.000 1.000 1.000 2 1.000 1.000 0.995 0.925 0.920 0.910 0.980 1.000 1.000 1.000 3 1.000 1.000 1.000 0.970 0.950 0.950 0.980 1.000 1.000 1.000 4 1.000 0.980 0.995 1.000 1.000 1.000 1.000 1.000 1.000 1.000 5 0.980 0.935 0.955 0.995 1.000 0.995 1.000 1.000 1.000 1.000 6 0.995 0.995 0.995 1.000 1.000 1.000 1.000 1.000 1.000 1.000 7 1.000 1.000 0.995 0.985 1.000 0.990 1.000 1.000 1.000 1.000 8 1.000 1.000 0.995 0.995 1.000 0.990 0.995 1.000 1.000 1.000 9 1.000 1.000 0.995 0.995 0.970 0.960 1.000 1.000 1.000 1.000 10 1.000 1.000 0.995 0.995 0.995 0.995 0.995 1.000 1.000 1.000   where the execution times of the tested machines are more homogeneous as shown in Fig. 2), the resulting accuracy values decrease with time.
The general result that can be deduced from the experiments of the robustness analysis is that, by training our ML model on just 200 runs (corresponding to the diagonal time windows of the table), we are able to identify the devicerelated noise fingerprint with high accuracy for all the 1 800 remaining ones.In this regard, it is worth noting that, between the training samples and the last test ones, there is up to a week in real-time execution (as one can see in Fig. 2).This means that we can consider our classifier to be fairly robust in time, despite the changes in the environment and calibration of the machines that might occur even at time-scales of weeks.

Conclusions
In this work we prove the existence of a noise fingerprint -also admitting a clear time-dependent profile -in the tested IBM quantum machines, which are just a particular class of NISQ devices.We have also demonstrate that such noise fingerprints can be exploited to reliably distinguish the machines by means of SVM models.As general results, our experiments confirm that (i) all the analysed quantum devices exhibit a clear machine-related noise fingerprint that is robust, in the sense that the fingerprint is highly predictable over time in windows of consecutive runs; (ii) the noise fingerprint has also a time dependence, namely it changes over time and after few hours becomes different enough to be distinguished from fingerprint in the past; (iii) in each quantum device, sequences of measurement outcome distributions are required for the accurate learning of the corresponding noise fingerprint.One may conjecture that possible reason behind the latter aspect may be that the noisy dynamics in the IBM machines can be non-Markovian due to the presence of timecorrelations among consecutive samples of the noise field.However, it is worth observing that the SVMs we successfully used in this work are memory-less ML models, which thus ignores possible temporal relations across the measurement steps.Therefore, the gathered data and the adopted ML models are not indicated to validate any hypothesis on non-Markovianity.These aspects, deserving further investigations, will be addressed in another contribution in which memory-less ML models will be compared with other ML architectures processing time series data with variable memory length.In conclusion, despite the microscopic reasons for the existence of a machine-related noise fingerprint are still unknown (indeed, the IBM machines are partly inaccessible), we can now affirm that one can reliably leverage such noise profiles to distinguish, and possibly in the future characterize, different NISQ quantum devices.
As an outlook, learning the noise fingerprint of quantum devices from timeordered measurements of testbed quantum circuits is expected to open the way, in the next future, to many other experiments and ideas.The proposed methodology, indeed, may be applied not only to IBM quantum machines, but even to a larger class of quantum devices, both in commercial or laboratory scenarios.In all of them, classification ML model, exploiting the presence of intrinsic noise sources that give rise to an identifiable noise fingerprint in the devices, may be employed to predict on which machine, and at which time, a given quantum circuit or algorithm was executed.Moreover, our procedures could be adopted to predict if and when the noise fingerprint of a specific quantum device changes over time, e.g., due to calibration actions.Such a knowledge will help in mitigating (time-varying) errors occurring in the computation and, possibly, performing ad hoc error corrections. 0 Fig.3Maximum reached accuracy for SVM models trained on sequences of measurement outcomes for all the steps k = 1, . . ., 9 taken from the 'Belem' quantum machine and collected in the dataset SLOW.The model is trained to classify the executions in the window of runs from 1 to 200 from the ones in a subsequent window of 200 runs.Initially, the latter is adjacent to the first window, then it is moved by increasing the gap between the two windows.The plotted curve is then obtained by drawing the accuracy values for the corresponding gaps, expressed in hours.Note that a gap of 6 hours correspond to approximately 90 runs.

Table 1
Classification accuracy, denoted as α(•), of all the possible binary SVMs trained with the measurement probabilities collected in the dataset FAST.For each experiment, a large number of executions are run on two different IBM machines (whose names are in the 1st column and in the 1st row of the table) in correspondence of the measurement steps k (1st column of each sub-table).Then, two different inputs are tested: Outcome distributions at single steps (whose accuracy values are in the 2nd column of the sub-tables) and sequences of measurement probabilities obtained at each k (accuracy values in the 3rd column of each sub-table).

Table 4
Binary classification accuracy, denoted as α(•), of SVMs trained to classify the outcome distributions belonging to distinct two sets of data.One set is composed by the runs of 'Belem' in the SLOW dataset by numbering them from 1 to 200 in temporal ordering.Also the other set is composed by runs of 'Belem' in the SLOW dataset, but collected within temporal windows specified on the columns title (from run 201 to run 400, from run 401 to run 600, etc. . .).In the top sub-table the models are trained with the outcome distributions taken at the kth measurement step, while in the bottom sub-table the inputs are the sequences of measurement probabilities from step 1 to step k.