1 Introduction

The field of quantum computing has experienced rapid growth in recent years, both in the number of quantum computing hardware providers and their respective processors’ computing power. Companies such as D-Wave Systems, Rigetti, and IBM offer access to their quantum processors, and their use in proof-of-concept demonstrations has been widely discussed in literature. Quantum processing units (QPUs) have been used to solve a wide variety of problems such as traffic flow (Neukart et al. 2017), logistics and scheduling (Venturelli et al. 2015; Stollenwerk et al. 2020), quantum simulation (Streif et al. 2019; McCaskey et al. 2019; Grimsley et al. 2019), and more (Venturelli and Kondratyev 2019; Nishimura et al. 2019). Notably, a recent study by Google showed how their QPU can perform the task of sampling from random quantum circuits faster than state-of-the-art classical software (Arute et al. 2019), ushering a new era in the field of quantum computing. These applications use so-called noisy intermediate scale quantum (NISQ; Preskill 2018) processors to solve various forms of optimization and sampling problems. Most commonly, the problem is formulated as a quadratic unconstrained binary optimization (QUBO) problem, or its equivalent form of an Ising Hamiltonian. The former uses a basis of binary {0,1} variables, and the latter makes use of spin variables {− 1,1}. Both can be solved using existing quantum computing hardware.

The QPUs provided by D-Wave Systems use a quantum annealing algorithm that implements a transverse-field Ising Hamiltonian (Johnson et al. 2011). This quantum protocol prepares an initial Hamiltonian with a simple ground state, and transitions to a Hamiltonian whose ground state is difficult to find. This is referred to as Adiabatic Quantum Computation (AQC) (Van Dam et al. 2001), and under open quantum system conditions as quantum annealing (Kadowaki and Nishimori 1998). Because AQC has been shown to be polynomially equivalent to gate-based quantum computation (Aharonov et al. 2008), and the Ising spin-glass has been shown to be NP-hard to minimize (Barahona 1982), AQC (and quantum annealing) has the potential to significantly impact the fields of optimization, machine learning, and operations research. Equivalently, with gate-model QPUs such as those produced by Google, IBM, and Rigetti, the quantum approximate optimization algorithm (QAOA) is used to solve such Ising Hamiltonians. This algorithm also attempts to minimize a target Ising Hamiltonian by alternating between a driver and mixer Hamiltonian, until the sampling procedure converges to the target state population. The derivation and details of the QAOA algorithm are beyond the scope of this paper, and are discussed in detail in Farhi et al. (2014).

At the end of a quantum annealing run, or a QAOA circuit execution, the measurements are a projection along the z-component of the qubits’ spins, resulting in a sequence of classical bit strings. These states can be interpreted as approximations to finite-temperature Boltzmann states from the classical spin-glass Ising Hamiltonian (Raymond et al. 2016; Verdon et al. 2017):

$$ H(s) = \underset{i}{\sum} h_{i} s_{i} + \underset{i<j}{\sum} J_{ij} s_{i} s_{j}. $$
(1)

The task of programming a quantum annealer or QAOA circuit involves finding a suitable Ising Hamiltonian representation for the optimization task. In this paper, we motivate classification of time series data based on extracting features which exist within the data, and use combinatorial optimization techniques to match and reconstruct data with other time series. We start by reducing the dimensionality of the TS data and encode it as a string. We introduce a pulling procedure, comparing the encoded strings to form a collection of sets, where common features between the strings are extracted. All extracted common features are pooled together, which we can then use to construct new TS and compare between existing ones. We perform these tasks by using the pulled features as elements of the universe in the set cover problem, which has a known QUBO/Ising formulation and can be solved using quantum computers. By reformulating the critical task in our clustering algorithm as a set cover problem, we introduce two novel ideas to quantum clustering algorithms: (1) we avoid representing single vectors with polynomial numbers of qubits, instead representing the features within the data as the qubits, and (2) we perform the clustering task by transferring the core concepts of clustering (and reconstruction) to the quantum algorithm for set cover, as opposed to a direct translation of a distance-based minimization procedure. This results in an algorithm that avoids a classical “learning” procedure, therefore requiring significantly fewer computational resources compared to other classical and quantum methods.

The rest of this paper is organized as follows. Section 2 provides a short overview of existing methods for both classical and quantum clustering. Section 3 motivates the task of TS reconstruction, explains the methods used to discretize the data, and how to convert the discretized data to the set cover problem and its representative QUBO. Section 4 shows how to extend the reconstruction method to classify the TS data. Section 5 outlines the experimental setup used in this analysis to test the developed method using various open-source data sets, and Section 6 reviews the results from those experiments. Section 7 presents the conclusions from our work and outlines future research in this area.

2 Previous works

Quantum computing-based approaches which exist in literature involve fundamentally different approaches than that introduced in this work; we provide a brief overview of some key methods and algorithms related to quantum clustering. Assuming the existence of error-corrected quantum processors (and the existence of quantum RAM), it has been shown that quantum computers could perform k-means clustering exponentially faster than their classical counterparts (Lloyd et al. 2013). Other works have also shown how to reformulate parts of classical clustering algorithms as quantum subroutines that can be executed on error-corrected gate-model QPUs (Alexander et al. 2018; Aimeur et al. 2013; Wiebe et al. 2015; Horn and Gottlieb 2001). In quantum annealing, a similar approach has been shown in which the objective function of the clustering task (minimizing distance metrics between high-dimensional vectors) has been directly translated to a QUBO, with each vector’s possible assignment represented via one-hot encoding to physical qubits (Kumar et al. 2018; Neukart et al. 2018).

Classical time series (TS) analysis is considered to be a challenging task due to high number of dimensions involved resulting in the Curse of Dimensionality phenomenon. A series of works address the question of efficient dimensionality reduction (Keogh et al. 2005; Lin et al. 2003; Lin et al. 2002; Senin et al. 2018; Schäfer and Högqvist 2012; Patel et al. 2002; Guo et al. 2010; Xiaodong et al. 2002), explaining the trade-off between information loss and search space size. Main results presented in this manuscript are obtained with Symbolic Fourier Approximation (SFA) method (Schäfer and Högqvist 2012) due to its pruning power, noise-robustness and scalability. SFA represents each real-valued TS in a frequency domain by a symbolic string using the discrete Fourier transform. These transformed TS can then be used by classical string-based similarity algorithms such as phonetic distance based, Levenshtein, Hamming, Jaro, Jaro-Winkler measures, and more (Gomaa et al. 2013).

Classical TS clustering techniques can be split into the following categories: model-based, feature-based, shape-based and their combinations (Aghabozorgi et al. 2015). In the model-based approach the TS is encoded and fit by parametric models and clustering is applied to these extracted parameters (Liao 2005). In feature-based methods, the features of TS, like Fourier components, periodicity, trend, number of peaks, and variance, are extracted and later clustered by conventional algorithms (Hautamaki et al. 2008; Christ et al. 2018). Shape-based approaches refer to comparing shapes of TS directly and matching them according to specifically chosen metrics. A typical example for this approach is Dynamic Time Warping (DTW) (Sakoe and Chiba 1978), which has been shown to outperform Euclidean metrics (Chu et al. 2002; Vlachos et al. 2002). DTW-based classical methods are used to evaluate the accuracy of our approach in Section 6. For more details on classical approaches to TS clustering, we refer the reader to Gonzalez et al. (2014), Fu (2011), and Aghabozorgi et al. (2015).

3 Time series reconstruction: problem formulation

Clustering techniques generally require specific data representation, similarity measure definitions, and clustering algorithm selection. Similarly, in our quantum computing-based approach, we represent the TS data as encoded strings from which we formulate semi-supervised clustering and optimal reconstruction as a set cover problem, and provide metrics based on solutions to the set cover problem. While different than classical approaches (Iwama et al. 2018; Acharya et al. 2010; Frieze et al. 1999; Skiena and Sundaram 1995), we preserve the computational complexity of the problem, while introducing a method that is based on latent features within the data.

In order to reconstruct given time series data, we start by discretizing the data, and comparing the encoded strings to generate the elements of our universe to form the set cover. This pulling technique is crucial to allow feature-wise comparison of the data, as well as arbitrary reconstruction of TS using existing (or training) data. We use existing techniques for discretization, and explain the pulling procedure in detail. We then show how to use this data to construct the set cover problem for quantum optimization.

3.1 Discretization and pulling technique

There are many ways to discretize time series data, as reviewed previously. For our purposes, we use the symbolic Fourier approximation (SFA) method (Schäfer and Högqvist 2012), as it provides differentiation between separate TS classes and features in high-dimensional data sets, allowing us to use these representative symbols for our quantum algorithm. Nevertheless, the exact discretization is data-dependent, with various hyperparameters (such as number of letters in the alphabet and length of each encoded string) present in the method; for a full explanation we refer the reader to Schäfer and Högqvist (2012). Given the encoded strings, we compare the time series using the following pulling procedure, illustrated in Fig. 1. This pair-wise comparison is considered a preprocessing step necessary to formulate our set cover problem. Starting with one fixed string (red in the figure), we consider each encoded character as an independent element in the universe setFootnote 1 (U = {0,1,2,3,4} in the figure). A second string (green in the figure) is compared element-wise by successively moving the second string along the first, as illustrated. At every iteration, all character matches between the two strings are recorded as a new set. In the example from Fig. 1, the set of sets is \(V~=~\left \{ \left \{0 \right \}, \left \{\emptyset \right \},\left \{ 0,2\right \}, \left \{\emptyset \right \},\left \{1,2,3 \right \}, \left \{\emptyset \right \}, \left \{\emptyset \right \},\left \{ 3\right \}, \left \{\emptyset \right \} \right \}\).

Fig. 1
figure 1

Schematic illustration of TS encoding and pulling procedure to produce subsets of set \(V=\left \{ \left \{0 \right \}, \left \{ \emptyset \right \},\left \{ 0,2\right \},\left \{ \emptyset \right \},\left \{1,2,3 \right \},\right .\) \(\left .\left \{ \emptyset \right \},\left \{ \emptyset \right \},\left \{ 3\right \}, \left \{\emptyset \right \} \right \}\). The optimal selection to cover \(U=\left \{ 0,1,2,3,4 \right \}\) in this case would be underlined subsets \(V=\left \{ \left \{0 \right \}, \left \{1,2,3 \right \} \right \}\) with item numbers 0 and 4

The procedure is repeated for the rest of the encoded training TS to form the set of sets V. This set, which is a union of all subsets obtained via the pulling technique, represents the features in common between the target time series and all other time series in the data set. Given this aggregate set, the goal is now to select the minimal subset that most closely reconstructs the universe, which is the NP-hard set cover problem. In the case illustrated in Fig. 1, the optimal selection of subsets is underlined in red. In principle, solutions of this set cover problem do not preserve order of elements, and allow the use of the same element multiple times. This feature is useful for TS comparison, as elements of the time series data can be permuted and duplicated without affecting our reconstruction method.

3.2 Formulating the set cover problem

Given the encoded strings and the common set of features V, we can now formulate the set cover problem as a QUBO, following the method demonstrated in Lucas (2014). Consider the universe set \(U=\left \{ 1, ..., n \right \}\), and a set of subsets Vi, such that \(U=\bigcup \limits _{i=1}^{N} V_{i}\), \(V_{i} \subseteq U\). Finding the smallest number of subsets Vi whose union is U is a well-known NP-hard optimization problem in the worst case (Karp 1972). In order to map the set cover problem to a QUBO problem, we use the following binary variables:

$$ x_{i}=\left\{ \begin{array}{ll} 1, & \text{if set }\ i \text{ is included}, \\ 0, & \text{otherwise,} \end{array}\right. $$
(2)

and

$$ x_{\alpha,m}=\left\{ \begin{array}{ll} 1, & \text{if the number of}\ V_{i} \text{ which include element } \alpha \text{ is equal to } m, \\ 0, & \text{otherwise.} \end{array}\right. $$
(3)

Here, αU denotes an element of universe set, and m signifies if element α appears in m subsets. We consider the full QUBO as a sum of two components:

$$ H_{A}=A\sum\limits_{\alpha=1}^{n}\left( 1-\sum\limits_{m=1}^{N} x_{\alpha,m}\right)^{2}+A\sum\limits_{\alpha=1}^{n}\left( \sum\limits_{m=1}^{N} mx_{\alpha,m}-\underset{i:\alpha\in V_{i}}{\sum} x_{i}\right)^{2}, $$
(4)

and

$$ H_{B}=B\sum\limits_{i=1}^{N}x_{i}. $$
(5)

The complete QUBO is given by H = HA + HB (Lucas 2014). The first summation in HA imposes that exactly one of xα,m must be selected in the minimum via a one-hot encoding. The second term in HA represents the number of times α is selected, and that this is equal to the number of selected subsets α appears in (m, as only one xα,m can be 1 in the minimum). The final term HB (5) serves to minimize the number of Vi needed to cover the universe U. The total number of variables required is N + n(1 + M), where M is the maximal number of sets that contain given element of U (see Lucas (2014) for details). The limiting case where each element of Vi included covers only one element of U constrains the coefficient of HA and HB to 0 < B < A. The closer the coefficient B and A, the more weight is given to (5), minimizing the number of elements selected from V.

In our application of time series reconstruction, the final size of the QUBO is heavily dependent on our choices during discretization. For example, the number of binary variables is equal to \(N_{\text {train TS} }\left (2L-1 \right )\left (L+1 \right )\), where Ntrain TS is the number of TS in the training set used for reconstruction, and L is the length of string that encodes the TS. Increasing the string length to encode each TS changes the size of the universe U. Allowing longer encoded strings to represent the data creates more subsets Vi. Therefore, there exists a trade-off between the granularity of the encoded strings and the ability to solve the set cover representation of the problem. Including more characters in our alphabet for discretization changes the non-empty sets Vi, which the number of quadratic elements in the QUBO depends on. The general trend is, however, that the number of the quadratic element decreases with the increase of the characters used in our alphabet. This is explained by the properties of the pulling procedure described above, since a smaller alphabet produces more non-empty elements Vi which could be used for reconstruction of the universe U. In Fig. 2 we show how varying these hyperparameters of the discretization affects the size of the QUBO problem, based on 20 test samples from the BeetleFly data set (Hills et al. 2014).

Fig. 2
figure 2

a The number of quadratic terms in millions as a function of string and alphabet length. b Quadratic elements as a function of alphabet length, with string length being fixed to 6. c Quadratic terms as function of string length, with alphabet length being fixed to 6. The corresponding isolines (b, c) are shown with dashed line on surface plot (a). Analysis was performed using 20 test samples from the BeetleFly data set (Hills et al. 2014)

4 Semi-supervised classification

We can now combine the methods described in the previous sections—constructing the universe set U from discretized data and the subsets V —to perform semi-supervised clustering. We start by separating the input TS data into two groups—training and test data. In our case we use training data sets with known labels, and the task we solve is to use the labeled data to assign labels to the test set. Normally, the training set with labeled data is significantly smaller than unlabeled test set, which we exploit in our method.

We encode both the training and test data sets into strings using the method described in Section 3.1. We then perform the reconstruction procedure for every TS in our test set using the entire training set. Each TS from the test set is assumed to individually form a universe U, and is to be reconstructed using the sets Vi, obtained via the pulling procedure. Explicitly, using Fig. 1, the red string is the TS from the test data set, and all strings in the training set are pulled through (green strings) to obtain the Vi’s. This allows us to compare every test TS to the full training set in one-versus-all manner. Then, using the universe U and Vi’s from the pulling procedure, we formulate the set cover problem outlined in Section 3.2. Thus, a single solution to that set cover problem (even sub-optimal in the worst case) allows us to reconstruct each TS from the test set using a set of discretized features obtained from all elements which appear in the training set. Furthermore, since annealing-based sampling methods produce finite-temperature Boltzmann distributions (Raymond et al. 2016), various optima of the set cover problem could yield different ways to reconstruct the test TS using the training set. Due to this, it is therefore the users’ task to use these reconstructions to associate each test TS with a label from the training set. We outline the steps of our classification procedure using pseudo-code in Algorithm 1.

figure a

To classify the reconstructed test TS data we evaluated three different similarity metrics using set cover solutions: largest common subset Vi, highest number of common subsets Vi, and largest sum of common elements in selected Vi. We briefly explain how each metric is calculated, and discuss the performance of each.

  • Largest common subset. Given a candidate solution to the set cover problem, the label corresponding to the Vi which contains the most elements is selected. The label is then assigned to the test TS. This metric captures the longest continuous set of features from the training TS data, and assumes that is sufficient to determine the label.

  • Number of common subsets. Frequently, multiple Vi’s from the same training TS are used to reconstruct a test TS. In this metric, we count the number of Vi subsets used to cover the universe. The test label is assigned the same label as the training TS which appears most frequently in the set cover solution.

  • Largest sum of subsets. This metric is a combination of the previous two. For every training TS that is used to reconstruct a test set, the total number of elements used by each is counted (summed over all Vi’s). The label which corresponds to the training TS with the largest sum is assigned to the test TS.

These metrics allow us to quantify the accuracy of our semi-supervised clustering algorithm. The first two metrics, being based on large sets of common features between the TS, performed the best (results shown in the next section). There was no significant difference between the two metrics, and the superiority of one metric over the other varied between data sets. The third metric, which was a combination of the first two, performed worse than either of the first metrics in the majority of the cases tested. While unexpected to begin with, this observation could be explained by the fact that because the third metric admits matches with many small subsets Vi that are selected in the set cover, this metric could miss significant signatures present in the TS data. The largest common subset metric was selected for the experiments presented in the next section. It should also be noted that the use of labeled training data is not designed to not reach the accuracy of supervised learning methods. Moreover, there are modifications that could be made to the methods presented to improve the accuracy, for example increasing the word length and/or using a larger train set. Both are constrained in our use-case to prohibit excessively large QUBOs from being constructed. The goal of this method, as described, is to allow for relatively high accuracy using small sets of training data.

We provide an illustrative example of our QUBO-based reconstruction and classification in Fig. 3 using the BeetleFly data set. The task is to reconstruct the data in Fig. 3a using (b) and (c). For this example, an alphabet of size 5 was used for encoding, color-coded in the figure. The results of the set cover problem, formulated using the methods explained in previous sections, are three sets, shown as v1,v2, and v3 in Fig. 3. Meaning, each box (representing a fifth of the TS data per box) that appears in one of the subsets forming the solution is designated as such. Specifically, v1 = [‘A’, ‘E’], v2 = [‘E’, ‘B’], and v3 = [‘C’]. Therefore, the union \(v_{1} \bigcup v_{2} \bigcup v_{3} = U\), where U =‘ACEEB’, the test TS data to reconstruct. For classifying the reconstructed sample, we refer to the classes of the training data used for the reconstruction, and note that the training samples in Fig. 3b and c belong to two different classes. Using the similarity metrics defined above, it is easy to determine that v1 and v2 both originate from the time series (b), whereas only v3 (which contains only a single element) is obtained from (c). Therefore, (a) is assigned the same label as (b). This example is representative of the majority of cases encountered during classification, with components of the reconstructed TS varying across multiple training samples, and often also across multiple classes.

Fig. 3
figure 3

An illustrative example of reconstruction and classification from the BeetleFly data set. a A test TS sample (encoded as ‘ACEEB’) reconstructed from two training TS. Each box in the sub-figure is encoded as a single letter in a string, as per the color bar. The subsets vi obtained from the pulling procedure and used to reconstruct this data are shown both in the reconstructed (test) TS and in the training TS. b The first training data used for reconstruction and classification (encoded as ‘EABBE’). c A second time series used for reconstruction (encoded as ‘CCEAB’)

5 Experimental setup

The experiments performed in this work used open-source labeled TS data available publicly (Bagnall et al. 2017; Bagnall et al.). We restricted our analysis to univariate TS data with two classes and small training set size to make our work amenable to NISQ devices in the near future. However, this method of semi-supervised classification can be used with any number of classes, at the cost of QUBO size. Since both the number of TS in the training data and the word length used to encode the TS contribute to the number of variables in the QUBO, we select data sets that have small numbers of TS in the training set. The test and training sets used in these experiments are already determined and labeled by the source, allowing us to easily calculate the classification rate of our method and avoid the step of selecting a training set. To benchmark the performance of our classification method, we compared the accuracy of our labeling to semi-supervised and unsupervised classical classification methods. The results of these experiments for the various data sets are summarized in Table 2.

5.1 Data sources

To test the robustness of our method we collected a variety of data sources of different types. We briefly review each source and provide a literature reference for further details. We note that in the data sources’ accompanying cited works, higher classification rates than our methods are reported using supervised algorithms. In our analysis we do not consider supervised classification algorithms, and compare our semi-supervised quantum-based approach to similar classical algorithms.

SonyAIBORobotSurface1 (Mueen et al. 2011) data is sensor data collected from a small, dog-shaped, quadruped robot. It is equipped with multiple sensors, including a tri-axial accelerometer. In the experiments section we classify between roll accelerometer measurements on two classes of surfaces: soft carpet and hard cement.

GunPoint (Ratanamahatana and Keogh 2005) data includes motion tracking of actors’ hands during gun-drawing and gun-pointing actions. For both classes the X-component of the actor’s right hand centroid is tracked and used to distinguish between the two classes.

TwoLeadECG (AL et al. 2003) and ECG200 (AL et al. 2003) are electrocardiogram data sets available at the PhysioNet database (AL et al. 2003; Wagner et al. 2020). The first includes long-term measurements from the same patient using two different leads. The classification task aims to differentiate between each lead signal. In contrast, the second ECG200 (AL et al. 2003) set contains electrical activity recorded during one heartbeat. The two classes are the normal heartbeat and a Myocardial Infarction records.

BeetleFly (Hills et al. 2014) time series data is generated from binary images developed for the testing of shape descriptors. The external contour of these images is extracted and mapped into the distance to the image center. The two image classes are contours of beetles and flies.

Chinatown (Chinatown 2020) data is collected by an automated pedestrian counting system in the city of Melbourne, Australia. The classes are based on weekday or weekend traffic.

It is important to note that all data sources used in this experiment are real-world data sources which are available for public use. Furthermore, the sources are pre-divided into labeled training and test sets. These sets were used as-is in the experiments performed below. Validation is performed by measuring the classification rates of each methods on the labeled test data.

5.2 QUBO sizes and optimization

The QUBOs generated by our methods were too large to be optimized using the largest available QPUs (D-Wave 2000Q) at the time of experiments. The exact sizes of the QUBOs for each data set are shown in Fig. 4. To solve the QUBOs we used simulated thermal annealing (SA), a well-known classical heuristic for solving such optimization problems. The specific implementation of SA was from the D-Wave Python package for classical QUBO optimizers (D-Wave Systems 2021). We found that 20,000 samples and 1000 SA sweeps (with geometric interpolation of the inverse temperature) were sufficient to ensure that low-energy local minima were sampled within reasonable times per QUBO. We use the default SA settings in the package for initial and terminal inverse temperature selection (for more information about the implementation of SA we refer the reader to (D-Wave Systems 2021)).

Fig. 4
figure 4

Distribution of number of QUBO variables for all data sets in Table 1

The specific parameters used for the TS encoding to generate the QUBOs are shown in Table 1. In general, the longer the TS are and the fewer TS are in the training set, the finer the discretization method required to accurately classify the test data. In all data sets we were able to reconstruct each test TS with elements from the training set, as explained in Section 3.2. The distributions of the number of variables in each QUBO for all data sets is shown in Fig. 4.

Table 1 Table with data set description, number of TS in training and test sets, length of TS, and length of each encoded string and number of different letters used to encode data set

5.3 Classification benchmarking

For the purposes of evaluating our QUBO-based classification method quantitatively, we compare two classical time series classification algorithms based on dynamical time warping (DTW) (Sakoe and Chiba 1978) measures: k-means clustering and a classical analogue of the semi-supervised method described in the paper. The motivation for using these specifically is that both are based on pair-wise similarity metrics as in our approach. DTW applied to temporal sequences aligns the pair series in a non-linear way to minimize differences and calculate Euclidean distance afterwards. The DTW measure could be applied directly in unsupervised k-means clustering or similarly to the method described in our paper in the semi-supervised fashion We use k-means clustering with pairwise DTW metrics calculated on the original TS (before encoding), with the labels being assigned based on belonging to one of two clusters. The second method assigns the test TS labels are by the DTW metric directly, calculated pairwise between each training and test TS (without encoding). We use these two methods to calculate classification rates for all data sources in the experiments (Table 2).

Table 2 The clustering accuracy measured on two classes and weighted average reported for QUBO-based and classical DTW-based methods

6 Results

As expected, the semi-supervised QUBO-based method outperforms classical unsupervised methods. We note however, that the QUBO-based method operates on a reduced dimensionality in contrast to the classical methods which use the original TS, where full information is preserved. Even under this consideration the accuracy of QUBO-based method is comparable with the semi-supervised DTW methods, and could be improved still by enriching the set V, i.e. by augmenting the training set or increasing the discretization granularity.

The worst performance of the QUBO-based algorithm is observed on the TwoLeadECG data set. This could be explained by the nature of our method, as well as the sensitivity of the ECG data. By using the set cover problem, we allow for permutations of subsets of TS data in the reconstruction of the test TS. It is likely that this permutation of TS segments, and similar representation in Fourier space of the signals from the two leads in the ECG measurements, makes our method not suitable for this kind of data. To confirm this is the case, and improve the classification accuracy, we applied SAX (Senin et al. 2018) encoding, based on sliding window time series magnitude, rather than the Fourier transform. Using this method, and encoding the TS as a word of length 5 constructed from a 5 letter alphabet, the accuracy is improved to 0.62 and 0.85 for the two respective classes (0.74 weighted average). In contrast to the Fourier encoding, the better results with ECG200 data are due to the significant differences between the classes of normal and ischemia ECG readings.

The highest accuracy is obtained using the BeetleFly and Chinatown data sets. In the first case, many permutations of the training set to construct the test set are permissible, which our method takes advantage of. The accuracy of our method is additionally improved by the relative size of the training set, further augmenting the combinatorial space of permutations. This robustness can also be explained by the dimensionality reduction technique for this data set: the 2D BeetleFly images (with different orientations) were mapped to 1D series of distances to the image centre, which again is beneficial for permutation-based methods. The Chinatown data set, for comparison, contained significantly shorter TS than BeetleFly. Encoding the Chinatown TS data with the same word length as BeetleFly resulted in higher granularity representations, and ultimately higher accuracy. This provides additional evidence that the accuracy of our method can be improved by increasing the granularity of the encoding.

7 Conclusions

We present a QUBO-based method for TS reconstruction and semi-supervised classification that reaches accuracy scores comparable with classical DTW pairwise approaches, and in most cases outperforms unsupervised clustering. Among the advantages of our method is the utilization of significantly less data with respect to conventional classical methods, as well as a one-versus-all comparison that allows the selection of segments of data from multiple sources to reconstruct a single TS. This provides an additional robustness in our method in permutations of TS segments during the reconstruction. We showed how to reformulate the task of TS reconstruction as the set cover problem with a minimal number of subsets. In order to formulate this problem as a QUBO we apply TS dimensionality reduction by encoding each time series as a separate string. This encoding procedure and selection of comparison metrics (as discussed in Section 4) define the hyperparameter space of the problem. The QUBO-based classification method performed the best on image and traffic data, which is consistent with our method’s inherit ability to utilize permutations of features/data within the TS to perform reconstruction.

Time series reconstruction and classification has a wide variety of useful applications, such as: management of energy systems, factory process control, sensor systems, and many more. The methods introduced in this paper show how to reformulate the tasks of reconstruction and classification of such data using quantum computing. The fact that our work uses small training sets of labeled data means that the QUBOs produced could be solved by next-generation NISQ devices. Using quantum technologies, this method could analyze significantly more complex TS data, even in a live setting. The results of the optimization process (the selected subsets used for the reconstruction) would be informative as feedback for live process optimization as well. Future work in this are will be focused on generalising the method to multivariate TS cases, finding application-ready data sets, and execution of the presented methods on quantum processors. Specifically, with the advancement of hybrid quantum-classical algorithms, we will focus on converting the methods presented in this paper to be suitable for commercial applications.