1 Introduction

The field of quantum computing has seen large leaps in building usable quantum hardware during the past decade. As one of the first vendors, D-Wave provided access to a quantum device that can solve specific types of optimization problems Johnson et al. (2011). Motivated by this, quantum computing has not only received much attention in the research community, but was also started to be perceived as a valuable technology in industry. Volkswagen published a pioneering result on using the D-Wave quantum annealer to optimize traffic flow in 2017 Neukart et al. (2017), which prompted a number of works by other automotive companies Mehta et al. (2019); Ohzeki et al. (2019); Yarkoni et al. (2021). Since then, quantum annealing has been applied in a number of industry-related problems like chemistry Streif et al. (2019); Xia et al. (2017), aviation Stollenwerk et al. (2019), logistics Feld et al. (2019), and finance Grant et al. (2021). Aside from quantum annealing, gate-based quantum devices have gained increased popularity, not least after the first demonstration of a quantum device outperforming its classical counterparts Arute et al. (2019). A number of industry-motivated works have since been published in the three main application areas that are currently of interest for gate-based quantum computing: optimization Streif et al. (2021); Streif and Leib (2020); Amaro et al. (2022); Dalyac et al. (2021); Luckow et al. (2021), quantum chemistry and simulation Arute et al. ( 2020); Malone et al. (2022), and machine learning Melnikov et al. (2023); Rudolph et al. (2020); Skolik et al. (2021, 2022); Peters et al. (2021); Alcazar et al. (2020); Perelshtein et al. (2022); Sagingalieva et al. (2022); Kordzanganeh et al. (2022). Research in the industrial context has been largely motivated by noisy intermediate-scale quantum (NISQ) devices Kordzanganeh et al. (2022)—early quantum devices with a small number of qubits and no error correction. In this regime, variational quantum algorithms (VQAs) have been identified as the most promising candidate for near-term advantage due to their robustness to noise Cerezo et al. (2021). In a VQA, a parametrized quantum circuit (PQC) is optimized by a classical outer loop to solve a specific task like finding the ground state of a given Hamiltonian or classifying data based on given input features. As qubit numbers are expected to stay relatively low within the next years, hybrid alternatives to models realized purely by PQCs have been explored Zhang et al. (2021); Mari et al. (2020); Zhao and Gao (2019); Dou et al. (2021); Sebastianelli et al. (2021); Pramanik et al. (2021); Perelshtein et al. (2022); Rainjonneau et al. (2023); Sagingalieva et al. (2022). In these works, a quantum model is combined with a classical model and optimized end-to-end to solve a specific task. In the context of machine learning, this means that a PQC and neural network (NN) are trained together as one model, where the NN can be placed either before or after the PQC in the chain of execution. When the NN comes first, it can act as a dimensionality reduction technique for the quantum model, which can then be implemented with relatively few qubits.

In this work, we use a hybrid quantum ResNet model to perform image classification on a subset of the Stanford Cars dataset Krause et al. (2013). Image classification is an ubiquitous problem in the automotive industry and can be used for tasks like sorting out parts with defects. Supervised learning algorithms for classification have also been extensively studied in quantum literature Havlíček et al. (2019); Schuld and Killoran (2019); Schuld et al. (2020); Rebentrost et al. (2014), and it has been proven that there exist specific learning tasks based on the discrete logarithm problem where a separation between quantum and classical learners exists for classification Liu et al. (2021). While the separation in Liu et al. (2021) is based on Shor’s algorithm and therefore not expected to transfer to realistic learning tasks as the car classification mentioned above, it motivates further experimental study of quantum-enhanced models for classification on real-world datasets.

In combining PQCs and classical NNs into hybrid quantum-classical models, we encounter a challenge in searching hyperparameter configurations that produce performance gains in terms of model accuracy and training. Hyperparameters can be considered values that are set for the model and do not change during the training regime and may include variables such as learning rate, decay rates, choice of optimizer for the model, number of qubits, or layer sizes. Often in practice, these parameters are selected by experts based upon some a priori knowledge and trial-and-error. This limits the search space, but in turn can lead to producing a suboptimal model configuration.

Hyperparameter optimization is the process of automating the search for the best set of hyperparameters, reducing the need for expert knowledge in hyperparameter configurations for models, with an increase in computation required to evaluate configurations of models in search of an optimum. In the 1990s, researchers reported performance gains leveraging a wrapper method, which tuned parameters for specific models and datasets using best-first search and cross validation Kohavi et al. (1995). In more recent years, researchers have proposed search algorithms using bandits Li et al. (2017), which leverage early stopping methods. Successive halving algorithms such as the one introduced in Karnin et al. (2013) and the parallelized version introduced in Li et al. (2018) allocate more resources to more promising configurations. Sequential model-based optimization leverages Bayesian optimization with an aggressive dual racing mechanism and also has shown performance improvements for hyperparameter optimization Hutter et al. (2011); Lindauer and Hutter (2018). Evolutionary and population-based heuristics for black-box optimization have also achieved state-of-the-art results when applied to hyperparameter optimization in numerous competitions for black-box optimization Vermetten et al. (2020); Bäck (1996); Awad et al. (2020). In recent years, a whole field has formed around automating the process of finding optimal hyperparameters for machine learning models, with some prime examples being neural architecture search Elsken et al. (2019) and automated machine learning (AutoML) Hutter et al. (2019). Automating the search of hyperparameters in a quantum machine learning (QML) context has also started to attract attention, and the authors of Gómez et al. (2022) have explored the first version of AutoQML.

Our contribution in this paper is not only to examine the performance gains of hybrid quantum ResNet models vs. purely classical, but also to investigate whether quantum-enhanced or quantum-inspired methods may offer an advantage in automating the search over the configuration space of the models. We show a reduction in computational complexity in regard to expected run times and evaluations for various configurations of models, the high cost of which motivate this investigation. We investigate using the tensor train decomposition for searching the hyperparameter space of the hybrid quantum neural network (HQNN) framed as a global optimization problem as in Zheltkov and Osinsky (2020). This method has been successful in optimizing models of social networks in Kabanikhin et al. (2019) and as a method of compression for deep neural networks Wang et al. (2021).

Fig. 1
figure 1

The hyperparameter optimization problem description (a). The tabular methods for hyperparameter optimization: the grid search algorithm (b) and the tensor train algorithm (c–d)

2 Results

2.1 Hyperparameter optimization

The problem of hyperparameter optimization (HPO) is described schematically in Fig. 1(a). Given a certain dataset and a machine learning (ML) model, the learning model demonstrates an accuracy \(A(\bar{h})\) which depends on the hyperparameters \(\bar{h}\). To achieve the best possible model accuracy, one has to optimize the hyperparameters. To perform the HPO, an unknown black-box function \(A(\bar{h})\) has to be explored. The exploration is an iterative process, where at each iteration the HPO algorithm provides a set of hyperparameters \(\bar{h}\) and receives the corresponding model accuracy \(A(\bar{h})\). As a result of this iterative process, the HPO algorithm outputs the best achieved performance \(A(\bar{h}_\textrm{opt})\) with the corresponding hyperparameters \(\bar{h}_\textrm{opt}\).

The HPO could be organized in different ways. One of the standard methods for HPO is a tabular method of grid search (GS), also known as a parameter sweep (Fig. 1(b)). To illustrate how a grid search works, we have chosen two hyperparameters: the learning rate (\(h_1\)) and the multiplicative factor of learning rate (\(h_2\)). They are plotted along the x-axis and the y-axis, respectively. The color on the contour shows the accuracy of the model \(A(h_1,h_2)\) with two given hyperparameters changing from light pink (the lowest accuracy) to dark green (the highest accuracy). In the GS method, the hyperparameter values are discretized, which results in a grid of values shown as big dots. The GS algorithm goes through all the values from this grid with the goal of finding the maximum accuracy. As one can see in this figure, there are only three points at which this method can find a high accuracy with 25 iterations (shown as 25 points in Fig. 1(b)). This example shows that there could be a better tabular HPO in terms of the best achievable accuracy and the number of iterations used.

2.2 Tensor train approach to hyperparameter optimization

Here, we propose a quantum-inspired approach to hyperparameter optimization based on the tensor train (TT) programming. The TT approach was initially introduced in the context of quantum many-body system analysis, e.g., for finding a ground state with minimal energy of multi-particle Hamiltonians via density matrix renormalization groups White (1992). In this approach, the ground state is represented in the TT format, often referred to as the matrix product state in physics Cirac et al. (2021). We employ the TT representation (shown in Fig. 1(c)) in another way here and use it for the hyperparameter optimization. As one can see in Fig. 1(c), the TT is represented as a multiplication of tensors, where an individual tensor is shown as a circle with the number of “legs” that corresponds to the rank of the tensor. \(h_1\) and \(h_d\) circles are the matrices of \(n\times r\) dimension, and \(\{h_i\}_{i={2}}^{i={d-1}}\) is a rank 3 tensor of dimensions \(n \times r^2\). The two arrows in the Fig. 1(c) illustrate sweeps right and left along with the TT. This refers to the algorithm described below. Leveraging the locality of the problem, i.e., a small correlation between hyperparameters, we perform the black-box optimization based on the cross-approximation technique applied for tensors Oseledets and Tyrtyshnikov (2010); Zheltkov and Tyrtyshnikov (2020).

Similar to the previously discussed GS method, we discretize the hyperparameter space with TT optimization (TetraOpt) and then consider a tensor composed of scores that can be estimated by running an ML model with a corresponding set of hyperparameters. However, compared to GS, the TT method is dynamic, which means that the next set of evaluating points in the hyperparameter space is chosen based on the knowledge accumulated during all previous evaluations. With TetraOpt, we will not estimate all the scores \(A(\bar{h})\) available to the model. Instead of this, we will approximate \(A(\bar{h})\) via TT, referring to a limited number of tensor elements using the cross-approximation method Oseledets and Tyrtyshnikov (2010). During the process, new sets of hyperparameters for which the model needs to be evaluated are determined using the MaxVol routine Goreinov and Oseledets (2010). The MaxVol routine is an algorithm that finds an \(r \times r\) submatrix of maximum volume, i.e., a square matrix with a maximum determinant module in an \(n \times r\) matrix.

Hyperparameters are changed in an iterative process, in which one is likely to find a better accuracy \(A(\bar{h})\) after each iteration and thus find a good set of hyperparameters. Notably, the TetraOpt algorithm requires an estimate of \(\mathcal {O}(d n r^2)\) elements and \(\mathcal {O}(d n r^3)\) of calculations, where d is the number of hyperparameters, n is a number of discretization points, and r is a fixed rank. If one compares it with the GS algorithm, which requires estimation of \(\mathcal {O}(n^d)\) elements, one is expected to observe practical advantages, especially with a large number of hyperparameters.

Algorithm 1
figure a

Tensor train optimization.

The TetraOpt algorithm for the HPO is presented as the Algorithm 1 pseudocode that also corresponds to Fig. 1(d). The TetraOpt algorithm can be described with 9 steps:

  1. 1.

    Suppose each of d hyperparameters is defined on some interval \(h_i \in [h_i^\textrm{min}, h_i^\textrm{max}]\), where \(i \in [1, d]\). One first discretizes each of d hyperparameters by defining n points

    $$\begin{aligned} {\{ h_i(1), h_i(2), \ldots , h_i(n)\}}_{i=1}^{i=d}. \end{aligned}$$
  2. 2.

    Then, we need to choose the rank r. This choice is a trade-off between computational time and accuracy, which respectively require a small and a large rank.

  3. 3.

    r combinations of

    $$\begin{aligned} {\{h_2^1(j), h_3^1(j), \ldots , h_d^1(j)\}}_{j=1}^{j=r} \end{aligned}$$

    are chosen.

  4. 4.

    In the next three steps, we implement an iterative process called the “sweep right.” The first step of this iterative process is related to the first TT core evaluation:

    • The accuracy of nr elements is estimated with all n values of the first hyperparameter \({\{h_1(i_1)\}}_{i_1=1}^{i_1=n}\) and for the r combinations of \({\{h_2^{1}(j), h_3^{1}(j), \ldots , h_d^{1}(j)\}}_{j=1}^{j=r}\):

      $$\begin{aligned} {\begin{matrix} {\{A(h_1(i_1), h_2^1(j), h_3^1(j), \ldots ,} \\ {h_d^1(j))\}}_{j=1, i_1=1}^{j=r, i_1=n}. \end{matrix}} \end{aligned}$$
    • In this matrix of size \(n \times r\), we search for a submatrix with maximum determinant module:

      $$\begin{aligned} \quad \quad {\{A(h_1^1(i_1), h_2^1(j), h_3^1(j), h_d^1(j))\}}_{j=1, i_1=1}^{j=r, i_1=r}. \end{aligned}$$

      The corresponding r values of the first hyperparameter are fixed \(\{h_1^1(i_1)\}_{i_1=1}^{i_1=r}\).

  5. 5.

    The next step of this iterative process is related to the second TT core evaluation:

    • We fix r values \(\{h_1^1(i_1)\}_{i_1=1}^{i_1=r}\) of the previous step as well as r combinations \({\{h_3^1(j), h_4^1(j), \ldots , h_d^1(j)\}}_{j=1}^{j=r}\) of the third step. We, then, estimate the accuracy of the \(nr^2\) elements with all n values of the second hyperparameter \({\{h_2(i_2)\}}_{i_2=1}^{i_2=n}\):

      $$\begin{aligned} {\begin{matrix} {\{A(h_1^1(i_1), h_2(i_2), h_3^1(j), \ldots ,} \\ {h_d^1(j))\}}_{j=1, i_1=1, i_2=1}^{j=r, i_1=r, i_2=n}` \end{matrix}} \end{aligned}$$
    • Again, in this matrix of size \(nr \times r\), we search for a submatrix with the maximum determinant module:

      $$\begin{aligned} {\begin{matrix} {\{A((h_1^2(k), h_2^2(k)), h_3^1(j), \ldots ,} \\ {h_d^1(j))\}}_{j=1, k=1}^{j=r, k=r} \end{matrix}} \end{aligned}$$

      r combinations \({\{(h_1^2(k), h_2^2(k))\}}_{k=1}^{k=r}\) of the first and the second hyperparameters are fixed.

  6. 6.

    The \(d-1\) TT core evaluation:

    • We fix r combinations \(\{(h_1^{d-2}(k), h_2^{d-2}(k), \ldots , h_{d-2}^{d-2} \) \( (k))\}_{k=1}^{k=r}\) of the \(d-2\) TT core as well as r combinations \({\{h_d^1(j)\}}_{j=1}^{j=r}\) of the third step. We, then, estimate the accuracy of the \(nr^2\) elements with all n values of the \({\{h_{d-1}(i_d)\}}_{i_d=1}^{i_d=n}\):

      $$\begin{aligned} {\begin{matrix} {\{A((h_1^{d-2}(k), \ldots , h_{d-2}^{d-2}(k)), }\\ {h_{d-1}(i_{d-1}), h_d^1(j))\}}_{k=1,i_{d-1}=1, j=1}^{k=r, i_{d-1}=n, j=r} \end{matrix}} \end{aligned}$$
    • Again, in this matrix of size \(nr \times r\), we search for a submatrix with the maximum determinant module:

      $$\begin{aligned} {\begin{matrix} {\{A((h_1^{d-1}(k), h_2^{d-1}(k), \ldots ,} \\ {h_{d-1}^{d-1}(k)), h_d^1(j))\}}_{k=1, j=1}^{k=r, j=r} \end{matrix}} \end{aligned}$$

      r combinations of \(\{(h_1^{d-1}(k), h_2^{d-1}(k), \ldots , h_{d-1}^{d-1} \) \( (k)\}_{k=1}^{k=r}\) hyperparameters are fixed.

    The end of one “sweep right” is reached.

  7. 7.

    Similar to step 3, we have r combinations of hyperparameters, but they are not random anymore. We next perform for a similar procedure in the reverse direction (from the last hyperparameter to the first). The process is called the “sweep left.” One first changes the index order:

    $$\begin{aligned} \quad {\{(h_1^{d-1}(k), h_2^{d-1}(k), \ldots , h_{d-1}^{d-1}(k)\}}_{k=1}^{k=r} \Longrightarrow \textrm{relabel}\nonumber \end{aligned}$$
    $$\begin{aligned} {\{(h_{d-1}^{d-1}(k), h_{d-2}^{d-1}(k), \ldots , h_{2}^{d-1}(k)\}}_{j=1}^{j=r} \end{aligned}$$

    And then, continues from the fourth step of the TetraOpt algorithm.

  8. 8.

    A combination of the “sweep right” and the “sweep left” is a full sweep. We do \(n_\textrm{swp}\) full sweeps in this algorithm.

  9. 9.

    During all the iterations, we record it if we estimate a new maximum score. An expected runtime comparison of this method against grid search for increasing problem dimensionality is shown in Fig. 2.

Fig. 2
figure 2

Tensor train (TT) and grid search (GS): expected runtime in maximum objective function evaluations vs. growth of problem dimension d

2.3 Benchmarking HPO methods

In order to ascertain the solution quality in our proposed method for hyperparameter optimization, we tested over three black-box objective functions. These functions included the Schwefel, Fletcher-Powell, and Vincent functions from the optproblems Python library optproblems (2022). We ran 100 randomly initialized trails and recorded average fitness and maximum number of function evaluations in response to the change in the problem size d for each objective function. We compared grid search (GS) and tensor train (TT)—both tabular methods (Table 1) for hyperparameter optimization. For tensor train and grid search, we partitioned the hyperparameter ranges with 4 discrete points per hyperparameter. For tensor train, we set the rank parameter \(r=2\).

Table 1 Table of results comparing HPO methods for Schwefel, Fletcher-Powell, and Vincent objective functions
Fig. 3
figure 3

Classical (a) and hybrid quantum (b) ResNet architectures

2.4 Car classification with hybrid quantum neural networks

Computer vision and classification systems are ubiquitous within the mobility and automotive industries. In this article, we investigate the car classification problem using the car dataset Krause et al. (2013) provided by Stanford CS Department. Examples of cars in the data set are shown in Fig. 3. The Stanford Cars data set contains 16,185 images of 196 classes of cars. The data is split into 8144 training images and 8041 testing images. The classes are typically at the combination of make, model, year, e.g., Volkswagen Golf Hatchback 1991 or Volkswagen Beetle Hatchback 2012. Since the images in this data set have different sizes, we resized all images to 400 by 400 pixels. In addition, we apply random rotations by maximum \(15^\circ \), random horizontal flips, and normalization to the training data. For testing data, only normalization has been applied.

Table 2 The table shows which hyperparameters are being optimized, their labels, limits of change, and the best values found during HPO

We use transfer learning to solve the car classification problem. Transfer learning is a powerful method for training neural networks in which experience in solving one problem helps in solving another problem Neyshabur et al. (2020). In our case, the ResNet (residual neural network) He et al. (2015) is pretrained on the ImageNet dataset Imagenet dataset (2022) and is used as a base model. One can fix the weights of the base model, but if the base model is not flexible enough, one can “unfreeze” certain layers and make it trainable. Training deep networks is challenging due to the vanishing gradient problem, but ResNet solves this problem with so-called residual blocks: inputs are passed to the next layer in the residual block. In this way, deeper layers can see information about the input data. ResNet has established itself as a robust network architecture for solving image classification problems. We dowloaded ResNet34 via PyTorch PyTorch (2022), where the number after the model name, 34, indicates the number of layers in the network.

As shown in the Fig. 3(a), in the classical network after ResNet34, we add three fully connected layers. Each output neuron corresponds to a particular class of the classification problem, e.g., Volkswagen Golf Hatchback 1991 or Volkswagen Beetle Hatchback 2012. The output neuron with the largest value determines the output class. Since the output from the ResNet34 is composed of 512 features, the first fully connected layer consists of 512 input neurons and a bias neuron and n output features. The second fully connected layer connects n input neurons and a bias neuron with nq output features. The value of n and q can vary, thus changing the number of weights in the classical network. Since the network classifies k classes in the general case, the third fully connected layer takes nq neurons and a bias neuron as input and feeds k neurons as output.

In the hybrid analog as shown in Fig. 3(b), we replace the second fully connected layer with a quantum one. It is worth noting that the number of qubits used for the efficient operation of the model is initially unknown. The position of this layer was chosen to be between two classical layers that can appropriately pre-process the outputs of the ResNet (the first classical layer) and to post-process the quantum outputs (the final classical layer). In the quantum layer, the Hadamard transform is applied to each qubit, then the input data is encoded into the angles of rotation along the y-axis. The variational layer consists of the application of the CNOT gate and rotation along x, y, and z-axes. The number of variational layers can vary. Accordingly, the number of weights in the hybrid network can also change. The measurement is made in the X-basis. For each qubit, the local expectation value of the X operator is measured. This produces a classical output vector, suitable for additional post-processing. Since the optimal number of variational layers (q, depth of quantum circuit) and the optimal number of qubits n are not known in advance, we choose these values as hyperparameters. A thorough analysis of the quantum circuit for \(n=2\) is given in the Appendix, where three approaches are employed to measure the efficiency, trainability, and the expressivity of this quantum model.

We use the cross-entropy as a loss function

$$\begin{aligned} l = -\sum _{c=1}^{k}{y_c \log p_c} \end{aligned}$$

where \(p_c\) is the prediction probability, \(y_c\) is 0 or 1, determining respectively if the image belongs to the prediction class, and k is the number of classes. We use the Adam optimizer Adam optimizer (2022); Kingma and Ba (2014) and reduce the learning rate after several epochs. Note that in the simulation of the HQNN we assumed a precise (infinite-shots) and noise-free simulator, as investigating the effects of these sources of noise fell outside the scope this work. There is no one-size-fits-all rule of how to choose a learning rate. Moreover, in most cases, dynamic control of the learning rate of a neural network can significantly improve the efficiency of the backpropagation algorithm. For these reasons, we choose the initial learning rate, the period of learning rate decay, and the multiplicative factor of the learning rate decay as hyperparameters. In total, together with number of variational layers and number of qubits, we optimize five hyperparameters presented in Table 2 to improve the accuracy of solving the problem of car classification.

Fig. 4
figure 4

(a) Dependence of accuracy on the number of iterations HPO. TetraOpt for the hybrid model found a set of hyperparameters that gives an accuracy of 0.852 after 6 iterations, 0.977 after 18 iterations, for the classical model found 0.977 after 6 iterations. Grid search for the hybrid model found a set of hyperparameters that gives an accuracy of 0.989 after 75 iterations, for the classical model found 0.920 after 75 iterations. (b) Dependence of accuracy on the number of epochs with the found optimal set of hyperparameters

2.5 Simulation results

We next perform a simulation of the hybrid quantum ResNet described in the previous section. The simulation is compared to its classical analog, the residual neural network, in a test car classification task. Because of the limited number of qubits available and computational time constraints, we used a classification between two classes, Volkswagen Golf Hatchback 1991 and Volkswagen Beetle Hatchback 2012, to compare the classical and hybrid networks fairly. In total, we used 88 testing images and 89 training images. Both the HQNN model and the classical NN model were used together with the GS and TetraOpt methods for hyperparameter optimization. All machine learning simulations were carried out in the QMware cloud, on which the classical part was implemented with the PyTorch framework, and the quantum part was implemented with the <basiq> SDK QMware (2022); Perelshtein et al. (2022); Kordzanganeh et al. (2022). The results of the simulations are shown in Fig. 4.

Figure 4(a) shows the dependence of accuracy on the number of HPO iterations on the test data, where one iteration of HPO is one run of the model. Green color shows the dependence of accuracy on the number of iterations for the HQNN, and blue color shows for the classical NN. As one can see from Fig. 4(a), TetraOpt works more efficiently than GS and in fewer iterations finds hyperparameters that give an accuracy above 0.9. HQNN with TetraOpt (marked with green crosses) finds a set of hyperparameters that yields 97.7% accuracy over 18 iterations. As for the GS (marked solid green line), it took 44 iterations to pass the threshold of 98% accuracy.

TetraOpt finds in 6 iterations a set of hyperparameters for the classical NN, which gives an accuracy of 97.7%, which is the same as the accuracy given by the set of hyperparameters for the HQNN that found in 18 iterations. As for the GS, it is clear that the optimization for the HQNN works more efficiently than for the classical one. And the optimization of the HQNN requires fewer iterations to achieve higher accuracy compared to the optimization of the classical NN. A possible reason is that a quantum layer with a relatively large number of qubits and a greater depth works better than its classical counterpart.

Fig. 5
figure 5

Examples of test car images that were correctly classified by the hybrid quantum ResNet

The best values found during HPO are displayed in Table 2. The quantum circuit corresponding to the optimal set of hyperparameters has 52 variational parameters, leading to a total of 6749 weights in the HQNN. In the classical NN, there are 9730 weights. Therefore, there are significantly fewer weights in a HQNN compared to a classical NN. Nevertheless, as can be seen from the Fig. 4(b), the HQNN, with the hyperparameters found using the GS, reaches the highest overall accuracy (98.9%). Figure 5 shows examples of car images that were classified correctly by the HQNN model. The HQNN with an optimized set of hyperparameters achieved an accuracy of 0.989.

3 Discussion

We introduced two new ML developments to image recognition. First, we presented a quantum-inspired method of tensor train decomposition for choosing ML model hyperparameters. This decomposition enabled us to optimize hyperparameters similar to other tabular search methods, e.g., grid search, but required only \(\mathcal {O}(d n r^2)\) hyperparameter choices instead of \(\mathcal {O}(n^d)\) in the grid search method. We verified this method over various black-box functions and found that the tensor train method achieved comparable results in average fitness, with a reduced expected run time for most of the test functions compared to grid search. This indicates that this method may be useful for high dimensional hyperparameter searches for expensive black-box functions. Future work could investigate using this method in combination with local search heuristic, where the tensor train optimizer performs a sweep over a larger search space within a budget and seeds another optimization routine for a local search around this region. This method could also be applied to the B/n problem for successive halving algorithm by decomposing the search space to find the optimal ratio of budget B over configurations n. Future work could investigate these applications in more detail.

Fig. 6
figure 6

The only changes we could make to this circuit are fusing some constant spiders, which we will need to re-introduce later for circuit efficiency. Additionally, measurements are in the X-basis, so all variational parameters to the right of the last CNOT only contribute to the qubit that they are applied to. This is particularly evident in Fig. 10(a) and (e), where there is only one CNOT in the system. This allows us to assign a variable specific to each qubit which we can use to tune the output of each qubit independently

Second, we presented a hybrid quantum ResNet model for supervised learning. The hybrid model consisted of the combination of ResNet34 and a quantum circuit part, whose size and depth became the hyperparameters. The size and flexibility of the hybrid ML model allowed us to apply it to car image classification. The hybrid ML model with GS showed an accuracy of 0.989 after 75 iterations in our binary classification tests with images of Volkswagen Golf Hatchback 1991 and Volkswagen Beetle Hatchback 2012. This accuracy was better than of a comparable classical ML model with GS showed an accuracy of 0.920 after 75 iterations. In the same test, the hybrid ML model with TetraOpt showed an accuracy of 0.977 after 18 iterations, whereas the comparable classical ML model with TetraOpt showed the same accuracy of 0.977 after 6 iterations. Our developments provide new ways of using quantum and quantum-inspired methods in practical industry problems. In future research, exploring the sample complexity of the hybrid quantum model is of importance, in addition to generalization bounds of the quantum models similar to research in Caro et al. (2022). Future work could also entail investigating state-of-the-art improvements in hyperparameter optimization for classical and quantum-hybrid neural networks and other machine learning models by leveraging quantum-inspired or quantum-enhanced methods.