1 Introduction

“Nature isn’t classical, dammit, and if you want to make a simulation of nature, you’d better make it quantum mechanical”—Richard Feyman [1]. Natural Language Processing is one of Nature’s problems that could benefit from Quantum Computation. But how can semantic meaning be brought into a Quantum computer?

To answer this, consider how illiterate individuals comprehend meaning. How can people use language so diligently if they cannot read or write? Simply because a word obtain its meaning from its usage, from where it comes in a sentence, and what other words are associated with it. This is How can a Quantum Machine interpret natural language meaning and structure [2]. In 2016, an interesting technique was proposed by Zeng et al. [3] to interpret sentence meaning as quantum states and sentence structure from quantum measurements as illustrated in Fig. 1.

Fig. 1
figure 1

Interpreting word meaning as quantum states and grammatical structures as quantum measurements [3]

Quantum Natural Language Processing (QNLP) is a relatively new field of study that aims to design and implement NLP models that use quantum properties like superposition and entanglement to perform language-related tasks on quantum hardware. The scope of this paper focuses on the possible performance and prediction accuracy benefits of bringing NLP problems to Quantum platforms. Complexity analysis for QNLP techniques in search of quantum advantage cases is an active area of research to verify this potential [3,4,5,6]. As the Quantum hardware scale-up continues in the next few years [7], experimental comparison with cutting-edge classical language models such as GPT-3 [8], GPT\(-\)3.5 [9], and BERT [10] could be feasible. In this work, an experimental comparison between classical tensor network-based NLP and QNLP models is given while changing the following factors:

  • NLP problem size

  • training quantum parametrized circuit

  • backend simulator noise model

In the implementation, we depend on the lambeq library pipeline as described by Kartsaklis et al. [11] and illustrated in Fig. 2 to build our model. We describe each step of the pipeline from input dataset reading and parsing to the model training and evaluation.

Fig. 2
figure 2

QNLP experiment steps

This article is organized as follows: Sect. 2 gives an introduction to the notion of Quantum Neural Network (QNN), Sect. 3 gives an overview of recent QNLP experiments and related QNN evaluation experiments, Sect. 4 explains the QNLP experiment stages, the sentence to diagram conversion is explained in Sect. 5, Sect. 6 covers the diagram to circuit conversion, Sects. 7 and 8 cover the experimental settings and the experimental results, respectively, and then, Sect. 9 discusses the results with a comparison with related work. Section 10 concludes the article.

2 Quantum neural network

Quantum Neural Networks (QNN) [12] are an example of variational algorithms [13]. By analogy with classical machine learning, we can employ quantum circuits instead of classical neural networks to learn patterns from data in quantum machine learning. The term “neural" refers to the fact that parameters \(\theta _i\) are adjusted in each iteration to minimize the model cost function, much like weights in a classical neural network. Abbas et al. [14] conducted a complete comparison of Classical and Quantum Neural Networks in terms of effective dimension and training ability. But how can a string diagram be converted into a parametrized quantum circuit? The short answer is by using the DisCoPy library [15]. Section 6 covers more details about this conversion.

2.1 Computational advantage

Motivated by the big similarity between Compositional Quantum Mechanics and Quantum theory, some QNLP experts claim that NLP problems are “Quantum native” and they expect a large-scale quantum computational advantage when more powerful quantum hardware is available in the near future [16]. As both quantum theory and natural language use vector space to describe states which are more efficient on quantum hardware, QNLP models can achieve up to quadratic speedup over classical direct calculation methods. Some recent research works have already experimentally proved a computational advantage over some NLP classical methods on currently available quantum devices [3]. Our experiments extend those early trials to prove the prospective potential of QNLP methods over some types of classical ones in terms of enhanced prediction accuracy values.

2.2 String diagram

The Distributional Compositional Categorical (DisCoCat) model of language meaning [17] is a mathematical framework that describes the meaning of a phrase as a combination of the meanings of its constituent words and the grammatical relationships between these words as shown in Fig. 3. In contrast, many classical NLP models approach sentences as “bags of words,” ignoring their grammatical structure.

Fig. 3
figure 3

DisCoCat package

DisCoCat includes a graphical representation, allowing each sentence to be represented by a string diagram. A string diagram is an abstract diagrammatic representation that reflects computations in a monoidal category, and this way of modeling greatly matches how a quantum computer works and processes data. This match eventually translates into up to quadratic speedup over the classical direct calculation methods [3, 18] Given their resemblance to quantum circuits and independence from any low-level hardware-dependent design decisions, string diagrams are the default method of encoding sentences in lambeq and DisCoCat. They imitate enriched tensor networks. More details about sentence to diagram conversion are covered in Sect. 5.

3 Related work

The availability of Noisy Intermediate Scale Quantum (NISQ) devices has already enabled researchers to run simple QNLP experiments on quantum platforms [19, 20]. Despite the NISQ devices’ limitation, those early experiments are critical to better understanding the process, technicalities, and unique nature of this new computational paradigm [21]. Lorenz et al. [19] and Meichantez et al. [20] experimented with QNLP classification models on both Quantum Hardware and classical noiseless simulation. Karamlou et al. [22] implemented a hybrid quantum-classical sentence generator based on the procedural generation method. Additionally, Kartsaklis et al. [11] gave a detailed overview of the lambeq package with experimental results of different training models. Khatri et al. [23] evaluated different ansatzes designs when used as a QNLP classifier, he also evaluated his model against barren plateau symptoms [24]. As an attempt to compare how the grammatical structure is unique for every language, Abbas-Zadeh et al. [25] compared two parametrized quantum circuits corresponding to two synonym sentences in two different languages, English and Persian. Waseem et al. [26] analyzed the DisCoCirc [27, 28] text circuits developed for two distinct languages and concluded that the grammatical structure variations between English and Urdu vanish when transformed into text circuits since they have the same meaning.

The scope of QNLP experiments is tangible with many currently active research work. In the scope of evaluating Quantum Neural Network (QNN) ansatz circuits in classification experiments, the authors in [29] evaluated ninteen different ansatze designs when embedded as a binary variational quantum classifier. Hubregtsen et al. [30] empirically proved the correlation between variational circuits’ expressibility values and the classification accuracy values. A hybrid quantum-classical binary classifier was created by Arthur et al. [31] and tested on both simulated and real quantum hardware. ElMahalawy et al. [32] compared the performance of a QNN model with other ML methods to evaluate the optoelectronic performance of NTCDA/p-Si UV photodiode.

4 QNLP stages

Starting with input sentences, the QNLP model must convert the input dataset into quantum states while preserving semantic meaning. As shown in Fig. 2, a typical QNLP experiment can be divided into the following sub-modules:

  • Sentence to Diagram Conversion Following the reading of the dataset, each sentence is converted into its corresponding diagram DisCoCat diagram. Each word is assigned an atomic type based on where it occurs in the sentence. An atomic type can be a noun, a verb, an adjective,etc.

  • Diagram Optimization Due to the limitations of Quantum Hardware, optimizing the diagram is critical to ensure the QNLP model’s performance. Using rewrite rules, a diagram can be transformed with the aim to minimize the resulting circuit width, i.e., the number of used qubits.

  • Diagram to Circuit Conversion To create a sentence circuit from a diagram, each wire represents a qubit system, and each box represents a variational quantum circuit. Lambeq [11] provides an interface for creating quantum ansatze circuits as well as classical tensor networks. In this study, the experimental results of quantum ansatz circuits are compared to classical tensor networks.

  • Model Building The training model under study is determined by the backend to be tested. For the noiseless quantum simulation experiments, a NumPyModel from the lambeq package is used for training. On the other hand, a PyTorch model is trained in the classical experiment. A TketModel is used for the rest of the experiments on the noisy Qiskit Aer simulation [33].

  • Model Training/Validation The accuracy and loss values for both the training and validation datasets are examined for a sampling of the training epochs.

5 Sentence to diagram conversion

At the beginning, a pregroup of grammatical derivation of the sentences is performed. This grammatical derivation is then depicted as a monoidal diagram. The main approach consists of considering the diagram as a process. This approach is inspired by functional programming. Diagrams are read from top to bottom and are made up of basic processes depicted as boxes with input and output wires carrying types. In Fig. 4, one can see that the atomic noun type n is assigned to the nouns “Alice" and “Bob." However, nouns alone do not constitute a sentence. To form a complete sentence, one must combine a noun with at least one verb.

Fig. 4
figure 4

Monoidal diagram example

The transitive verb “likes" is used in the preceding example. To form a sentence, a transitive verb requires a noun on both its right and left sides. In other words, a transitive verb is algebraically assigned to the pregroup type \(n^\textrm{r} \cdot s \cdot n^\textrm{l}.\) The superscripts \(\cdot ^\textrm{r}\) and \(\cdot ^\textrm{l}\) denote right and left adjoints, respectively. A right adjoint cancels out a noun on its left side, i.e., \(n \cdot n^\textrm{r} \xrightarrow {} 1\), and a left adjoint cancels out a noun on its right side: \(n^\textrm{l} \cdot n \xrightarrow {} 1\). Hence, the complete type reduction for the sentence “Alice likes Bob” looks like this:

$$\begin{aligned} n \cdot n^\textrm{r} \cdot s \cdot n^\textrm{l} \cdot n \xrightarrow {} s \cdot n^\textrm{l} \cdot n \xrightarrow {} s \end{aligned}$$
(1)

The type reduction yields the atomic sentence type s, which indicates a correct grammatical structure. And the “cups” (\(\bigcup \)) denote the grammar reductions. This was the compositional part. To give this diagram distributional meaning, simply reinterpret it as a quantum process, where word states are pure quantum states and verbs are “Bell" maximally entangled effects. Each type is assigned a vector space dimension or, likewise, a number of qubits. These are regarded as model-specific hyperparameters. Consider a set of classical parameters for each word that describes the state preparation in terms of quantum word embeddings. These parameters could be ansatz circuit angles (phases) defined on a number of qubits, which are specified by the word types. In other words, both the compositional and distributional parts of the setup are specified. The final step is to run this quantum process by preparing the quantum word states and then, applying the “grammar-aware" Bell effects. Thus, in a Hilbert space of dimension \(2^{q_s}\), one has created a quantum state that encodes the meaning of the entire sentence.

5.1 Diagram optimization

Removing as many cups as possible has a computational advantage in terms of decreasing the width of the corresponding circuit [19], since each cup leads to 2\(q_n\) (or 2\(q_s\)) qubits that need to be post-selected. As shown in Fig. 5, 6 out of 7 qubits would have had to be post-selected in the unreduced diagram, rather than 4 out of 5 in the reduced one. With slightly longer sentences and a smaller number of shots on the active quantum hardware, there are severe statistical limitations on the ability to accurately estimate the desired circuit outcome.

Fig. 5
figure 5

Diagram optimization using cups removing

6 Diagram to circuit conversion

The abstract DisCoCat diagram [17] representation is given a more concrete form in this step: the DisCoCat diagram is mapped to a specific quantum circuit. This map is determined by: (a) selecting the number \(q_n\) and \(q_s\) of qubits to which each wire of type n and s, as well as dual types thereof, is mapped; and (b) selecting concrete parametrized quantum states (effects) with which all word states (effects) are consistently replaced. The combination of such options is known as an ansatz. Considering that each cup is equivalent to a Bell-effect. An example is shown in Fig. 6.

Fig. 6
figure 6

Diagram to circuit example

It is important to note that the mapping output is a circuit whose connectivity is determined by the syntax of the sentence, whereas the choice of ansatz determines the number of parameters for each word’s representation. When choosing the ansatz design, one shall mind the parameterization size problem for two reasons: First, p is exponential in q. So, the parameters size should be feasible with the sentence length and the size of the dataset. Second, each quantum device might have a different set of “native" quantum gates with various decoherence error values. Those factors shall be considered while choosing the suitable parametrized quantum circuit (ansatz).

7 Experimental settings

This section gives some highlight on the QNLP experiments details. The model was built and tested using the lambeq package. As illustrated in Fig. 7, the lambeq integrates the DisCoCat package and the AI model used for training. The Quantum trainer is implemented and run on the TKet platform. The noise effect is introduced through an IBM Qiskit Aer Simulator.

Fig. 7
figure 7

Experimental environment

7.1 Input sentence parsing

A 100 pairs of sentencesFootnote 1 are fed into the model for training and validation. Each sentence belongs to one of two topics “food" or “IT." Each pair is associated with a label that indicates whether the pair of sentences is matched or unmatched, i.e., belong to the same topic or not as shown in Fig. 8. The dataset is clean and balanced. It is divided into train and test sets in a 4:1 ratio.

Fig. 8
figure 8

The training/validation dataset

The language diagrams for the classical-quantum trainer comparison experiment were produced using the BobcatParser. For the remaining experiments, rewrite rules were combined with “spiders_reader" to reduce the resulting circuit depth and improve experiment performance. Each diagram will then be converted into a predefined ansatz circuit. Based on the model convergence for each experiment, the ansatz circuit effect is evaluated.

7.2 Quantum and classical trainer

Two different quantum models are tested on two different backends to study the noise effect on the classification experiments. A NymPyModel is used with JAX to run the noiseless simulation. A TKetModel is used with Qiskit [33] Aer Simulator with “ibmq_manilla" noise model. In the classical experiment, the first step in creating a tensor network is to translate the pairs of sentences into string diagrams using the sentences2diagrams module. The classic pipeline uses SpiderAnsatz, which has two dimensions for the noun and phrase spaces. Our backend, PyTorch, uses the Adam optimizer and traditional binary cross to finish the training on Google’s tensor network. We will use cross entropy as the loss function. Plots for accuracy and loss on the training and development sets are shown in Fig. 9.

7.3 Ansatz circuits

In the contest of Quantum Machine Learning, an ansatz is a subroutine in variational circuits that consists of a sequence of gates applied to specific wires. Similar to the structure of a neural network, this only defines the base structure, while the types of gates and/or their free parameters can be optimized by the variational procedure. Numerous variational circuit ansatze have been proposed in recent related work [35,36,37]. The strength of an ansatz is determined by the desired use-case, and it is not always obvious what constitutes a good ansatz. However, Sim et. al [35] suggested nineteen ansatze designs used successfully in variational algorithms [12, 38]. The QNLP experiments are tested on four different ansatze designs: (1) Instantaneous Quantum Polynomial (IQP) Ansatz: the default ansatz used for training in the lambeq package [11], (2) Circuit14: a modified circuit of Sim’s suggested circuits [35] for QML experiments, (3) Circuit15: a modified design of the same study by Sim et al. [35], and (4) StronglyEntanglingAnsatz: as defined in the PennyLane package [39, 40].

8 Results

In this section, QNLP experimental results are presented with increasing complexity of the experimental setting. this study presents the effect of three variables on the QNLP model convergence: (1) NISQ device noise, (2) Problem size, and (3) the effect of rewrite rules.

8.1 No rewrite rules

A noiseless simulation for a minimal dimensional QNLP problem is presented without using any rewrite rules. Figure 9 demonstrates a 15% increase in both training and validation set classification accuracy values over the classical experiment results.

Fig. 9
figure 9

Classical versus quantum experimental results without rewrite rules

8.2 Noiseless simulation experiments

The simplest case (shown in Fig. 10a) is a noiseless simulation running to classify a QNLP experiment with a single dimension noun and sentence space on a single anstaz layer circuit. As the depth of the ansatz circuit increased, it becomes necessary to use rewrite rules to improve experiment performance.

Fig. 10
figure 10

Experimenting with different Ansatze designs with a gradually complex problem

8.3 Introducing noise to experiments

The Qiskit Aer noisy simulator is then used to investigate the effect of noise on QNLP classification accuracy. The QNLP problem dimension is also gradually increased in terms of noun dimension, as well as the number of layers in the training ansatz circuit as shown in Fig. 10b–d.

9 Discussion

By comparing the different circuits’ behaviors, one can notice that the QNLP model converges faster with minimal problem space (\(q_s\) = \(q_n\) = 1). By increasing the qnlp problem space and using a strongly entangled ansatz circuit, the StrongEntanglingAnsatz was the fastest to converge followed by Circuit15. The improvement in QNLP convergence is in line with the hypothesis that various quantum machine learning models benefit from quantum properties like superposition and entanglement. Additionally, the continuous decrease in train and test loss values confirms that the model is not overfitting. By increasing the number of ansatz layers to two, the more basic ansatz designs were converging faster, namely Circuit14 and IQPAnsatz.

Prior to the training, it was necessary to optimize the diagrams in order to reduce the amount of runtime and resources needed for the experiments. Rewrite functions have been used equally throughout all tests, except the no rewrite rules experiment, to evaluate the ansatz design’s only impact. However, it is anticipated that the diagram optimization prior to the circuit formation improves the learning outcome and raises classification accuracy, particularly for grammatically complicated sentences. Rewrite rules and diagram optimization have greatly improved classification accuracy regardless of the training circuit characteristics. Classification accuracy improved over the classical trainer from 15 to 35% and 45% for training and validation sets, respectively.

The tested QNLP models showed resilience to noise when tested on a Qiskit Aer Simulator with a “ibmq_manila" noise model. Extending the meaning representation space by assigning more than three qubits to each AtomicType is one possible extension to the experiments.

9.1 Comparison with prior work

Our work differs from previous work in terms of dataset used for training, as well as introducing the noise effect on the classical simulation. Lorenz et al [19] used the IQP ansatz to train their model while changing the problem size and using a noiseless classical simulation. A single run-on quantum hardware is also performed, and the quantum device’s results also showed the model resilience to noise.

Khatri et al. [23] calculated the expressibility as defined as the Kullback–Leibler (KL) divergence between PCirc and PHaar for the DISCOcat circuits: IQP, Circuit14, and Circuit15 ansatze. For the ensemble of Haar random states, the analytical form of the probability density function of fidelities is known:

$$\begin{aligned} P_\textrm{Harr}(F) = (N - 1) (1 - F)^{N-1} \end{aligned}$$

where F corresponds to the fidelity, and N is the dimension of the Hilbert space [41]. After collecting sufficient samples of state fidelities, the Kullback–Leibler (KL) divergence [42] often used in machine learning applications, between the estimated fidelity distribution and that of the Haar distributed ensemble can be computed to quantify expressibility (“Expr”):

$$\begin{aligned} \hbox {Expr} = D_\textrm{KL}( ({\hat{P}}_\textrm{Circ}(F; \theta ) \Vert P_\textrm{Haar}(F)), \end{aligned}$$

The expressibility values for the mentioned ansatze designs were found to be low which corresponds to high expressibility when used as quantum classifier. That’s why the corresponding accuracy values were high [30, 35].

10 Conclusions

Quantum computing is helping to reduce the computational cost of many problems in the near future. For instance, a quantum computational advantage could be achieved over some classical NLP models using currently available quantum devices. In the upcoming years, experimental comparison with larger deep learning language models may be feasible as quantum technology continues to scale-up [22].

In this work, an increase in classification accuracy was achieved on noiseless and noisy classical simulation. The article goes through details of different QNLP experiment stages while explaining the uniqueness of a QNLP experiment. In terms of problem size, the QNLP stands out among other types of classification experiments. When the word meaning dimension is increased, the problem size grows exponentially, which affects the classification process. Another factor of scalability is the sentence length. Depending on how many qubits are assigned to each pregroup type, the number of qubits on which a sentence’s corresponding quantum circuit is constructed, or the circuit width, will scale as the sentence’s length does. But more importantly, the number of qubits that must be post-selected will increase exponentially with a larger statement. Notably, one shall use more advanced protocols that only require one qubit to be measured in the long run instead of aiming to post-select, leading to additive approximations of an amplitude encoding a tensor contraction [43]. Essentially, sentences in natural language are typically upper bounded in length, so these can be considered as initial fixed costs.

As expected from variational circuits [44], the tested QNLP models showed resilience to noise when tested on a classical simulator with induced noise model. On the other hand, the QNLP model training typically takes more than 1 h to complete; one possible improvement is to run the training on multiple NISQ devices in parallel. Another possible improvement is using more advanced rewrite rules borrowed from classical NLP methods like the ones used by Le et al. [45] and Kha et al. [46].

The type of quantum advantage over any classical algorithm varies with the problem and is determined by the task at hand. The expressibility of quantum models is a reasonable direction for attempting to establish a quantum advantage in the NISQ era. However, in order to make a fair comparison to prove a quantum advantage, both classical and quantum models must be placed on equal terms. One possible approach is using tools from information geometry [47], rather than simply trying highly expressive circuits intuitively.